Documente Academic
Documente Profesional
Documente Cultură
- - - -- - - - -
-- - -
INTROIJU( TION
Throughout the book we illustrate concepts with practical examples. In addition, we 111cludc a group of real cases based on the actual implementation of experimental design
methods. In this section, we discuss the highlights of a number of these cases.
In the marketing area, consumer testing is an important and widely used tool. But most
marketing pr'lfessionals hold firmly to the approach of changing onlv one variable at a time,
which is often called "split-run testing" (also referred to as A I B splits, test control, or
ch.unpion-challenger testing). Only recently have marketing managers begun to embrace
multifactor techniques that simultaneously test marketing \,1riahlc,. I'hesc L'XPL'rillll'lll<il
design methods are particularly well suited to product testing 111 .'>upern1arkets.
I11 one significant marketing application, and one of the Lases 111 the book, ,1 m.1jor mag
di.inc publisher sought to increase sales of its popular magazine in a chain of supermJrkl'ls.
The lirm identified I0 factors to te ..,t, i11Llud111g a disn>unt 011 rnulllplc Lop1cs 1 no or res),
an additional displav rack in the snack food area (no or ves), and an on shelf advcrtl'>L'mrnt
(nnor yes). Alter considering a number 0Calternat1ves, the publisher implemented a 24 run
PlaLkctt-Burman experimental design (see Chapter 6). l:ach run C<lllsio,ted of <l particular
LOmbination of settings of each of the I0 factors.
/\key part of the experiment was to decide how many stores to include and how long to
test, in order to achieve statistically signifle<mt results. A total of 48 '>tores were included,
and the experiment ran for two weeks. As a result, the tirm identified '>CVeral changes that
increased sales by 20%, and equally important, it gained ins1ghb into which changes would
have a negative effect or no effect.
Direct mail is a common marketing channel, and firms use 1t tor a widc range ol prod
LILI.'> including credit cards, clothing, and maga11J1cs. Typical!\,, r-c'>p<>nse r<lles arc vcr} low,
and a small increase in response can mean large finanlial hcnc11ts. A/other }onr:s maga1111c
had cxtcnsive experience in direct mail testing aimed al increasing their subscription rates.
Their protocol was to test only one change, such as thc color of the cnvclopc, in ead1 mailing to potential subscribers. Using a fractional factorial dcs1gn (Chaptcr 5) the firm was able
to test seven factors simultaneously in a single mailing, gaining valuable and immediate msights that led to large l!1creases m response. Moreover, the results were attained with a
sample size (the number of people receiving the mailing) that was much smaller than would
have been needed if the seven factors had been tested one factor at a time.
1\ leading office supplies retailer designed and implcmented an e-mail te<;l targeted at
small business customers, a group the retailer wanted to attract to their stores and Web site.
!'he retailer identified 13 factors that it wanted to include 1n thio, experiment, with each fac
tor havmg two possible values. The factors included the b<1Lkgrouml color of thc e mail
(white or blue), a discount offer (normal price or 15% discoLlnt ), a free gift ( 110 gift or a pen
and pencil set), and products pictured (few or many). Testing all pmsibk Lombinations of
thc 13 factors would have required 2 11
8, 192 different c rnail dL'signs! l3ut using a fraction<il factorial design, a methodology that we discuss in Chapter 5, the firm was able to succes..,fully test all l.l raLtcm with just 32 differrnt designs.
INTRODU C TION
I3
Peak Electronics, a manufacturer of printed circuit boards, was faced with a recurring
problem. Jn the circuit board production process, most of the holes on each board are
plated with a thin layer of copper so that current can flow from one side to the other. Some
holes, however, arc not meant to be plated and instea d are tented, meaning that they are
protected by a thin layer of photographic film. During the manufacturing process, a significant number of these tents were breaking, and their holes were being plated. The result was
the number-one ca use of re.work at the firm, because the copper in these holes had to be
scraped out.
At the time. Peak was using film supplied by Dupont. The sales representative of Hercules, a competing filmmaker, suggested that Peak perform an experiment using the Hercules fi lm to test the effect on broken tents of a number of key manufacturing variables. The
sales representative designed the test and helped Peak analyze the results.
With the explosive growth of the Internet, Web site design has become an important issue,
as firms attempt to attract a greater number of people to visit their sites and order their products or services. Phone Hog is a subscription-based service through which consumers get free
long-distance phone calls. Participants sign up for the program and earn phone minutes by
visiting Internet sites, entering sweepstakes, or trying new products and services. The PhoneHog case in Chapter 8 describes how experimental design can be used to improve a Web site
to obtain more customers. In this case there were 10 factors to be tested with the number of
variations, or levels, for each factor ranging between 2 and 10. For example, the top image on
the Web page had four possible designs: (I) photos of five people talking on the phone with
the Phone Hog logo on the right, (2) a cartoon image of a pig peeking through the 0 in the
Phone Hog logo on a blue background, (3) the same image of the pig on a white background,
and ( 4) th e photos of th e five people talking on the phone with a different Phone Hog logo on
the ri ght. If every possible combination of factor levels were included in the experiment ~ a total of 1,658,880 test Web pages would have been req uired. In fact, the experiment consisted
of just 45 different Web pages (each page a combination of factor levels), with each person arriving to the site randomly assigned to one of them. The number of visitors to the site and the
number of visitors who click on an icon to request additional information were record ed. As
a result of this experiment, the click-through rate, which is the number of clicks divided by
the number of visito rs, increased by 35%.
1.3
The field of experimental design began with the pioneering work of Sir Ronald Fisher,
whose classic book, The Design of Experiments, was published in 1935. Fisher was responsible for statistical analysis at an agricultural experiments station in England, and his ea rly
work on experimental design was applied to improving crop yields and solving other agricultural probl ems. Over the years, applications of experimental design to industrial problems have been widespread, with particular attention given to problems in the chemical
industry, such as m ax imizing chemical yields and assays. In 1978, George E. P. Box, Wil-
f
t
2005) , a book that became, and still is, a standard text in the field.
INTRODUCTJON
Beginning about 1980, U.S. manufacturing firms, faced with cornpditive challenges, especially from Japanese companies, took a renewed interest in quality management and design of experiments. This period spurred renewed interest among U.S. manufacturers in cxperi mental design, and in the 1980s the American Society for Quality (ASQ) and many other
organizations started to offer numerous seminars on experimental design. However, little or
no attention was given to the application of experimental design to service organizations.
More recently that has slowly begun to change, and several articles have appeared showing that multi variable experimental design techniques provide powerful approaches to service problems. "The New Mantra: MVT" (Forbes, March 11. 1996) discussed the experirnrntnl design applications to services by a quality consulting firm, while "Tests Lead Lowe's
tu IZevamp Strategy" (Wall Street }uurna!, March 11, 1999) e>-.pL1rncd how that ti rm hel[Jed
Lowe's improve its advertising policy. The article,
"Bou~t
ime11tal Design" (Almquist and Wyner, Harvard 13usiness Review, October l, 2001), told
how another consulting firm used experimental design tn improve mctrkcting decisions. In
short, business leaders are beginning to realize that experimental design has widespread ap
plic'.1tions to management decision making, particularly in service organizat1011s.
Chapter 3, we extend the discussion in Chapter 2 and focus on cum paring more than
two population means. For example, we might want to compare the effectiveness of three
different advertising strategics by testing them in a number of stores. We present two statis
ti cal models: the completely randomized design and the randomized block design. Jn Chapter J, we emphasize two important ideas, randomization and blocking, that arc used
throughout the book.
The heart of the book begins with Chapter 4, where we focus on so-called 2-level facto
rial designs. In these designs, there are k factors to be tested, and each f"actor is studied at two
different values (levels). ~or example, in a Web site test, 011c faLtur 111ight he the banner
headline (version I vs. version 2), while another might be the image under the headline
(roduct hoto vs. happy user)_ ln the full factorial design, the experimenter tests all combinations of factors and kvels, with each combination called a run. With k factors, there art
2" runs. Thus, testing two factors requires 2 2
4 runs, testing three factors requires 2'-= 8
runs, and so forth. The main effect of a factor is the difference in response at one level of the
,l
I NT R OD U CTION
fac tor versus th e oth e r. For example, for the im age under the banner headline, the main effec t of that factor is the difference in respo nse if the happy user image is employed rat her
than the product photo. Jn so m e instances, there may be an interacti on between factors. For
example, th e differen ce in response b etween version I and version 2 of th e banner headline
may d epend on which im age under the ba nner is used. In Chapter 4, we show ho w m ai n
and interaction effects arc estimated and discuss the various approaches for determining
which effects are statistically significant.
The focus of C hapter 5 is on 2- level fra ctional factorial designs. Full fac torial designs are
useful for experimenting with relatively few factors. As th e number of factors increa ses, the
numbe r of runs required in a full factorial desi gn in creases dramati cally. In fact, th e inclusion of eac h addition al fac to r in a full fac torial d esign doubles th e number of runs req uired ,
with 4 fa ctors requirin g 2 4 = 16 runs, 5 factors requiring 2 = 32 runs, and so forth_ If full
factorials were th e only option, the expe rim e nta l design approach wo uld have limited va lu e.
In a fractional factorial des ign , the experiment requires only a fraction of th e number of
runs n eeded for a full fac tori al design . For ex ample, a full factorial desi gn with seven fa ctors
req uires 2 ' = 128 run s, or sepa rate experiments. But as we shall see, it is possible to construct a frac tion al design req uiring only 16 runs that provides nearl y as much information
as in a full fac torial des ign. In some instances, a fractional experiment may produce results
that are diffi cult to inte rpret. We show how a follow-up expe rim ent ca n be designed and executed to resolve these ambiguities.
In Ch apter 6, we discuss Plackett -Burman designs. Th e number of runs required in a
fractional factorial design is a power of 2. Thus, the number of runs would be 8, 16, 32, 64,
and so forth. Tn a Plackett-Burman design , the number of runs req uired is a multipl e of 4,
so the numhcr of runs would be 4, 8, 12, 16, and so forth. For example, in a particul ar situ ation, if the experimenter were limited to fractional factorial designs, . she m ight have to
choose between a design of 16 runs and a design of 32 runs. Th ere is a rather large ga p between allowable run sizes. The Plackett-Burman designs give the experimenter additional
options that may be adva ntageo us. We disc uss the characteristics of Plackett-Burman designs and illustrate their use with several case exa mpl es.
The designs in Chapters 4 through 6 are all 2-level designs, with each fac tor bein g set at
one value or another. In Chapter 7, we extend the analysis to include designs in which factors m ay be at m o re than two levels. We show h ow regression analysis can be used to estim ate effects, and we discuss the constructio n and analysis of simple fractional designs th at
include factors at more th an two levels.
The last chapter of the book, Chapter 8, is devoted mainly to the most advanced topic.
The designs discussed in earlier chapters have an important property called orthogonality.
In an orthogo nal design, effects are estimated independently of one another. That means
that th e particular estim ate of on e effect is not influen ced by the estim ated value of another.
In Chapter 8, we consid er no northogonal designs invo lving many factors a nd several leve ls.
We show how regression analysis can be used to anal yze these design s, and we illustrate the
approac h with th e Ph o ncHog case, which was described ea rlier in this chapter. Chapter 8
ends with a discussion of experimental design so ftware fo cusing on two so ftware products,
~~N 11wnu<
1.5
_ri_o_N_ _ _ - - - - - - - -
!( 1\ hsher, The 1 i/c ofa Scientist, is an inten.>'>ting biographv of~ir l{onald hsher written
b7' Im daughter, Joan l is her Box ( 1978 ). Fisher is one of the stat istiuans 111cl uded i11 J'hc
Ladv Tasting Tea: How Statistics Revolutionized Science in the 'J'wenticth Century, bv David
<;,1bhurg (200 I). fhe title of the book comes from a paper that f'ishl'r wrote, which 1s in
eluded in fisher's The J)esign of Fxperzme11ts. As the story goes, a lady claimed that bv tast
1ng 1t she could tell whether milk or tea was put into the cup first. I isher designed ,111 ex
periment to test her claim. Salsburg's book has stories of other great statist1L iam including
\\'illiam Gossett, famous for the t-distribution, which we discuss 111 Chapter 2. I he online
(and free) encyclopedia Wikipedia (www.wikipedia.org) has interesting biographiLal information on Fisher, Cossett, and many other important figures in the world of statistics.
I he NHC tdev1sion white paper "If Japan Can, Why Can't We?" winch was broadLast in
1980, was a milestone that marked the beginning ofa qualit\' revolution in manufodunng in
the United States. W. Edwards Deming was featured on the program, and he cast igatcd 1\ mer
ican firms for shoddy quality. Deming, a statistician with a Ph.D. in physics, gave a serie., of
lectures in Japan in 1950 that greatly influenced that country's quality efforts. The Deming
prize, the highest award for quality in Japan, is named in his honor. Deming's ( l 982) book,
Out o_(the Crisis, is a good source for learning about his quality management ideas.
Not long after that NBC program, the work of Cenichi raguch1, a Japanese consultant
and former professor, began receiving widespread attention from manufacturers rn the
UnitL'd States, particularly in the automobile industry. Tagud11 methods hcLa111e ,1 familiar
bu11word for his approaches to experimental design. "itJtistiL.ian-, have oltrn L.rit1Li1ed
Tagt1Lhi's statistical methods, but there is grneral agreement th,1t hi-, l'!lgineering ideas ,11-c
\'L'f) meful. He is probably best known for two con<.:epts; robu-,t design and the I agudn loss
function. Roh11>l des1gn means <lesigning a product or proLe-.-. th.it i-. i11-.e11-,itive to L'll\ iron
mental factors. ror example, a robust cake reLipe would produce a good cake even with con
s1derable variation in baking time and oven temperature. The 'foguc/11 /ossfimctwn i-, an appealing alternative to the traditional approach to determine whether a produ<.:t or process
meet-. customer specil[c.1tiom. h1r cxampk, tu illustrate the trad1t1onal ,1pproad1, -.upposL
Lhe plat111g thickness in millimeters ot a pnnted urcuit ho.ml 1s ,lllL'Jltablc 1f 1t fall-. \\'1thi11
cntain upper and lower specification limits. ~o, a board having th1L.k11ess just below the up
per '-pecifiLation would be judged acceptable, whereas a bo.ird whose th1Lkne-.s was just
ahmL' that limit \\'oul<l be classified as ddtllive. In reality, there is a target that is ideal, <ind
the Lioser each board comes to that target, the better. In contrast, under 'laguchi's loss tunl
tion, the loss associated with an individual board would he equ,d to J L.onstant time'> the
squared deviation between the board's thickness and the target. With this function, dou
hling the distance from the target would quadruple the los-.. In th.: traditional approaLh,
where each board is either in or out of >pl'citicatiom, two prnLesses might have the -,ame
fraction of boards meeting speulications but, in reality, \en different quality le\ek One
process might have most of its acceptable boards with thicknesses close to the target,
whneas the other might have a more uniform distribution with board thicknesses evenly
spread within the window defined by the specification limits. This process would have much
INTRODUCTION
lower quality than the other, but under the traditional approach, the quality of products
produced under th e two systems would be judged as equal.
In recent years, the Six Sigma approach to qu ality has been embraced by numerous organizations. Six Sigma was originally developed at Motorola in the mid-J 980s and refined
first by Allied Signal and more recently by General Electric. Six Sigma has many similarities
to total quality management (TQM) and other programs in the past, but it also has some distinctive characteristics. One is its focus on defining and responding to customer needs. In doing so, it takes a broader view of quality management compared to some more narrowly focused programs of the past, better integrating quality activities into all areas of the
organization and aligning these activities with th e strategic goals of the firm . In addition, Six
Sigma programs have been more widely applied to service processes, including many implementations in hospitals an d oth er health care organizations. The d esign of efficient experiments is an important component of the Six Sigm a approach. One of the m any books on Six
Sigma is The Six Sigma Way: How GE, Motorola, and Other Top Companies are Honing Their
Performance, by Peter S. Pande, Robert P. Neuman, and Roland R. Cavanagh (2000).
EXERCISES
Exercise I
Search the Web for the work of Sir Ronald Fisher on experimental design, in-
cluding his earliest effort s performing agricultural experiments at the Rothamsted Experimental Station in the United Kingdom .
Exercise 2
Read Mother Jones (Case 3 in the case study appendix) and Peak Electronics: The
Broken Tent Problem (Case 4 ). Both of these cases describe a company's first exposure to experimental design m ethods.
(a) Mother Jon es: Suppose the organization wanted to test each of the seven factors in
a separate mailing. What specific shortcomings would this approach have com pared to th e approach in the case?
(b) Peak Flectronics: Su ppose the company did not use experimental design to examine and solve the broken tent problem. Imagine how they would have approached
the problem instead. What difficulties would they have likely encountered? Would
it have been possible to identify interactions between factors? If so, how?
Exercise 3
Pick a Web site on the Internet. Suppose you were designing an experiment for
increasing visitors' response to a product or service offered on the site. What seven factors
do you think woul9 be most important to test? In each case, if possible, specify two levels
(values) for each of the factors.
in nature. Here, anv number-obvi o usly within a certain interval-is a feasible outco me.
We ca ll s uch random va ria bl es continuous random va riables, and we use continu o us di stributions to characterize th e vari ability. The norm al distribution, the t-distribution, and the
F- disrribution are important exa mpl es.
ti o n P[ Y =
yJ
stands for the probability th at the random variable Y takes on the value y.
Example I The random vari ab le Y describes a custom er's purchasi ng decision. Possible
ou tcomes are I (purcha se) and 0 (no purchase). Based on historical data, it is estimated that
5% of customers will pla ce an o rder. Thus, P[ Y
0.05 = 0.95.
Example 2
I]
= 0.05, and
hence P[ Y
= 0] =
I -
y = 0 (no fl aw)
y
I (exactl y on e flaw)
l
r
Example 3 Let the random variable Y be the number showing on a thrown die. The possible outcomes are y = 1, 2, 3, 4, 5, 6. Assuming the die is fair, the outcomes are equ ally
likely. Hence P[Y = I]= P[Y = 2] = ... = P[Y = 6) = 1/6.
Example 4 Let th e random variable Y be the number of times a customer orders from
a catalog during a specified time period . The possible outcomes are y = 0, I, 2, 3, with
P[ Y = OJ = 0.2, P[ Y = I] = 0.5, P[Y = 2] = 0.2, P[Y = 3] = 0.1. Note that the probabilities sum to J. No tice also that ordering four or more times has zero probability; it cannot
occur.
We can easily calculate the probabilities of various events. For example, the probability
t
f
0 o r Y = 1 or Y = 2]
10
P[ Y
0J
+ Pi Y -
+ P[ Y =
l]
2] --' 0.2
+ 0.5 + 0.2
P[ Y
3 J = 0.5
+ 0.2 + 0.1
P] Y - 2]
I - Pl Y - 0] ~
1 - 0.2 = 0.8
LYP [Y = y]
We use the Greek letter, to denote the mean. It is the weighted sum of the possible out comes, with each outcome weighted by its probability. The mean or expected value is the
long - run average.
Example I
(0)(0.95)
Exarnple2
= (0)(0.90)
Exarnple3
= (l)(l/6)
Example4
= (0)(0.2)
+ (I )(0.05)
+ (1)(0.08) +
is 0.12.
+
+
(1)(0.5)
+ (6)( 1/6)
(2)(116)
+ (2)(0.2) + (3)(0.l)
average, 1.2 orders per customer. Of course, the number of orders can only be an intcger ;
howl'ver, in the long run (i.e., over many custorners) the number ot orders averages to 1.2 .
L (y -
, )2 Pf y
yJ
The variance, denoted by the Greek letter sigma squared (rr 2 ) is a measure of spread. Jt is
the weighted sum of squared deviations from the mean, with squared deviations weighted
by their probability of occurrence.
JL(y _
, f P[ Y =
y]
The standard deviation is equal to the square root of the vari,rnce. If the units oft he random
variable are, say, dollars, the variance will be in uni ts of doll a rs squared. That make~ the vari ance difficult to interpret. Taking the square root of the variance to obtain the standard
deviation expresses the spread of the distribution in the same units as the random variablein th is case, dollars.
Example 1 rr
[0.1456]
[2.9167]
Example 4
(0 .1)]
'
(I -
0.1 2) 2 (0.08)
= [0.0475]
+
05
= 0.218.
Example 3
05
= f(O
- 0.12) 2 (0.90)
= 0.382 (flaws ).
Example 2 rr
05
I II
1. 7 1.
y] =
n!
) 11'Y( J -
y! n - y !
11')" - y for y
0, I, 2, . . . , n
120. By definition, O!
ability of success in a sin gle trial 11' are called the parameters of the binomial distribution.
It can be shown that the mean of the binomial distribution is given by
, = n11'
The binomial di stributi o n is tab ulated in statistics textbooks. Also, its pro babilities can easily he determined using functions in computer packages such as Excel o r Mini tab.
12
Example Assume that a production process is characterized by a J 0% defect rate. That i'i,
the probability of producing a defective item is Pl defective J = 0.1 O; the probability of producing a good item is P[good] - 0.9. Assume also that the quality of each item (defective
or not) is independent of the quality of every other item.
101
--'-(0.1) 2(0.8) 8
2!8!
0.1937
. can be calculated either from the expression above, or from the binomial function of readily available computer programs. Note that the probabilities, summed over the possible out comes, add to 1. Computer programs also calculate the cumulative probabilities such as
Pl Y < 1]
Pl Y =
0J
+ Pl Y = l]
0.3487
+ 0.3874
0. 7361, or in general
P[ Y :s y J =
.2: P[ Y =
i] for y
0, l, 2,.,, , n
1= 0
The Excel function B!NOMDIST(y, n, 'TT, PALSE) returns Pl Y = y], if n is the number of
triab and 'TT is the probability of success. Replacing FAT.Sr_ with TRUE returns the cumula tive probability, J>l Y :S y]. In Minitab, the calculations arc e<Hried out by using the conve nient pull -down menu "Cale > Probability Distributions > Binomial. "
Probabilities such as
P [ l s Y :s 3 j = P[ Y = l J t- Pl Y = 2] + pi Y
= 0.9872 - 0.3487
5 j - J>[ Y s 3 J - Pl Y s
oI
= 0.6385
can be calculated by summing the individual probabilities, or as the difference of two cumulative probabilities.
P[ a -s Y s bJ ~
j(y)dy
P[ }' -s bj - P[ Y <; ll J
''
Percentiles of the distribution are defined by cumulative probabilities. The ( lOOp)th per centile is given by Yp> the value of the random variable for which the area under the curve
frurn -oo to Yp equals p; that is, p -= P[ Y -s y1J
'
_J
,.,
0.4
I
I
I
I
I
n..l
I
I
I
I
I
I
I
I
I
'
f
I l.l
,'
I
I
0. 1
,.
0.0
I
/
-+----_,:;.~-=---~---~~---~--=--
-3
-()
Va lue y
- - Standard normal distribution
Norm al with mean 3 an d standard devi ation 2
figure 2.1
The probability that a continuous random variable is exactly equal to a particular value
is zero; that is, P[ Y = a] = 0, for any a. Hence,
= 0.2743
and
P[Z
=l
- 0.2743
= 0.7257
14
0.4
97.Sth percentile =
I
I
I
-+----,---~-'-'--j--
-2. 0
.\.0
1.0
U.5
Valu~
Figure 2.2
~
.
2.U
1. 0
U.IJ
---,
\.0
0.7580
s J.Oj
and
1-'[L'
>
0. 7
P[/'. <
0. 7J~ l
0.7580
02420
0.3085 - 0.5328
1.96
""
Suppose Y has a normal distribution with mean., and standard devic1t ion r..r, <111d that for
any value a, we want the probability that Y is less than or equal to a. We convert the proba bility statement about Y into an equivalent statement about/. We have
P[ Y :so a]
P[
Y-.,
r..r
:::;
a - .,j
r..r
Note that on either side of the inequality we subtract the mean., and divide by the standard
deviation r..r. The random variable Z = ( Y - .,) /r..r follows a standard normal distribution ,
and the probability in the above equation can be looked up in the z-tahlc. Similarly,
P[ a ::::; Y < b]
a-,
P[ -
Y - .,
<
<F
b-.,1
lF
<F
Example The weight of toothpaste Yin a 2.7 ounce tube follows a 11or111al distribution
2.8 and standard deviation tr = 0.05. The fraction of underfilled tubes is
with mean.,
P [ Y --= 2.7]
2.8
0.05
~~
2.7
2.8J
r
0.0::>
15
Statistical software allows us to obtain cumulative probabilities for any normal random
variable directly, wilhout the conversion to the standard normal. The Excel function
NORMDIST(y, ,, <r) returns the cumulative probability, that is, the area under the density
curve below y, for a normal random variable with mean, and standard deviation u.
Some percentiles of the normal distribution with mean, and standard deviation u are
the following:
50th percentile Yo.so = ,
5th percentile y005 = , - ( 1.645 )r:r and 95th percentile Yo.95 = ,
2.Sth percentile Yo 025 = , - (I. 96 )rr and 97 .5th percentile y 0975
99th percentile Yn. 99 = ,
+ (1.645 )r:r
= , + ( l. 96 )a
+ (2 .326 )rr
The Excel function NORMTNV( p, ,, rr) returns the ( 1OOp )th percentile of a normal distribution with mean, and standard deviation a. For example, suppose a random variable Y
has a normal distribution with mean 100 and standard deviation 20. Then NORMINV
(0.99, 100, 20) = 146.53, and P[Y :s 146.53) = 0.99. In Minitab, cumulative probabilities
and percentiles ( referred to as " inverse" cumulative probabilities) of a normal distribution
are obtained with the pull-down menu "Cale> Probability Distributions > Normal."
The t -Distribution
The I-distribution has one parameter
11
The only difference is that the tails oft-distributions are slightly heavier than those of the
standard normal (as explained later). The standard deviation of the t-distribution with
11
to the density of the standard normal. Notice that for very large or small values y of the ran~
dom variables, the densities and hence the tail areas are larger for the t-distributions compared to the normal. This gives t-distributions a somewhat larger chance to generate large
deviations from th e mean. The t-distribution converges to the standard normal as the degrees of freedom approach infinity.
Percentiles and cumulative probabilities of the t-distribution can be calculated using Excel or any other statistical software. With Excel, percentiles are found using the TINY function. The user specifies a, the area in both tails of the distribution, and the number of degrees of freedom 11. Thus a/2 is the upper tail probability and I - (a/2) is the corresponding
cumulative probability. For example, for a t-distribution with 3 degrees of freedom,
TINY(0.10, 3) returns the value 2.3534, which is the 95th percentile of the distribution .
Other examples arc
95th percentile oft( JO):
1095 (10) =
1.8125
16
0.4
/'
//
,'
--. ,,'
' '' \
,,,.- .....
I/
\ \
/1
- 0.3
\\
\ \
\\
'\
I/
/1
//
"
<::::.
I/
'/
\\
'"'
.,,
0.1 ....
()(I
-,
\'alue
Figure 2.3
()
i
with dj
_l I
long cLish)
2.2181
1.96
x2(1)' with the S}'mbol x denoting the Creek lowerLdSe letter chi.
Figure 2.4 shows the densities of three chi square distributions, with 3, 6, and 10 degrees
of freedom. rhe mean of the chi-square distribut1on is the '><tme ,t\ th degrees or freedom:
JJ.
V2v.
The F-Distribution
The F-distribution takes on values from 0 to x, and it is -,kewed to the right. lt has two
var;u11cters, its degrees of freedom 11 > 0 and u, ""0 which 111 most st.it1stical appltcatton'>
are positive integer'>. \Ne use the notation F(1 1, u,), to de-,nibe <lll I di,trihution \\'ith 1 1 .ind
u , degrees of freedom.
I hL' mean or the Ht'1 l'2) dtslrthutwn, ,
11,/(/J,
2 ), depe11d., Olli\ Oil I',, .ind JI
ts .tl11,1ys slightly larger than I. The st,1ndard de\'iation dCJJL'lld'> on hoth p.trametcr,, l',
and''.
Figure 2.5 shows densities of four f- distribution'>: 1-'(4, !OJ .111d lH, 20), Jnd /(8, IOJ and
Fr8, 20). Percentile' and cumulative probabilities can be calculated with standard ..,tafr,tic'>
soft\\'are. ror example, the 95th percentiles of these four f- distrihuti\>tl'> arL'
f- 095 (4, 10) - 3.4780
F,m(4, 20)
2.8661
2.4471
17
0.25
0.20
0.15
-~
0.10
.... ...
... ...
... ...
;:i
0.05
.........
.... ........
....
....
__
---
.....
0.00 -*-~:.:....----,-----...-:;.::::::===,;,,;:,r:;-;.:.----i----0
JO
15
20
Value y
Figure 2.4
0.8
() 7
11 .6 -
~ 0.5
;;- 0.4
0:
0.3
0.2
0.1
0.0
1L----~--------=-=-:::=:~~~~0
Value y
- - f(8, 10) : 95th percentile = 3.072
f(4, JO): 95th percentile= 3.4 78
Figure 2.5
2.3
- - F(8, 20)
- - F(4, 20)
DESCRIBING DATA
In this book, we focus on methods for designing experiments and analyzing the resulting
data. As part of this process, simple graphical displays such as data plots and histograms,
and summary measures such as the mean, median, and standard deviation, provide extremely useful complements to the more formal statistical methodology. In this section, we
discuss these simple tools for displaying, summarizing, and analyzing data . In most cases
th e data are a sample from a larger underlying population. Occasionally, if the population
is small, the data will consist of all of its elements.
Categorical data are observations that are grouped into qualitative categories. Examples
are marital status (single, married, divorced, widowed), advertising media (radio, television,
18
print ), and type of real estate (residential, commercial). For categorical data, we can calcu late n:lativc frequencies of observed outcomes. For example, it ma~ ' be that among 500
client' who received an advertising message, 20 made a purchase and 480 did not. Then the
(sample) proportion of clients who purchased is p = 20/500 = 0.04 (4%), and the proportion of clients who did not isl -
p=
tions in a bar chart or a pie chart. For more than two outcomes, there are more proportions
(adding up to l), more bars in the bar chart, and more pie slices in the pie chart.
Continuous data, on the other hand, reflects measurements that can be any (possibly
rounded) value within a certain mterval. We display continuow, measurement data usrng
dot diugrams and histograms. In dot diagrams, each measurement is displayed as a dot on
line graph (the x -axis). ln a histogram, the observations arc binned into nonoverlapping
equal-width intervals on the x-axis and the frequencies (either absolute or relative) are displayed on the y -axis.
Statistical software makes it easy to construct bar and pie charts for categorical data and
dot diagrams and histograms for continuous measurement data. fllustrative examples are
shown at the end of this section.
S[1111rnary statistics are useful for describing data sets. The center (ur location) of a data
set is measured by the mean or median, while its variability is described best by the standard
deviation or the interquartile range, Assume that we have a sample of n observations
y 1, Ye , ... , y 11 , such as the dollar purchases of n customers or the annual donations to a college made by n alumni. The arithmetic mean (average) is given by
2:Y,
+
Y - lY1 + Y2 + ..
Y11J! n
'
The median is the "middle" observation in rank. First, order the observations according
to their size Y(i) <o Yr2J -s
(11
1)/2.
The percentile af order p, where pis a number between() and J, is the observation with
rank ( n I l)p. If this is not an integer, we take the average of the two observations with ad jacent ranks. I 00p % of the observations are ::.mailer than the pcrcl'lltik , while I 00 ( I
p }0;.,
= Y (n) -
Y(1)
The interquartile range is the difference between the 75th percentile (the third quartile )
and the 25th percentile (the first quartile):
!QR
= Y(( n + 1)0.?sJ -
Y(( 11 - 1Jo.2sJ
The range is very sensitive to extreme observations. The interquartile range covers the
middle 50% of the observations and is less sensitive to extreme values.
I 19
The sample standard deviation is the most commonly used measure of variability. For a
sample of n observations, it is defined as
The sample standard deviation is nonnegative; it is zero only ifthere is no variability and all
observations are the same. The standard deviation approximates the "average" distance of
the observations from their mean. In many data sets (reasonably symmetric and bellshaped), the cumulative probabilities of the normal distribution will apply approximately
and about 95% of th e observations will fall within two standard deviations from the mean,
while about 2/3 of the observations will fall within one standard deviation.
The square of th e sta nd ard deviation results in the sample variance.
SL
ji)2
= --1 - I .... -- --
n- I
The numerator, th e sum of the squared deviations from the sample average, is referred to as
the sum of squares, co rrected for the mean. The denomi nator, n - I, reflects the degrees of
freedom of the sum of squares. The degrees of freedom of a sum of squares are the number
of"independent" components that are needed for its calculation. The sum of th e deviations
from a sample average 2.::'~ 1(y, -- Y) is always zero, and consequently specifying an y n - l
deviations determines the final deviation ; the value of the last deviation must equal the negr
ative o f the sum of I he others. The division of th e sum of squares by its degrees of freed o m
tion variance
I.
I
Ir
I
,,f
I, 2, . .. , n. The correlation
coefficient
r =
I
---=--2::"(x;-x)(r;-y)
- - -n
] i=
Sx
Sy
is always between - l an d +I. Its sign indicates the direction of the linear association. For
positive values of r, above-average values on y tend to occur with above-average values on
x. The absolute value o f r indicates the strength of th e linear association. A correlation of+ l
occurs if the observations plotted on a scatter diagram lie on a straight line with positive
slope. A correlation of - I occu rs if the observations plotted on a scatter diagram li e on a
straight line with negative slope. A high correlation does not necessa rily imply causality; this
20
is e,;pccially relevant if one analyzes data from observational studie' (as compared to dJta
from designed experiments). Statistical software such as EXLcl and M i11itab can be used to
cJrry out the calculations. The Excel function CO R.l:ZEL(urray I, arruy2) returns the correla
tion coefficient. The user enters then pairs of observations into two columns of the spreadsheet, with arrayl being the cell range for one variable and array2 being the cell range for
the other.
22
TABLE 2.1
Nu dcll1c1L1un
lJon,llion
Pcru:nLage
1957
1967
1977
1987
157
95
37.7
159
120
43.0
215
119
35.6
214
108
31.6
/\II
1997
3UU
1,065
570
34.9
128
29.9
---
DONATION~
UUNA I !UNS
Year
1957
1967
1977
1987
1997
Nu
NuDo
')bl)o
2s2
95
120
119
108
128
37.7
600
4.l.11
35.6
l l.6
29.CJ
559
35()
246
2,-9
llI
l'l2
128
JIRUPURTIUN OJ
StDev
Max
'Ju
158
158
120
1,480
1,804
879
11,50<1
l 'l,ll:'"iS
90
470
IH
I Ill
89
113
113
!Ob
128
Mean Median
73
-.... $2,0UO
< $2,UOO
ALL DONATIONS
(l,.JlllJ
2,71 b
I ,IJIJIJ
'.>tlJev
Nu
{l
IOU
I lb
218
2li8
8()
IX
_l4 l
110
.l,MO
2,(1118
11
(IHI dc!Ll)
Mean Median
1;8
l :12
.2LJK
211
181
201
'l
Mean
5,IMl
h, I
~)Ci
!>ONA I
ION~
iicrLl'lll..igc
Nn prior attendance
Prior attendance
1\ll
No
Ye>
647
418
1,065
182
388
570
I Jo11al111[!
829
21.95
48.14
34.86
806
1,6.h
ALL lJONAllONS
Allendance
$2,oou
<. $2,000
:-.;u
Mean
Median
Nu
Mean
Median
Nu
No
182
Yes
.\H8
134
460
50
100
182
367
134
210
50
lOO
21
,'vledw1
:-.1can
No Jata
4,820
J,000
lll<HT
likely lo give.
l"lll'
information in Table 2. l shows that almost 50% of those who attend fund-raising events are
do11ating, while the proportion of donors among nonattendi11g ,ilurnni is only 22'Yt" AJso,
the magnitude of the donation increases if alumni attend such events.
A SL alter plot of 2004 donations against 2003 donations is shown in hgure 2. 7. We consider then = 410 alumni who have given donations of$ l ,000 or less in both years. The scatter plot and the correlation coefficient r = 0.8 l 2 indicate that the 111ag11i tu des of donations
in di ffcrent years arc strongly related.
sample tu the population from which the sample was drawn. The l1upulatinn um..,isto of all
I
~4
400
300
s
;;;
g 200
-0
""
0
0
100
0
Box plots of2004 Junalions by class
600
500
~
g 400
;;;
g 300
"" 200
g
-0
JOO
_ ;j:;,._.,,._,
..
1957
1967
1977
1987
1997
Class
1,000
II
800
;;;
c
""
0
0
'
.
...
..
.
. ...
: .lI .
600
"
400
200
: .- .
0
0
200
---,-400
,------ - - r
600
800
l,000
2003 donation
Figure 2.7
Continued
elements, whereas the sample consists of a subset of the population. It is important that the
sample be representative of the population, otherwise reliable inferences about characteristics of the population would not be possible. Characteristics of the population are usually
referred to as parameters, and summaries that are calculated from the sam pie arc referred to
as sample statistics.
I
!
25
Consider the population of all graduating seniors at State University. There are about one
thousand each year; we denote the population size by N = 1,000. We may be interested in
population characteristics such as the average grade point average (GPA), the average number of weekly study hours, and the proportio n of smokers (a percentage) among graduating se niors at State Uni versity. These characteristics crn be determined without any uncer' ,
tainty if we are willing and able to collect information on all graduating seniors. We call this
a census. Of co urse, ask in g students about this information may be subject to error; some
respondents may not tell the truth.
If the population is large, a census is not feasible, and sampling becomes an alternative.
The sample size is denoted by n; usually it is much smaller than the population size N. A
random sampling m ethod guara ntees that the sa mpl e resu lts are "representative." In this
case, one can use stat isti ca l tools to assess the likely size of the resulting sampling error.
Random sampling guara ntees that each possible sample has the same likelihood of being selected. For a large population size Na nda sm all sample size n, many sam ples are possible;
in fact, there are
( N)
n
N!
= -( --~)
-differ e nt samples. Under random sampling, each of
n! N - n !
school year. We enter the nam es of the students into a column of length 1,000 and select 60
of them at random and without replacement by executing the Minitab command "Cale >
Random Data > Sa mpl e From Columns." The first 30 students in the sample become the
students for tutori al A, and the second group of 30 st udents use tutorial B.
Assume that gender plays a role. A sampling strategy such as the one just discussed may
not be optimal because it cou ld lead to an unbalanced gender composition in the sample.
The student body at State includes about the same number of men and women. However,
it could be-by bad lu ck of the draw-that th e first sa mple for A includes only 40%
wo m en, while th e second for B includes 65%. It is better to ta ke stratified random samples.
From the 500 women, select at random 30 and randomly divide th e 30 into two groups of
r
i
26
-~--
2.5
-----
-----
STATISTICAL INFERENCE
y = (y 1 +
for the sample average Y. This sampling distribution has a certain mean,\. and standard
deviation
Uy.
Ii
sample averages fluctuate around,. Averages of some samples are smaller, and averages of others are larger; however, the mean of sample averages from repeated
samples will be,.
The standard deviation of the sampling distribution of Y is givrn by rr\
rr/ v'~,
Averaging reduces the variability, with averages varying less than individual population values. '.:i.imple results from J single observatJun (11
the population mean with standard deviation
<T.
I ) fl uctuc1te Jrou nd
L.111
l:lut, of course, taking a very large sample would in most rnses be prohibitively
expensive.
i-;or reasonably large sample sizes, the distribution ofY is approximately normal,
regardless of the distribution of Y.
The bulleted paragraphs are consequences of the central \.imit theorem, one of the most important results in statistics.
f.T
averJge <Fy = fTIVn quantifies the estimation error; it tells us how far the estimate could be
;Tl
27
from the true population mean . Then a 95% confidence interval for the population m ean is
given by the interval
y ::!:
l.96a y
or
y ::!:::
l.96a/Vn
- y) /(n --
I
I
I
r
Ii
1)
y ::!:
l.96ser
or
y ::!:
l.96s / Vn
The factor 1.96 follows from the central limit effect and the approximating normal distribution ; it is the 97.5th percentile of th e standard normal distribution. Using the factor 2
rather th an 1.96 results in a close approximation.
For small sample sizes, and under the additional assumption that the distribution of Y
in the population is normal, we replace the factor 1.96 with the 97.5th percentile of the
t-distribution with n - I degrees of freedom. Then the 95% confidence interval is given by
- I) ]s/Vn
where 10 97 s( n - I) is the 97.5th percentile of the t-distribution with n - I degrees of freedom. For sample sizes larger than 30, the difference between percentiles of the t- and normal distrihution is small , and it docs not matter which distribution is used .
Thus, 95% confidence intervals cover the true population mean in 95% of repeated
samples. Intervals with other coverages, such as 90% or 99% confidence intervals, can be
obtained by using different percentiles, such as t0 95(n - I) for a 90% or t0995(n - I ) for a
99% confidence interval.
Example
A random sample of 60 customers selected from among all customers who have.
ordered from a catalog in 2005 showed an average purchase amount of y = 125 dollars,
with a sample standard deviation of s = 24 dollars. The standard error of the average is
se-y = 24/ v'60 = 3.098, and a 95% confidence interval for the mean purchase amount in
the population is given by
125 2: (1.96)(3.098)
or
(118 .9to131.1)
p = (number ofsuccesses)/n
y = (y 1 +
y2
+ + Yn)!n
is an average of n sample responses. Each response is the o utcome of a discrete random variable Y with possible values 0 or I (smoker), and associated probabilities I -
1T
and
1T.
The
28
A RFv11ow 01
variable Y follows a binomial distribution from a single trial and with success probabilitv 7T.
Section 2.2.1 shows that its mean is 7T, and its standard deviation is \, 1i( l
/i ). Applying
the central limit effect (Section 2.5.1) to the sample proportion P
Y, we find that for rea
sonahly large samples, the sampling distribution of a proportion can be approximated by a
normal distribution with mean 7T and standard deviation Up= V77(1-- 77)/n. Sample
proportions fluctuate around the population proportion 77, and their standard deviation
decreases with the square root of the sample size.
The sample size needs to be large for the central limit theorem to take effect-certainly
much larger than when averages of continuous measurement data arc considered. ~ample
si1es of 100 or more will be sufficient as long as the population proportion is not too close
to 0 or I. If /i is close to the boundary (O or l ), the distribution of the sample proportion
will be skewed (and not normal) even frir large values ot n.
p provides an estimate of the population proportion 77. The substitution of this estimate
into the standard deviation of the sample proportion \/ 77( J
7T )/ n provides the standard
error sei' = Vp( I
p )! n. The standard error 4uantifies the estimation error, telling us
how i'ar the estimate Lan be from th<:' true poulation proportion. An approximate 95%
confidence interval for the population proportion 77 1s given hy the interval
l.%sc1
or
1.96\,/i(I
p)/11
Example A random sample of 400 customers selected al random lrom all our Latalog
custu111ers found that 108, or 27%, arc repeat cmtomers; in other words, 2710 is our best cs
timate for the proportion of repeat buyers in the population of all our customers. A 95111
confidence interval for the population proportion is
0.27
(1.96)V(o.n)(o.n)/4oo
29
hypotheses. A sample from the customer base is taken, and the average purchasing amount
and the sa mple proportion of repeat customers arc calculated. An experiment with two different advertising strategies is also conducted, and the average sales response for each group
is calculated.
Hypotheses address unknown population characteristics. The research hypothesis ( i.e.,
the hypothesis we put forward as the hypothesis to be tested) is called the alternative hypothesis, H 1 The opposite of th e research hypothesis becomes the null hypothesis, H 0 . It is
the status quo or the fallback hypothesis in case we cannot show that the research hypothesis is more appropriate. In our first example, H 0 : :::::= 115 and H 1: , > 115. In the second
example H 0 : 7T 2: 0.30 and H 1: 7T < 0.30. In the third example, H 0 : 1 - 2 = 0 and H 1:
f.l 2 =F 0.
The burden of proof always lies on the research (i.e., the alternative) hypothesis. If our
f.l1 -
sample or experiment does not provide enough evidence against the null hypothesis, we will
not embrace the research hypothesis and will retain the status quo. We are aware that
sample information may not always give an accurate picture of the population, as sample
statistics are fraught with sampling error. We want to be reasonably confident that we do
not reject the null hypothesis (the status quo) in error. That is, if in fact the null hypothesis
is correct, we want to fix the error of rejecting it at a certain low value; say, 5%. This value
is referred to as the significance level of the test.
:::::=
115 and H 1: ,
A random sample is taken, and from that sample we calculate the sample statistics
y ands.
The test statistic is the difference between the sample average and the hypothesized value,
that is, y - I I 5. If the difference is positive and large, we reject H 0 : , :::::= 115 and conclude
J-1 1: > 115; otherwise, we retain H 0 But the sample mean is subject to sampling vari ab ility, a nd its standard error, ser = s/Vn, must be taken into account and used to stand.ardy - 115 y-115
ize the difference. This results in the standardized test statistic TS = ---== --::-,r--- . If
se-y
s Iv n
this test statistic is large, larger than what could be expected under the null hypothesis, we reject the null hypothesis. Under the null hypothesis that the population mean, is 115, the standardized test statistic follows a I-distribution with n - I degrees of freedom (or a standard
normal distribution, if n is large ). The probability that the t-distributed random variable ex ceeds the computed test statistic can be found in I-tables or by using certain functions in statistical software packages. For example, one can use the Excel function TD!ST( t, n - l , I),
where tis the value of the standardized test statistic, n - I is the number of degrees of freedom, and I indicates that the user wants the upper tail probability (replacing the 1 with a 2
would return the probability in both tails of the distribution). We call this the probability value,
2:
5
,_Y_-_I_l_ ]
ser
A small probability value indicates that under the null hypothesis it would be unlikely to
observe such a large sample test statistic. In this case, we reject H 0 in favor of the alternative
H 1 The significance level 0.05 is taken as the cutoff value, On the other hand, a large
30
probability value (larger than the significance level 0.05) makes it plausible that the sample
test statistic resulted from the null hypothesis, and therefore we would retain H 0 .
Example 1
standard deviation s = 25. We wish to test a research hypothesis about the mean purchasing amount of our catalog customers, and we hypothesize that it is larger than 115 dollars.
That is, 11 1: ,
125 -
115
3.10
25 / \/60
This >talistic is quite large; certainly larger thJn 2, which is a reasonable rntoff, bcc.1use it is
close to the 97.5th percentile ( 1.%) o( the standard normal distribut101i. The probability
. value
probubi/ityvalue
P [ t(59 ) 2: 3.10 ]
0.0015
is very small, which makes the null hypothesis highly unlikely. We reject the null hypothesis in favor of the alternative that the population mean is in fact larger than 115.
Example 2
We are interested in the proportion of repeal buyers, and we want to lest the re -
7T
<: 0.30. We
reject the null hypothesis if the sample proportionp is much smaJler than the hypothesized
value of0 .30. Under the null hypothesis, the standard deviatiu11 ofp in repcall'd SJlllples or
siLl: n is V0.3( l - 0.7) / n; see Section 2.5.3. The standardized test >tatistic becomes
TS =
p - O.~ -
vD.3(1-
0.7)/ n
Suppose a random sample of 400 catalog customers found tliat 108, or 27%, were repeat
customers. The value of the standardized test statistic, TS
probability value =
1,1 /'. s -
l.31] - 0.0951
is !drgcr than the standard significance level, and therefore we retain the null hypothesis.
There is nut enough evidence to say that the proportion of repeal bu ye rs is less than 30%.
we need. to know whether a certai.n sam\)\e si.2e 1s sullici.ent for esti.matm'b a popu\ation char acteristic to the desired accuracy.
31
Estimating a Mean
Assume that we want to estimate an unknown population mean,, and suppose th at we
want to be 95 % confident that the estimate is within -:'::. B units of the true value. How large
a sample size is needed? The standard deviation of the sa mple average is a l Vn, and a 95%
confidence interval is given by
y -:': .
quantity 2rr/ Vn must equal B. Solving the equation B = 2a/ Vn leads to the required
sample size
able, we could first take a small preliminary sample of (say) 50 observations and use it to
estimate rr.
Example
Assum e that we want to estimate the mea n G PA for undergraduate stud ents at
the Central University. Similar studies on GPA may have been conducted at other comparable schools, and we m ay even have access to estimates of the variability in GPA at Ce ntral
University for previous years. Suppose these studi es indicate that a good planning value for
the standard devi ation among individual GPAs is a = 0.8.
Suppose that we want to be 95% confident th at our estimate is within :!:: 0. 15 of th e true
population mean . How large must the sample be? Using the equation just given , we find th at
2
2(0.8))
n = ( - -= 113.8 = 114
0.15
Estimating a Proportion
Assume that we wJnt to estimate an unknown propo rtion
7r,
to be 95 % confident th at our estimate is within :!:: B units of the true population propo rti o n.
How large a sample d o we need?
The standard devi atio n of th e sample proportion is ~7r )! n, and an approximate
95% confidence interval for the population proportion is p ::+:: (2)v7T(l.:_ 7r)l n. Solving
the equation R = (2) \/iT(t"=--:;:;. )/ n leads to the required sample size
n=
47T( l - 7T)
---2 - - :::;
The value I/ R 2 is an upper bound on n, the required sample size. It results from setting
7r
= ~. The fun ction 7T( I - 7r) resembles a half-dom e shape, with a maximum value of~
7T = ~ . If we had prior knowlcd~ about the proportion 7T, we could substitute this
when
value into the equation just given. Previous studies with similar objectives would help with
this selection. On th e o th er hand, we could substitute
upper bound
J2
n - I I 8 2 if no prior guess on
1T
is available. Setting
1T =
Example
We know from past studies that two-party electi0'1s are Lluse, with the proba
77,
much interest in "calling" an upcoming close election, and we want tu estimate the proportion of votes for the candidate uf the incumbent party from a random sample o! likely
voters. We want to be 95% confident that our estimate is within -+-().02 (i.e., 2 percentage
points) of the true value. How large a sample is needed? The above equation implies that we
should take a sample of size
n= (
)
O.o2 2
2,500
Although this is not an overly large number, the challenge of sampling is in making sure that
a true 1andom sample is taken, and that each possible sample from the population of interest is_given the same chance of being selected. We need to be certain that our sample does
not exclude voters that are difficult to reach, nor do we want to include in our sample people
who will not be eligible or willing to vote at election time.
How does the sample size change if we want to be 99%, or 90% confident? for that we
need to replace the factor 2 (which is roughly the 97.5th percentile of the standard normal
distribution) with the 99.5th percentile (which is 2.576), or the 95th percentile (which is
1.645), and solve for n.
11
1-LH
0 and // 1: I-LA
1-LH f
arc the average purchase amounts of customers exposed to advcrti;,ing strategics A and
B, respectively.
Assume we conduct the following experiment. One sub;,ct of n 1 LUstomer' is drawn
randomly from our regular customer ba::.e (the populdtion) and sent adverti,,crncnt A.
A second, and different, randomly selected subset of size
n,,
chases over the next 6 months are monitored. Suppose in this particular experiment we
selected n 1 = n 2 = 30 customers in each group, and found that y 11
132 and s1\ - 20,
and YH = 141 and
s/J
25. ls this enough evidence to conclude that the effects of the two
strategies differ?
Herc we base our decision on the difference between the two sample means, y 1\
Yu
However, one must realize that many different independent pairs of samples could have been
drawn, and that the difference of the resulting means would have changed with each pair of
samples. What is tht' sampling variability of the difference ofsamplt' averages from two inde-
33
pendent random samples? Another version of the central limit effect implies the following:
The mean of the sampling distribution of YA - Y8 is given by I-LA - , 8, which says
that the sampling distribution is centered at the difference of the population means.
The standard deviation of the sampling distribution of YA
YB is given by
The sample standard deviations sA and s8 can be substituted for the unknown
population standard deviations. sey, -y,, =
s~
n1
52
B_
nz
We reject H 0 : !-LA - /.Lu= O in favor of the two-sided alternative J-1 1: I-LA - /.Lu* 0 ifthe test
statistic is a large positive or large negative value, with :: 2 being a good cutoff value. In addition, we can calculate the probability value
probabilityvalue= P[Z2
iTS/] +
P[Zs
-/TS!]=
2P[Z2
ITSIJ
Z follows the standard normal distribution, and the probability can be looked up in the
z-table. Because of the two-sided nature of the alternative hypothesis we must double the
tail probability. This was not needed in the one-sided alternative of the two previous examples. We reject the null hypothesis in favor of the alternative hypothesis if the probability value is smaller than the significance level 0.05. Equivalently, for a 5% significance level,
we reject the null hypothesis if a 95% confidence interval fails to include the value zero.
Example
YA
= 132
and sA = 20, and Y!i = 141 and s8 = 25. The standard error of the difference of the two
\.I
A i{J \
<11-erages 1s sev, ;-
I h
Of
20'
\ 30
BASIC
s IA"!
25 2
30
IS
11(
II
( 0'-l'l'I' IS
f-lH
1s
( 132
141) ::+:: ( l. % )( 5.85). The confidence interval extends from 20.46 to 2.46. The
value 1ero is within this interval, which indicates that the "no difference" hypothe.,1s L<rnnot
be rejected with this data.
The identical conclusion is reached with the probability value. The test statistic for H,:
.1
11
0 and ff 1: A
J-lH
0 is ( 132
141) /5.85
l.54, with probabdm v,due
2JJ / l.54J - 2 0.0618)
0.1236. Since it is larger than the significance level 0.05, we
find no reason to reject the null hypothesis. The effects of the two advertismg strategies are
about the same.
menh A and B lo the available time slots needs to be addrL'ssed. l'hL s,1rne issue ,mse:, 111
medical trials for evaluating the eftcL111-c11ess of new drugs, \\'here the treatmenh are the
drugs that are tested ,111d the subjells .ire the expenmental un1l'>.
One design approach 1s to rundomizc the assig11ment of tre<tlmcnh to the experiml'ntal
unit'> . !'he expenme11ter would Ji-,t the L'XJlL'l"llllL'fll,d u111h the ll''>t fields, till' ,11,t1L1ble
t1111e-,, l>r the available rnbiects - ,md randornlv assign trtatmenh to u111l'>. Randon111ation
is important and certainly better than a nonrandom arrangement, as 1t spreads the existing
variability among the experimental u111ts fa1rlv acros-, all treatment'>. I J01\L'1er, the e\pen
menler ca11 do con-,iderabl} bl'tter if the ex penmen ta I u111h Ldll be gruuped into groups or
hlod.s, .'>uch that the u11its are homogeneous within the same block hut difkr across blocks.
For example, test fields close together are more similar than fields far ap,1rt. Or, ex pen men ts
run 011 the same day benefit from more homogeneous condttiom than experiments that Jre
cnndultcd on diffrre11t days. Or, a within-subject comparisun of"thc effectiveness ofa drug
is exposed to fewer interfering variables than a comparison auoss subject'>. In rundo1111zcd
block experiment>, one randomues the assignment within eaLh block. ~or ex,1mplc, if 20
cxperi!llents need to be carried out over 5 days, the expcrilllcnter would rando11111L thc or
dcr of two A and two B experiments on each day. Or, mstcad of assigning a certain blood
pressure medication to 50 patients and "no treatment" to SO others <llld co!llparing the
blood pressure readings of these two groups after a period of 3 months, a better approach
would be to establish the initial blood pressure (the no-treatment group) on all 100 patients,
then put all patients on the new medicine, and analyze chang-:s after 3 months.
Example Jn Table 2.2, we report results of a blood pressure experiment on 10 patients.
lniti,d blood pressures (x) and blood pressures ,dter 5 months on the Ill'\\ drug (yl Me listed.
"\'he table abo \\::;\::;the ;,,urnrnary ::;\ati::;ti<...;,, (.mean anJ. \\,rnd.ard deviatiun) o\ the in1\ia.\ b\ood
35
TABL E 2.2
Patient
--- -------
-- - ---
--
'
5
6
7
8
9
10
- - - --
--------
Blood Pressure
after 3 Months (y)
- - ---- -- ----- -- - - -- -- -
Redu cti on
(x - y)
------ -----
190
221
212
232
200
178
186
220
204
196
181
211
200
218
185
175
169
212
191
187
13
9
203.9
17.22
192.9
16.69
11.0
4.06
9
10
12
14
15
3
17
8
Mea n
Standard
deviation
pressu re and the blood pressure after 3 mo nths. Furthermore, it lists the changes for each
patient, d; = x, - y;, the ave rage change
difference. A two-sample test treating the two sa mples as independent fails to show any improvement due to the medi cation. The test statistic
an d its probabilit y value l'IL :.:;: 1.45] = 0.073 5 arc inconclusive and d o not allow us to reject the null hypot hesis H 0: /.li niti I - J.lMw = 0. Note that we used the one-tail probability,
because our resea rch hypoth es is specifies an improvement H 1: /.linitial - J.l Aftcr > 0. Also, observe that we used the norm al distribution; we could have used the t-distribution with th e
ap propriate degrees of freedom, but the results would have been very simil ar, and our conclusions wou ld not have cha nged .
The assumption that th e two samples in this experimen t are independent is incorrec t.
Two blood p ressure readin gs (x an d y) are taken on the same person. If the in itial reading
on one subject is high compared to all other subjects, we would expect that also his or her
readi ng after 3 months would be high compared to the other patients. Each subject acts as
hi s or her own block. The va riabili ty between the two read ings from the same subject is
small, certainly much smaller than the variability across patients. It is the differences in the
blood pressure readings th at need to be analyzed. Taking differences elimin ates the subject
variability, which constitutes a large part of the va riability that we see in Figure 2.8.
J6
A llFV 1 EW OJ
---.-----,-
lnitIJI ----, - - -- - - _ _ _ _ _ _
_ -~- --~--
170
Figure 2.8
180
190
200
210
230
220
160
170
180
200
190
220
210
240
230
Blood f>rcssu re
Figure 2.9
Blood Pressure l::xpcnmcnt. Measurements for the Same Subject arc Connected
WC
th,1t uime from the -.amc subject. lt is obviow, lrnrn this graph th.it till' t1 pc of tn:allllL'llt
make;, a big diffrrenu.:. In all subjccb, blood pressure 1s rcdul.cd b)' tlic medicatiun.
Tlic rnrrect test procedure in this blocked (paired) experiment
ences and test whether 1 lu: 0
'"'''"'
. . . d
i;,
f.l:..11cr
II
s) v / n
1s
1,",,il
0 .1ga1 nst / / 1: ,,
\ltci
0. The
4.06/ \/JO
- 0.0000 I is essentially 1ero. Hence, there is vcry strong evidence that the 111edic.llion has
lowered the blood pressure. The average reduction isl J unib; the 95% confidence interval
for the reduction is given by
d :!:
t0
Comment. Herc we assess whether a particular drug "works." Of course, one should be
concerned that the observed effect is a com bi n.iti(ln oft wo effect\: the real effectiveness of the
drug and a placebo effect due to the person's belief of being given something useful. Apart
from much higher sample sizes, rDA approved drug studies w,ually mm pare a new experimental drug to the currently available "best-practice" drug. The best-practice drug could be
a placebo. Jn such a study, one would divide patients into two groups (preferably, at random)
and conduct the experiment discussed in this example with both groups. This would result in
two sets of blood pressure differences (final readings minus initial readings), one set for eJch
group. The procedure in Section 2.5. 7 for comparing the means of two independent samples
can he applied to test whether the me<tn dfclliveness of these tv\(l drug-, 1s diffrrcnt.
,1
for this product. The company commissioned Ad !'cl, a marketing resea1ch company, to
sess thL' impact of
;I',
potential payoff of a $6 million television advertising campaign \crsus the current S2 mill1011
37
strategy. M anageme nt had estimated that a 15% sales in crease (established with 90% confidence or higher) would he required to justify the add ed expense.
AdTel maintained a 2,000-fa mil y panel. It also employed a dual-cable television system
to determin e the sa les effect of television advertising alternatives . Ad Tel had two separate ca b le circuits. Television sets owned by half of t he test-families were wired to cable A, while
those of the other half were wired to cable B. The panels were carefully balanced according
to demograrh ic characteristics and shopping prefe ren ces. Ry the push of a button, AdTel
was ab le to hlock the commercia l hroadcast on one sid e of th e cahle and simultaneously cut
in the desired test commercial , while the other side carried the regular program. The pan el
fam ilies record ed their rurchases in weekl y diaries.
The basic stud y covered a period of 18 months. The first 6 months represented a control
period, where hoth circuits received the same advertis ing at the level of the $2 million campaign. The ne xt 12 months represented the test period where advertising for panel A tripled.
To avoid distortions by families joining and dropp ing the panel durin g the test, a static
samp le was created that only included those fami lies return ing at least 80% of their di aries.
Panel A con tained 829 families, while panel B comprised 922. The average monthly volu mes
per fami ly and the m onthl y market shares of Barrett's peanut butter for the 18 m onths
(6 pretest and 12 test periods )"a re shown in Table 2.3.
Ti me sequence gra phs of average sale volu mes for Panels A and Bare shown in Figure 2.10.
Time series graphs of market sha res for Barrett's pean ut butter are given in Figure 2. I I.
The pretest data (weeks I - 6) show th at there is no appreciable difference between the
two panels. The grap hs also show convincingly that sales and market shares-for both panels A and B-change with the reporting period. Hence period is an important blocking vari able, and the analysis needs to be conducted with the monthl y differences between A and B.
TABLE
2.3
Volume
Panel A
Volume
Panel R
Period
( month )
Pretest
and Test
I
2
3
4
12
11
14
l'i
16
17
Pretest
Pretest
Pretest
Pretest
Pre test
Pretest
Test
Test
Test
Test
Test
Tc., t
Test
Test
Test
Test
Test
45
65
38
18
1est
4(!
47
-- ------ - - -- -----
5
6
7
8
9
10
II
43
22
31
17
29
.'\ I
22
21
29
2'J
46
40
38
5.1
47
Market
Share
Panel A
Market
Share
Panel fl
50.0
30.0
39.5
24.0
44.0
35.0
20.0
26.0
30.0
4
5
9
15
13
19
27
50.0
30.0
40.0
23.0
45.0
39.0
20.0
23.0
33.0
27.0
44.0
32.0
29 .0
38.0
41.0
43 .0
55.0
n.5
30.0
27.0
33.0
38.0
34.0
54.0
9.0
1.0
.7
51.0
57.0
-6.0
Volume
A- R
MMket
Share
A- R
41
23
31
18
25
25
22
23
25
32
42
35
29
38
34
26
2
- I
0
- I
4
6
0
- 2
4
)
- --------- -
.no
0.0
0.0
0.5
J.()
1.0
4.0
0.0
- \.0
.~. ()
6.0
0.5
2.0
2.0
5.0
.Hl
\81
Prele~l
65
SS
~
"
"
E
"
~
45
Cl.
15
-,--,-
1 -'--r
-r
T--r
,----,
Ill
(l
Jj
lh
18
l'criuJ
Figure 2.10
Panel A
l.l
lb
IH
Panel B
figurl' 2.11
Dot plot\ of monthh dtlferences of A and H for volume and market share are sho\\"11
111
Figure 2. 12.
/J-H
()
sdl V t1
>
0.88
4.321 Vl2
s) \
t1
9.98/\ 12
/J- ..\
I'! t( I I )
' 2.U]
20
JO
39
Figure 2.12
There is evidence that the increased advertising has increased the volume. The average
increase of seven units is statistically significant; a 90% confidence interval for the mean increase extends from 7 - (1.7959)(9.98)/\/T:Z = 1.83 to 7 + (l.7959)(9.98)/\/12 = 12. I 7.
An increase of seven units over the average for panel B with standard marketing (which
is 32.58 units) represents a 21.5% increase in sales. However, the lower limit of a 90910 confidence interval for the percent increase in sales amounts to only I 00( 1.83/32.58)
5.6%.
It appears from the graph in Figure 2.10 that the extra advertising has done very little during the first 6 months of the test period. It is only during periods 13 through 17 that we notice appreciahle differences. The last period is also quite remarkable, in that the benefit of
the extra advertising has disappeared completely. In summary, while we sec some increase
in volume due to the increased advertising, it is doubtful that this strategy meets management's goal of a I 5/ri sales increase that can be established with minimum 90% confidence.
A conclusion that increased advertising has affected market share is even less convincing; the small average increase of 0.88 percentage points is not statistically significant ..
2.7
What is now called the normal distribution first appeared in 1733 in a paper by the French
mathematician Abraham de Moivre. (For a discussion of the paper, see Anders Hald ( 1986),
History of Probability and Statistics and Their Applications Before 1750.) At the time, games
of chance such as tossing coins or rolling dice were very popular, and both gamblers and
mathematicians were interested in knowing the probabilities of various outcomes. The binomial distribution was well known, but calculating binomial probabilities was extremely
difficult computationally if the number of trials n was fairly large, and impossible if n was
very large. In his paper, de Moivre derived the equation that would later be called the normal density function as an approximation to the binomial when the number of trials is very
large. Later, Laplace in 1783 and Gauss in 1809 made important contributions by developing theoretical arguments to support the normal distribution as a model of errors of measurement, in particular for errors in the observations of heavenly bodies. Over time, interest in this probability distrihution continued to grow with noteworthy contributions made
by the Belgian social statistician Adolphe Quetelet (1796-1874) who used the normal distribution to describe data on measurements of physical characteristics. Quetelet used the
40
norm JI distribution to measure variations about the "average rnan." The name normal was
first i!pplicd to the distribution in the 1870s by Calton and several others, reflecting the fact
that the distribution described the nurmul or natural variation in many observed phenom ena. (See Chapter 22 of Stigler, 1999, Statistics un the Table, for an interesting discussion of
how the normal got its name.)
Jn 1908, in the paper "The Probable Error of a Mean," William Gossett derived the prob ability distribution that became known as the -distribution. Gossett, a young chemist and
statistician, was studying quality problems at the Guinness brewery in Dublin. I le was interested in calculating the probability that a population mean was within a specified dis tance tlf a sample average. The approach had been to use s/"Vn as an estimate of <Tl V~z, the
standard deviation of the distribution of sample averages, and to calculate the probabilities
using the normal distribution. Gossett knew this worked well for large samples where s
would be close to <T. But he realized that when n was small, calculated values ofs would vary
greatly and therefore so would the estimate s!Vn. As a consequence, errors in the calculated
probabilities would be large. This led him to derive a theoretical density function for the
. bl e '/.
ran d om vana
= -y -, /
s/ v n
. tot h e d ens1ty
. 1"'unction
.
"' ,'/_
an d to compare 1t
1or
= y - - , tI1c
u!Vn
density function for the standard normal distribution. He showed that because oi'thc variability in the estimate s/Vn, his new distribution was more likely than the normal to take
on values in the tails. Gossett's employers at Cuinness viewed their quality efforts as pro prietary, and as a result, he published his papers under the name Student. His distribution
became known as Student's t-distribution.
Students in introductory statistics courses are often puzzled by the foct that the sample
variance s 2
2. ;'
2
l, and not by n.
1(y, - Y) /(n - J) divides the sum of squares bv 11
The division by n - I is a consequence of having to calculate the sum of squares around the
-
sarnpk 111ea11y instead ui'the unknuw1 1 pupulc1t1<111 mca11 .If the pupul,il1<111111ea11 .were
knuw11, the quantity :L;'~ 1(y - ) 2111 would be an unbiased esti111ak ui' the pupulatiun
variance u 2 It is easy to show that the sum of squares 2: '.' 1(y,
. ) 2 i, smallest if. = y.
Hased on the central limit theorem, we stated that for reasonably large sample sizes, the
distribution of the sample average will be approximately normal regardless of the population distribution of the individual values. How large due; the sample si/.e n have to be'
Many introductory textbooks specify or at least suggest that n should he at least 30. But in
most cases of practical interest that figure is too high. For example, in statistical process control, sa111ple averages are plotted on X - bar control charts, sometimes called Shcwhart chJrts
for their originator, Walter Shewhart. Typically, samples of size 4 or 5 arc used, and the
control limits arc based on sample averages following a normal distribution. In his classic
book, Economic Control u[Quality o[Manujuctured Product, originally published in 193l,
Shewhart presented the results of his experiments taking 1,000 sample averages of size 4
from populations that were rectangular (uniform ) and triangular. In b(lth cases, the sample
averages were well approximated by a normal distribution. As Shewhart said, "The close-
-- - --
- - -
11
ness of tit ~s striking and illustrates the rapid approach of the distribution to normality as
the sample size is increased. Such evidence ... leads us to believe that in almost all cases in
practice we may establish sampling limits for averages of four or more upon the basis of
normal law theory. " f n some instances, sample sizes greater than 30 may be needed if the
distributions of population values are highly skewed with long tails (sampling from an exponential distribution would be one example). Rut in the experiments we consider in this
book, these situations would be highly unlikel y.
As we discussed in this chapter, for confidence intervals and hypothesi s tests on the population proportion rr, large samples are typically needed before the central limit theorem
takes effect. In these cases the normal distribution is approximating the binomial distribution, which for rr = 0.5 is symmetric. As the value of 1T moves away from 0.5, the binomial
distribution becomes increasi ngly skewed, and larger samples are needed before the normal
approximates it well. A useful rule of thumb is that the normal distribution is a good approximation if nrr > IO for rr :s 0.5, and n(l - rr) > JO for 1T 2 0.5. With this rule, a
sample size of I00 will be large eno ugh (for the normal to be a good approximation) as long
as rr is no smaller than 0.1 or no larger than 0.9.
43
success proportions is detected with reasonably large power. A planning value for the common success proportion (7T) and a meaningful detectable difference (8) of the two success
proportions need to be specified. Information on the success rate is usually available from
prior experiments, and worthwhile changes are determined with economic considerations
in mind. In our illustration, this has led us to the values
1T
0.03 and iS
= 0.005.
The discussion of the sample size in Section 2.5.6 is different in two respects. First, it focuses on the one sample situation, not on comparative experiments. Second, it determines
the sample size that is needed to achieve a certain precision of the estimate (either a single
mean or a single proportion), but does not address the power of detecting a certain meaningful difference.
EXERCISES
Exercise I The file cantrihution summarizes the 2004 contributions to a selective private
liberal arts college. Refer to Section 2.3 for a description of the data set.
(a) Confirm the information in Table 2.1 and Figures 2.6 and 2. 7. Use available computer software such as Excel or Mini tab.
(b) Consider some of the other factors that were not used in Section 2.3. In particular,
assess the effect of gender, marital status, graduation status (graduated/not graduated), and major on the likelihood of donating and the donation amount.
Exercise 2 Consider the data in Section 2.6 on AdTel. Recreate the information in Figures 2.10 through 2.12 and the results of the hypothesis test discussed in this section.
Exercise 3 Search the Web for useful statistics applets. You can do this by searching for expressions such as "applets for central limit effect," "applets for confidence intervals," "applets for hypothesis testing," "applets for visualization of statistical concepts," or "applets
for sample size." Experiment with these applets. These applets will reinforce the concepts'
discussed in Section 2.5. They will demonstrate the central limit effect by drawing repeated
samples of a certain specified size. They show through simulations how the variability of
sample statistics such as the sample mean decreases with increasing sample size. They show
through simulations that 95'Vr1 confidence intervals for a mean or a proportion cover the
population (process) mean and proportion in 95% of the cases. Applets for the correlation
coefficient illustrate the connection between scatter plots and correlation coefficients, and
they show how the correlation changes if observations are changed.
Exercise 4 John, in charge of custodial services at the business school, installed brand new
lightbulbs into the offices of the marketing faculty. He kept track of burned-out bulbs and
the times when he had to replace them. After 12 months, he had to replace 25 of the 30
bulbs. The length of life (in weeks) for the 25 bulbs is given below:
33
19
11
22
22
15
37
10
38
19
20
23
50
30
22
10
15
37
15
22
40
22
46
45
to reject the null hypothesis that, = 1.00 lb, in favor of the alternative that
, > 1.00 lh?
( c) Assume that the distribution of weights is normal; furthermore assume that the
sample average and standard deviation are good estimates of the corresponding
population characteristics, and a. Calculate the proportion of loaves that are underweight (i.e., weigh less than 1.0 pounds).
(d) Predict the weight of a single loaf from this morning's production. Obtain an approximate 95% prediction interval.
Exercise 8 Prior studies showed that the standard deviation among individual measurements of a certain air pollutant is 0.6 parts per million (ppm). You are planning on using
the information from a random sample (i.e., the sample average) to estimate the unknown
process (population) mean,. How large do you have to select the sample size if you want
to be 95'Yo certain that your sample average is within plus or minus 0.2 ppm of the unknown
process mean?
Exercise 9 Thirty lightbulbs were selected randomly from among a very large production
batch, and they were put on test to determine the time until they burn out. The average failure time for these 30 bulbs was 1,080 hours; the sample standard deviation was 210 hours.
The lightbulbs are advertised as having a mean life length of 1,200 hours. Test this hypothesis against the alternative that the mean life length of this batch is actually smaller than
l ,200 hours.
Exercise 10 A new emergency procedure was developed to reduce the time that is required to fix a certain manufacturing problem. Past data under the old system were available ( n = 25). The staff was trained under the new procedure, and the response times for
the next 15 occurrences of this manufacturing problem were recorded.
Old Procedure
4.3
6.5
4.6
4.3
6.4
4.8
5.1
6.8
4.9
4.5
5.1
7.3
5.0
4.6
7.0
5. l
3.8
5.2
4.1
5.7
4.6
5.9
3.1
6.2
6.0
3.3
New Procedure
6.2
4.0
3.3
4.5
2.3
3.0
3.2
3.7
4.5
5.3
4.0
5.4
4.3
3.8
Compare the response times under the old and the new procedure. Are there differences in
the mean responses? Discuss using appropriate graphs, summary statistics, and statistical
tests. Would you switch to the new procedure?
Exercise I I
Two different fabrics are tested on a wear tester. A wear tester is a mechanical
device that rubs the attached fabric against a fixed object. This particular machine has two
separate attachments that allow us to compare two pieces of fabrics in the same run. The
44
(a) Obtain a dot diagram and calculate the mean, median, and standard deviation of
the 25 observations. Calculate the 90th percentile.
(h) We would like to obtain the mean and the median lifr length ofall 30 lightbulbs.
However, by the end of the 12 months, five bulbs had nut yet burned out. Can you
calculate the median of the 30 observations without waiting until the five remaining bulbs faiF
Can you calculate the mean of the 30 observations without waiting until the five
remaining bulbs fail? If not, what can you say about this mean?
Exerci~e
The data set given below list;, the annual 2005 sc1lary (in ~:>i,000) ctnd the edu-
3
4
6
7
9
10
II
12
13
Education
Salary
1'.mployee
Education
Sabi y
16
12
12
I6
18
15
11
12
II
52.J
43.7
39.5
47.8
53.0
49.0
33.7
J2. I
lI
15
16
17
18
19
20
21
4<J.4
45.4
4U
37.(,
9.8
22
20
37.7
26.3
22.0
27.0
23
24
17
16
13
12
12
19
16
17
16
12
]6
16
I')
16
16
25
.lU
M.8
'ilJ."
54.5
27.3
14.8
2 l .7
33.~
Comtruct a scatter diagram of salary against educational achievement. Calculate the cum:latio11 coefficient.
Exercise 6
Jn an NYT/CBS poll, 561<, of 2,000 randomly selected voters in New York City
said that they would vote for the incumbent in a certain two-person race. Calculate a 95%
confidence interval for the population proportion. Discuss ib i1nplication. Carefully discuss
what 1s meant by the population, how you would carry out the random sampling, and what
other foctors could lead to differences between the responses in the survey and the actmd
votes
011
Exercise 7 A S<lmp\c oF n = 5ll bread loaves is taken frnm the si1.ab\e productiun that left
our bc1K.ery this lTlomintj. 'We \'md that the aveta\!,e wei\l,ht u\ the 5ll luaves 1s ~ .ll5 \)\lUnds,
the standard deviation is s - 0.06 pound.
(a) Obtain a 95% confidence interval for the mean weight ulthis morning's production.
( b) One of the employees claims that the current process produces loaves that are
heavier than one pound, on average. Is there enough information in our sample
46
Fahnl
)\
36
l:l
3\1
26
2>
31
35
38
28
17
22
2\1
42
_JI
39
21
_\ 2
Analyze the data and determine whether the mean wear of fabric A is different from that of
fabric G. If it is different, how docs it differ?
Uiscuss why the design of assigning both fabrics to <.:ach run is preferable tu a design that
assigns fabric A to both positions of runs I - 4, and fabric B to both positions of runs 5 - 8.
Exercise 12 In the past, the sign-up rate for your credit card has bt:<.:11 around 6;(1. Your
marketing team wants to decide between two different sets of promotional materials that it
plans to send to potential customers: a traditional set that is vny similar to the one that has
been used in the past, and a new, bolder set that is expected to increase the sign-up rate. Be fore switching to the new materials, your company wants to run a comparative experiment
that evaluates the two sign-up rates. Assuming a significance level oJ 0.05, determine the
common sample size for the two groups that can detect a l % increase in the sign-up rate
with power of0.95. How does the sample size change if you require less power (say, 0.)10 ur
0.80 )? How docs the sample size change if you want to detect a differrnce of one-half of a
percent? You may want to use computer software to carry out the calculations.
Exercise 13 Using a com put er software of your choice, perform Shcwhart's experiment of
drawing random samples of size 4 from a continuous uniform d istrihution between 0 and l.
You may use the Minitab function "Cale > Random Data > Uniform " or the Excel command RAND(). Generate four columns of I ,000 random numbers, ,1nd calculate l,000
sample averages from samples of size 4. Construct a histogram of the 4,000 individual ob servations, and calculate their mean and standard deviation. Construct a histogram of the
I ,000 sample averages, and calculate their mean and standard dcvi<1tiu1l. Compare the twu
histograms and the two mean and standard deviation estimates. Arc tlll'sl' rc,ults what )'OU
expected to see? Explain.
____,
because each of the three promotions is assigned to every block. !'he design is called a ran
dom11.ed complete block design as, within a given block, promution., arL' r.111du111l: a ...... 1g11ed
to weeks. In some situations, the number of treatmcnb is greater than the size of the bl<>Lk
and d u>mplete block experiment is not possible. In those situatiom, whid1 we will not
di.,cuss, it is possible to construct incomplete block experiments (sec Bo\, I lunter, <lild
Hunter, 2005).
The blocking approach has potential advantages. Suppose one partiLular store has some
charalleristic that would make its sales volume particularly high under all three promotions
A, B, and C. In the earlier completely randomized design, a store is assigned to a single promotion, and it is due to chance whether the well-performing store becomes part of promotion A, B, or C. If the store were assigned to A, then A would benefit. I lowever, this benefit
i., not due to the treatment, but due to the store effect. ll the store happened to be assigned
\o group C, then C would benefit. Through "blocking," we LOntrol for the "luck of the
draw" bv assigning all three treatments to every store (block). 1n the randomized block de
sign, we can focus on the relative changes within the block, thw, 1...1111..cltng uut pms1blc block
effects. Consequently, any differences in results will be due to the treatments, not the stores.
If there is an actual block effect, the r.1ndom1zed complete bloLk de-,1g11 11ill inLrl".Ise our
abilit: to detect differences in the trcatrncnb.
Br blocking on '>lon:s, we have eliminated a possible stun: dll'Ll. But -,,tics 1rnght ,d-,o be
affected by a lime (week) effect. The three I-week penods might not be homogeneou'>; ex
pccted sales might vary from week to week. Randomi1ing the a,,1gn111cnl ol the pronwttons
aero.,, the three l 11cck periods o( each hl<llk i, import,rnt l1cL.1use 11 'prcad'> a pm..,tlilc
week\ dfcct across the three lreat111e11h. But rt 1s pos..,ihlc to do hctte1, b) block111g the ex
perimL'lll with respect lo weeks as well .1s '>lores. l-,xpenmrnts that hilllk on two factors
(here, stores and weeks) are called /11//11 s111111re dl'si.;ns. \\'e drsllt'-s th rs type of dc ..,ign 111
C:haptn 7 and in Case I I of the ca'>e stud)' appendix.
Our discussion of the randomi1ed LOmplcte block expenment is ,111.1tural exlem1011 of
the material covered in Chapter 2. There, in comparing two mcam, we di.,cusscd the difference between the completely randomi1ed experiment in which two tre<1l111cnts are assigned
to two groups of different experimental unit'>, and the paired cornpan-,011 (blocked expcri
me11t) in which each unit receive'> both treat111e111'>. We iilLrslrated the p.11rLd Lurnparr.,on
approad1 in a test of a hluod prcssu1T 111cd1cat101l. We showed that 111L'.Isunng blood pres
sure on the same patient before <llld alter treatment eli1111n,1tcd the 1ar1,1tion in prL.,.,urc
among patients, increasing the preusion ot the test and hence it'> ability to dctcLI diltcrcncc.'>
in the two trealmenu,, In this chapter, in the randomized cornpletc block experiment, we
will apply this idea to the comparison of more than two treatments.
).21'HE COMPLE1'ELY RANDOMlZED fXPERlMEN'f
lmn planned and executed a le:,l of the e\Tecllvene::.s of three <liffcrent prnduct display::..
hfteen stores were available for the test, and each display \\<1s used in live different stores.
I<.1 make the results comparable and lo minim11e bias, displays and sturcs were randomly
assigned. Sales volume for the week during which the displa1 was present was measured and
f\
49
3.1
TABLE
2
9 .5
3.2
4.7
75
8.3
Samrlc si ze
Sample mea n
Sample v;uiancc
8.5
7.7
9.0
11.3
9.7
7.9
'i.O
3.2
------- ------
II .5
-
-----
12.4
- ----- ------
6.64
6.82
6.72
10.52
6.28
3.43
TA BLE
3.2
-------- - - -
---
TRfA TM F. NT GROUPS
k
Y1 1
Y12
- - - - - - - - - - - --
Sample si1.c
Sample mea n
Sa mple vari<Jnce
compared to the hase sa les of that store. Percentage changes were calculated, and they are
given in Table 3.1.
In this particul ar example, the observations come from three treatment groups. In gen-.
era I, there arc k treatm ent gro ups with observation s y1,. The first subscript denot es th e treatment group ( 1 = I, 2, . . . , kJ, while the second subscript j denotes the replication. A li stin g
of the observations for the ca se of k treatment gro ups is shown in Table 3.2.
Note that the number of observations in the k treatment groups need not be th e same.
Let us denote the number of observations by n 1, n 2, .. . , nk, and the total number of obse rvations by N = L: ~ _1 n,. We ca ll the study balanced if the sa mple sizes in the k groups a rc the
same. There are adva ntages tt) having equal (or nea rl y equal) sample sizes. Balanced des igns
allow us to estimate the treatment means with uniform precision, and they maximi ze the
power of the test procedure (the F-test) that is discussed in this section.
Table 3.2 also lists summ ary statistics for the k treatment groups. The sample m ean a nd
the sample variance for treatment group i are given by
r;1
j=
I
y = -I
n,
(Y;i - Y;)
and
j=I
sf=---- - n; -
so
See Section 2.3. There are k = 3 groups in Table 3.1, with cqwil sJmple sizes n1 = n2 n3 = 5 and total sample size N = 15. You may want to check the sample means and the
sample variances that are given in Table 3.1, using a calculator or a statistical software
package.
We assume that the samples were randomly drawn from mrmal populations with possibly different means , 1, , 2 , .. , ,k, but each with the same variance cr 2. Civen the sample
result;, we wish to test the null hypothesis that the k population means arc equal against the
altcrnati ve that at least one of the means is different. Formally, we have
In the following sections we will discuss a statistical test for this null hypothesis. Initially, we
i1ssume that the sample sizes in the k treatment groups are the same (n
n2
11,
11), because this will make it easier for us to motivate till' procedure. Liter we will relax tlii:., ,1ssumption, considering thL' more ge11eral L.JSl' when s.1mpk .,i1.cs a1e different.
1
co111111011
population
SS,
n - 1
n - l
The numerator of the sample variance is the sum of the squared deviations of the observation> from their mean, and we denote it by SS,. The denorni11<1tor 11
l is the number of
degrees of freedom that is associated with this sum of squares.
Jn our example of three groups (k = 3) of five observatiom each (n - 5), we have
52
I
SS 1
4
(9.5 - 6.64) 2
+ (3.2
- 6.64) 1 I (4.7
6.64) 2
4
27.27
d-
= 6.82
(8.5 - 6.72) 2
6.72)
4
25.11
-
6.28
'
(5.0
6.72)'
-+-
(3.2
6.72f
s~ =
SI
4
(7.7 - 10.52 ) 2
13.73
= -
--
3.43
Each sample varian ce is a n estimate of the common population variance rr 2 , and this is true
whether or not the population means are the same. The average of these three vari a nces,
( s;' + s;
mate of <T
Jn the general c;ise with k treatments (groups) and varying sample sizes, the pooled estimate of the population variance u 2 is given by
sfv
(n1
--
l) si + (n2
----- -
( n1 -
--
l }s ~
+
----
+ (n1 -
I ) + ( n; - I) -t
+ ( n1
--
I )s l
n1 -
nz -
----- s1 + - - - si +
N-k
N-k
l}
n, -
+ .. -- s2
N - k
n,
L L (rij - rY
i= I j = I
N- k
The numerator in this equation is called the sum of squares within groups (SSW). Its degrees
of freedom are given by the sum of the degrees of freedom of the individual sums of squares,
+ (nk - l) = ( 2:~= 1n;) - k = N - k. The pooled estimate of the population variances~ is the ratio of the sum of squares SSW and its degrees of
freedom N - k; it is called the mean square error within groups.
(n 1 - I) + (n 2 - l) +
In our example, N - k = 15 - 3
5 f + s~ +s ~
2 = ------
Sw
----- -
12, and
SS 1 + SS2 + SS3
--- -----------
12
27.72
25.11
-
+ 13.73
-----~ - - - - -
12
5.51
y=
52
deviation
Uy=
u; =
n.
<J" /
Assume that the sample sizes of the k treatment groups arc the same and suppose that the
nullhypothesis, 1 = , 2
are realizations from the same distribution with a common mean and variance
sample variance of the k group averages y1, )'i, ... , Yk>
<J"
! n. The
52
y )2
L(Y,
;
k -
k- l
<J"
is given by
L n(y, 2 SB -
ns;
y)2
I _ _ __
n(Y,- Y) 2 measuresthevariationofthekgroupmcansfrumthegrand
2:, 2: 1y,/ N;
it represents the variability between the group' tnd is \.Jllcd the sum
of freedom that is associated with this sum of squares; note that there arc k means and one
restriction. The estimate s~ is called the mean square between groups; hence the subscript B.
In the example in Table 3.1 with n = 5, y 1 = 6.64,
6.72 I 10.52)/3 = 7.96,
52 =
(6.64
7.96) 2
--
+ (6.72 - 7.96) 2
--
y2
-1 ( 10.52
7.96 )'
4.92
and
k
L n(r, slB --
1=
rl2
(5)(4.92) - 24 .6
IS
an estimate
ur the
true whether or not the population means are equal. The bdwcen-samplc estimate is abo
an estimate of u 2, but only if the population means are ihe same. lf they are nut, the
between-sample estimate is inflated, in that it also reflects the Jifferences between the population means.
53
A test of the null hypothesis that the population means are equal examines the ratio of
the hetween-samplc and the within-sample variance estimates, syil s~ = 24.6/5.51 = 4.46.
Under the null hypothesis of equal population means, the two estimates of rr 2 will be similar in magnitude, and the ratio will be close to I. If the null hypothesis is false, the numerator in this ratio will be larger' than the denominator, and the ratio will be greater than I.
How large does the ratio have to be before one can reject the null hypothesis that the population means arc equal? Th e answer is given by the F-distribution, which was introduced
in Section 2.2.2. The f-distribution is used to test the equality of two variances and arises in
the following way. Suppose we take two independent samples from a normal distribution
with variance rr 2: one sample of size n 1 and the other of size n2 Then the ratio of the two
sf
S2
and n 2
11 1 - -
I degrees of fr eedom.
Applying this result to our problem, a test of the null hypothesis that the k population
means are equal is given by the ratio of the between-sample and the within-sample varian ce
estimates,
k
L nCr, -
SSHl(k - I)
--- - -
r) 21(k - 1)
i= l
-------
SSWl(N - k)
L L (Yi; 1=
i-
y,)2/(N - k)
Under the null hypothesis of equal population means, this F-ratio follows an F-distribution
with k - 1 and N - k degrees of freedom. In our example, F = s ~I s~ = 24.6/ 5.51 = 4.46.
The numerator has k - I = 3 - 1 = 2 degrees of freedom, while the denominator has
N -- k
taining the value 4.46 or larger from this F-distribution. It is given by P [ F(2, 12)
4.46 J =
0.036. This is small er than the usually adopted 5% significance level and gives us reason to .
reject the null hypothesis. We conclude that there are differences among the three population means.
Excel or any other statistical software package can be used to find the probability value .
For example, the Excel command FDIST(4.46, 2, 12) returns the probability value 0 .036.
Alternatively, we ohtain the 95th percentile of the F(2, 12) distribution and use it as the critical value for the test. Th e Excel command FINV(0.05, 2, 12) returns 3.89. Our test stati stic
exceeds this critical value.
s4
T F s T 1N c, P 1 I r 1c R EN c: Es A iYI o N c, s r.
v r: RA L
rABLE
ME A N s
3. 3
Varlcllton
L n,(y,
Between groups
Degrees of
heedom df
i\lear
Squares MS
k- I
5513/(k - I)
N - k
SSW!(N
ni
F-l<alio
S::>Wl (N
'
SSBI( k
--
I)
k)
"
L L (1,1 - _yy
Wi1hin groups
'
I I
"
2: 2: (y,J -
lt>t.il
I I
'
k)
N - I
r) 2
rt
L n,(y, F-
y) 2!(k - I)
L 2: (y,J I
y,)2/(N - k)
(ANO VA) table; see Table 3.3. lt can be shown that the total sum of thL' squared deviations
of the observations from their common mean SST - 2: ~
~ ;' 1 (y,,
Y) 2 can be parti -
tioned els
k
2
1
2, (y,,
I
y) 2
211,(Y,
I
)', )
Y )'
.'lSH
\.'i Betwi:t:n ( Jruup.,
.)_\\ \
J/ l l LJ f l~
The lolill sum ofsl/lllln:s (SST) is -,l10w11 in the lc1st row ulTabll' .1.3. It llll'asurcs the vari abilit} of the obscrvatiom around their LOlllmon mean y
grees of freedom arc N
~ ~ 1 ~ ; 1y ,/ N , and il\ de -
l.
Programs for calculating the A NOVA table and for testing the hypothesis that the population means are the same are part of most statistical software packages. TC.) illustrate, we use
Minitab, a popular and useful statistical software package. Commands in Minitab carry out
statistical analyses of data that are entered into columns and rows of a oprcadsheet. In this
example, one enters the response (percent changes) and the treatment identifier (I, 2, 3)
into two columns, say, columns I and 2. There <lfC 15 rows in each colullln because there
are 15 stores. The fi.rst row has 9.5 in column I and I in colu1rn 2; the second ruw has 3.2
in column I and I in column 2; ... ; the last row has 12.4 in column I and 3 in column 2.
The consecutive arrangement of the 15 stores is arbitrary given that there is no particular
order lo the stores. The Minitab command "ANOVA >One-Way" provides the ANOVA
table and the confidence intervals that you see at the bottom of Table 3.4. Other statistical
software packages such as JMP and SPSS work in pretty much the same way.
TARLE
S'i
3.4
Source DF
SS
Display 2 49.17
Error
12
66.11
Total
14 115. 28
MS
24.58
5. 51
- --
---------- - -- - -----
F
P
4.46 0.036
Individual 95% Cis For Mean Based on
Pooled StDev
Display
1
2
3
N
Mean
5 6.640
5 6.720
5 10.520
StDev
2.611
2.505
1.853
,,
5. 0
7. 5
10. 0
12. 5
Pooled StDev=2.347
Table 3.4 shows the F-statistic and the probability value that we had calculated earlier.
The test result provides fairly strong evidence that the means are different. Once we have decided that there arc differences among th e population means, we look at how they differ.
This can be done by di splaying the data graphically. Dot diagrams, separate for each group
but shown o n the same scale, are very informative because they show differences in the levels as well as differen ces in the variability (which, for our test to work, should be about the
sa me). The sample means and their 9S% confidence intervals shown in Table 3.4 arc also
very informative. The mean square error within the groups,
stv =
ance of individual observations by pooling the variability across the k groups. Its square root
gives the pooled standard deviiltion
Sw
= vs.Si =
Minitah output. We use it to calculate confidence intervals of the population mea ns. The
9S% confidence interval for }.L, is given by
the I-distribution with N - k degrees of freedom, which arc the degrees of freedom that are
associated with the poo led estimate.
The confidence intervals show how the means differ. The third display is more effective.
than the first two. Also, there is not much difference between displays 2 and 3.
3.3
A large supermarket chain plans to test three different versions of an in-store promotion .
The firm identifies l S stores in one region to participate in the experiment. A test of a particular version of the promotion in a store will run for one week. The company wants to run
th e entire experiment over a consecutive 5-week period and plans to run three tests per
week. Initially, the marketing director decided to use the following completely randomized
d esign. Each promotion strategy would be randomly assigned to five stores. Then three of
the 15 stores would be randomly chosen for the first week, three other stores would be randomly chosen for th e second week, and so forth.
A young analyst in the marketing department, fresh from a course in experimental
design, suggests an alternative. She reali zes that in a completely randomized design there
is a chance that all three stores selected for week I could have been assigned promotion
S6
TAJJLE 3.5
Results of a Blocked Experiment with Three Different In-Store Promotions
HLOLK (WEEK)
--
Treatment
Version l
Version 2
Version 3
Block (week) Average
Week I
Week2
52
60
56
56
47
55
48
50
Treatment
Average
TABLE
Week3
Week 4
Week 5
44
51
52
44
43
38
47.2
51.8
46.2
49
41
48.4
49
45
46
42
3.6
Responses from the Randomized Complete Hluck Experimcti t (the Grnaal Case)
BLULK
Treatment
Treatment
Average
Y21
Y12
Y22
Y11.
Yi.
y,.
y,
Yu
Yk2
YH
Yv
y..
Y11
llluck Average
Y1
Yt
---
y,
(treatment) A, while in week 2, none of the stores would be k.'>ting A. She points out that
the dT1:lliveness of .t promotion might depend on the week in which it is run, due to differences in the weather and other conditions that might vary l:orn week to week. She explains, "In this case there are two sources of variation, the promotions themselves and 1he
week. 'fo eliminate the week as a source of variation, we should run the experiment in
'blocks' with each of the three versions of the promotion tested in every week." The marketing director is impressed and decides to follow the analyst's ,1dvicc. The design and results arc shown in Table 3.5. Based on the percentage increase in sales, the firm measures
the effectiveness of a promotion on a scale from 0 to lOO.
!11 the general randomized complete block design, there arc k treatments and /;blocks,
and n ~ bk observations. The observations y, 1, for i = 1, 2, ... , k (treatments) and j = 1,
2, ... , 11 (blocks ), are arranged in Table 3.6. The observations irt the second column represent the k treatment responses on the first block; the observations in the third column are
the responses on the second block, and so forth; and the penultimate column contains the
responses on block b. The treatment means (averaged over the blocks) are denoted by JI,.,
for i = l, 2, ... , k. The block means (averaged over the k treatments) are given by )1. 1 , for
Observe our dot notation for averages. A dot in place oC an index expresses the fact
that we have averaged the observations over that index. For example,
y.
block), averaging over all treatment:, in that block. The overall average y. is the result uf averaging over both inc.exes i and j.
We model the observation y,1 i11 treatment group i and block .I through an overall level
and additive effects of treatment i and block j. Using the surnm<.try data in Table 3.6, we estimate the observation y,J by y..
of the grand mean; the second term is an estimate of the incremental effect of treatment i;
TA
111
) -
MF/\r-.iS
57
'i!lllHl'tlf
\'.1ric1t1nn
.\1edn '>quarcs
_\,/'-,
I rc,llmcnt
\\( 11()
fi~(y.
y.)"
,\fS(TI~)
Hinck
.'i.'i ( Ill )
k 2:V ,
y_) -
/1
.\l'>(BI.)
y., + y. ) 2 (k
y,.
I
2:
lnJ.d
M'i( IR )
MS( error)
MS(IH)
MS( error)
M'i(crror)
l )(b - I)
)'.)"
~(}'.,
I
ll-111"
kh
,111d 1he L1'>l leri11 i'> .in e'>tim, te of the incremental effect ofhlockj. The difference bctwecn
the observation and this e'1imate is the error component, y,1
y .. + (y,.
)' ) -+
( )'
}' ) - \'
\'..
}',
. \' ..
J'hl'> model ,dlrn1, 11s to express the de1i,1tio11 of each observation from the grand mclll <ts
y,,
)!. '
y .. ) + (y., - y .. )
(y,.
(y,, - y,. - Y; + y .. )
The first component on the 1ight-hand side compares the treatment mean to the overall
mean; the second compone1rt compares the block mean to the overall mean; and the last
uimponenl measures !he c'rrnr after correcting the observation for its treatment average
and block average.
'>qu.mng the Jett .ind right hand sides of the equation, summing over both inde.xes
1 and/, .rnd LJ<,ing till' foct !hat .di sums of cross produll terms arc 1cro lead to the stfm of
squ.1rl''> dc'composil1on:
k
)'. )' +
L L (Y,,
t
y,.
y., + \' )
,.
lr
'i\/
I)
(k
I) 1 (
1)
+ ( k - I) ( h
I)
The sum of squares and their degrees of freedom become part of the ANOVA table 1n
T.ihle "\.7.
I krc, we .ire testing whether the treatment means differ and whether the block mc.1m
differ; thus, there <1rc two F r.. t1os. The relevant stat1st1c for testing whether there arc statis. .-- .
. I I
. - MS(TR)
t1c,1 11 v '>1gnlf1cant
trc,1tment , fleets
1s
g1Hn 111
t ll' ast co Iumn. fl lL' I -stat1stic
)
MS( error
58
!'A I! I I
.\ .
F
P
OF
SS
MS
Source
Treatment
2 89.2 44.60 7.62 0.014
Block (Week) 4 363.6 90.90 15.54 0.001
8 46.8 5.85
Error
14 499.6
Total
at -,1gn iii ca nu.> lel'l~I 0.05 concludes that t hne arc sign Jfic1 n t t rcat lllL'n t cfll'Ll'> if t l11s
/..'>latistil is larger than the 95th pnccntile of that d1stribut.on. The second f stat1st1L
,\/'>(\-IL)
1\il'>(nrm)
(k
I J(h
tesh Im hloLk etfccb ,111d 1s corn pared to the LJ'>ti1 f'l'rLl'lltilc of thL }-( /J
I,
I)) di:.trihution.
3 displavs and b
Tahll' 5.5, together with the treatment and block means, can be used to calculate the sums
of squares in !"able 5.7. We have done tlm tor illustrallon, even though stallst1cal computer
soft\\'arc will be used in practice.
SY!
(52
48.4) 2
SS(IRJ
51(47.2
SS(BIJ
31(56
+ (47
48.4)
t- . . . + (60
48.4) 2 +(51.8
48.4) 2 t (SO
48.4) ) t.
48.4) 2
48.4)
48.4) 2
(46
(49
(3H
48.4) 2
t99.6
89.2
18 .4 ). + (41
t18.4) 'J
363.6
and
SS(error) - SST
SS(TR)
SS(HL)
499.6
89.2
363.6 - 46.8
We use Minitab for the calculations. The ANO\'A table in Table 3.8 was obtained with
the ,\Iinitab command "ANO\'A
>
consists of three columns: column I contains the response (effectiveness rating); column 2,
the trl'atment (promotion) md1L,1tors (I, 2, 5); ,md rnlunrn i, thl' hlt1Lk1ng group-, (\\L'l'k>
I through 5) . There arc 15 rows to ead1 column. Row I cunt,11m 52 in Lolumn \,I in rnlu11111 2, and I 111 column 3. !{ow 2 Lonta1m 47 in ullu111n I, I i11 ullunrn 2, ,111d 2 in t.olu11111
d so forth; the 15th row contains 38 in rnlumn I, 3 in u,lu11111 2, ,md 5 in column 0\.
7.b2,
promotion 2 scorinl_!, '>ig,ni\icant\y h1g,hi.:r than prnrnotions I a Pd'> whi(\1,ni.: .1bou\ the same.
'\ <ib\c 3.8 shows \hat there is urnsidernb\e variability anrnng, the weeb \ !~ statistic
15.54, with \)robability value 0.001 ), and 1ha11t b important to IHL\.H\XHatc this vMi<1b1\it)
into the analysis when comparing the three treatments. What would happen if the blocking
effcd were ignored? Cornbining .'>S( block)
S.S( error )
36.\.6
46.8
410.4 with
+8
59
= 12 degrees of freedom , and calculating the test statistic that is appropriate for the
completely randomi zed design in Section 3.2, would have led to the F-statistic
MS(TR)
- -- - - .
MS( error)
89.2/2
-- ---
( 563.6
---- --
= 1.30
46.8 )/l 2
with probability value P[ F(2, 12) > 1.30 ) = 0.31. An analysis that ignores the week e ffects
would have made the error of accepting the null hypothesis and concluding that there are
no differences among the treatments.
Note. The analysis in this section assumes that we have exactly one observation at each
factor-block combin ation. Th e Minitab command "ANOVA > Two-Way" fails if th ere are
any missi ng observations. In this situation, one needs to use the general linear (regression)
model approach (Minitab command "ANOVA > Genera l Linear Model") to analyze the
data; this will be discussed in C hapter 8.
3.4
CASE STUDY
Thi s case is ad apted from Clarke ( 1987) . Additional relevant details and further analyses are
discussed in Case 11 of the case study appendix.
Researche rs at th e United I )airy Industry Association (UDIA) evaluated the results of a rece nt field experiment to test the impact of varying levels of advertising on the sales of cheese.
The principal objective of the study was to measure the retail sales response (pounds of cheese
sold) to varying levels of advertising. Four test markets, selected from differe nt geographic regions, were used in this study. Executives determined the levels of advertising to be tested
in the experiment. It was believed that the levels should be distinct enough to generate measurable differences in the results . They decided to test the impact of four levels of advert.ising:
Ocents (level A), 3 cents (R), 6 cents (C), and 9 cents (D), all expressed on a per-capita basis.
The 6-cents per-capita level represents a national campaign costing approximately$ L2 million (in 1973). The principal medium for advertising was television, with point-of-purchase
display materials in stores and newspaper ads playing a secondary role. Each of the four levels of advertising was impleme nted within each test market during a 3-month period between
May I 972 and April 1975; see Table 3.9. The sequence in which the advertising levels were
tested was selected so that each adve rtising level was used in only one test market during any
one ti me period. You ca n check that each letter in Table 3. 9 (A, B, C, D) appears only once in
each column and each row. Su ch an arrangement is referred to as a Latin square design. In
Case 11 of the case study appendix, we will discuss the analysis of observations that originate
from a Latin square design, and we will illustrate that this design can be used to further isolate
a possible time effect. However, for the purpose of this illustration, we ignore the time effect
and assume that the observations are from a blocked experiment that studi es four treatments
(A - D) on each of four blocks (test markets) .
Within each market, UDIA executives obtained the cooperation of approximately 30 supermarkets in obtaining quarterly audits of cheese sales. The average cheese sales (in pounds
per store) during 3-month pe riods between May 197 2 and April 1973 arc listed in Table 3.9.
The A NOVA ta hie and the treatment and block means are given in Table 3.10. The results
show that there are large differences between test markets (with highest sales in Rockford
.1. 9
TA ll l I
L'VIA Study ie>l Markel:;, 'Jrcalml'nl>, 'Jest J>cnvd>, and /fr_,wb u\n'rXL' 'lab /'er )loll'
Jl"il
!'v1a1 I uh 72
Au lkt 72
~\n
Jan . , ~
leb Apr 7.\
l<Dd,JorJ
.\lbuquL'llJUL'
( .h,lltdllD0.1
Ji
/)
H
\
A
Ii
;\
/J
/)
Ii
\
/J
~alc>/~tor~
Treatme111
Binghamton
RocklorJ
Albuquerque
7,360
7,364
8,049
9,010
I :l,15.l
I I ,258
I .l,880
I l,147
I I ,852
12,089
I l,KOtl
11,15()
c
[)
("hat t.tnoua
7.557
7,900
8,)01
7,77fl
'J AB I I J. I 0
>\.\'()\'A /'able: L'J>IA Stwfr
Source
Treatment
Block (Market)
Error
Total
Treatment
A
B
c
D
DF
SS
3
1917416
3 79308210
9
4380871
15 85606498
Mean
9980.5
9652. 8
10557.5
10345.8
MS
639139
26436070
486763
Block
Binghamton
Rockford
Albuquerque
Chattanooga
F
1. 31
54.31
0. 329
0.000
Mean
794 5. 8
12859.5
11797.8
7933.5
and Albuquerque), and that sales increase with Ihe amount of advertising.
1 lowevcr,
the
advntising effects arc not statistically signi11cant ( probabilitv \,due 0. \.2tJl. V\'c will rcvi.,it this
in ( .ase 11 of the case study appendix and invcstlg,tte whether the u.,e of tirnt' a.'> an <tdd1tional
blmh.111g variable d1anges our lOilLlu,1011s u11 thc s1gnif1L.tl1Ll nl thL t1cat111c11t L'lil'Lls.
3.5
i'hL tL'fm unulys1~ of l'l1rwnu: (At\OVA) gl\ cs no indiLation th,1t the prtllcdurc is about
LOllll'<ll'ing means. But as we have '>l'en 1n this Lhapter, Wt' te-,t whether '>l'\l'ral means differ
b) n11nparing vari<1I1Ll's. 1 he/.. distribution l'lays the key rok 111 the analy-,is, and pcrhap.,
not surprisingly, Fstands for hshcr, the most important statistiLian ol the 20th century. But
hsher did not invent this distribution; it was derived bv c;t'orge '->ncdecor, who named it I
to honor Fisher.
In the randomized complete block experiment, the lcrm block corm:., from the origins of
this design in agricultural studies. Blocks were created by aggregatll1g Lonliguous parLels
that were homogeneous in terms of soil comosition and hence fertility. Fisher described
these kinds of experiments in his hook '>tatistirnl Metlwds for lfrscun Ii I\'orkcrs, which wa-.
published in 1925.
MFANS
61
EXERCISES
Exercise 1 c:o nsidl'r thl' d.it,1 from a L{)mpletclv randomi1ed experiment (Table 1.2). hpre.ss the deviation of the ohsenat1on from the overall mean as
v) -'- (y,,
(I',
\'
)',)
!'he first component on the right-hand side compares the treatment mean to the mTrall
mean; the second component expresses the within-sample variation. Take the square of the
c\pression, sum the squares mer both indexes i ( i = 1, 2, ... , k) and j (j = I, 2, . , . , 11 1),
and pro\'C the sum of squares decomposition in Section 3.2.4:
!
2: 2: (y,,
....,1
li11;il '-,11rn n! ...,qu.irn
~hell\
L 11,(Y,
)')
r)' +
2: 2: (r,,
Y,)"
SS\.1\
\\Ii
SS Within ( 1rotll''
th.it the .sum of the Lross products 1.s 1ero; that 1s,
'
'\' -..J
'\' (1' I
....._,
l' )( 1,,
Y,)
()
Exercise 2 C:onsidn the d; 1 l,1 from a randomi1ed wmplcte block e\pcriment (Table 3.6).
I q1rL'Ss lhL' dc\i,11ion of the ,1hscrv.1tion frnm the overall mean <lS
( )',
)'..) + (y.,
y .. ) r (y,,
y,.
)'.,
y )
lhe first rnmpontnt on the right-hand side compares the treatment mean to the overall
mean; the second tomponc it compares the block mean to the overall mean; and the last
component mt'<lsures the L'f'rnr after correcting the observation for its treatment .wer,1ge
.111d block average. !)rove the sum of squares decompo.sition in Section 3.3:
'
2: 2: <>"
I/
)' )2
'
,,
'12:(1'.
)' )2 + k L(Y .
v.
l' +
2::
r
II
!nt,11 '-.11111 n! '-,q11.irn
/,
2:(1,,
I
y,.
y., + )'. )
I\
\'
B!ntk
Frrnr
Exercise 3 You study the monthly amounts seventh- and eighth-grade boys and girls
spend on entertainment such as movies, music CDs, and candy. Representative samples of
children within the Iowa City school district were selected, and children were asked about
their spending habits. The following results were obtained.
~.llllJ'lc Sill
1\lc,111 f SI
~land.ml dt'\'i.111011 (~
(a)
.'10
20.1
11.0
2'i
23.2
.'10
19.6
5..'l
S.6
Hlh Crade
(;1r1,
25
25.0
7.0
lest whether or not th l' four groups differ with respect to their mean 'ipcnd111g
amounts.
62
(b) Follow up 011 your analysis in (a) if you find differences. In particular, assess
whether there are differences in the mean spending amuunts ur seventh- and
eighth-grade boys and in the mean spending amounts of seventh- and eighthgrade girls. ~!~est whether the yearly changes in the mean spending amounts differ
between boys and girls.
Exercise 4
(a) To avoid program overlap, you select four different market regions, :rnd you assign
one of the four TV spots to each region. The programs are aired for one month,
and sales of the advertised product are recorded in 16 stores in each of the four
markets. Store-specific sales for the previous month are also available. Discuss how
you would analyze the data to learn which of the four TV spots is preferable. What
additional assumption do you need to make to infer that the winning ad would
also work best in future months?
(b) Assume that all your stores are in a single market, and that all fuur TV spots must
he aired in this single market. You decide to run the sputs in four comecutivc
months. You collect sales data on 16 stores, with each store being observed under
,111 four TV spots. Discuss how you would analyze the d . lld tu learn which of the
four TV spots is preferable. Discuss the differences to yoc1r earlier strategy in (a).
(c) Discuss Lhe advantage and the danger of your design in ( b). For example, would
1our analysis he affected if sales were seasOJ1aJ? Discuss wavs of i11Lurporating
known seasonality into your analysis. ])iscuss ways of blocking thl' experiment
with respect to stores as well as months.
Exercise 5
----
Fabric
A
B
JO
--
.\ 6
26
31
J8
28
J7
n.
_1,1
25
JU
.l9
40
27
28
35
34
42
43
_\J
39
21
37
34
30
39
22
J6
28
27
33
Exercise 6 The female cuckoo lays her eggs into the nests of fos1 er parents. The foster parents are usually deceived, probably because of the similarity in the si1.es of the eggs. Lengths
I'
I
Tl 'iTIN(,
1>1111
MHANS
6.l
of cuckoo eggs (in millimeters) found in the nests of hedge sparrows, robins, and wrens are
shown helow.
22.0, .23.LJ, 20.LJ, 2))1, 2'i.O, 24.0. 21.7, 23.8, 22 .X, 23.1, 23.1, 2.L'i.
2.HI , 21.0
Roh111:
21.~.
23.0, 2.U, 22.4. 23.0, 23.0, 23.0, 22.4, 21.LJ, 22.3, 22.0, 22.fi,
\ \' ren:
1 LJ.i-:.
22.1, 21.5, 20.9. 22.0, 21.0. 22.3, 21.0, 20. \ 20.9, 22.0,
::io.o,
20.8, 21.2, 2 I .0
It i, hcl ll'\l'd that t hl' 'ill' oft Ill' egg i nnuc'nLeS the female lUCkoo in her selection oft he foster parent. Do the data support this hypothesis' Test whether or not the mean lengths of
cuckoo eggs found 111 nests of the three foster-parent species arc the same.
Exercise 7 The plant manager wants to investigate the productivity of three groups of
workers: those with little, average, and considerable work experience. Since the productivity
depends to some degrel' on the day to-day variability of the available raw materials, which
.1ffects all groups in ;1 similar fashion, the manager suspects that the comparison should he
hlockl'd \\Ith re'>pCLI to d;l\'. J"he re.suits (productivity, in percent) from flve production days
.ire given in the following table:
[l,\Y
J .XptT\l'llU..'
57
I\
60
(.1 J
62
04
f>O
M
69
t\re there dilfl'rcnn, 111 thl' mean productivity among the three groups?
1'.xcrcise 8
C rroup
1\
II
(a) Arc there differences in the mean weight gains of the three feed supplements'
(b) Ifynu could <.,1zi1t the experiment over, would you suggest improvements that
would help nrnkc the comparisons more precise? What about a blocking arrangement on initial weight?
11\'0-I I VI I
IA( 'IORIAI
FXl'FRIXHNTS
65
the f.rncier font works better 1\ith the blue background' These arc the kinds of issues we will
address in this chapter.
In this and the next two chJptcrs, we discuss experiments where each factor is studied at
1ust two lcn'k In ( h.1ptcr'> 7 and 8, we consider the more general case when a factor may
h.nc more than two il'veb, t(ir example, four different lc1cls for ad copy, three background
color.'>, and two different font'>.
4. I. I Basic Term'>
\\'e '>1.1rt hv dl'fi111ng somL 1111port,rnt tnms.
lhc /11(fors .lrL' thL 1,1r1,1hk-; whose effrct-; arc he1ng '>tudicd. In the .. dvcrt1s1ng l'\j1LTI
mcnt d1..,u1sscd earl1cr, the f<tdor.., are the ad copy, the lont, and the background color. In
.in 1nduo;tr1al expcrilllcnt, till' f.ictor.'> might he tcrnpn.1turc, pressure, and the type ofLhernical c.1t,ilysl. In an agricultur;1I experiment, the factor'> might be type of seed, type of fertil11cr, and the amount of water whereas in a m.1rketing experiment, the factors might he the
rnlor of the box, the price, and the dollars spent on advertising.
The levels arc the 'pccified values of each factor. /l.s noted earlier, initially we will focus
on 2 lenI designs; that 1,, each factor is set at one of two possible levels. for example, in the
marketing expenllll'llt, thL h1t\ 1s either red or blue.
The response vrmahlc 1s tl1c performance measure, the dollar sales in the marketing expcnml'nt, or the number of bushels of corn in the agriculture expenmcnl.
1\ run 1s a partirnl.ir expenmcnt with each factor at a specified level.
Each factor may he co11ti1nous or categorical, and as we will show in this chapter, the dist1nclln111s important. Lictor, such as temperature, pressure, amount of fertilizer, price, and
dollar'> -;pent on ach ert1'.ing .ire continuous. Factors such as the type of ad copy, font, background color, uilor of the hox, and catalyst arc categorical.
4.2
rhe following example illustrates some basic, important concepts. /I. company manufactures clay pots that arc used lo hold plants. For one of their newest products, the comp.rny
has been experiencing an un.1cceptahly high percentage of pots that crack during the m.rnufacturing proces'>. Companv production engineers have identified three key factor.., they
helic1L' will .iffect nack111g, ,1;1d they decide to run an experiment to k.1rn about the most
llllport,lllt !.Jltor('>J. J"Jie f<iL tOr'> studied .ire thl' pe,1k temperature Ill the kiln, the r,llL' at
"hich pots arc cooled .1ftcr bl1ng heated to the peak temperature, and" coefficient th.it dc'-LTihco; the expamion of the cL1v pot. A higher peak temperature can he expected to reduce
the perc:entage of uacb, hut he higher temperature al'o increa-;cs operating costs. ( ooling
the pols al a faster rate would mean an increase in the number of pots produced per hour,
hut it could also 1ncre<1sc thL percentage of cracked pots. The coefficient of expansion depends on the composition c! the clay. /I. supplier has offered the company a new clay mix
that 1t assert.'> has a lown coefficient of expansion. The mix is being offered at the same price
as the raw materi<d that is currently used. A lower coefficient of expansion should decrease
the incidence of cracks.
66
J__ l'\V0-11'.\'l'l
lACTOlllAI
EXPFRIMENI
'>
'l AH I I 4. I
The lhrce ractun Utlli Their l.cvcls
I
I ~ \:
l: L
l.ictors
C\f'J'"""' Cl
~low
2000~
J.O\\..
List
201>01
fl1gh
The firm wants to determine how changt's in tht'se thrct' laLtOh 1,ould ,iffell the per
CL'lllagL' of cracked pots. and the product ton cngt11eer-, decide to L'\.pernncnt with e.1L11 f.tL
(or .11 two levels. 1'!1c current settings arc lower peak tc111pcr.1lurc, '>lown u1ol111g rate, ,rnd
highn coefficient of cxpmsion. In expenmcnt,d cks1gn, \Vt' usL c1 stand.trd notation with tht
low Jc,el of each f~1Llor cknoted by 111inu'> ( ) and the high k"L'I of each faLlm denoted li) '
plus ( ' ). Leiter in 1 hts chapter, we wtll discuss ho\\' these low .ind high lcvd., would be de
term incd in particular situations. lab le '1.1 lists the thret' factors and their levels.
a lu11c. Such experiments typically start with the current settings of the (actors and begtn
b!'
changing the level of the one factor that is considered the most i111port.1nl. The res po mes at
the low and high settings of this one factor are compared while keeping all other factors
fixed, tnd if there i'> a difftrence, the level at which the rc,po11'L'" !JL,t 1' lockcd 111 f(ir the
ne\t stage. !'he factor that is considered second most import.nt is vc1ried next. Ag.1in, re
spoml''> .it the low and high levels of this factor are compared, rnd the best level of th ts foe
tor, i(tht'n' is a differemc, is locked in f(>r ,t/l ,uh>cqucnt rum. I ht'> proLess u1nt1nuc., un
ti! the last factor is reached.
[ ,ldllr ('(coefficient o(expansion ofthl' d.t\') \\'as COnsidncd nlO'>t LrltlLa/. 1\ppi)-'illg thi-,
approach, we first set I<
slow and I
2,000, then e<1rry out 4 run'>'' 1th (' low ( ) and
4 ru11' ''1th C
high ( . lach ru111s the ll'>U.d product10n h.~ tLh of 100 poh. 'iuppme th,lt
the pm portions of -:racked pots were 5, 8, _) and 8% for the 4 Ill 11'> .it the low ( ) level
Jnd 15, 12, 11, and 1610 for the 4 runs Jt the high ( +-) levt'I oi C, resulting in averages of 6
and I ).5%, respectivelv. The (positive) difference of 7.5% inuicates that it is better to set
factor ( at its low level. But tt would be premature to conclude th,1t the Ill'\\ cl<I\' mix with a
low coefficient of expansion (factor CJ decrea'>e'> cr;1cking in gcner<tl. i\l thi-, point, all we
can sa1 is that we ob-,ervcd a difference of7.'i% at a particul.1r temperature, I
2,000 ( ),
and .i partiLular coolrng rate, N.
-,low ( ). \\'e don't Kill>\\ ii the .'>.tnlc' re'>ult would hold
for ll'mperature r 2,060 or cooling rate u f,1sl.
Temperature was comidered the second most LriticaJ fallot. \il'\t, \\'L' '>et the coefficient
ofexpansion C .it its low (best) levt'i, and fix the cooling rate U at slow ( ). \\'e need to com
pare 4 runs with F
2,000 and 4 runs with F
2,060. We ,iJrcad1 hall' 4 rum with T
2,000. ~o, we do 4 runs with I
2,060. :iupposc we found I
2,060 to result in the bettt'r
respolhl' (fewer ua ... kcd pots).
ur ("
J'\\'<l
11\11
TABI 1-
I\(
IClRIAI
lXl'l;l~l"v1F'1
IS
67
4.2
Run
(st.rnd.1rdor.lcr)
Rate of
Cooling
II
r cmperaturc
I
C:ocffinenl of
Expansion
lo\\') at
their better setting., thc11 u1mp<1Jc I rum with R sl<rn to 4 runs with R fast. \'\'e .dread)
ha\'C 1 run' with N ,Jow, 'n we add four runs with/?
fast.
lh1>-<1pprn.1Lh ofli1angi11g one factor at a time requires 16 runs i\s a result of these
16 runs, we would onlv knmv the effect of each factor at one particular combination of set
tlllg'> ol the otlwr t\\'o. \\L' \\'ould not kno\\' .llwthing about interactions ,1rnong the l.tLtor~,,
hir cx.implc. we \\'ould not know whether the effect ol Lhanging tern pl .iturc from 101' to
high depenlb on the lcnl ol the cooling rate. II such interactions are present, the C\peri
mcnt ofLh.rnging ont laLtor ,1t .1 time Lou Id lead to the wrong comlusions, because it might
not 1(kntil1 till' bL''>t .'>L'lt1ng' !'or the foctor .... \\'e di..,cu<,<, the shortcoming.., of this approach
in more dl'tail in 1\ppendix L l.
tor'> at t\\'O lei-cl., each (as 111 our example), his lactorial design requires just 8 run., fewer
than the l61n the earlier appro,1ch of changing one factor at a time. In addition to the economy of fewer runs, the factorial design provides estimates of possible interactions and thus
produces more information.
Table 4.2 shows the 8 runs of the factorial design with 3 factors at 2 levels each. \\'e use
minus ( - ) and plus ( +-) signs to represent the low and high levels of each factor.
I he 8 run' arc Ji,tcd in the rn callecl stancl.ird order. For example, in run I, all three l.1ctor" arc at their low ll'vcl,, while in run 8, all three foctors are at their high levels. The design
rnatnx in 'itandard order is easy to construct. We start with the first fa( tor (in this c.1'-L'. R)
with a minus sign and altern<lte the signs until we complete the column for the second factor('/'), we . . tart with two mi, us signs and alternate the signs in groups <lftwo. For the third
lac tor (C), we start with four minus signs and alternate the signs in groups of four. This pro8 factor-level combinations. These 8 factor-level combinations c.rn
LCdurc gives u" all 2\
he rcprT'>l'Iltl'd a-. the \'lTtiLe.s of,1 cube; sec rigure 1.1.
i\ faL!ori.1! dc,1g11 \\'Ith 011!)' two factor,, A and fl, h.1s two columns, one for each l.1L tor,
6H
TWO
Ll'Vil
FA< IORIAI
l Xl'i'IUMFN IS
Run,
(
,t,11
Run K
~ 1~.)
(Run.. ~ )r
_/
Run n
(+, ,t)
l"'"'
Factor l IC )
Run .l
'
'
Run I
, , )
'
I
I
l 'aLior 2 IF)
laL!or I I i< J
Run 2
(i,
Figure 4.1
contains
C.raphical Repre-,entat1011
2 Levels
This method of generating the runs is easily extended to ,rny number of factors. With
16 runs. h>r the first laLlm IA) we -,tart with a mi four factors, the design consists of .2 1
nus '>ign and alternate "gm until the lcnb of,tll I Ci rum h,t\L' hecn specified. l'.tdor Ii '>la rt-.
with twu minus '>lgns and <tlternalc'> signs 111 groups of two. hH lur ( '>larb with four rnrnus
sigm <llld alternate' -,igns in groups of four, .111d fall or I> h.1-. light 1111nu-. -,1gm lollo"ed b\
eight 11lus signs.
i"hl'> rnethod of gennating the runs 1n '>tand.1rd order 1s u-,lul ,1-. It hLlp-, L'n-,ure th.It 110
co111h111.1tions ML' mi-,sed. I lowevcr, i11 L.1rry1ng out the l'\peri111L'llt, it i-, L'ssc11t1al tu
perlorm the rum in random order. ['his ra11do11111atio11 is important hccau-.e there rn,11 he
add1ti1111,tl factors not 111duded Ill the e\pe11111L'nt th,1t could 111llUL'llLL' the rL-,ult-.. h11
exam pk, there ma} be a da7 of week cfted or other unknown f~,ctor-, th<tt change with time.
By randomizing, we ensure that the effects of these lurking or noise variables arc distributed
randomly across the factors. A simple way to do so is to put slips of paper' into a box (num
bercd from I to 8 in the case of three factors) and draw them randomll', carrying out the
rum in the order in which the slips were drawn.
4.3
In th1.-, ,1nd the next two chapters, we fou1s on 2 level designs. In ChaptL'l"' 7 .tnd 8, we will
extend the methods developed here to cases \\here fallors h.iH more than two kels.
We use the notation 2' to designate a factorial design with k factors, each having two
levels. Such an experiment requires a total of 2k distinct runs:.~ 4 runs fork
2 factors,
2'
8 runs fork - 3 factors, 2 4
16 runs fork = 4 factor,, <1 id 'ill on.
1 \\' 0 - I I \' I I
o9
TA HI. I 1J. 3
A 2' Fr1rtnrial I Jcs1gn Motnx and Results for the Cracked Pots Problem
--
Run
(standard cmlcrl
Percclliage of
16
5
6
~
-+-
'
'~
16
34
14
14
Let's return to nur discussion of the ceramic pot examp le, Assume that a 2 1 factorial experiment has been run, resulting in the data shown in T:1ble 4.3.
age percent of cracked pots at the slow cooling rate. That is,
cooli11g rate effect: R
l 2 + 16 +- 34 + 34
6+6+16+14
24 -
l O.S
I "'1.S
In T;1hlc 4. ), notice that 12, 16, 34, and 34 ;ire the responses when R is -+ (runs 2, 4, 6, and
H, rL'Sf1ectivcl; ), \\'hilc h, h, 16, ,incl 14 arc the responses when R is - (runs I, 3, S, .rnd 7,
respectivelv).
lhc rn.iin effect of temperature 7 is the average percent of cracked pots at the high(+)
level nf factor Tm in us the average pcrccn tat the low ( - ) level of T,
temperature effect: T
(1
I 16 + 14 + 34
6 + 12 + 16 + 34
17.5 -
17
0.5
The main effect of the coefficient of expansion C is the average percent of cracked pots at
the high (+)level minus the average proportion at the low (-)level,
16 + 34 + 14 + 34
expansion effect: C = - - - - - - - - 4
6+12+6+16
4
24.5 - 10 - 14.5.
Note that we arc using the same notation for the factors (R, T, C) and their esl'i111ated
111ain effects. Jn most cases, it will be clear whether we are referring to the factor or to its
csti111atcd effect. Jn cases where there is the possibility of confusion, we will introduce sepdfatc not,1tinn for the cstim.1tcd effects hy putting parentheses around the factors, such as
(!?.), (T), and (C).
sion. There is an interaction between these two factors, and we ienote it b\' RC. We estimate
the interaction between the cooling rate and the coefficient of expansion by comparing the
effect of cooling rate (factor R) at the two levels of the coefficient of expansion (fallor C).
\\'ith coefficient of expansion at +,the change
cooling rate is changed from its low (
34
34
16
1t1
) to high (+)setting is
14
19
34
t-
lh
6
2
' and /~
and /~
(rum 6
(rum 5
14
uici'fiu~nt
level (8
II the Loefl1LIL'nl ol e\['dl1s1011 1s high, 11e L'X
pecl more cracks when the cooling rate is lllLrcased from the 'Im, r<llL' lo lhL List r,JlL'.
11,o).
By convention, tl1e interaction between the two factors is dLfined a' one half of the dif
fcrcnce between the average cooling rate efkLl with uiefficient of cxpam1011 ,1t ' ,1nd the
<IVL'rage cooling rate effect with coertiuent of expansion ,11 . I hu' the 111leradio11 between
factors/~ and C denoted by UC 1s given by (I LJ
8) 12
5.5. I .,Iler 111 tlm Ll1apter, we will
show how to deter111111e whether an cffell is stati.,tically signitiL,lllt I or no11', let Lh "''u1m
th<tl this interaction is statistically s1gnif1cant and not the re'> ult of random variation.
The square diagram in Figure 4.2 and the interaction di,1gr,1n1 1n 1 1gure 4.3 reprc,ent
convenient ways to compute and display an interaction. The square diagram lists the aver
age response at each of the four possible combinations of settings of factors J< and (. l:ach
of the four numbers 1s the average of two responses. ror example. when both the coefficient
of expansion (C) and cooling rnte (U) are ,It their + levels (runs 6 and 8), the average response 1s (34 ' 34)12
34. When the LOel'iiL1ent ol expans101 < 1s .it thL lm1
le1el, the
b
8, hut whl'll C is at tlw high ( ) le1cl, the elil'Ll ol
effect ot the cooling rate N is 14
the cooling rate J< is 34
15
19.
I hl' interaction 1s shown graph1call} 111 the rnterallion d1c1gram 1n l 1gure l.3. I !ere, IH'
connect the average response at the low and high levels of the Llloling rate, and we do this
separately for e<1ch !eve! of the coefficient of expansion. Not1LL' that thl' two lines have different ,]opes, reflectmg the fact that there 1s an interallion. If there were 110 1nteractio11, the
two lines would be parallel or nearly so.
71
' 15
< llcffic1en1 nl
CXJ1<llllll!ll
("
- 6 -----------+
Cooling ralc ( Pl
Figure 4.2
J(l
_:;:
~
25
s 20
~
&;-
1:1
[(I
--,
------
---------~----
+
Cooling rate (P)
- - l'xf'il 11s1on al high
Figure 4.3
(I)
level
Expilnsion
at
low (-)leve l
RT Interaction
The square diagram for I his interaction is shown in Figure 4.4. With temperature -1 <ll-+,
to + is
the effect of changing the c0<1li11g rate/? from
16
+ 34
+ 14
25 - 10
15
With temperature Tat - , th "~ effect of changing the cooling rate from - to
12 + 34
2
+
2
16
23 - 11 = 12
15 - 12
The R'J' interaction is one-half of the difference, which is - - 2
l.5.
+ is
72
25
1e mp er ature ( n
-23
Figurt' 4.4
Te mperature
-2 1
Crl
+
Coelfairnt of expansion (CJ
Figure 4.5
TC Interaction
The square diagram for this interaction is shown in Figure 4.5. With temperature Tat
the effect of changing the coefficient of expansion from - to
14
34
34
16
16
24
11
+,
is
13
, the effect ofd1angi11g tliL u>eCllLic11t t>l L'\ll<tmio11 fro111 - It> +- i:-
6 t 12
25
16
IJ
] Ii
,:;,
TWO I FVlI
lA( TOl<IAI
FXPFT<l.'v!FN
rs
73
2-foctor interaction between any two of the factors depends on the level of the third
fiictor. Fquivalcntly, it means that the effect of changing a particular fador from
to -t
depend-; on lhc level<, of the ithcr two factors. The 3 factor interaction is calculated <ls
fo 11 O\\s:
I ind the JU 1ntcralt1on with the coefficient of expansion Cat '
\\'ith T
With
(and C
J ),
(and (
t ),
the effect of R
\\'ith T
\\'ith .,
In
is
14
14
20.
.14
16
18
18.
20
I.
"
(,1nd (
), the effeLI of R
I.ind (
16
10.
12
6.
IO
IS
2.
I he 3 fad or interaction ,., defined a., one -half the difference of thc.,e two 2-factor
1ntcra<..1ion.,, N IC
teraction NJ with ( at
, then took one half the difference. Note that the chrnce of
which 2 factor interaction to use in the calculation is arbitrary. We would have obtained the
same result by taking h.1lf thc difference of the 2-factor interaction RC with Tat + and the
2-factor interaction r<.c with rat , or by taking half the difference of the 2-factor int~rac
tion TC with Rat + zind the 2-factor interaction with U at - .
If there were a fourth fal.lor (',ay, factor[)), we could also calculate a 4-factor interaLlion.
1he 1 f,1clor rntcractron /ff(/) 1s one-half of the diffcrrnce between the 3-factor intcr.1Ltion !<. /{ lalculated when the fourth factor I> is at its high (+)level and the 3-factor rntcr.iction /.IJ( when/) 1., .it 11' low ( ) level.
design matrix .,Jrnwn in T.1b ,l' 4.4. \'\.'e have added four so-called calculation columns th,1t
.1llmv us to cstim.ite the intcr<1ctions simplv and directly.
The signs in the added co urnns (RT, RC, TC, RIC) were found hy multiplying the srgns
in the design col um m ( R, F, ( ') row by row. f<or example, in the R7 column, the sign in the
fir'>! r<l\\ ( 1 ) j., the product ,,f the fir-;t row of the R column, which is - , and the fir.,t row
of the I column, 1,hich j.,, lso . Simil.irly, ohscrYe that the sign in the seventh 1,i11 nf
Lolurnn /ff(' i.'>
(for/<.),
74
--~------------------~
~--~--------
TABLE 4.4
'J'able of Signs for Calculating FJ]ects in the 2' Factorial /)esw1: Crocknl l,ots t:xamplc
PerLL'ntagL'
u!'
l\u11
Ii
VI
He
re
}(/(
Put.~
"''th (
Jdcb
(l
12
-r
(l
J(l
j(l
(1
34
t-
34
14
We obtain the main effect of R by applying the signs in co umn N. to the res po mes in the
last column. We have
-6
main effect of R
12
l6 -
16
34
54
14 t 34
13.5
The minus and plus signs in the numerator of this expression correspond to the signs i.n the
first column. We divide the linear combination of the responses in the numerator by the
number of plus signs, which in this case is 4. This expression is equivalent to taking the av
erage of the four responses with 1\ at + ~rnd subtracting frori it the average of the four re
spomes with R at , which is how we defined and calculakd the main effect previously.
Simil~1rly, for the two other main effects WL' have
main effect of T
main effect of C
-6 -
12
16 -
16 - 34
14
34
12
16
- 0.5
16
34 i
14
58
34
14.5
interaction is obtained bv applying the signs in the /(T column to the responses
i11 the last column and dividing the result by 4, the number of plus sigm in that culumn.
We ha\'c
The]({
R7 i11tcractio11 =
12
-+
16 + 16
)..j
ll +- 34
1.5
Similarly,
6 - 12
RC interaction = -
6
TC interaction = -
16 -
16
+ 34 - 14 + 34
+- 12
- 6
16 -
16 - 34
14
34
l2 i
6 -
16
22
5.5
-6
RTC interaction = -
+
4
-6
- LS
l6
34
14 + 34
-2
4
-0.5
r
IW0
4.4
I I'\' I I
I i\ C I () R Ii\ I
F X I' f. R f M I NT S
75
J\ 2-lcvel factorial design with k factors leads tn many estimated main effecb and intcrav
~or
i 1( k
.
35 three fallor interactions,
(
41 7
2!(7 - 2
7!
4 )!
SO Oil.
lortun,1telv, \\'e L.lll ex1wLt the great majority of these effects to he negligible. Experience
has shown that the Pareto principle is generally at work here, with a small number of effects
constituting what the Pareto principle calls "the vital few" and the remainder comprising
the " tri1 ial manv." I he phra..,c "effect.., sparcity" is also used to conve1 the same idea. In
,1ddition, there tenth to he a hierarchical ordering of effects with main effects larger in mag
nitudL and hL'llll' more imp1 1rt,1nt than 2 factor interaLtiom, 2 factor interactions larger
than ) l.1ctor intc1aL1io11,, ,rnd so forth. In experiments for which 4 factor and higher
onk1 intn,1Ltion .s c.111 he cst mated, thcv arc .1lnH1\t cert,1in to he negli~,1hlc. In 1110..,t cases,
thL' .\ f.1L1or i11teraL1111ns will he ncgligihlc .is well.
l'hLTL' 1.., ..,uh-,1.rnt1,d empmL.ii L'1"1de11Lc nl thl'> hicr.1rchical ordering primiplc ha..,ed
llll thL ,ll.LlllllUl,1tilln ol l'\pe11mental re.,ulh 1n n11merous settings llVl'r 111.rny years. In the
Lase of cont1nuom factors, ,here is also theoretical support. Smooth response fumt1ons
can he approximated hy their l'aylor series expansions, with first-order terms (main elklls),
scurnd order terms (2 fallor interactions), and so on, with higher-order terms C{)rrt'spondmg to higher order effects diminishing in magnitude.
The calculated effi.cts arc estimates th<1t are subject to uncertainty. Repeating a particular
experimental run would inv<1riably result in a response that is somewhat different from the
orig1n.il lllll', duL' to L'\pcr1111L'nt,1l error. 1:or e\,1111f1lc, 111 the crJcked pots experiment there
dre numerous sources of experimental error: differences in clay composition from batch to
batd1, vari.ibility 111 actual pe.1k kiln temperature around each of the two target settings, differences in how pot'> .ire h.rndlcd bv workers, and so forth. As a consequence, a seul!ld 111
dependent exernt1on of the entire experiment would result in calculated effects that would
differ from the estimates oht,1ined before. In the light of this vari;ihility, the experimenter
need.., to determine 1\hich estimates arest11tist1cnllysignificant . In assessing thestatistiLal signifiLance of ,rn estimated effect, the question is whether the evidence is strong enough lor
the cxperi111e11tcr to c:oncluu1 beyond a reasonable doubt, that the tru1 (or mean) cffed is
not equal to zero.
J'hcre are four approadle!> to determining the statistical significance of effects in ,1 factorial experiment, which arc discussed in the next sections:
I. Replicating all or part of the desig11 (i.e., multiple runs under the <>ame experimental
rnndit1ons ),
"\. \ssuming higher order interalliom arc ncgl1giblc so tlut thl'1r L''>ti111ate-, IL')Hl''>L'llt
noise (experimental error), and
4. '.'\ormal probability plots
and the results arc shown in Table 4.5. for a particular combination of factor settings (e.g.,
+ + +),the difference in the two percentages 1s due to cxpcr1111cntal error. 'vVith eight dis
ti net combinations of factor scttmgs (the 8 runs in standard order), we L<lll Lakul,1te eight
separ<tte estimates of the variance of the experimental error, .v1th e<1d1 L''>t1rn<1te having onl'
degree ol freedom These estimates arc shown ll1 I dblc 4.5.
\\'c average the eight estimates to obtain the pooled e-,timate
s;.'
8 i 2
18 i 2 t 18
\,
8.25
H t 2
8 2S
sponse of an individual run; it has 8 degrees of freedom (the -,11111 of the degrees of freedom
of the eight separate cstm1ates). )111ag111c repeatedlv Ldlryrng nut a part1Lular run, (sav) run
3 (I~
,1
+, (
). Because of experimental error, the uutwmcs >rnuld vary from
run lo run, with sl'
2.87 estimating the variability in the-,c responses.
LILh cstm1ated cffi..ll is the d1ffcrc11Le ol two a\L'ragcs: thl ,J\'l'r<tgL of eight rLsponses ,Jt
the + level and till' ,1vcrage of eight responses at the
Icici I he rl'spoll'-L's arc unLcrta111
(random variables), and hence each cstim,1tcd effect is a rando 11 \ariahlc ,1s well, with a cc1
ta in unknown me<l!l. ( )1ll..c the experiment is canicd out, WL ohl.1111 ,1 r1,1rt1cul<11 \.due !or
each e-,timatcd effect. h>r example, we Lakulatcd that the estimated m,1in eff(:ct of looling
rate I.'> 13.5. )f we repeated the entire experiment and rccakul.itcd the 111a111 cffctt of U from
the )6 new runs, we would obtain anothercstJlll<lle for the effect of the L<ioling rate. BecaU'>L'
of the variability in the I 6 individual rl''>f)()Jlses, this second LalculatLd estimate would al
111ost ll'rtainly be different from the first. II we repeated the n.pcrime11t 111am times and ,n
cragcd the estimates of R calculated each time, we would obtain '>0111clh111g llosc to the long
run d\crage or mean effect of R.
'fo test whether the estimate R = 13.5 is statisticalJy sign ficant, we ask the following
13.5
question: If the mean effect were actually 1.cro, how likely is 1 that the estimate R
would occur by chance? 'fo answer this question, we will u>mlruLl a 9"i"o confiderllL' inter-
val on the mean effect of cooling rate. If the rnnfidrnce in ten ti docs not include 0, we will
reject the hypothesis that the true mean effect of the muling rate is() ;111d Londudc that the
C'>t1n1,11c 13.5 is -,ut1stiL.1lly signiliLant. I or c.1d1 ol thc esl1111atLd cllcch, \\e ,,Jij lollo1, till'
same procedure to determine which effects arc significant.
-,--
I WO
11 Vi'I
~A(
77
I A fl I I 4.::;
/lc.-11/t.1 of the Cmcknl Pots hxamplc with lfrpl1catcd R1111>
R111
I""
I 1t.1r1d.ird
i'crLcnlagc of ( raLkcd
(1nd1v1dual cxpcrimcnlsl
II
order)
1\vcr.1gc
l'crcrn t.1~,. ol
C:r<ickcd l'nt1
I
I.'
11
9
17
19
I)
(i
I(>
12
In
.n
11
12
L~umatcd
(8
variance.'.
h) +- { 1
(>r
\ 19
In) " ( I\
16 ) 2
('l t>
14 ) + (32
34 )'
'1
12 ) 2 + ( J l
(II
12)
2
6 )2
(9
~ (3
n)'
",,
18
'
lh )
( I"
'i
'
i l'i
s,
IhI
'
'"
18
- 8
14 ) 1 + ( 16
( 12
14 ) 2
8
\.1 ~
( )l
" (:lS
>4 l'
let rr,. 11 ,.,, he the 1l,111dard dcvi.ition (\tand.ird error) of an cffell, an.d let <T~"" be the
\ariallle o! the respome of a ' 1ngle run. f.ach estimated effect (main effect as well as interaction) i., the difference of two averages,
each of the cighl d1sl1nct foLttr level combinations and 16 runs in total, each average 1s .rn
erage of the result., from 8 r' 111'. I ct N he the tot<1l number of runs in the expen men t, d nd
<l\
let
1r ,.
,\'
, . Recall that !he v;1riance of a sample average p from a sample of si7e n is given hy
11
IT
, where <T 1., the popul.1t1on standard deviation. Also, we know that the variance o!
II
!he difference nf two independent random V<iriahles is the sum of the variances. I klllc
we find
\'I.Jr
(effect )
\'II r( )',
y )
var(y,)
J
.~
II
4 2
l\J (T run
Replacing 1r;"" by its estimate s1'. leads to
estimated var(effcct) =
Seffecr
4 '
= Nsp
var(y )
--
TABLE
4.6
Confictezce Intervals for Main Effects and Interactions: Cracked Pots l:xample
95% Confidence lnterval
losti111ale
MAlN EFFECTS
13.5
0.5
14.5
13.5 + u
0.5 + 3.3
14.5 ::':: 3.3
1.5
5.5
- 1.5
Cooling rate Ii
1ernperalure T
Coefficient of expansion C
lWO-l;ACTOl{ l'.'JTFRACTJONS
UT
l<C
re
rHREl-FACTOR IN IE RAC I ION
() s
!WT
"<>
11 :
0.:1
2.
1()(1
I .44
4
16
(8 .25)
16 and
sf,
1'
v2.0625 ~ I .44.
sei"''
lll
mean effect. In our example we use the 97.5th percentile (0.025 tail probability) ot the
t-distribution with 8 degrees of freedom (2.306), because the variance
from
eight separate variances . Suppose in this example, there were three replications at each mm
bi nation of settings for a total of 24 runs. Each of the eight separate variance estimates would
have 2 degrees of freedom, and the pooled estimate would havt' I 6 degrees of freedom. The
confidence intervals for the seven effects are given in Table 4.6. Significant effects arc shown
in boldface. They are the main effects of cooling rate Rand coefficient of expansion C:, and
the I<.(: interaction.
Instead of constructing confidence intcrvab, cyuivalently, we can compare the
ratios
(estimated effects divided by their standard error) with ='-2.306 . Effect.'> larger than 2.306 in
absolute value are considered significant.
Interpretation of Results
The estimated main effect of temperature Tis very small and not statistically significant.
Also, the factor T does not interact with either of the other factors. Since increasing the temperature of the kiln would be more costly, it is dear from these results that it would be best
to keep the current kiln temperature. The main effects of cooling rate]( and coefficient of
expansion C, and the interaction between these two factors are stati,ticalJy significant . The
main effect of a factor should be looked at by itself only if th" factor dues not have a statl:,tically -;ignificant interaction with another foctor. Because of the RC inll'raction, we should
and will examine the two factors jointly.
hom the square diagram in Figure 4.2 we observe the nature of the inll'raction. With cu
et11Lic11t of expansion c; at+ (the current material), increasing the cooling rate from
i11crc~1.,cs
to t
the pcn.:cntage of crackcd puts lrnm I 5'ViJ tu 34%. With coclticicr11 ol cxpamiu11
J"\VO
11 Vi:J
!A< J()ll/AI
FXl'l-RIMf-'\JIS
79
10th loclficie11t ol expan-;ion ('and cooling rate Rat their low(-) levels. Since the cost of
he ne\' material with the lower coefficient of expansion is the same as the cost of the current m,lterial, it is clear from the experiment that the new material 1s preferred. The question ofthL' best cooling rate would require additional analysis. The issue 1s whether the cost
s,l\'lngs from lowering the proportion of broken pots from 14% to 6% is greater than the
LO.'>t oft he deu-e.i.-,c in prod uLl J\ 1ty that wou Id result from operating ,it t ht> u1rrent ( sJm,Tr)
cooling r.itc compared to the Li.,tn rate.
4.4.2 Prior Information about <T;un' the Variance of the Experimental Error
'-.omct11lll''>, ha-,cd on prL11ou' C\ll'nsivc cxpn1rnLn1'. the cxpcrimcntn <.:dn .issu11w th.1t
the 1.n1,rnLL' of I hl rc,~1011,c I rn m a '>1 llgle run, 1r
I.'> known. Jn t fw, case, WC SU hst It U I c' I Ill'>
1a!t1L' i1110 the cq11.1t1ll11 lor the '>Lrndard error -,hown in the prcviotl'> s11h.-,ectio11. t\l<.ll , 1n
11111
rcpL1LL' the pc1u1111k' of the I drstnhutinn (I 1<1lucs) by the percentiles of the normal dis
trrhulron (.:- 1,iiuL', ).
0)
s', 1, "
t \
1.75
0) 2
(0.25
5
8.1875
)
1.637S
80
TWO-LEVEL l'ACTOIUAL
EXP~RIMENJ'S
TABLE
4.7
Run
[)
1'1
16
K
22
19
l'crccntJge ol
Cracked Pots
\7
211
7
K
IX
I
9
10
J J
+-
12
JI)
12
jj
14
15
16
\()
I\
_\()
fstwwted ejjects:
l'.flect
Average
R
T
c
D
RT
RC
IW
0.500
-0.250
0.5UO
1.11011
l.:>00
1.750
0.250
U.750
1.500
'](_
nJ
( /)
/(/(.
/(//)
RC:IJ
rcu
/ff CD
No TE.
17.625
12.500
j .000
14.500
-8.250
1.250
5.250
s,. 1re"
\/I.6375 ~
I .::>8
The 97.Sth perccnule uf a 1-distributiun with 5 dt:grees uffrt:edom ts gil'en by 2.571. I knee,
the 9510 confidence interval for each effect is obtained by adding and subtracting (2.57 l) X
3.29 i"rorn the estimate. Any effect with absolute value grc;1tn than 3.29 is stat is ti
( 1.28)
cally significant. We find that R, C, lJ, and /~C arc significant, ;rnd they Jre shown in bu Id face
111 Table 4.7. The rubberized carrier (lJJ reduces the percentc1ge uf cracked puts by 8.25%.
I\\'()
I I VlI
I\{ lf)l<IAI
TXl'fHIMI "''
I RI
thdt the re.,ponse dm, not dqwnd on anv of the L1ctor,, and that observations vary around
a rnnst.rnt lcvLI. !'hen all main ,,nd interaction effects, which arc linear comhin.ltions olthc
rcspon.'>es, should vary <1round zero. Furthermore, because of the central limit effect, their
distribution should be approximately normal, because estimated effects a\Trage the observations (with half of the weights at l and half of the weights at + l ). A dot plot or a histogram of the effects is useful ,is such plots can highlight big effects that do not follow a nor!llal d1stnhut1on .1round zero. 1lowevcr, with seven estimated effects (in the factorial with
"\factors) or IS l''llmated effects (in the factorial with k 4), the construction ofa his1ogram .rnd ,1 check ol whether 1t 1s hell shaped around zero are not verr reliable, given that
there arc iu't ton few observations. A normal prohahilitv plot of the effects, on the other
hand, provides a useful tool.
Let m denote the 111 e't im"tcd effect' lw / 1, J;, ... , /;,,. In general, there will he 111
I eflcl.h. !or c\,1mplc, for/..:
3, there will he se\l'n estimated cffecl.,. 1he procedure
i' the follow111g. I 1r,t , \\'e onlu the effcch frolll '>lll.1llcst to largest. Next, a' described he
lm,,11't' 11lnt till' nhscr1Ld ll'lllJ'irIL.ill u1nrnl.1tiVL' proh;1hilitin associ,1tl'<l 11ith the cstilll;11l'd
cflcch again.'>! the estimated dkcts. 1 he x-axis represent> the effects, and they axis repre
'>cnts the cumulative prohabili!ies. The 'ca le of they axis 1s constructed 111 such a wa1 th.it
if the cL1ta pnmts follow a normal distrihut1<l11, the L11mulativc probabilities will plot
as a straight line. ror effects tl1.1t arc from a normal distribution with mean zero, the plot
of the effects should .1pprox1mate a <;traight line with the line passing through the point
(x 0, v O.S). Signitiunt effects much different from 0 will fall away from this li11L'. rJfeLt., th.it .ll"L' unusu,1111 slllall or 1.irge and I.iii to follow the straight line pattern are judged
to he significant. .'>tatistical '>Oftwarc packages can readily LOnstruct such normal proh.1hd1t} plots.
Tii illustrate the procedure, consider the 2' design in Table 4.4, from which we estin)ate
the Se\ ell cffL'l.l.s ,J10wn Ill T.1hlc ..J.6. \\'corder the . . even effects from small to large: n'
- 1.5 , RCT
0.5, r
0.5, !ff
1.5, RC S.5, R = 13.5, C = 14.5. The 1,mallest among
,he se\Tn effects ( /'(
1.5) represents a cumulative probability between 0 and 1/7 .111d is
.issigncd ,1 cumulat1n prohab1iity (y-val ue ) at the midpoint of that interval, which is (J.07 I..J.
fhe second smallest among th e seven effects (RCT
0.5) represents a cumulative probability between 1/7 and 2/7 and is assigned ay-value at the midpoint of that interval, which
1s 0.2143. The third smallest effect is assigned a cumulative probability at the midpoint of
the interval 2/7 to 3/7, and so f;lrth. In general, with m effects, the ith sm2llest effect is plotted at a cumulative prnh,1hilit) of(i 0.5)/m.
Now let w. return to the cra_ked pots example in which four factors were tested, the origIS
inal three factors plus factor[), the type of carrier. Table 4.7 lists the data and them
effects that can be cst1matcd from this unrcplicatcd experiment. Which effects arc significant? t\s noted above, a simp!C' dot plot of the 15 effects (there are 15 effects in a 2 4 design)
could highlight h1g effects th,,t are far from zero, but a better approach is to construct a
norlllal prohahilitv plot. Figure 4.6 shows the normal probability plot created by Minitah.
'.'\ote that i\11nit,1h tises a -,light I} different definition ofcmp1ncal percentiles; the ith l.1rge<;t
effect" plotted .1t a rn11llll.1t1\L' probability()' v<1luc) of (i
0.3)/(m t 0..1), instead of ,1t
(1
(l.S)/111.
I lmve\l'r, this ch.rnge is of little practical -;ignificancc. Observe that the .,L,tle
82
rwo-LEVl-.L
rACTORIAJ
LXPrlUMENTS
99
.c
95
90
j(
RC
80
70
~ ~8
~ 40
30
20
I~ ~ [)
I -+---------,-~~~~~
-JO
-5
l.rnth' P:,l
1.5
--,
Ill
I)
Effect
Effect type:
Figure 4.6
Not significant
Significant
Normal Plot of Effects in the Cracked Pots Problem: 16 Runs, with 4 factors at
2 Levels Each
on the y-axis (cumulative probability, in percent) in Figure 4.6 is nut linear. This is where
the normal distribution comes into play. With rn effects, the 1th smallest effect is associated
with a cumulative probability of (i 0.3) /(m + 0.4). The percentile of order l OO(i
0.3) I
(m f 0.4) implied by the standard normal distribution is called the norrn1.1/ score or z-score
of the ith smallest effect. A normal probability plot is a plot ofnorrnal scores against the estimated effects. I-or illustration, the normal score of the smallest elleLl /) 8.25 w1 th cumulative probability 0.0455 is given by -1.690 (the 4.55th percrnt1k frnrn the standard
normal). The norm<d score of the Sl'collli s111,dlcst ,1111ong the I 'i eflc'd' I /(J'/)
1.7'11 with
cumul.1tive prnbability 0.1104 is
rnulative probabilities multiplied by 100. Minitab makes it easy tu distinguish between insignificant effects (with mean 0) and significant effects by titting a straight line to the 111iddlc
portion of the graph. It also adds information on Lenth's ( 1989) PSE, a method that we explain in Appendix 4.3. The effects k, C, D, and RC are signihcant, which is consistent with
the 1-csults we obtained by assuming that 3- and 4-factor interactions are zero.
! \'\' () - l I'\' I [
T ;\I\
I I
f A ( ! () IU A I
F x pr R I M
r NT s
83
,1 R
Control
( +- )New Idea
( ,urrent
I ower
No
Curren I
I.ow
Yes
Lower
I ligh
fccts of interest rates and fees, Jsing the four factors shown in Table 4.8. These factors arc
the annual fee, a fee for opening an account, the initial interest rate, and the long-term
interest rate. The company wanted to test the effects of lowering the annual fee and initiat ing an account-opening fee. Although the account-opening fee was likely to reduce the re.;ponsc, one manager thought the fee would give an impression of exclusivity that would
11itigate the magnitude oft he response decline. The team also wanted to test the effect nfa
m1,1ll inL rea<;c in the long-term interest rate. i\t the same time, they wanted to test the cllcct
of two different initial interest rates, both lower than the long-term rate.
10 studv interactions along with all main effects, the consultant recommended a 2 1
design. The marketing team Lhed columns A- D of the test matrix in Tahle 4.9 to create the
16 mail packages. Many advertisements had to be sent to targeted customers as the average
respomc rate to such mail ;ids is only in the 2-3% range. Each of the 16 different mailings
was sent to 7,500 customers, requiring a tot<ll of 120,000 mailings. The numbers of orders
received and the order rates arc given in Table 4.9. In total, 2,837 ord ers were received,
rc:,ulting in an ovcr,111 order rate of I 00(2,8371120,000) - 2.364A1.
The +/
stati'>tical an;1lysi' of the results. The 15 main effects and interactions (the six 2-factor in te1ictions, the four 3-lactor interactions, anci one 4-factor interaction) arc obtained \1\' c;1l cu la ting lineilr com bi nations oft he response rates using the weigh ts ( ::!:: I) in the design ;i nci
intcr,1ctinn columns, and dividing the results by 8. for example, the main effect offallnr A
i.'> given hy
(!\ )
2.!J5
.>.36
2.16
2.29 - 2.49
- 2.0!J + 2.03]/8
0.4075.
The (A/5C) interaction is obtained by first forming the product column ARC (which is given
1-, - , +, +, - , +, - , -, +)and calrnlating
hy - , + , +, - , -+ ,
(ABC) = [ - 2.45 I- 3.36 + 2.16 - 2.29 + 2.49 - 2.04 + 2.03]/8 = -0.0525.
We have put parentheses around the effects to distinguish them from the factors and cnlculation columns. The cs ti mated effects are shown in Table 4. I 0. Significant effects exceeding
two standard errors are indicated in boldface. The calculation of standard errors is explained in Section 4.6.4.
Figure 4.7 graphs the effects in the order of their absolute magnitudes. The broken line
indicates estimates th;1t arc larger than two standard errors. For simpliciiy, we use the factor 2 to ap~miximatc the 97.5th percentile of the standard normal distribution ( 1.96).
TWO
LEVEL- FACTORIAL
1\ I\ 1 f'
EXPER1MF.N~8_s_ __
4 I()
hli111111ed l:f)rrrs
lffrLI %1
2.16112
l '011_..,1;111!
( A)
(fl)
((")
( /))
0.405
-0.518
0.252
- 0.498
-0.302
0.002
0.108
0.048
0.102
0. 158
(ARI
(\("I
1:\/)\
l!iC)
(fl/))
(CD)
I A IJ( '
( AJS/J)
0.052
(l.088
0.008
0.108
(11(,/))
(fl({))
\/Will
O.O'i2
No T F. Fst1m,1!cs
(2)(0.08""
fl: i\ccount-opening
fee
-0.518
CJ) ----
+0.2'i2
ln1t1n.l 1ntcrc..,t r<1ll'
All
. .
--\/)
0.158
0.108
/l("j)
+0.102
Rfl
!\fl{)
All<
0.085
fl.O'i2
1\ /l( '{I
') OS2
fl(
11.1118
\t /I
\(
0.108
0.IH),;
10.00:
f--------
(J.(Jtlll
---,
0.125
11.250
0.375
() 'iOO
Figure 4.7
As shown in hgure 4.7, all four main effects and one (and perhaps a second) interaction
(the All and the en interactions) arc significant. Note that the CD interaction is just slightly
smaller than 2 times the standard error.
B - : ,\'o acco11nt-opl'ningfee. One manager thought that charging an initial fee would
give the impression of exclusivity. However, this fee had the largest negative effect,
n::ducing the response r,\lc by 0.518 percentage points.
r
.
.
'
TWO - I FVEI
FACTORIAi. EXPERIMENTS
87
initial rate has a large impact, with the response changing from 1.91 % to 2.32%. Jn contrast
to the main effeLls th.11 suggc-,t both interest rates should be low, these results, follo\\'ed by
.1dd1tional .rnalv.sis u-,ing the company\ flnanci<il models, showed that a lower long term
r,1te rnupled with the current (higher ) initial rate would be the most profltablc.
O\cr.11!, thc.'>e 1ntcractio11' gi\t' important insight into the true relationship among the
t~1ctors .111d help to hcttn qu.intify their effects on profitability. Rv combining all main effect'> .rnd signifiLant 1nteract1om into one model, the marketing team could analyze different comhin.111ons a11d est1m.1tc response rates and profits more accurately.
4.6
FACTORIAL EXPERIMFNTS
In thi-, -,ection \\T discus-, some additional important issues related to factorial dcsigm. We
hq.~1111'1th ,1 di-,lll'-'ion ofhm1 to -,ct foLtnr ll'\el' sud1 a' the plw; and minus values for kiln
tempc1.1turc. :\L'\t , \H' 'lrnl'. ho11 ,1 facton.11 des1g11 L.111 he represented as a reg1-c.,.,ion
model , and ho\\' thi' model can he used to predict the rL'Sponsc as a function of the factor
'-l'tt111g.,. \\c then tkl111L' .ind di'>cms a11 1mporta11t mathematical property, orthogo11t1!rty ,
\\h1Lh 2 icl'cl f,1ctori.d des1g11' h,l\e, and whid1 leads to the independent estimatio11 ofcffell.,. ,\l.,o in this scLtion, IH' 011sidcr the special charalleristics of experiments such ,1-, di reLt mail, where the rcspomt' \.iriahlc is the fraLtion of people who resf' Ond. \Ve shm, how
to determine the needed sa1rplc si1c and we explain how to assess which effects arc statisti
L.1111 '1gn1l1c.1nt. I 110 ll'lcl l.i donal dc-,igm .ire lineM models that will he inadequatL' ii the
relationship between the response \ariablc and one or more factors 1s nonlinear. We end the
section b7 showing how to check for curvature by adding runs to the original design.
he overwhelmed by the 1nhc ..-nt variability. On the other hand, the levels should not he too
different, either, hcc1use then the effect might no longer be linear over the studied range.
r\lso, note th.11 1n 2 lc\cl experiments it i-, not possible to detect nonlinearity, since"' ith just
two levels only a straight lim can be flt to the responses. In Section 4.6 ,5, we discuss how to
use additional runs to check 1clr nonlinearity.
In gcncr<1I, the'>c LOmiderations do not apply to unordered catcgo - cal factors '-lllh <ls
type of ad copy, font, color, and catalyst, because levels in-between the categories .ire not
t1om .rnd lead., to thl' st.1nd.ird error of the csti!llatcs. I his is equivak11t to our appro.11..h 1n
:-iectio11 4.4.3 ofassurrnng that higher-order interactions arc ncgligihle.
I he rcgres.'>ron appro.1ch ha.'> -,omc advantages. It is more flexible for analyzmg cxpcri!llents th<1t include lllis.'>ing oh-,ervations. Also, the regression approach is needed when factor'> Me co11tinuou., with more than two levels, and if one wishes to model the function-ii re
l.1tiomhip het\\"l'l'll the I"L'spo11sc ,111d thL' faLtors. Jrnagr11c an experiment that asscs'>e'> the
relationship between thL -,,ilL' ofa maga1.1ne and ih u1vcr price. r.xperimcnts at four clrfferL'nt pnLe level<, S 1.0, 'i I .'i, S2.0, and $3.0
rna1 h;1vc heen conducted, and one m.11 '' ish
to rnodLI the ft111Lt1011.1l 1Tl.1tio11,hq1 het1H'Cll .,,1lc'> and prrcc, deterrn1Ih' ''hether till' -,ales
price rLlat1onsh1p 1' Ir near or quadratiL, and h11d the prrcc at which sail's arc maximi1Ld. I-or
that, one need, regrL'''1c111.
4.6.J Orthogonality
Definition .. \ de.,1gn 1' orthogonal ilfor anv two design foctors, each f.1ctor-level LOlllhination ha., the .,,1mc number ii rum.
The 21 f.ictorial design is ,1n orthogonal design. Jn the 2k factorial design each pair ol desrgn fall or'> is '>tud1ed at four pmsihle combinations, and at each of these combinations, 21
runs arc carried out. Consider the 2 1 design for factors A, B, and C shown in Table 4. I I. The
lour level u1mhi11.1t ions ofl,Ktors A and H, for example, arc ( - I, 1), ( + I, 1), ( I, I I),
and (+I, +I), and 2 runs drc conducted at cac h combination. The same is true for the other
two pairs: factors\ and(', ,1J1d factor'> Tl and C:.
'.\:m,, ignore the response column, and consider any two columns (design, as well as cal
culation columns) 1n the matrix in Table 4.11. Multiply the entries in each row of the two
selected columns, and sum the products. It will give zero for any pair of columns. For illustration, take the product O columns C and AHC; you obtain the s1m +I - I
I+
I I I
I
I t I
0. T11is characteristic is a property of orthog< nal 2-lcvcl dc'<ign'>.
lkc.iuse of thi' orthogon, I design <anrcturc, effects .ire estimated indepcndenth for
example, the main effect of r\ docs not depend on the main effect of H because the uirre'ponding uilu111m .Hl' uncorrLl.1ted. It is edS} to sec why this is the case. Whenever I I'> at
f I, H 1s equally likelv to be .it + I or -1. Similarly, whenever!\ 1s at
I, His cqualh likcly
to he at+ 1 or
I. As a resul, any change in one effect is canceled out in the estimate of any
other effect. For example, suppose whenever!\ is + 1 the response is increased lw some
amount.:.. In c.1kulat1ng the main effect of Tl, the amount L\ will be added for each of the
- - - - 1wo 11
VFI. IACTORIAI
rXl'l RIMlNTS
When the
Re~ponsc
ls a Proportion
In the case studv of 'iect1on 4.5, the total sample s11c was 120,000. A fundamcnt.il question 1n problem'> of th1-, t\pe 1s hmv large ,i -,ample si1c is needed? Stat1sticcll pack.iges,
mcluding .\1initah ,111d Jl\11', provide wcful software for makmg this dctcrmmation. 1\ppend1x 2.1 di.'>LU'>'>L''> the theon behind their Lakulat1nns. Suppose that, based on prior cxpencnLe in 111.iiling..,, the fin,1nc1al -;erviLcs Lompanv described in the c1se study cst1111.1tcd
that the over.ill respon-,e r.1tt would he .ihout 0.02'i or 2.5%. further, suppose the firm de
cided that ,i ch<111gc 1n 1c-.s11on'c of 0.2510 w;is economically meaningful. I lcncc, the hrrn
1v.inted to he .ihlc lo Liclell '-.L~Lh .t Lhangc (L'1ther an 111Lreasc from 0.02'> to 0.0275, or ,1 decrease from 0.025 to 0.022S) with high prohahilitv.
I lerc we illustrate .\1111itah's power and sample size routines . .\.1initah includes a function
for dctcrrrnning the -,ample -,i1c in the compariwn of two proportions ("Stat > Pown and
Sample Size > 2 Proportions"). In the case study, the total sample size was 120,000 with
7,500 people receiving the p.ickage mailing defined hy each of the 16 runs. Each cffeLl is the
difference 111 two sample proportions (p
p ), with 60,000 people exposed to the 4 level
,ind 60,000 people exposed to the
level. This means that in estimating each effect we arc
uimp.mng two 1mkpenden1 -,,11nples of silt' 60,000 each. We enter 60,000 for the sample
si1e, 0.025 for proportion I, rnd "not equal" in the options for the altc'rnative hypothesis.
7T 2
This setup tests the null hvpothco;is f!0 : 7T 1
0.025 or 7T 1 - 1T 2
0 against the ;ilterrr , ct. 0. We use a (l.05 significance level when testing the null
n.ll1\e lwpothcSJ.., //: 7T
lwpothcsi-, that the t1rn !11opulat1on) proportions equal 0.025, and that there is no diffcr1
TW<l
Ji'Vll
lA< IORIAI
l'd'llU\ll~i'
case of continuous !Jctors, the effects arc assumed to be linear, and there is no way to Lhcck
whether this assumption is reasonable without adding runs to <he design. 'Jo illustrate, rnn
sider ,1 single contifluous factor with two levels. With two re-,ponse averages-one at the
km ,111d one at the high level- we can fit a str,1ight line pe1 fell I). hut we cannot check
whether the linear model is appropriate.
If we want to flt models with linear and quadratic effects, we need at least three factor lev
els. However, often this is costly in tt:rrns of the number of required runs. At the i111ti,d stage,
where usually one starts with many factors that may or may .rnt have an effect on the respome, this is not a practical approach. h1ctonals with factors at three or 111ore lcveb may
be appropriate at a later stage, after the expenmenter h,1-, rcdUll'd the number
a sm,dlcr set.
u(
faLtors to
. \t the initial scn:cning stage, only a -,1mplc chcLk for 11011l11e,mt1 1s needed. This LJll he
achincd by adding to the factorial design 011c or 111orc rum at till' u:11ll'1 polllt. I he ,,.,11,1
pu111t ol ,1 2 level foLtorial cxpernnent set'> c,1ch fdltor cqu,d to till' ,l\L'ragc of it-. lo\\' ,111d
high levels. In coded u111b, it 1s thl' run with x 1
.\
\,
appropriate for experiml'nts with continuow, factors, but not fm catcgonLal fallor-, \\'here
an in between level h,1s no meaning.
:\s.,ume that we collect
11,
aver.1gc y, and standard deviation s,. The standard error of the average of the n, obser\ations
at the LCnter point 1s given bys) V n"
'\Jcxl, consider the average y of the 111casureme11b at the 2' aLlor Incl Lombmations. II
the response function is linear, then this average will abo be a;1 csl1111ate of the level (mean)
at the Lenter point. However, this is not the case ii the respo1he !u11ct1011 is no11l111car.
lhl' difference bl'tween the two .t\crages J' ,1nd y is a 111e,is ire of the Llirl'<llurc in thL rL'
sponse function. 1\ large nonzero difkrellLL' po1nh to ,1 no11.11ll'<lr IL'l.1t1omh1p. ThL st,1n
dard error of the dillere11ce is needed to ,1s-,ess the st.tl1stic,d .,1gnifiL,ll1Ll'. It 1s rcason.1blc to
assurnl' that the variability of individual respomes at the factorial design points 1s -,irnilar lo
the variability at the center point. Hence, the standard error of y, the average of the obser
vat ions at the 21 factor-level combinations, is given bys) V 2k. Because of the independence
of the two averages, the standard error of their difference is given by
standard error (y
y,)
I
!>
' \ 21
n,
Comment. Obseve that the calculation ot the standard en or of thl' di!krencc rL'lJU1res
inftrnnation on the \ariability of individual responses. Often, <his done above, the standard
deviation is obta111cd from mdepcndcnt replications at the center point. '>0111cti111c">, as
shown in the following example, the standard devi,1tion is obt.11ncd from rq1liL.1tiom at ,di
design point....
Example Case 2 (Maga11ne Price Test) i., med a'> an i\lu.,tr.ition. You .,\wuld refer to the
case study df'pcmlis for,\ detailed disLussio11 uf the cspL'rllllL"lt ,111d the resultlllg d,11,1. r\ 2 '
factonal experimlllt with three continuous !JLtors-rnver t'ricc (~5.':l':l and ~5.':l':l), sub
scripl1on price ($1 and SJ), and nurnber of newsstand cop1.s ( 1/3 less th.Ill current, and
0.175%.
IW<>-1 rv11
TA
Jl I I
h\C TORIAI
EXPERIMJt-."TS
9,
4' I 2
fl
( '1nTr
~ub,cr1p11011
<:np1cs nn
Prtlc
Price
'Jew\st.u1d
Pcrccnl
Change
111 Sale'
2. >I
S.S1
I 1,2
.10
IX.lO
1.4 I
22.61!
0.71
2.llX
'
<I
II
II
II' more than cu rrcn t) wa' carried out with the objective of asse..,si ng the impact or t he..,e
!actor.., on mag.11111c .-.ale'> .. \ ccnter polllt, with a Sl.99 cover price, a $2 subscription price,
.rnd the currentlv u'ed number of newsst,1nd ulp1es was also considerl'd. Each of the nine
comb1n.ition-. \\'.l'> run O\'er <l S week period, and the resulting .wenge weekly pcru:nt
change' in -,ales .ire o,hown in rahle 4.12. The week to-week variation was used as a measure
of expennH:nul error. \'ari,rnces among weekly percent changes, calculated for each of the
nine runs, were averaged, resulting in an estimated standard deviation of the change in
weekl\' o;alcs. Thi' '1<111dard deviation was calculated to be 5%.
:\fain cffrcts and interact1rrns are estimated in the case, and the main effects plots shown
there illustrate mmtlv linear relationships between sales and the three studied factors. The
average of thl' response' at tbe eight factorial points is y
3.63, and ii, distance to the re'>pono,c at the ccntn prnnt "3.h3
2.08
1.5::>. The percent changes listed in Table -1.12
.ire ,1\'l'r<1gc' ol n
) wceklv observations with standard deviation 5, and hence thei1 stan2.236. l'his becomes the estimate;., rom the earl1c'I' disd.ird dcv1.it1on i.., gl\cn I)\ 'i. \ 11
uis-,ion. \\'ith a single rc'spllnsc at the cei;ter point ( n - I), the sta11dard error of this
d1ffereme 1s
,I
st1111rl11rd cr-rnr; v
v ) - 2.23\/
2.37
4.7
,\, di..,c ussed in this L h.1ptl'r, experience h<lS shown th;1t there is a hier.irchical ordering of
effects with main eff(:cts larger than 2-factnr interactions, 2-factor interactions larger than
3-foltor interaction'>, and '>O forth. \Ve noted that for continuous factors the Taylor series
94
EXPEHIMENTS
----
------ ---
expansion of a continuous response function provides theoretical support for this finJing.
But this is only true for continuous factors with smooth rcspome !"unctions. With categorical factors, there is no such theoretical justification. In some cases, for example, it mc1y
be true that a 3-factor interaction is just as large (or even larger) than main effects and
2-factor interactions. To illustrate how this might occur, consider plant growth as the respon'->c (a continuous variable), and the three categorical factors: water (no/yes), fertilizer
(nu / yes ), and temperature (0 degrees / 25 Jegrces). Only one ul the eight f.tctor -lcvcl wm
bimllions (water, fe rtilizer, and temperature of25 degrees ) leads to plant growth, resulting
in ,1Luge3-factor interaction among water, fertilizer, and lcfll[>LTaturc. This and similar ex am pk.., do nut mean th,1t hierarchi._al ordering dues nut c1ppl, tu qu,tl1t,1ti1c facttH'-> at ,di.
Experience has shown that in general it docs. It just means that when the factors under in vestigation are qualitative (red background/blue background, headline I/headline 2, and so
forth ) some caution is needed before one automatically assune::. that 3-foctor (and higherorder ) interactions will be negligible.
In Section 4.4.1, we described how replicated runs arc used to estimate the variance uf
the experimental error (variance of a run) and to find the standard error of an effect to determine which effects are statistically significant. [t is essenti<d that the repeated runs at a
partiuilar combination of factor settings be geni1ine indepcrwail rep/1rntians. !or example,
in the cracked pots example a production batch consisted of I 00 pots. Simply following one
batch with a second would not constitute a genuine independent replication. The variation
in the percentage of broken pots between these two batches is very likely to underestimate
the experimental error. A true replicate requires that each setup procedure in the process be
done independently before each run. Thi> wuuld mean (arrying out tlw '->tep.'> needed to set
the pl'ak tcmeraturc in the kiln, >etting the conveyor speclb that determine the llJoling
rate, and choosing a batch of clay matl'rial from the appropriate supply (low or high coeffi cient of expansion) in a fashion that rel1ects thL variability tli,1t exisb 111 this ra\\ 111,1te1ial.
In general, a common mistake in manufacturing processc.; is to take repeated measurements from the same run and treat each as a replicate. But thi ., only captures measurement
error, which is typically only a small part of the total expcrimllltal error.
Randomization is important in experimentation and no less so when replicates arc in
clucil'd in the experiment. Carrying out 2. runs in succession <lt the s~1me factor settings
would likely lead to underestimating the experimental errnr bvcau::.e these rLspunscs would
be more alike than if the order of the 2 runs were determined randomly.
In the cracked pot example, the 8-run factorial design wa ., entirely replicated, resulting
in a total of 16 runs. for a factorial design with 5 factors or even 4, replicating all of the runs
might be uneconomical. For example, in a 4-factor, 16 -rur: L1ctorial design, the cxperi
menler might rand1Jmly choose (savJ 8 ul the 16 runs tu repliLate. 111 th.it ca,e, th.: Lalcula
tio11 of effects and their 'tandard errors would havt" to be done using regrc,sion bc(au-;e only
half of the 16 experimental conditions would be run twice. Also, there would be 8 degrees
of freedom associated with the standard error of an effect compared tu the 16, if all 16 runs
were repeated. Th~ resulting confidence intervals for the effects would be wider, because a
I-value from a distribution with 8 dq~rees of freedom would be larger.
EXl'LRIMFNTS
9S
Designed ex peri men ts arr pa rt of a sci en ti fie learning process. The goals of th is process a re
to co 11 f1 rm or rcfu tc prim k nnwl edge and t n suggest new hypotheses for future study. Clearly,
it is important that this experimental approach he efficient and lead to the right answers. We
showed that factorial designs where factors arc varied simultaneously are more efficient dnd
provide more information than experiments that vary each factor one at a time.
/\\"ll
,,
11 \'II
I/\<
l<lHl/\I
Factor"\
~~
if
! 'r
Av
//
120
100
I
I
I
,. L
hgurc4.A.l
I'~
l)(j ----------~
~()
l XPl-Hl."1J'\,;
60
!-actor I
HO
90
of(
hJliOl"ldl r.XJ'LTllllL'Jli
'itart1ng \\tth ( \ 1
I, x
I ), the .1pprnach of changing one f.1ctor .1t a time would '>Ct
the ftr-.t factor .it ih low lc\cl (80 1s l<1rgcr than 70) . Locking in the low level of factor I ,ind
\.II") 111g follor 2, one \1ould se leLI the lcm level of fodnr 2 (90 is larger than 80). I l<l\\L'\'l'r,
the combination (x
, .\
) with y
90 has not located the overall ma\1nH1m
y
11 () .it (x 1
-t, x,
). !'he reason why the approach of changing one fad or at a
time f~ub I'> hecau'>e of the interaction hct\\'cen f~ictcm, I and 2.
It I'> not possible to est1111. tc .rn 1nteradion with the data generated under the approach
of lh.rnging one LH.tor .it ,1 t llll'. \\'c l.lnnot uimpare the main effells of a factor at 2 levels
of the ntlll'r f.1ctor if \\'e don t h.ne d.1ta .11 .ill four factor level combination'>.
htrthcrmore, the main effects cstim;1tcs from the approach of chang111g one fall or at a
t 1mc c;rnnot he general 11cd heL.rnsc the; ,ire 111.11 n effects at spcciflL IC\ cb oft he other f.1ctor'>. ~1nce we ML' unccrt,iin 11hethcr there is an interaction, we cannot generalize thc.,c cffccts to other lc\cl'> of the factor'>.
!"he -;amc' prn11t<. c,1n he niade 1n the context ofa 2' factorial design, which can he\ 1su.di1t'd ,1., the HTtice., of a Lllhe; .'>CC hgure ..J.1\. I.lour runs at each of the low and the high '>c'ttings ofc<1ch faltor .ire u-;cd to estimate m.1in effects The approach th<lt changes onl' foctnr
.it ,1 111111.: doc> ll()t rnm1dcr all 8 lcvcl rnmh1natinns, but only the 4 that arc outlined 111 I 1g
ure 4.A. l. Again, we learn the following: (I) The approach of changing one factor at a time
is inefficient in terms of the number of runs. for the same precision, we:: need 8 runs tor establishing the main effect of the first factor and the best level of the flr<t factor; we need 4
more to establish the level of the second factor; and 4 more for the thi d factor. Hence, we
need a total of 16 runs to estimate the main effects with the same preci-,ion that is achieved
hy the foctorial design. (2) It 1s not possible to estimate interaction terms (3) In the presence
of intcr.1ct1on, the procedure of changing one factor at a time can miss the optimurn. The
oh1ect1n 111 I igurc -1.1\. l is 1 find the maximum. I !ere, we st.1rt hy Vdl")'ing factor.\ first,
98
this factor amounts to 2.5. Moreover, there is no guarantt'l' thd this prut.edurl' will tind the
optimum.
102
52 ~
( ~A. 2)
n- p
wherL'
= 2
and (3 1
Another import ant result specifics the covariance matrix o r the regression estimates
/J.
Cov(~u,$ 1 ) ]
s2(X'X) 1
( 4A.3 )
V({3 1 )
The 1ariances of the regression estimates arc in the diagonal of this matrix. Thci r square roots,
the sundard errors of the estimates, are used to comtruct con t1dcnu.' intervals for the regrcs sion coefficients. A diagonal covariance matrix implies that the estimates arc uncorrelated.
A nice feature of these matrix exprcssiom is the fact that th ey work fur general models.
Consider the gencr <:1 linear regression model with k rcgressor, (factors I; that is,
Define the observations of the ith case (the ith experimental run) as y, (for the response),
and x 11 , x; 2,
x,k (for the studied values of the k factors ) . Nute that the t1rst subscript in
.. ,
this double subscript notation is the index for the run (I, 2, ... , n) , and the second sub script is the index for the factor ( l , 2, ... , k ). The n X ( k
X1
X11
X1 2
X2 1
Xn
+ I ) matrix X is given by
xkJ
X2
Xn
1, 1
x ,1
x,,
l .l
xn~
The equati.?n: fo~ the lca_:;t syuares estimates in equation (A4. I ) ( now there arc{' ~ k + J
estimates,{3 0 , {3 1, {3 2, ... , f3 k) and for the covariance matrix in equation ( A4.3 ) (which is now
ak+ I X k
matrix, with variances in the diagonal ) carr y <Jl'er to thL general case. Thl'
y, -
L [y, 52
I =
($u +
f31Xi1
-- - - -
k-
+- f3,x,k) ]2
y,
y, - ((3 0 +
1.1>vE1
_____
T_wo
- ({3, ll
"'
,, r
L.,, I y,
-j-
{3 1X11
-j-
101
ror sum of squares; it measures the variability that is not explained by the model. The sum
of the -;qua red d ist;i nces of the observations from their sample mean, SST= L :'~ 1[y,
y]2,
is called the total sum of squares. It expresses the variability in the observations, without any
adjustment for the explanatory variables. The difference between the total sum of squares
and the error sum of squares expresses the sum of squares that is explained by the regression. It is called the regression sum of squares,
SSR - SSR(x 1 ,
- 2: [y, -
y]2
L [y, I
(f3n
f31X;1
+ f3kx,k) !'
I).
1
X1
more, X'y
(X'X)
1
X2
X 11
2:x, ]
L(x,) 2 .
Fu rt her-
.\ ,,
= l through i
L.,
- ~x,]
II
:L (x,
x)r,
L (x, -
x)2
and
Substituting the inverse (X' X) 1 shown above into the expression V({3) - s2(X' ;,:)- 1,
leads-again after some algebra-to the variances of the least squares estimates
'
V(f31) -
'
"'-"' (.
L.,
x, -
=)2
)(
and
10
The orthogonalitv of the f1Ltorial design (:-.cc the cfocussion in SeLtwn 4.6.3 ) 1mpl1c-. a
d111go11al X' X 11111tnx. You La11 d1eck this with the X matrix in Table 4.1 I that resulh from
the 2 de-,ign. 1he multipliution of the transpmc X' with the matrix X leads to an 8 x 8
d1agonc1l matrix 1\ith K 111 the d1,1gonal.
h1r the general 21 L1ctorial design, the entries Ill the diagonal of X' X arc all equal to:'.'", ,md
,111 off d1agon,il clcnwnh ol .\ X arc 1cro. rhc diagonal structure of X' X implies that (X' X) 1
i-, di,1gonal with di,1gon,1l elcme11ts 112'. I lcnce the estimate of an clement of f3 is gi\'en hy
I
(effect)
1\lll'll' tlw 11c1gh1', ,HL' the clements 111 the corresponding des1gn /takulation column.
1\parl from till' ditfrrcnt normalization, these estimates coincide with our previous dehni1
t1011 of m,1in and 1ntnact1011 effects in SeLtion 1.3. The only difference is the factor . The
deti111t1011ofeffells111 Sect1011 ~-~looks at the difference 111 the aver.1ge responses at thl high
and low settings of a factor. J"he regression estimates cut this into half; the coefficients in
the regression equation represent the slope or the change in the response per unit change of
the factor.
The orthogonalitv of the cesign has se\'eral fortunate consequences as far as estimation
I<, LOllLl'l"lled.
I.
rhe cstim,11c-. of the eflcLts arc u!llorrcl.ited. 'I he diagonal X' X matrix implies a di
,1goml col'an,lllce matrix V(f3).
!"he e'1imate' do not cl1ange when we omit factors from the model. Let us ilJw,trate
this with cL1ta trnm tht factorial experiment that includes 3 factors and the 8 runs in
I ahil' 4.4. I ct us ignorl the third factor and consider the regression model with just
f,llttll"' I ,ind 2,
In thi-, l,lsc, till'\' rn;1tnx has fewer lOlumns (only four as compared to the eight it
f,1ctor 3 i.<., lllLIUdcd ). rhc llldll"IX X' ,\ IS still d1,1gonal, although of 'mailer dillll'll
s1on (4 X I ), ,llld its d ,1gonal clcmcnb arc still 8. The inverse ( X' X ) 1 is diagonal
with diagon.il clement<, I /R, and the estimates in {3 = (X' X) 1X'y, consisting of
rcgrL''-'nrs.
3. In orthogo11,1l dc-,igns, the joint (combined) regression sum of squares of the ctTcds
L<lll
he p.irt1tioned into the sum of the regression sums of squares of the individual
')Sf?(x, ) t 5Sl?(x 1 )
t- SSR(x 12
The regression sum-, r f squares arc additive. The regression sums of squares from
regressing the respon'c vector yon each single column x of the design matrix
_1_u6~1__r_w(~ I \~ L~_A_c_n_i_R_L_Il
x I' I
J(
IM 1 ' - I '
-,eparately can be added to obtain the regression sum o. '>q uarcs of the complete
model. This decomposition docs not work for nonorthugonal designs with
nondiagonal X' X matrices.
EXERCISES
Exercise I
Cons,der
(~ase
(a) Assume that there is one factor that increases average store '>ale., by SI 00. You want
to be 80% cuntident that a 5/ii signiticanLe test Lan detect '>t1d1 a large increase.
Determine the sample si1e. L''e computer .,oft ware 'ULh a'> \l1nitah or )\IP to
check your cakulat1011s.
What if you wanted to be 70% 1..011fident to detcLt c1 LhangL' ,1, large ,is SHW
f {Ul/ Li'>e the approadl outlined 111 !\ppendix 2. J. I IO\\'L'\'l'r, note that hell' the
effect 1..0111pares two uwrugc.,, insll'ad ol t\\'o pn>['Oll1<1 :1s. As'>lllllL' th.It till' st.111d.11d
deviation of 111div1dual sales 1' gi1e11 b7 er. The v,1r1.inc, 1r rqil.tcL''> 77 ( I
;; ), tllL'
variance of the 0/1 random 1ariahle in Appendix 2.1. l hi-, -,uh-,t1tutlllll lead-, lo the
expression for the required -,ample -,i;c that i, .,fwwn l1 ere:
n
21r 2 rz1 ,
21
/Jr
iY
Also note that 11 1s the sample size of each of the two g1 oups. J'he -,ample s11e of the
factorial experiment 1s obtamed b) multiplying the ab\l\e l'\pre,s1on h: 2.
(h l Now that }'OU know the sample size, disLU'>s the ,1Lh,111age' ol.i 1miltif:1Llor l'\pcri
ment over the approach of changing one factor at ,1 time
(L I Eagle Brands wanb lo learn ,1hou t the elfrlls of six i<ll 'ors. 1\ I ull 2 '' faLton,d 111
64 runs could be considered. Uisrnss the advantages and disaLhantages ofsuch
a design.
(d ) Discuss the protocol that you would use to carry out the experiment.
(e J Discuss whether one should a11aly1e absolute or relative (pro11ortion,d) Lhange'>
in sales .
Exercise 2
Comider Case 2 (Maga1i11e Price Test) from the Leise .,tud; .ippendix.
(a) Consider
~ales.
l:st1mate the marn effects (A, H, C J and the i111cr,1Lt1u11 effcch (1\13,
AC, BC. AUC ), ,md construct .i normal proh,1bility plot. \.,.,e.,., the '>ig11ifi1...111ce ul
the estimated effclls. Note that l\lin1t.1b w1ll 1..akul.1te I enth., i l9K9 ) PSI (sec :\p
pend ix 4. l l and help }'OU\\ 1th the .isse~,s111c11t.
Note: !'his amounts to ,1ssLss111g the 'ignifiL.1nce of effl:cts lrom .1n unrepl1L.1ted
design. You will notice that/\, C .rnd AC c1rc large c111d ~.ig11Jf1c,111t.
( b I Obtain the L(1cfti1..ients 111 the rcgrL'.s'>io11 model ol s.ilL' 011 111 ,1111 cfkLts .111d 1111L'r
clLtiom Jlld COil\ incl' yourself tl1c1t the u1cifiLll'llls .!IL' Olll' h<til ul liiL' e'tl!ll.lil'lJ
L'ffcch. Run the regression tw1Le: OnLe \\1th thL eight 1:1Llori.d re.,p1>n'>L'' ,111d onLL'
TWO-I LVbl
h\CTORIAI
EXPEFUMl.NTS
107
with all 9 runs including the response at the center point. You will notice that all
coefficient;, 111 lhcsc t110 regressions, except the intercept, arc the same. ExpL1in
these findings.
Hint: Use the regression formulation in Appendix 4.4. The intercept is the
average response; hence, the intercept in the regression that incl udes the center
point is 8/9 (average response from factorial runs) + 1/9 (response at center
point).
(c) ( :onsidcr suhscriptions. Fsti111.1te the main effect:, (A, fl, C) and the interaction cf
fects (AH, 1\C:, HC, Afl<:), and construct a normal probability plot. Assess the significance of the estimated effects.
,\!o/i'. Ynu will notice th.it A, fl .. rnd AR ,1rc l.irge and signiflcrnt.
(d) Recreate the mdin and interaction plots that arc given in this case.
(cl i\vcr.1ges in the table .ire ca\culatccl from five weekly pe1cent ch2nges. The GI.Sc also
11rnvidcs <lll cs ti mat c oft he stancLnd dcviat ion of weekly percent changes: 5!ir for
sale.s change,, dlld 15'"'1 for subscription changes. Use these estimates to ohtain
standard errors of the cstimdted effects for both sales and suhscriptions (sec Section 4.4.2). Check whether these standard errors change the conclusions you
reached in (.i) and (b) . J)iscuss the assumptions that one makes when using weckto-wcek changes to estimate the variability.
(f) Use the standard dcvidtions of weekly percent changes to test for curvature in hoth
9. 12
2n, 22
I;, 19
.l2, 27
<lS-
(c) Test the h\pnthesis th;it the two main effects MC the same.
Hmt: Use the fact that this design is orthogonal and that the estimates ilrl' stat is
tically independent. This implies that var( effect I - effect 2) = vnr(effect I) +
var(cffcct
2)
Con.sider three categorical factors at two levels each. Assume that only one of
thecightcxpcrirncntalconditionshasaneffectonthercsponse(rcsponseis lOat( +, ', -+ )),
Exercise 4
110
,111d inlcraction effcd<;. Discuss our co111111cnt in Section 4.7 that effect sparcity and effect
1oll_j_
JWO
'!'\'II
!ACTOR/Al l Xl'l'R/Ml'iT'>
hierarchy, which arc useful design principles for experiment> th<tt inrnlve continuous faL
tors, may have less applicability for categorical factors.
Exercise 5
nitride etch process on a single waver plasma etcher. The etching process uses C: !,.
(perfluorocthane) as the reactant gas. four factors can be varied: the gas flow, the power ap
pliLd to the cathode, the pressure 111 the reaLlllJll chamber, ,111d the gc1p between the c11HHiL
and the cathode. l'he response variable is the t:tch rate fur sil1LOn nitnde (1n angstroms per
1111nutc). bch factor is varied at a high- and a low-level Sl'tt1ng. 'I he objeLtive is to find the
f<ILL11r level settings th,it maximize tht' etLh rate. !'he levels lor gap JaLtor ,,\)arc 0.8 ,111d
1.2 un; the levels for pressure (faLtor H) arc ISO and 550 m lorr; the levels for the L 1-,, llm\
(factor C) are 125 and 200 seem (standard cc/minute); the lcveb for power (factor/)) <tre
27'i and 325 watts. for further background on the etching pruLess ,1nd dt'lc11b of the expn
iment, you can consult the original source for this exercise, Yin and Jillie ( 1987).
Run
A
(gap)
B
( ~nessure)
(now)
lJ
(power)
Re, po me
l etLh rate)
5'i0
669
604
(lj()
(142
601
nYi
1,0 \7
(1\.\
')
7~9
10
II
12
11
14
I ,O'i2
- I
I'>
16
86H
1,075
Kh()
1,0'1.1
72'1
A11aly1e the results of the 2 4 factorial experiment. hnd the important main eftcch ,111d inter
actions. Assess their sign ificancc by using no rm al probability plots and /or Le11 th 's ( 1989) P:i f
appro.tch. I low would you select the faLtor level st:ttings so that you ali11cve high etLh rate:/
Exercise 6 ,\kredith Corpor,1tion, the puhlishc1 of l.ud1t"> I !tJlll<' /011mu/ m,1g.11inc, 'ends
morL than a m1ll1011 ll'ltcrs each year to potential suhsu1hcr., lwp111g to .,ecUl'l' .ts m.tll) suh
scripl1ons as possible. I he marketing team looks tor the right 1111\ of promotion,d 111c1teri
,1ls, and it experiments urnstantl} with \,Jrlous aspcLl., ofthL lnoLhllrl', urdLr L,trd, cnLlo,ed
tcstirnonials, and offns. The June 2005 L<lllljlaign, for exampll', tc:.ted different \<:J.,ions of
the lront page ot the brochure, and different messages on the ~ront and the back side of the
ordn card.
lrcmt side a/brochure. One version (level I) shows a radiant looking Kelly R.1pa (the
star of the ABC show Live wit Ii lfrgi> llllll Kelly), while the other version (level t I)
features Dr. Phil (known from his nc1tiunally syndiL,tted T\ .,hm, ,111d publiL,1tions
l \\()
IT \'l'I
Ft\CTORTAI
l.Xl'l'Rl.\H"ITS
109
f.ro11t side o( th<' order curd. Level I ( I) highlights the message "Double our Best Offer," while in'el 2 ( t I J draws attention to the message "\\'e never had a bigger '>ale."
Hatk side o/the order card. level I ( I) emphasizes "Two extra years free," while level
2 (+-I) feature-, magMine covers of previous issues.
The re-,ults (number of letter' sent and the number of orders that were received) are shown
below.
( lrdl'r
( lrdl'r
Card I rnnl
( ,ard Back
l.cllcr'
Broe. hun~
~l'lll
Ord cf'
ProportlCll
1'i,042
I 'i,042
I 'i,01:'
'.)/)
61'1
'i6.l
I ,,!M2
I 'i,1112
J 'i,lll)
nln
O.OJHOlJ.l
0.042RI l
0.0.\7428.
0.0109'i20
;().t
o.or.J<J'>o
'i'i()
I 'i,042
I 'i,1112
'; ""'.';
0.0\i>'iM l
ll.0.lH226 l
0.0.lh7n.F
';_\
1\n,d~ IL' the d.1ta. I ..rin1.itc r11.1in and interaction effech. Displav the effects graph1c.illv
thrn11gh main cill'Lh ,md 1nlL'r.iction ploh.
\''l''' the <,1g111l1c.tlllL' of thL' effeLt-,, 11-;111g thL' .ipproach disLll'>'-ed 1n '.'iection 4.6. I.
'.'iummari1e your LOllLiusi.lll'>.
Exercise 7 This cxL'flise applies the general regression results in 1\ppcndix 4.4 to the spcual model without intcrccpl, y
f3x t t:, that reldtes a response vellor y to a single regrcssor vector x. Show that
(a)
/3
(h)
\\F(x)
~x 1 ;,
~x;
'\'
~\
\',
/3x
/3x,)
This cxerLisc w<1s inspired by a real problem described in the article, "Str.itcgic
i'e'>ting <..,top-, I cak: I 1tter ( .artons in Their Tracks" (Packaging /)1gcst, August 2001 J. The
exercise resembles what the actual company did, but the data are not rcilL
rhe makers of"C:ilts lovL It" cat litter arc facing a serious problem. Retail custOlllL'rs are
reporting that Cdrtons of the firm's premium brand cat litter are leaking the product onto
'>tore shelves. The rnmpanv ealizes that while cat lovers arc used to cleaning stray spray-s of
litter tracked through the house, thcv arc not willing to put up with cartons that leak on the
\\.l) home'.
,\l.rnagemL'llt h,1, dctcrn,1ned that the problem i'> with the carton-scaling proce'>s. Carlon'> .ire tilled .ind '>l.ilcd on .1 produL11on line run hy 20 workers. The ..:ompany decides to
perform ,1 3-factor factori,1i experiment. r\ run consish of filling and scaling 200 c.1rtons.
Exercise 8
1101
J\\'()
11\11
IXl'IHl~ll'.'I'>
IA< l<)Rl\I
l'hl' l,1L.lors to bl' tested Jnd levels of each arc shown below. I actor A 1-. line -,peed with thl'
llllllll'> level at 22 cartons per minutl' and the plus level at rn c1rto11s pl'r 111inute. h1ctor His
the pressure applied by the gluing mJch1ne, with the minus Incl be111g lower pressure and
the plus level berng higher pressure. I actor C is the amount l>I glue w,ed, with the plus level
being the currrnt amount and the minus level being 40% less glue.
The design 111atrix and the estimated effects are shown b1 :ow. l'he response 1s the
port ion of cartons that leak.
[1ro-
I EV I I
A: I 11w speed
H (;JUL' pn.':-.'.'iure
( ,\111nunl of glue
J(un
f-Jst
Lower
I lighc1
.\lore
l.e"
Slow
Res po me
8
I'
+
47
JO
,,
8
10
II
8
hll111utiu11 results:
Average
25.875
A
A Ii
(<11
3.2'1
1.25
14.75
AC
Ji(
0.75
'\Ji(
) :"
--)
\Vhat 1s the e-,timated 111a111 effect of factor,\? \\'h,1t ''the cst1111,1ted t\(
interacllon?
(hi Supprn,e each rc>ponse is the <1verage ol 2 rep!JCated runs (note that the nulllber-,
have been rounded). Suppose the pooled estimate ot the \aria nee of the re">pome
of an individual run is equal to lb. l\,1sed on <)5% co11t1dcnLl' 111tcrvab for the cf
fects, which cfkcts are s1gnilic,111t'
(c
Based on the results ofth1s expenmcnt, what Je,els would you reu1111mc11d for
each factor!
(di Wlwt is the regres-.1011 prcdict1011 cquat1u11, and what is the p1cd1Ltcd rcspon-,c
(proportion of leaking cartons) if your recommended settings arc used'
Exercise 9 A 2 1 factorial experiment is to be conducted. The variance of the response of
an individual run is known to be equal to 4 from previous experiments. Suppose that we
want the width of a <)5% confidence 1ntl'.rval for the mean uf an effect tu be \.Hor smalh:r.
How many runs need to be made for each lest condition, and how m,111y runs ,ire needed in
totdl 7 Assume we have the same number of runs for each test condition.
112
LHSl(,'>S
TABlr5.I
in
Facto1
2
4
In1ttal temperature
flame temperature
Color
Supplier
Machine
l.owcr
1.ower
Lighter
I !tghn
I lighcr
Darkc1
( urrc1ll
:-.!e"
l urrent
:\l'\.\
I he 5 factors and levels are shown tll Table 5.1. For i111t1,J tcmper,tture, b,1sed on experil'llce and the operating ranges recommended by equipment makers, the levels are set an
equal distance above and below the norm,d setting. hJr tLime tempcr<1ture, the minus level
1s the minimulll temperature required to roast the beans, \\'hile the plus lc\'cl 1s the highc.,t
ll'lllpcraturc in the operating range. In pral11ce, the chief roa:,ti:r vane'> the Lolor ol roasted
hea11' depending on the variety of the coffee, but fo\'or'> lightn roasl111g, reu1gni1ing th,1t
u>lfre roasted too dark will have a bitter ,rnd burnt taste. Fur the <..ulor factor, the minus
lc\'l'l Lorrespond> to his normal color for Ken\'a .\ :\, 11l1tk the pJu, il'lcl "u1n,1dnahh
darkn. The two suppliers for the test an: thL' existing one ,ll\d ,t 1,ell rq~,1rded u1lll~'L't1t111.
The last factor is the roasting mad1111e. The mill pan: has a 'n1,dl ( 5 pound c,1~1aut \ J ro,1st
ing lllachine that 1t uses to test new suurLL's ot green beam. ThL' Lhil'I ro,1stl'l 11-.111ts to l'I al
uatL' ,lllother small machine made b: a different lllanutdltUrn ,l!ld seL'' this experiment ,ts
an opportunity to do so.
Suppose the rnmpany is willing and able to do onl}' 16 runs rather than the .32 runs of J
tu II 2 factorial design. What is the best design for doing so, a 11d I\ hat is lost by carrvi ng out
only J 6 runs?
TWO
I FV FI
T ,,
11 I
lHACTTONAI
Ill
F 5.2
rA<
Run
!OR
12
ll
14
2.1
21
+
\
H
9
10
II
12
I\
II
15
16
+
+
+
+
+
+
+
+
+
+
+
+
121
DF.S!C1N
I 34
2.lI
12.\4
-r
+
+
+
+
+
+
+-
+
+
+
+
+
+
+
+
+
+
I 2.l
2<1
4
34
AN 11
+
4
I, 2, .\,
IN fACTOllS
was at its plus level, factor I would be plus as well. In this case, the average of the responses
when 5 ( = I) is at the plus level (y. ) mi nus the average of the responses when 5 ( = I) is at
the minus level
(y ) is actually an estimate of the main effect of5 plus 1he main effect of I.
With this arrangement, the two main effects arc said to be confounded, and it is not possible
to ,,cpM<lte them. The main effect of factor 5 and the main effect of factor I are called aliases
of each other. The calculated effect (Y+ - y_) might be due to the main effect of factor 5,
the main effect of factor I, m some combination of the two main effects. Confounding two
main effects in this way would be a poor choice since main effects tend to be the largu;t and
most important effects.
The hes I choice i.s to confound the main effect of factor 5 with an effect that is least likely
to he important, which is the 4-factor interaction 1234. Therefore we set 5 = 1234, con-_
founding the main effecl of -1 with the 1234 interaction. Taking the average of the responses
when 5 ( = I 234) is :it the pl us level (y. ) min us the average of the res po mes when 5 ( 1234)
is at the minus level (y ), ei>timates the main effect of factor 5 plus the 4-factor interac11on
I 234. Since 4-factor interactions are almost certain to be negligible, we arc left with ;rn esti-
Effects Arc Confounded in Pairs. By setting 5 = 1234, we not only confound thc'c two
effect>, hut illl other effects become confounded in pairs as well. For example, consider in
Table 5.2 the column of signs representing the 12 interaction. Writing this column as a row
to save space, we ha\'e
12
+ - -
+- - ++ -
+ + - - +
Now multiply the signs for factors 3, 4, and 5 to obtain a column representing the 345
intcr.iction. It is
345
-+-f---++---++--+
J
114
TWO-U:VEI
The two columns are identical; the effects 12 and 345 are cu1\founded. Taking th.: average
of the responses when the signs in column 12 ( = 345) are pl u'i (y. ) mi nus the average of the
responses when the signs in column 12 (
the sum of the 2-factor interaction 12 and the 3-factor interaction 345.
The entire confounding pattern can be found in the same way, by multiplying columns
o( signs for every interaction and idcntif ying pairs of columns that arc identical. However,
this hrutc force approach is not necessary; in Section 5.2.3 we will present a much simpler
method for determining which columns have identical signs.
5.2.2 The Design Matrix, Confounding Pattern,
or ro,1sted beans
is
using the same LOffee maker. A blind tc1ste test is carried out, with each sampk rated on a
scale lrom I (lowest quality) to 10 (highest quality). The last column in Table 5.3 shows the
quaUty ratings of the brewed coffee that resulted from the various ro,1sts.
The lower part of the table shows the 15 effects that are independen ti y csti1mted and the
cunl(1unding patterns that arise from this design. Each main clfect is confounded with a
4-factor interaction, and each 2-factur interaction is confounded with a J-l~1ctor interactio\1. In showing the confounding pattern, we introduce some new notation.
:\otice that the column ofsigm associated with each estirnc1tc is idrntical tu une ut 15 culumns in the 2 4 factorial design. To calculate each effect, we apply the signs in its column to
the observations and divide the result by the number ofplu~ signs, 8. Since each is a linear
function of the observations and compares two averages (response averages at the plus and
the
111 in us
levels of that column), we refer to the estimated effect as a linear contrast. We use
the il'lter I to denote the estim<.1tc (I for linear), and we use the rnlumn label as a subsnipt
to idl'ntify the column that is involvl'd. For example, the estimate I, appli.:s the signs in column 5 to the responses, obtains the sum, and divides the sum by 8. The estimated effect
+3
I5 - -
9 - 5
10 t 7 f 8
-j-
3 -; 3
-I
10 -t 8
3 + 5+ 7 + 8+ 3 +
+ 7
7 t 6
10
9 + 5
7.5
10
8
5.5
,- 7
-2.0
is a difference (contrast) between the two averages at the plus and minus levels ufculumn 5.
We use the notation 15
--?
the 1234 interaction. The arrow mean:, "estimates." Similarly, to cJlculatc 112 , for example,
we apply the signs in column 12 to the responses, sum them, ,rnd divide bv 8. This contrast
estimates 12
+ 345, and
---c>
I:! + 345.
a1-e
negligible, 1-vhich
is vcrv likel y, we are leCt with cbtr estimates of all main cff~,ct~ and 2 factor interactions.
rwo-J JVJJ
'I
I RAt 110,AI
A Jl I I
JACTORIAJ
J)JSJGNS
!IS
5. 3
l OH
Re,ponse
Hun
R.1 111~
10
x
l)
'/
111
11
I.'
I;
11
'!
I;
10
I(,
si~ns
I )5
+ L't
-r 121
124
t 12.l
in the 12.>4
Lh111c 1. r\ Lh,rngc Ill the uilm nf the rna-.ted beans from lighter to darKcr increases the ta'>te
rc1ting hy 4 point.'> on average, while a change to the new machine decreases the r;1ting hy
2 p0111ts.
The average of the 16 responses is 6.5. Civen this average and our estimates oft he l wo
signil1L.lllt effect\, the 1mpl1cd regression prediction equation is y 6.5 + 2x1
Ix. At
the he-.t setting,, \vill'n L1L1013 is at+ (cLirker color) and factor 5 is at
(current rn,1Lh111e),
thL' prL'd1cted ta<.tl' r.iting is i
6.5 + 2( ~I)
I( I) - 9.5.
rlw u mcl ll'>IOJ1' from th i:- cxperi men t arc clca r: Keep the cu rrcn t ma chi nc and, most i m
port,rnt, rna'>I the coffee to the darkl'r color. !'he chief roaster rcali1cd immediately that he
had hccn ro;1st111g the ls.cnya 1\t\ too light, and from that point on, he began roastlllg it to
the d,1rker color.
1his stor\ hcg,111 vd1c11 a L<>lllpctitor\ coffee wa;, judged superior in a blind t,1'>IL' IL'st.
t\ll rnnccrncd pc1rtics at the roasting comp;rny were extrcrnelv pleased when, in the next
116
99
J
95
90
80 - 70 c:
60
~ 50
::; 40
i:i...
30
20
10
lcnth 's
--2
-1
PSI'-~
0.375
I)
Effed
Effect type:
Figure 5.1
No t signiticanl
S1gi11tican1
blind taste test ag;rinst a fresh batch of the competitor's coffee, the chief roaster's experimentally designed :Kenya AA was judged best.
(of plus and .minus signs) b y it self results in/. Multiplying a plus entry by it sel f
yields a plu s, a nd multiplying a minus entry by itself yields a plus, also.
2. Multiplying a column by I leaves the column unchanged. This is analogous to multiplication by I in ordinary arithmetic.
3. When multipl ying columns together, the o rder of multiplicatiun docs not matter.
Forexample,2 123 = 2213 = 2132.
Now we proceed to find the confounding pattern for the 5-facto r 16-run design with generator 5 = 1234. Multiplying both sides of the generator by column 5, we obtain
5 x .:; = 1234 x 5
I = 12345
jw
\\'c ll<>L' the defining rel.111on to find the wnlounding pattern among the I 5 indcprndent
effect estimates (linear comb111ations of the responses) that can be calculated in this design.
hlr ex,1mplc, lo find wh,it I'> urn founded with I (the main effect of fallor I), we multiply
11234)
2345
because I ( I)
I .rnd (I)( I)
umn arc 1dcnt1cal, and the n1d1n effect of factor I is confounded with the 2345 inter.iction.
'lo check that thi<> i-. correct, multiply the signs ofcolumm 2, 3, 4, and 5 and show that they
arc ident1Lal to the '1gns 111 column I.
<..,imil.1rly, multipll'ing both <>idco, of the defining relation by (column ) 34 results in
) ,j
125
5.3
Let LI'> return to the cracked nots example in Chapter 4. In that problem we examined 1 factors 111a2 1 factori,11 design. The factors were cooling rate, temperature of the kiln, coeffi cient of expansion, and carrier (metal or rubberized). The results of that experiment arc
118
TABL E
5.4
The Cracked Pots Problem: Design Matrix, Estimated Effects, and Confounding Pattern
for a 4-Factor Experiment in 8 Hun s
INTERA CT IO N CO LUMNS FOR
FACT UH
J
Cooling Temperatu re Coefficient
Expan sion
of Kiln
Rate
2
Ru n
4
= 123
Ca rri er
THE CA LC CI.AT I UN
Or
RESPON SE
REMAINING E FFECTS
- 12
Percentage of
- - - - --- -14 ( = 23 ) C: ra~keJ Pots
13
- - - - ---- - -- -
+
+
+
+
5
6
7
+
-t
+
+
+
+
+
/3 =
14 =
N o TE:
/13=5.0 -+ 13
114 = 1.0 ..--+14
3
18
34
21
27
12
Effec ts that may be \:Stimated and their confo undin g pat tern:
10 = 17.25-+ averag,e
11 = 9.0 -+ I + 234
1, 2 = 1. 5 -+ 12 + 34
134 '
12.5..--+3 + 124
-9.5..--+4 + 123
15
+
+
---- -------
12 =0.0..--+2 +
+
+
+
+
-----------------
.,
24
23
shown in Table 4.7 of Chapter 4. We fo und very large main etfrcts for cooling rate and coeffici ent of expansion and a signifi cant interaction between these two factors. We also found
a significant main effect for carrier.
l\\'O I !\'!I
IRA< llONAI
IACTOHIAI
lll",lf,J\:S
119
'A
shown in the last rnlumn of tt1e table. We calculate each effect estimate from its column of
signs. I or C\ample \\'C hal'c
I:;
12
IR
I'
21 + 27
34
15 + 3
34
'
27
Il.75
12 t 21
19.75
8 t 18
'-.uJ)pmL' th.1t b,J',cd on r1re11ous experiments, the company is confident that the vari,111Le
ot the rc-,ponsc of a run is 8.1. In ~eel ion 4.4. I of Cha ptcr 4, we showed that the est 1mated
i 41 N)s~, where N is the total number of runs, which in this case
variance of an effects 'rrc1
is 8. Thus we have that s~ 11 ""
(4/8)(8.5) - 4.25 anJ 5effrct
V4.25
2.06. A 9510 confidence 1nterv<1 I for the rnea n of each effect is given by the estimated effect :: 2.06( 1.% ). Fffccts larger than 2.06( 1.%)
4.04 arc statistically significant. There arc four significant efleLl': (I-+ 234), (3 t 124), ll ' 123), and (13 t 24). Assuming that 3-foctor inter.1Ltions
arc 1cro, we would wncludc th,1t the first three significant effects are estimates of the main
cffcLl of l (cooling rate,, the main effect of3 (rnefficient of expansion), and the m<iin effect
of 4 (Lamer), rc,pel11l'ely. The three es tim.1tes have values that <ire close to what we found
mthe2 1 16 run' of experiment of Chapter 4.
In the full fallon.1! experiment, there was no confounding of course, and we found that
there was a significant 1nter<iCt1on between cooling rate and coefficient of expansion, here
labell'd .is the IJ 1nteraLtion. But in this R-run design, there is some umertainty. The I~ inlL'r.tLtion 1s umlountkd l\ith the 21 111ter.iL1ion (an 111tcraction between kiln tcmpn.1ture
and Clrrier). ff thi-, 8-run L'Xf1L'rimcnt had been run rather than the J6-run fuJ] factorial,
1,ould 1\l' h.l\'e hccn <lhk to idcntifv with uinfidence the interaction between cooling rate
.ind LOL'i.flLIL'nt ofl'\J"ll1sH111 the Ii intcr.1ctio11 <I'> we have labeled it here)? It is hard 10 ,,1v
for .-,ure. The fact th.it the two main effects, cooling rate and coefficient of expansion, arc
large would lead us to believe that the significant effect ( 13 + 24) is due to the 13 interaction hut an interaction between kiln temperature and carrier (factors 2 and 4) is also con
ceiv<1hlc. \\'e h.1ve ga111ed hy cutting the required number of runs in half, but we h.11c 111
trnduccd u111fou11ding and thus some uncertainty in the interpretatioP of the results.
5.4
nr~S ICN
RESOLUTION
We have discussed two fraction<il factorial designs one with 5 factors rnd another with 4.
The full factorial 2 design requires 32 runs, and there is no confounding among the estimated effects. !'he half fractir. n 25 1design is more economical with only 16 runs, but it induces confounding: Main effects arc confounded with 4-factor interactions, while 2-factor
interdctions are confounded with 3-factor interactions. The 24 1design is a half-fraction of
120
the 2 4 = 16-run full factorial design. Compared to the 2 5 - 1 d esign, its confounding pattern
is worse. For this 8-run design for 4 factors, main effects are co nfounded with 3-factor interactions, and 2-factor interactions are confounded with other 2-factor interactions.
The resolution R of a fractional design is an index (usually written as a roman numeral )
that exp resses the degree of confounding.
I. A design of resolution R = Ill confounds main effects ;With 2-factor interactions.
24
exp r~sses the fact that there is a single generator. The subscript "V" denotes its resolution .
The 8-run design for 4 factors with generator 4 = 12 3 has reso lution IV. It confounds main
effects with 3- factor interactions, and each 2-factor interaction with another 2-factor inter2t.; 1.
action. We denote it by
In constructing our 2 5 1 design, we cou ld have used any interaction or main effec t col umn to accommodate the 5th fa ctor. We chose the generator 5 = 1234, which yields the
half-fraction with the highest possible resolution. (Similarly; in the 2 4 - 1 design we set 4 =
123 ). Suppose we had set 5 = 123 instead. Then th e defining relation wo uld be I = 123 5,
and the design would be resolution IV. Jn general, with k fac tors, the generator k = 123 ...
(k - I) produces a half-fraction wi th highes t possible resolution. For example, with k ~ 3,
the best generator would be 3 = 12.
The resolution of a design can be determined directly fr om its defining relation. Each
term in the defining relation to the right ofl is called a "word." For example, for the 2 5 - 1
design with gene~ator 5 = 1234, the defining relation I = 12345 consists of the single
word 12345. For the 24 - 1 design with generator 4 = 123, the defining relation I = 1234
consists of the single word 1234. Shortly, we will see examples of designs with defining relatio ns that consist of more than one word. The resolution of a design is the length of
the shortest word in the defining relation. In the first case (/ = 12345 ), the length of the
single: word is 5 ( it consists of 5 numbers ), an d the design is resolution V. In the second case
(I = 1234), the length of the (sho rtest) word is 4 (it consists of 4 numbe rs ), and the design
is resolutio n IV.
In choosing a design , the experimenter must consider both design resolution and cost.
Higher-resolution designs have more att rac tive co nfoundi ng patterns but re4uire more
runs and are therefore more costly. Resolution V des igns aru especially useful because we
can obtain unambigu ous estimates of main effects and 2-factor interactions if we assume
that interactions of order three and higher ;ue negligible. ln i1~s 1 a 11l:e s where the ..:osb assu
ciatcJ with each run a re relatively small, th ese designs are particu larly .ittrac tive, because the
expe rimenter is confiden t beforehand of achieving unambiguous results.
TWO
5.5
FRACTIONAL DESIGNS IN
121
RUNS
Table 5.5 shows fractional designs for 4, 5, 6, and 7 factors. All 8-run fractional designs are
constructed from the same building block - the design matrix and the calculation (interaction ) columns of the 2 ' factwial design. In each case, the design matrix starts with colu111ns
A, 13, and C (the darker shaded area in T~1hlc 5.5) and is completed by associating each additional factor in tht' design with a colu111n olsigns from the lighter shaded area con1<1ining
the four interaction Lnlumn!-. The column for each generator is set equal to one of the four
interaction column'> (Ail, AC , !IC, or AFlC ~ ).
The lower rart o( the tahk has one row tor each design. The first colu111n identifies the
design ,ind its resolution, the second column shows the gcnerator(sJ, and the last sevcn cnl u111ns show the seven effect<; that are independently estimated in the design.
ACE
But since A!)[) - I and ACF !, it is also true that their product equals i; that is 1 = /\HD
AC!: = BCDF. 1lcncc the dcf1 n ing relation is given by
T =- A llD
A(!~'
- RC{)f:'
lt consists olthrec words. Th e length oft he shortest word is three (letters), which show:<; that
this is a rcsnluticrn Ill design.
lil tind the cnnfouncling pattern for this or nny other 8-run fractional design, those in
the table or designs th.it use other generators, we always follow the same procedure. We mul t iply the defining relation hy each ol the seven columns A, B, C, AR, AC, RC, A BC that
make up the design matrix and the calculation columns of the building block, the 2 ' focto rial design. For example, multiplying by A, we have
A! - AAH[) = AACF = AHC[)F,
1\
/if)
Cf' ~
AflC[)E
IlCAR/J
/lCACL
!lC'/JCDF
11
]\\()I]'\']]
11<\C 1101'Al
1Jrs1<;r-:s
fACTOl<IAI
TA HI I :; .6
Ali 1111d /-
AC,
l:'\C TORS
Run
{) - AB
\(
Response
v
}'
)'
)'_,
)'
h
I ffeLh that mav he est1111ated .rnd their confounding pattern('\ factor and higher mdcr inter.1Ltiom
.ire .iss11 med
to he
1ero):
I,
~ i.1\ L'r,1gc
/0
I\
+ \
Ii \Ii
H/J
I, I<
1,, /) .
/1
--> /
\Ii
In, ~ fl<
I,"--> ( I J
/Jf
Iii.
I<
The Ulnfounding 1>.1ttnn lo1 L'<lLh design (,1s.s11ming th.it '\-factor and higher order i11tn.1c:
tions .ire 7ero) is given in Table 5.5, with each estimated effect shown with its aliases. hir the
2;1 design, the senn estimated effech arc shown as (A+ fl[)+ CF:), (fl +-AD), (C 111'-),
(!) 1 All), (f + ,\(),WC + nt), and (flr-+ C/J).
The de-.1gn mat ri\, cst1m<1tcd effects and confounding pattern for this 5-factor dc-.1gn in
H run' 1> '>hmvn 111 1,1hlc 'i.6. l.aLh of the seven contr<1sts uses one of the columns of sigm in
the 2 'building block. \Ve emphasize that important fact in labeling the contrasts.
/i(.
l,1tion is analogow. to the procedure used when there arc two generators. We multiplv hoth
sides of the first gcncr.itor by /), both sides of the second hy F, and hoth sides of the third
hy r. The rc'>ult is th<1t each right-hand side is equal to!, forming the first three term-. (or
words) in the defining relation,
A/![)
AC /
RCF
111
The length of the shortest word is three; hence this is a resolution Ill design. ThL' urn
founding p.ittern ,Jiown 111 T.ihlc 5.5 is found hy multiplying the defining relation lw L1ch
of tlw sc\cn Lolunrn' A, /!, ( , AR, AC, HC, A/iC th;it make up the de,ign matrix ,md the
124
and
ABC
ABCUC'F
5.6
FRACTIONAL DESIGNS IN
16
RUNS
Table 5. 7 shows a set of 16-run fractional factorial designs with the number of factors ranging from 5 to 15. T)1e table is constructed in the same fashion;as Table 5.5. In that table for
8-run designs, the building block was the 8-run 2 3 factorial d tsign with its four interaction
columns. Here, the building block is the 16 -run 2 4 factorial C!es ign and its 11 interaction
columns. For each)6-run fractional design, the design matrix starts with columns A, B, C,
and D (the darker shaded area in Table 5. 7) and is completed by associ at ing each additional
factor in the design with a column of signs from the lighter shaded area containing the l l
interaction columns.
The confounding patterns shown in Table 5.7 ignore 3-factor and higher-order interactions. Table 5.7 shows the generators for each of these designs, but to save space, we have
omi tted the confounding patterns for designs with 10 through 14 factors. Software such as
Mini tab and )MP provide the generators and th e confounding patterns for all of the designs
in Tables 5.5 and 5.7 automatically. The user simply enters the number of factors and the
number of runs. In Section 8.3 of Chapter 8, we di sc uss the capabilities of these software
programs in more detail.
= ABCE =
= BDF(;
= ABH; = Cl::FG
The last four words are obtained by forming all products of the first three words of the de fining relation. The' length of th e shortest word is four; hence, this is a resolution IV design.
The 15 es timated dlects and their confounding pattern an; ~ shown in the lower part of
Table 5.7. To find the confounding pattern for this or any 16-run fractional design, we always follow the same procedure. We multiply its defining re\ation by each of the 15 effect
col umns A, B, C, D, AB, AC, AD, BC, BD, CD, ABC, ABD, ACD, BCD , and ABCD that make
..
- - - - - - - --
,.,,
,._ ,..,,
.
- - - ----
- --
125
up the design matrix and the calculation columns of the 2 4 factorial design. For example,
multiplying by ABD we have
ABD
The contrast
IABD
= cm. = ACF =
BCG = BFF
= AEG =
DFG
= ABCDEFG
estimates ABD plus the sum of six other 3-factor interactions plus a
7-factor interaction. However, none of these interactions are visible in Table 5.7 because
aliases of order 3 and higher are not shown. Since 3- fac tor interactio ns are usually negligible, this contrast is an estimate of the noise.
The design matrix, estimated effects and confounding pattern for this design are shown
in Table 5.8. We have labeled each contrast to emphasize the fact that each effect uses one of
the 15 columns of signs of the 2 4 building block. By using this approach and listing the effects in the order of the 15 columns of the building block, we also provide a systematic way
to identify each of the 15 effects.
5.6.2 Testing 15 Factors in 16 Runs: The 2 15 -
11
Saturated
5.7
FACTORIAL DESIGNS
128
TAHl.E 5.8
A 2{v ' Design with Generators F = AriC, F = HCU, and
(J
Run
c:
c; -
ACJJ
/:
/'
/)
7
8
'I
Ill
11
!..'
I>
-i-
j.J
h
lb
+-
Iii ->Ii
"__, (.
L/)--> I J
+ FG
/,\( __,AC I BE + DC
LAJl __, AIJ + CG I EF
Lu, __,BC +- Al: t DF
Im,--> Hf) + CF + b'G
Im--> ClJ + AG + BF
IA/j(
.111d
11 __, F
JS
will he measured by final exam scores. (Note that there have been some major experiments
examining the 1-elatio11ship between class size and learning. But except l(Jt these studies, in
searching the literature we found few exarnples of statistical experiments in education and
even !ewer that studied more than one factor. We believe there art' many upportunitie' to
USt' t'Xperimental design methods to improve the effectiveness of education. The example in
this seLtion is not an aLtuaJ one, but it demonstrates the kind of experiments that could be
performed.)
Tire Factors and Levels. i-:actor /\ is the textbook, and Pt'sky-wants tu com part' a new book
to the une he has been using. He also wants to see if additional readings would improve per
forn1<1nce (factor B). hKtor C is the amount of homework, <1nd the two level-,
LHL'
the cur
rent 5 hours per week (which students have complained aliout) and a less de111a11di11g
3 hours per week. Factor D's two levcb compare a new softwa re package tu the existing one,
while factor Eis tht' number of lectures. Currently there arc 4 one-hour video leLtures per
TABLE
129
5,9 .
ractor
A Textboo k
R Readings
C Hori1 ework
[)Software
f Sessio ns
Review
(; Leet ure notes
C urrent
No
5 Hours
Current
4 per week
No
No
New
Yes
3 Hours
New
3 pe r wee k
Yes
Yes
week, but Professor Pesky is under pressure from the administration to cut back th e number to reduce costs. His superior Dean Takahashi believes that three sessions per week would
be just as effective. Finally, the last two factors will test two other changes to the course,
adding an on line revi ew session for the final (factor F) and the addition of a set of lecture
notes (factor G).
The Design. The professor decides to use the 8-run 2;1;- 4 saturated design shown in
Table 5. I0. The generators arc f) == AB, E == AC, F == BC, G == ABC, and the defining relation consists of 15 wo rds.
I= ARTJ == ACF " BCF = ABCG =AFG = BEG == COG = DEF
= ABEF = ACDF = ADEG
The first four words in the defining relation correspond to the four gen erators. Th e other
words were determined by multiplying th ese four words, taking two at a time (six combinations), three at a time (four combinations), and all four together. The confounding_ pattern (ignoring 3-factor and higher-order interactions) is shown in the last row of Table 5.5.
This is a resolution TII design with main effects confounded with 2-factor interactions.
Each of the 8 runs defines the characteristics of a section, and 20 students are randomly
assigned to each of the eight sections. At the end of the course, each stud ent takes a final
exa m. The response variable shown in Ta hie 5. l 0 is the average score for each section of
20 students,
Which Effects Are Significant? The sample variance calculated from the test scores of
th e same section provides an estimate of the variability of an individual test score. Th e variances can he averaged across ,the eight sections to obtain an even better estimate of the variability, resulting in the pooled es timates ~ with (8)(19) = 157 degrees of freedom. Section
4.4. l of Chapter 4 showed that the estimated variance of an effect is given by s;rrect = ( 41N)s~,
where N = (8) (20) = 160 is the total number of students in the experiment.
Professor Pesky performs this calculation and finds thats~ == I 06.4. H ence the variance
of an effect s ~ffect
\rreci =
\!i.66
= ( 4/ N)s~
= l.63. A 95% confidence interval for the mean of an effect is the estimated
effect :+: ( l .96)( l .63 ); an estimated effect is statistically significant if its absolute value is
greater th an ( J.96)( 1.63)
+ RD +
'J
j,
AH I I
I0
!{l"'lltllhl'
j()J{
\\ l'r.igl
i(u11
All
/!
AC
HI
"iUHt
+
+
(1
Ill.ti'
be cstimall'd
he 1ero 1:
.111cl
I,
Ip
11
o~ C +
I,
I,
1.0 ~ f + AG r BC + /Jl:
1.2 ~ c; + AI- + Hf + C/J
2.2
5.8
lJ
~I
' AC
HG
+ IJ/
'>ut1pose we (and Professor Pesky) assume that in (A -' H/J t- CF l FC;) and IF
~AC+
BC, /JF), the six 2-factor interactions are ncglig1ble. Then w: would haw estimates ot the
two main effects: f', and E. With this interpretation the best b1els are + for factor A (textbook). and
for factor E (session-, per week). Changing to the Ill'\\' hook increa-,cs the ,11
erage o.,core by almost 11 points, while reduung the number d sessJOm per week from the
current four sessiom to three sessions decrca;.cs the average score hr rll'arh 6 points ...\grad
u<1tc 'itt1dent leaks the news to Dean J'.1kahasl11, who 1s not pleased to he,tr 1t.
But what if the 2 factor interactions arc not tll'gligibll'? I hen till' oh'il'ned e'>trrnatL''>
might he due to one or more 2-factor interallions rather than the 111.iin cfkcts. lkL,lUsc ol
thcsL' uncertainties, Professor Peskv decide' to do a second '>et of 8 run.., dc'ilgned to cLmh
the i11it1al results.
l'hl expernnenul de,1gn course h ,1huut tu he otkrcd ag,1111. J>nk1 uL"tte' eight 'L'llton'
b1 randomlv assigning 20 studcnh to each run 111 this >ecollll de'ilgn, ,ind ,1t the end ol thL
coUt'iL', he calculates the linal exam average' for tILh sec:tion.
A 'iccvnd 8-Run l:xperiment Switch mg the Sigm vfCvh111111 A. J ,1hle ..,. I I ,ho1'' thl' urrg
in<il Lies1g11 matrrx (runs I 8) followed bv the design m,llrn. tor the 'ieLOnd experrmcnt
(rum ':>-16). This second set of 8 runs was constructed from he original design matrix by
sw1tcl1111g the signs in column A while leaving the other colu11111s unch.inged.
RL1ersing the signs in column A means that the sigm for each interaction column involving A are reversed as well. For cx.imple, .is Table 5.10 'ilHl vs, in the origin.ii de..,tgn the
signs of columns H, Al), CJ-, and(,' arc identical, and Iii (the <1\.eragc of the 1-c'>pon'>L's 1,hcn
Bis at the plus level minus the average of the responses whe11 Ii j, at the 111i11us ieffl) l''>llmatcs I 13-+ A/J _,_ r I- t- l:.'G). In the second design, the signs I ir Ul!un111.., Ii, ( /', ,rnd I(; .irL'
still identical, but the signs for column A/J are now rever'ied. As .i result, the ,1ver.tge of
the rL'sponses when l3 is at the plus level minus the <1veragc uf the re..,pu1\',es when H is ,1t
the 111111us level, which we denote by I~ (with superscript (for "follow up"), estimates
.,m!f!,F
-' - - --
- -- - ----
131
5. I I
TA fl LE
Th e 2i/i 4 Design (Runs 1-8) Joined by the Design That Switches the Signs of Factor A:
Online /.earning Example
-- - ------- --------
- --- --
--
--
---- - -- - ..
Response
FA CT O RS
- - - --
Run
[)
- - - - --
2f11
F
-- - -
1.-
5
6
7
8
4
2(11 with
sig ns o f facto r A
switc hed
9
10
11
12
13
14
15
16
+
+
79. I
'i8 .7
77 . 1
+
+
+
-1-
+
+
+
+
+
+
+
+
63. l
72. 7
70.2
69 .4
65.4
+
+
+
- - --
- - --- - --
+
+
- -~.---- -
- - - ---
tU
JU =
---
63.6
76. 8
60.3
80.3
67.2
7 1.3
68.3
71 ..\
+
+
- -- --- -
+
+
2
3
Average
Sco re
ti
(B - AD + CF+ EG) . As shown in the table, eac h of the o ther five co nt rasts (I{; thro ugh
I{;) that include a 2-facto r int~raction involving A are changed in the sa m e manner.
Now, consider colum n A. In the o riginal design t he signs of column s A, BD, CE, an d
FG a re ide ntical, a nd IAes tim ates (A + BD + CE + FG) . In the seco nd design , the signs for
co lu mn s BD, CE, an d FG are st ill identical. But now in every run, the signs in these three
2-foc tor in teract ion colum ns are the oppos ite of th e sign in colum n A. As a resul t, 1 ~1 (th e
ave rage of the responses when A is plus m in us the ave rage of the respo nses whe n A is
m in us) estimates (A - RD - CE - FG ).
Combining the Estimates from the Two 8-Run Experiments. We use two simple algehraic
operatio ns (add iti o n a nd subtracti o n ) to co mbin e the two sets of estimated effects and to
reveal th e confounding pattern fo r the en tire 16-run experiment. Conside r IA and I~ . From
Table 5. 11
132
~A
IA
1'.4 ~ A -
BD
CF
j-
HD - Cl:' -
FC
FG
I lcnce
IA + l~
BD
~---
CE
FC
-~--
A - BD - CE - FG
2
l~
-1-
+ Cl +
(I 0.8
f(;).
In Table 5.11, we perform these two operations repeatedly Io mm bi ne the cs ti mates from
the two 8-run experiments. The result is not only an estimat-. tJf the main dlect of A that is
JlO
I 0.5 and AC -
ubt~1i11
the cstimatcs
directly from the c:ombined 16-run design. hir cxample, to c;timatc AC, we Jctcrminc the
signs for the AC column by multiplying the signs in columns A and C and then apply thcse
signs to the responses. We have
l1t
63.6
76.8
60.3 71 .3 + 74.3
58.7
6-1. I
72.7
-r-
70.2
6':!.4
65.4
Dean Takahash[ w[[[ be happy to karn, the rcsu\b show that there Is <1e\ual\y nu J[\krenu:
In student performance If three rather than four \ectures are given each week.
1.H
.. __ ---
75
65
60
5 hours
3 hours
C ( homework)
- - New textbook
Figure 5.2
BD
CE
design, switching the signs of all columns leaves the signs of every 2- factor interaction .
column unchanged, and now th e signs in column A are the opposite of those in the three
interaction columns. As a result I ~ estimates A - RD - CE - FG. Using the same procedure as in Table 5.11, we combine the two estimates: (112)(/A + I~)~ A and (1/2)
00
ABC.
The defining relation for th is design consists of the following seven words:
Notice that these arc 7 of the 5 words in the defining relation for the original 2i114 design,
which we showed at the beginning of this section. They remain in the defining relation for
the combined design, whereas the other eight words in the original defining relation drop
134
TA!lLL 5.12
Tlie 2 111 4 Design (Runs 1-8) Joined by the J)esign Thal Switches the Signs
of All Columns (Runs 9-16)
l:H I OR:-.
Run
+
+
+
+
+
+
/)
(,
3
4
6
7
8
9
IO
II
12
11
14
IS
+
+
+
+
+
+
+
+
+
+
+
+
+
lo
+ AD + Cl + f-G
!, -> ( . + Al: + l!F + l>C
Iii ~ I!
IW
Cl:
/~->Ii
AV
AF
AH
Cf - };(;
HJ.
IJ(,
cc; /:'l
JJC,
llf
UC
/JI
/!/
('/)
l ~ -> ('
t;,-> [)
I~ -> t:
!~->I/~, -> (,
A('
/1G
;\/
J.G
( J/2)(/H
( 1/2)(1,
(l/2)(1u
(1/2)(/1
+ /~J)-> }j
+ !{)-> c
+ lb)->D
+ i{)->F
(112)(1,. + /~)-> J.
(112)(1,, +IL)-> c;
(l/2)(/..1
11,)-> LW + CI: t /(,
(l/2)(/H
!\;)-tA/J+U+f(;
( 112)(1,
/()-> 1\E ~ HI- 1 l!<.'
(1/2)(/u - l~,)->AB +CC+ FF
(112)(11 - tl)-.AC + liC + LJ!
(112)(1 1.
!;)-.AC;+ BC+ l>L
( l/2)(1,,
ti.)-> Al + /JF + ( /)
out. Why? The seven words that stay in all have four letters Ian even number), while the
eight words that drnp out such as AHIJ and AC'f-. have an odd number of letters (seven have
three letters each while one, ABC/Jl::FC, has seven letters). Switching the signs of aJI rnlumns leaves the signs of the four letter words unchanged, and thq rern,1in in the ddlning
relation of the fol dover design.
Hut in the defining relation of the foldover design the eight words with an odd number
of letters appear with their signs d1a11gcd. lur example, I
AH/!. In thl' Lomhinl'd Llc,,1g11
AHlJ is no longer equal to I (ABIJ
I in runs l-8, but /\Hf)
I in runs LJ 16). As a result, r\/ilJ anJ the othcr words with an odd number or letter::. drup out or thl' deJi11i11g 1elc1
lion for the combined design.
hum the defi111ng relation for the combined 16-run design we can find the entire con
founding pattern in the usual way. h>r example, multiplying the deil11i11g relc1titln hy /\/)
- - - - - ----
~--
a nd ignoring interactions of circler 3 or higher, we have that AD= CF= EC, which is identical to what we found by combining the two 8-run designs as shown in Table 5.12. In estimating the effects for the combined design, we can combine the estimates from the two
8-run d es igns as shown in Table 5.12. Rut the simplest approach is to estimate the effects
directly from the combined 16-run design . For example, for AD ( = CF= EC) we multiply
the signs in column A and D .to determine the signs for the AD column. Then we calculate
/An, the difference between th e average response when AD is at the plus level and th e average response when A JJ is at the minus level. This is an estimate of AD + CF + EC.
Suppme we had followed this compl ete fold over proced ure in the online lea rning
example of the last section. What would the result have heen? In the original 8 runs, there
were two significant estimates IA --= I 0.8 -A + HJ> + CE + FG and 11: = - 5.8 -
E + AC +
RC + OF. Switching the signs of column A gave us clear estimates of the main effect of A
and the AC interaction which were the two significant effects.
A complete foldover would have given us a clear estimate of the main effect of A. In addition, it would have separated E from (AC + BG + OF). The estimate of the main effect of
1:; would h;ivc heen small, while the estim ate of (A C+ RC + DF) would have been similar
to -5.8. Given that the mairi effect of A was large, we might have guessed that the AC interaction was respon sib le for this significant estimate, but we would not have been certain.
In this case, switching the sign of column A turned out to be a better choice.
Implementing the Sequential Approach. We have discussed two distinct methods for resolving th e ambiguities in a resolution III design: switching the signs of one column and
switching the signs of all columns ( foldover). A fol dover increases the resolution of the design from resolution Ill to resolution JV. The decision regarding which method to use
should be made after the results of the initial experiment have been obtained. If the initial
experiment shows that a number of estim ated effects are significant, a foldover is prohably
the preferred choice. On the other hand, if the initial experiment shows that only one estimate (as in our on lin e learning exa mple) or perhaps two are significant, switching the signs
of the factor associated with the largest estimate is likely to be the best choice because this.
will unconfound all the effects involving that seemingly important factor.
Decisions about the appropri ate follow-up design should be made after the results of the
initial experiments have been analyzed. For example, it does not make much sense to decide on a foldover before the results arc known, since in that case the experimenter could
have implemented a 16-run resolution JV design in the first place.
A complete foldover does not make sense if the initial experiment is already a resolution
TV design si nce in this case the main effects are already isolated. Switching the column signs
for one factor , however, may be useful as it will unconfound all interact ions involving that
factor.
5.8
With consistent growth and solid profit from their stores, an office supplies company that
we call ARC decided to expand an e-mail program that directs potential small-business customers to its Web site. After brainstorming ideas and trimming the list to the boldest ideas,
the marketing team identified 8 factors and selected two different versions of each factor to
~TWO~.l\'11.
RACTIONAL~ORIAI
LJES!t;Ns
TAlllE 5.13
>ltw IJea (
No
Ye'
ll l k>ign of e- rna I
( l'artner promolrnm
Simple
1\one
( urrl'nt
'\c(l1titl!lal llUllilll'>
I l\.1ckground color
\\"h11c
Ill""
11 ll\counl offer
I ~u1u o!f
0,.(J dl..,dHl/ll
t, ~ubject line
11 I ree gill
~ptu.._tl
~()fll'
test. lor each factor, the company refers to the currrnt setting as the control. T.tble 5.13
summdrizes the control and the new idea to be tested for each fat.tor.
A: Link to online catalog. The e-mail included a "Shop our catalog online" button near
the bottom of the message. The team felt that an obviow, link to the Web site would
encourage customers to browse through the selection of products.
H: l>esign ofe-mai/. In the past, e-rn<.1ib used a basic font 11ith a small company logo at
the top. The team wanted to test a stronger brand image, with a larger logo, more
stylized font, and greater use of the company's brand colors.
C: Partner promotions. The marketing team believed that promoting several wellknown brand-name products would encourage customers to make a purchase. They
decided to promote two specific brands in two bright boxes under "Offers from our
partners" at the bottom of the e-111ail.
]): Navigation urir on side. E-mails currently went out with .1 sidebar similar to the navigation bar or' the co111pany Weh .,ite, but with a shorte1 list of link.,. The rnm11any
decided to test the current navigation bar versus one with more choices.
F: l!ackground color. All e-mails were sent with dark text un a white background. The
creative diredor thought that changing to a blue background might help the e-mail
stand out.
F: /Jiscount ojjcr. The Internet director had gone back and forth between offering a
special e-mail discount or not. I k thought the discount helped but never had
quantified whcthcr it generated enough -,ales to ju-,tif)' the lown n1.1rg111.
( ;: Sub;rct line. The 1nternet di rector h.1d been testing di flt rent e-mail subject Ii ncs.
Until this test, "Exclusive e-mail offer from AHC" was the winncr. Since he knew the
subject line was important, he wanted to test another 1,rsion, "'1peu,Ii offer lur our
hest customers."
H: free giji. They h<1d never before offered a free gift with onlinc orders. They knew
other companies were doing so, and decided it was worth tr:1ng, selecting
c111
attrac
the order rate. Each version of the e-mail was sent to 1,000 addresses randomly chosen from
TAR l.F
13 7
5 . 14
A
Li nk
Design
D
Navigatio n
Ba r
C
Partner
f,
Co lo r
G
Subject
F
Discount
-
Li ne
- --
Respo nse:
H
Gi ft
Purchase
Rate (% )
- ------
- -------- ---- --
2.23
+
+
+
+
+
1.8 1
+
+
+
D R
2.03
2..lO
1.47
+
+
+
+
no
1.62
1. 28
J.98
1.78
IS i
I 11
-1
1.H I
1.9'1
l.xl
----- -----
Estimat e
Flfrc1
1.8363
Average
A
H
I>
F
F
(;
H
1\ll > CF: 1 /JI' + C:ll
AC + BF. + DG + FH
AD + BF + CG + EH
AF: + BC + f)fl+ FG
AF+ Rn+ Ci! + F.G
A<i I HII + CD 1- HF
A ll + RG + CF + Df.
---- -- -- - - --
0.0550
0.0850
-0.2775
-0. 0275
0.0325
-0.5675
0.0450
0.24 50
0.0425
0. 1650
0.0300
0.0250
0.0600
-0.0325
0.0875
a list th at the fi rm had purchased. The response variabl e, th e proporti on of custo mers that
o rdered from the In ternet site, is given in the last column. Th e average response was 1.84%.
The first fo ur colu mns of the design rep resent a full 2 4 facto ri al in th e fac tors A , B, C, and
D. Th e levels of th e remaining fo ur facto rs in the adjacent columns are obtained fro m the
four generators
F,
=ARC
F = A RD
G = ACD
H =BCD
l_\8
J_ I
\\{l-j
rVl-l rRAC'l~~\I
f'Al'Tlll!IAI
llr.Sll.:-IS
In '>ection 4.6.4 of Chapter 4 we showed how to find the st.tndard error of an e'>t11nated
effect when the responses are proport1om, a-, 1s the case in this e.\<tlllpk. i"lll' sl.lndard nro1
IS
gl\ en by
standll rd error(cffect) -
where p
/ p( I - p)
'
\'
p( I
;v.2
p)
N2
'4
f'( I
p)
,\'
This results in
-
stalllll' rd
error( effect)
4(0.00184)( I
\,
16,000
0.00184)
O.OOOC178 or 0.068""
At the 5% significance level, an estimated effect is significant 1f its absolute value is greater
than 0.068(1.96) = 0.133%.
The factors C (partner promotion), F (c.focount offer), and H (free gift) are significant.
Making available offers trnm two-p,trtner companies reduces the rc-.pomc r,1te by 0.28 pe1
cent,igc points. The team theorized that additional offers may have rnnf used the mess,tgc and
given customers too many disjointed offers to choose from. '\ll1t offering a 15% disu>unt de
crca-.es the response by 0.57 perLentage porn ts. The tea111 cakul,1tcd that thL lo.,., of margin b)
offering a 15% discount 1s more than covered by the increase in the number of orders. A free
en ,111d-pe11cil gift set increases the purchasing rate by 0.2'.'l l'en.:Lntage points. A11alv11ng
profitahilitv, the cu-,t of the gift was ea-,ily CO\Crcd hv the 1nue;he 111 ordn-,.
I he fourth largest effect, l4 c = 0.165, is a11 estimate ot AC
Hi
/)(, t l/f. J'hc-,e lolll
2-fallor intcract1om Ml' confounded, and l\'L' unh know th.tt their -,uni 1-, 0.1 hS. l lo\1e\er,
we do know that bctors F c111d Ji <1re '>Jgnllicant. lxpenencl' h.1-, shown thdt '>tgntftcant
2 l.iLlnr interactiom tend to involve !actor-. th.it h,ne '>ig11il1L,111t 111.ii11 dkLt,, an L'\pcri
ment.1! design principle th.it is called cf}ect heredity. :i1nLe 1- ,111d H 11crc identified ,1s -,1gniti
cant 111ain effects, effect heredity suggests that the esti111atcd effect is 111ost likely due to an
interallion between I (discount) and f-1 (free offer).
ThL 111teract1on diagra111 for factors 1- and// 1s shown in Figure 5 ..l. l'he interaction '>LIP
porh the ma111 effect'>: the 15% di.,u1u11t (/.'
the lreL' gift (H +,the top line) mcrL'<l'>es the response over 1w free grit. The 111terall1on can
be understood by comparing both points on the left with both point> on the right. On the
right (with no diswunt), offering the free prn and-pencil set give-, a h1rgc jump in re.,pome
versu-, offrring no free gift; the respome changes by 0.41 percentage po111h lro111 1.35% tu
l.76'hi. In contrast, the pomts on the left show that, with the 15% discount (1--), the tree
gift i11Lreases re>ponse only -,lightly (!"10111 2.08;(, to 2.16%). (herall, this 111te11ctio11 -,huw-,
thdt thL 15% discount is great, the free gitt is good, but both t"gether mav be overkill the
free gilt adds little to the benefit of the discount offer. These lfota helped the 111arkcting team
gain deeper insight into customer behavior, showing that one strong incentive is valuable,
but additional incentives are probably unnecessary.
The company decided to offer the 15% discount and avoic.1 the partner promotions. In
addition, they planned to run further experi111ents to study the interaction between smaller
discounts and the free gift offer.
139
21
2.1
-- 2.0
"
~ 1.9
;3
~ 1.8
v
~ 1.7
Oil
E 1.6
1.5
1.4
1.3
~--- ----------------~----
-I
15% discount
No discount
F (discount rate)
- - H (free gift)
Figure 53
5.11
Yes
H (free gift) = No
highn order interallions and could bl' used to estinh1te the ,ariancc oi',111 cftl:Lt, but thi, c'
ti mall' would only have I degree offrl'edom. Similarly, in the 2:\ design shown 111 I a hie">./.
two of the 15 estimates arc strings of higher order inleracttom ,1nd LOuld be used lo est 1mali:
the variance of an effect with 2 degrees of freedom. With 32-run cxpenmcnls, there arc more
opportunities to use this approach. lor example, a i' 1 32-ri.. 11 experiment with generator
6
12345 would be resolution VI with main effects confounded with 5 factor interactions
and 2 factor interactions confounded with 4-factor interactJ<;Jls. These leave each J-factor
intcrallion confounded with one other 3 foctor interaction. I here .ire IO sud1 p.11rs, ,,h1d1
could be used to estimate the vana1Ke of an effect with I0 dcgr'.'es ot'lrcedom.
In the 2i11 2 fractional design we gave the generators as lJ
AB ;111d F AC. A natural
ljUl'stion is, Why not set one of the generators equal to the u.lunrn ol signs of the J fauo1
interaction ABC? For example, we might use]) AB and F ,iBC. If you work out the con
founding pattern for this choice ofgcnnator'> you v.ill sec that tis i:qui'<ilent lo th.: p.1ttcrn
Ali(
for the generators that we have given. I or the 2i\ : design we gave the gi:ncrators a_, Iand I
BCJ>. In tpis case, you might ,isk, \\'hv nol I
Ali( I JI I he dl'frn111g rclat1on 111th
E 1\/i(,'LJ rather than f Al3C is I :\/5( ])/: Ji( /JI- AU. l hnlwnesl \\ord h.i, three
lettcrs, so this design would have resolut10n JI!, while the one we listed 1s re,olut1on I\'.
111 gLneral, we ''anl lo choose gcnn;ilor' lo ad1il'\c the h1ghLst pm.,ihk rnolut1011, hut
therl' 111ay be a numhcr ofcho1ct:s th,ll lcad lo dc-,1gns with lhL' s,1ntL' rc,olut1011. Jill\\ do l\L'
chlHhL' among them< I or example, consider ,1 2
design for 7 l~1clors 1n 32 rum. I 1rn of
the Lho1ces arc de-,ign I wJth generator'> (1
12 \and 7
I 2Vi ,llld de,1gn 2 with gL'llL'I"Jlor'>
6
12.\ and 7 145. hlr dco,ign I the ddinlllg rel,1t10111s I
I 23b
12-157 3!5(1/, \\'h1k
for de-,1gn 2, it is I
12.lb
1457
so thtse designs are both resolution IV, with 2-factor 1nteract1ons rnnfounded with other
2-factor interactions. But design 2 has six pairs of confounded.> factor mtcract1ons: ( 12, 36),
(13, 26), ( 16, 23), ( ! 4, 57), ( 15, 4 7), ( I 7, 45), wh.:n:as design I has on l!' three pairs l hat art
aliase;,: ( 12, 36), ( 13, 26), (16, 23). As a result, design I 1s prefc:ablc lo design 2. :\otice that
design 2 has two 4-letter words in its defining relation, whneas design I has only one. !or design I, the 4-letter word is 1236, ,md only p.ms of 2 factor 111tn.iLl1om 1111oll 111g tiie-,e -~LIL
tors will be confounded. But design 2 has a second 4 letter word, 14'17, ,1I1d p.iir'> of 2 lallor
interallions involving these four factor;, will be LOnfoundt:d c1s well. ~o the tic breaker hL'
tween these two (or more) resolution IV dt:signs is th.: number of4 ktter words in each de
fining relation. Hoth designs are resolution l \' designs, so the}' each must have at least one
4-letter word, but design l has the fewest 4-letter words and is called the m1t111num uberrution
design. Note that all of the desigm in Tables 5.5 and 5.7 .ire 11i11i111um aberration desigm.
In ( hapter 3, w: rntroduced the 1mortanl idea ofbluck111g. The same concet appl1t:s to
factori,d and fractional factorial desigm. hir example, in'' 2 L'\pl'limenl (with factor'> l,1
beled ,\, B, C), it might be necessary to perform 4 runs on one da)' ,1nd 4 runs on another.
Randomizing the assignment of runs lo days will result in a valid experiment, but if there is
a day effect, it will lead to an increast: in expcnrnental error. A11 allernat1vt: Is lo perform the
experiment in two h\ocks of L\ rum each. The runs in the f1r't b\ock (day I') W\luld be tho'e
for \\'hich the signs of the ABC column an: plus, with the rum in the -,.:cond block (dav 2)
being the runs for which AHC i' minus. This arrangement will confound the (po,s1blc) d,I\
141
effect with the 3-factor interaction that is very likely to be zero and can be ignored. Jn the
online experiment that we discussed in Section 5.7. I, the two 8-run experiments were run
at different times, which also introduces a block effect, as the students might perform betle r durin g one time period co mpa red to the other. But in this case, th e same 8-run ex periment is repeated in the second block. It may well be th at the scores in o ne block are higher
than the scores in the oth er, hut any block effect would he added to each of the 8 runs in the
hlock and would have no influence on the estimated effects. Th e differe nce hetween thi s and
the first example is that in the first, a bl ock effect would influence only 4 of the runs. The
firs I situation would be analogous to the on lin e experiment if the complete 2 3 experiment
were repeated on the second day.
EXERCISES
E = ABC and F = BCD. In a passionate discussio n, Bill argues that replacing the gen erator
E = ABC with the generator E = ABCD will result in a better confounding pattern. "We
would confound F: with a 4-factor interaction so this pattern is obviously going to he better." Karen repli es, "No Hill, you arc wrong, th e pattern is goin g to be equivalent." Co mm ent on the opinions expressed by these two lovebi rds.
Exercise 3
Karen I )on cg<rn, a st<i ffcr at the business school, is thinkin g about bu ying some
new golf eq uipm en t to improve her game in preparation for the annual golf tournament.
One day she comes across som e notes on experimental design that she fi nds on top of a file
cabinet nea r her office. "Hmm," she thinks to herself as she begins to read them, "this looks
interesting. Maybe this will help me decide what equipment to purchase."
K<iren has been thinking about replacing her current steel-shaft, small-head driver. The
company that makes her clubs has three other drivers that she is considering. The first has
the sa me steel shaft hut a newl y designed very la rge head . The second has a graphite shaft
with the sa me small head that is on her current driver, while the third has the graphite shaft
with the newly d esigned very large head. She is also wondering whether a new pair of golf
142
shot's might help, as well as switching to an expensive golf ball (at $3.0ll per ball) rath<:'r than
sticking with her discount store special (at $0.75 per ball). She also has her eye on a rather
expen<>ive new golf sweater, which should put her in the right frame of mind to hit some re ally long drives. Finally, she does not wear a golf glove when she plays, and her husband
pointed out to her that she is no Fred Couples, one of the few golf professionals who does
not wear a glove. After spending an evening reading the notes, Karen concludes that "this is
prett: clear" and decides to perform th<:' <:'xperimcnt whose cit-sign matnx is shown below.
She includes 6 factors and decides to do l6 runs, all to be pt.rformed at the local driving
range. Each run is performed in random order.
Karen buys the necessary golf balls and the glove, but is able to borrow the shoes and golf
clubs from a local stor<:'. She has her own steel-shaft, small-hc,1d driver, and she has 10 days
lo return the sweater for a full refund.
The design matrix and the results of the experiment are shown below. Each run is a single
drive (shot), and the response vJriable is the distance the shot ccHrics in yards. Karen's goal
is lo lind the equipment that will 111<1ximi1.e her distJnce, bL:l she doesn't w<1nt lo spend
money on anything that will not help in that regard.
LEVEL
Fu ct or
Steel shah
Cheap ball
No glove
Old shoes
Old sweater
Crahite shaft
Expensive ball
Glove
New shol's
Expensive new sweater
c
J)
E
F
Run
---
I:
L)
--F
Response
182
157
155
22h
184
166
177
218
+
+
8
9
IU
11
178
152
12
1.l
14
15
JJS
232
+
+
+
173
156
15,1
223
lb
IA - 24.00 ---7 A
lu = 2 1.50 ---7 B
4.25 ---7 c
le
~
/AH =
45.50
IA( = -5 .25
IAJJ
6.75
---7
---7
---7
L'
AH+ CF
AC+ BE
/AIJC
11
/AHi!
().75
---7
.\-factor interactions
+ I:F
/,\UJ
0.50
---7
l-foctor
AlJ
9.25
---7
intcr~1ctions
14.>
BF
(a) Verify the estimated effects and their confounding patterns (assume interactions of
order 3 or higher are negligible).
(b) Find all the effects including higher-order interactions that are confounded with C,
with CD, with AB.
(c) Suppose th at based
011
the length of her drives, Karen has determined that the standard deviation of a
single drive is I 3 yards. What is the 95% confidence interval for an effect? Which
effects arc significant'
(d ) Based on yo ur a nal ys is of the results, what arc the most reasonable and likely
conclusions you can draw without doing any additional runs? What should Karen
do? Show square di ag rams, if appropriate. What is the regression prediction
equation'
(e) How much additional yardage could Karen expect to get if she followed your
advice'
(f) Suppose instead of doing all 16 runs on one day, Karen decides to do 8 of the runs
on one day and th e other 8 runs the next day. Karen's husband Harold suggests she
do the even -numhen;d runs on one day and the odd-numbered runs on th e other.
ls this a good id ea? h plain. A professor at the business school suggests she do runs
l, 3, 6, 8, I 0, 12, 13, a i1d I 5 on one day and the others on the ne.xt day. Karen asks,
why? The professor says, "Look at the column of signs for the ACD interaction .
The runs with - signs should be run on one day and the runs with
+ signs should
be run on the other day. That's how I picked the runs for each day." Explain why
the professor's suggestion is better than Harold's. It may help to think of the day as
the "seventh factor" in the experiment.
Exercise 4 The Natural De iight Food Company makes a range of frozen foods. One product is the Natural Delight Soy Burger, a non meat product. The company is interested in testing a number of factors relat ed to this product.
Factor A: Location. The choice is between the natural foods freezer case and the freezer
case where beef hamburgers arc sold. The supermarket has offered the same amount
of shelf space at e ither location, and the company has to decide which location to
choose. In the past, the product has been sold in the natural foods case. The company wonders whether the higher customer flow past the hamburger freezer case
might lead to higher sales.
Factor B: Packaf!.e co lor. The existing package is green for the environment. A market-
Factor C. An in -s tein; spe(.ial display located halfway between the two alternative free ze r
locations. The display would show a happy person eating a Natural Delight Soy
144
f'ACTORIAI
l>FSlliNS
Hurger. The company logo would be displayed, and the ollowing words boldly
shown: "Oh Doy, you will love our Soy Burger!"
Fuctur 1>: Free samples. The brand manager feels strongly t at sett111g up a table next to
the location of the display, with an extremely attractive -,tore employee cooking and
offering shoppers samples of the product, would be very helpful. "If we can just get
people to try our product, they wiJJ buy it," he says.
Fuctur E: Sticker. The company has been adding a fancy sticker to the package with the
words "stay healthy." They would like to test whether or not the sticker has an effect
on sales. The sticker would match the package color.
Factor F: Package feHering. Currently, the lettering on the package uses a modern
font. A summer intern, the marketing manager's nephew, suggesb testing a more
traditional-style font.
The company has identified a group of l6 stores allot ver)' sirnila1 si1.e, with ahuut the
same weekly sales of the Soy Burger. The experiment will be ru11 over
lllid ) uly to mid-August. The response variable is dollar sales o( the product over the 4-week
period . As shown below, the design matrix consists of 16 runs, each a specific setting of each
of the 6 factors. Each of the 16 runs is randomly assigned to one of the l6 stores.
Based on an extensive analysis of a very large amount of historical data for the test stores,
the company estimates that the variance of the response ofa si11gle run (the variance of dollar sales at a store over the 4-week period) is $800.
The design matr:x and the experimental results are shown helow.
LLV El
Factor
A
LJ
/)
f;
No d"play
Nu free 'ampks !able
"Stay healthy" sticker
IZun
J)
4
5
-+
7
8
9
10
11
12
13
14
15
16
+
+
+
+
1ZesprJ11't'
l,l 10
l I 120
970
l ,o25
I ,000
980
l I 125
1,095
%(1
Y75
9.\()
'15U
9.HI
Y7~
+-
960
+-
back~round
950
I
(,1) Ohta in the estimated effects and determine the confounding patterns (see
Exercise 3 ).
(hJ B.1sed on a 9510 confidence 1ntcrv,1l, which effects are signiflLant?
(c) What settings of the factors would you recommend? Explain whv you picked these
settings.
(d) What is the regression prediction equation and what arc the predicted 4-week sales
given your answer to part c?
( e) \!\'hat additional comment,, suggestions, or observations, if anv, do you have'
Excrci!>c 5 C.on-,1dl'r I Xl'rl 1,e I in C'ha pt er 4. I .1glc Brands studic.., the effect" of 6 f,1ctor ....
1'
( "ons1der the 2''
, and 2" ' fractional foctonal designs. Discuss the confounding p.11-
Consider Case
I Mother Jones
(a ) Analv7e the results ot the experiment. Which effects arc statistically signiticant at
the 5% level' At the 10% level'
(h )
r L l \\'h.it 1.., the rl'gres..,ion prediction equation and the predicted response if signifiL.1111
l.ict<ir..,
f: xcrci!>e 7
(a)
.HL'
~upplics
In the ..,cct1on on Pl.urning the Test , it i' stated that with 35,000 names and an ,11n age re..,pon'>l' rate of ' lh, an effect would have to change the resp<lnse hy about 20(lh
( Imm 1.00 t<l l .2o ) to h,ne ,1 50:50 chance ofheing found signif1Cant. Confirm
till', -,tatemcnt hy app h ing the .1ppropriate sample size calculati dns. Use computer
software such as Minitab or JMP. What magnitude of change could you detect if
you v.antcd to he 80'" confident'
(b) Ignoring the three customer segments, discuss the advantages/disadvantages of the
16-run 2 11
,,
and 32 run 2 11
lutions ,rnd the implied confounding patterns. l.1se design software such as
or JM!' if,n"1ilahle.
~l1nit.1h
( t) [:xplore the differenc<'s among the three customer segments. Can you conclude
whether or not the effects of the 13 studied factors depend on the customer
segment'
(d )
\\'hiLh design ( ,rnd 1, hich run si1cJ would give you unconfoundcd estimate-, of
all 2-foctor intcracti< ns among the 13 factors? ls there a smaller design you
wuld use 1f you wcrl only interested in one specific interaction (say, the Kl
111ter,1ction )'
(c ) Obtain the st<llld<ird errors of the estimated effects, using the approach described
in
~eL!JOll
4.6.4.
(e I) Note that the sample si1cs are not the same. :-\ncrtheless, -,up pose that the\
were ai.d assume that the N
32 cells.
(e2 ) Use the variance var(p,) = p,(l - p,)ln, in deriving the variance of the estimated effects. Use the fact that an estimated effect 1s the difference of average
proportions (of size 16) at the low and high settings.
(e3) Compare your results in (cl) and (e2) with thosl given in the case.
(f)
Exercise 8
Consider Case 7 [Peak Elcctron1cs: lhe Broken lent Problem (B)J (mm the
Me
genl!nc prcdill which variables would be -,1gn1ticanl1 Whal 1-, the regression pred1<..t1on equation fur the number l>f broken te11h 011 Lill' p.111cl~ ht1111ate l11 '1011
llluch the number of broken tcnh 11ould he reduLcd h1 using the best -,ett1ng-. .is
mm pared lo the current settings.
Exercise 9 Discu'>s applications of factorial and fractional l"Llori.il designs to que-,tiom
that arise in your field ot study (marketlllg, operations man.1ge111ent, 111,1nage111enl 111tor
malion systems, economics, engineering, ell). Discuss the fa._tors, the re-,ponse, and the
process that you would follow to conduct such experiments.
'-.earch the literature in your field ofstud1 and find applications where these design melh
ods have been used.
Exercise 10 Paper helicopter experiment. !'his experimen was first recommended by
Ceorgc Box. A discussion of this expenmcnt dnd rnmtrulli 11 guidel111e' for pape1 heli
cupter-, are given 111 ledolter and '-,\1ersc~ ( 19971. Also, ther<.. ,ire nu111crnus rcil'rl'l!Les to
this experiment on rhc V\'eb.
Construct a par'l r helicopter by varying the length and the Nidth of the bldde!>. Vary the
weight of the helico;.llcr by using d1ffcn.:nt paper stu<..k and /or ,,Jding p.ipcr Lli[" to the hcl
icopler. Drop the helicopter from a high location (say, J 2 feel), and determine the flying
]\\'(\ 11 \'I I
11<\C 111,\J
IM'TORIAI
r>ISJc,-..;s
I 17
time. !he nhjccti\L' 1s to ma imi1e the ll~mg time. C:arry nut the experiment, prcft:rahly
with repliLations. \11,dy1c thL d;1t;1 .ind disc ms your findings.
Exercise 11 Cnmider the 8 run 2' foctonal de-,1gn. Suf1pose that you conduct the 8 runs
111 !-.ta1Hlard order. I lmwvcr, 11 t~1rns out that ()nly 4 rum can be conducted on a single day.
You ;ire concerned that the experimental conditions change from day to day, and that the
mean level!> of runs ,,1rricd out on different J,1vs arc not the same.
(a)
Would tlfr,
I.id
e~pcriment?
!or ex
l'XJ)t'J
.\011kingfl11id: water (
'111/1111tv: no s.ilt (
) or beer ( +)
) or ,,1Jt ( + )
) or room temperature ( +)
, or 6 hours ( +)
C1rn out the following experiment. Select a pinto bean, measure its "si;e," put the bean
into .1 '>oaking fluid, .ind .1ftcr ,1 certain amount ofel<Jpsed time-measure its si7L' again.
l 'l' l11e t.ihlc,poon' nl so,1klilg lluid to '>oak each be;rn. ,\fake sure that the liquid co1 er'> the
11l'dll. l '>L' rcgul.11 hL'l'I 11l'L.lll'>L' light beer might ,llt like 1v,ltcr. l or '>dlt, ,1dd 1/4 te,1spon11 to
till' ,o,1ki11g fluid. hn 1111egM, .1dd I tea'>poon to the '>oak111g flu1d.
(,1) I lI'>lU'>'> h<l\\ 1nu mc.io;ure the "site ol,1 pinto bean" and 1ls "ex11a11s1011." (;ill' a detailed description ol v;iur measurement procedure, so that it can he carried out by
other peopk; that i'>, gin' an operational definition.
(b) J)cs1g11, set up, .rnd eXL'cute a 25 1fractional factorial experimen'. Conduct two
replication,, which n;ay be run concurrently. Analyze the effects ,if the 5 factor,.
\\'rite a short report t'J<lt summari1es your findings. Support your findings with appropriate graphs ,rnd -.:alculations. Wh.1t have you learned? What was the most diffl
cult part of,.our experiment' If you had to do it over again, what would you change'
Note: In cdrrying .nit this experiment, you need ( J 6)(2) - 32 small paper con
taincrs for soaking tre beans.
Exercise 13
hbl, Kes,, and Pukelsheim ( 1992) discussed how the response variable, paint
rn<1t thickness, depends on a ~et of 6 input factors. Their objective was to find factor sett in gs
that achieve .1 desired tirget value for paint coat thickness of0.8 mm.
148
TWU-I l'VFJ
They consider the following 6 input factors A through F (listed here in decreasing order
of their assumed importance): belt speed, tube width, pump pressure, paint viscosity, tube
height, and heating temperature. All factors could be varied continuously. Level 0 stands for
+3
the standard operating condition. All factors were scaled so that levels between -3 and
were technically feasible, without increasing cost.
(a) The first experiment varied the factor levels between
that this experiment could detect the linear effect of the'e changes. The table given
below lists the observed paint thickness (in mm) for a 2-lcvel fractional factorial
experiment with four replications at each factor-level combination. The order or
the 32 experiments was fully randomized.
(al) Show that this design is a 2 6
founcitng patterns.
Hirit: Notice that factors A, B, and D form a fc II 2' factorial design. Write
out the calculation rnlumm, ,ind discu.-,s how th,.:_ levcb ul the rclllaining
factors C, t', and F were selected. You will !ind t I .it C - !llJ, I:
i\lJ, and
F= AB.
(a2) Calculate the averages Crum the replications, and analyze the averages. hnd
the important effects, calculate their standard errors, and interpret your
findings.
A
Ii
('
[)
.~J
U.~b
- I
I 49
I 12
I (1'>
1.29
l ..SI
1.41
U.7 ..!
0.98
2V
I.' l
2.17
I 46
1.4~
U.81
1.71
1.04
I .S'i
11.7'!
2.36
1.42
I .SY
U.88
1.76
Ul
I .4ll
U.8J
2.12
1.40
(b) A follow-up experiment focused on the first 4 factors (factors A through D). The
results are given below. The levels of the factors were changed because of the findings in the initial experiment. When analyzing the data, you can transform the fac-
+ J; the -1 in
-1.5 on the original scale; the + 1 in
I WO
I I VI I
/)
lHM I ICl'IAI
()
I >l
I 71
I.I'>
I 71
ll.YI
()
()
DI
<,J(;N~~
_ __
I h1cknc'"
l.'>I
O.hI
I ,.J
II
lACTORIAI
()
()
I IK
{) 78
1.<JH
1.06
l.Hl
1.2'!
I hi
1..lll
(L) Another follow up experiment was conducted with just the first 3 factors. Tlw re-
I II
I II
I
I.II
I 'i
1.0
l.'i
< .akula!c
;~
0.;
()
11. 'i I
n.hh
0.h2
II "I
ll.h'l
II l'l
o. ')
0. iH
0.M
I IH
tl.74
0.; l,
11.-9
11.78
thL ,1\'cr,1gL':, from the repl1citiorn,, and analy1c the ,l\'erages. find the
portant cffech, calcul,1 te their standard errors, and interpret your findings.
1111
(dl Summari;c \'Our nncii:1gs from all three experiments. Can you find factor settings
that achieve the desired target value for pamt coat thickness (0.8 mm)?
TAB I I
l'ln ckctt
H1111111m
6. I
12 Run s
!'A CTOR
Run . .
10
II
+
\
I
/1
8
9
10
II
12
6.2
J.ihk h. I shows thl' design 111.itri\ for the 12 run l'l,1ckctt Burman desi gn. The row' in th1.,
m.itrl\ rq'rL''-l'nt tiH' l'llll'- ( .\'
12 ) ,rnd the columns represent 11p to 11 factor'>. ,.\ , in all
l'i.1Lkctt Burman dc-,1gm, th : L'nt1re dc:-.1gn m.1tnx 1s comtructed from 2 11 initial row ol plm
,rnd m1nu., sign-. th.it I', given in Appendix 6.1. The J,1st entry in row I ( ) 1s pl,1ced 1n the
first position of rm' 2. !'he other entries in row I fill in the remainder of row 2, by each movlllg one pos1t1on to the right. I he third row 1s generated from the second row using thL' -.ame
method, .rnd the process continues until the next to the last row is filled in. A row of all signs is then added to comp etc the design.
The design is orthogonal, .in cc for any two factors (columns ) the number of runs .it each
of the four factor -lc1 cl u>mhinations (
), ( + ), ( + ), ( +- r) is the same (3 run'> ). Because the design is orthogonal, each of the 11 linear contrasts is independently estimated.
We obtain each l"it1n1atc 1n the usual way, by taking the average of the responses when the
column entry 1s at the plus sign (y.) minus the average of the responses when the column
entry is at the minus sign ( y ).
1s2
in column 26. Writing each column as a row to save space, rnd listing the run
11 umbers
+1
- l
Column I:
+I
- l
Column 26:
+1
+l
- I
+I
I J
+ J
+1
10
11
12
+I
fl
+I
Both columns have the same number of plus and minus signs, and they <1dd up to zero. Furthermore, the sum of the squares of the entries in each column is 12, the number of runs N.
The columns are correlated. In 8 of the 12 runs, the column signs match; whereas for 4 runs,
they arc opposite. The correlation between these two mean Lern columns (let us call them x
and z) is given by
~x,z,
8 - 4
12
This correlation of 1/3 indicates that there is some linkage lJLtween the signs of these two
col u1\111s, but the correlation is fairly weak.
Jn fractional factorial designs, if two effects arc confounded, the correlation between column entries is either p = + l or p - - I. ln Plackett -l3urni_m dL'Si~11s, if two effeLh are
confounded, the absolute value of the correlation between (olumn entries is strictly less
than I. One says that the effects are purliu/Ly cunf(ninded or pcutiu/Ly aliusl'd.
Calculating the correlation between each main effect colu11111 and L'<Jch 2 !actor interaL tion column, we find that for each factor ib 111c1i11 effect is ;)artic1lly cunlounded with <ill
2-factor interactions not involving that factor. In addition, we t1nd that for all 1l factors, the
correlation between column signs for a factor's main effel.t and each of ib confounded interactions is either +I /3 or -1 /3.
The correlation between each main effect column and eacl1 2 facto1 interaction colunrn
that includes the main effect factor is Lero. You can sec this, for example , by looking ,11 the
column for factor land the interaction column 12 that are listed below.
Run
Column l:
+ I
Column 12:
+J
- l
+ I
- I
+ I
- 1
+ I
+ I
+ J
+ J
rI
10
11
- l
+ l
12
-t I
The column signs match in half of the runs, and the correlation bcrwcen these two columm
is given by
6
12
- 0
Table 6.2 lists all confounding (correlation) coefficients among main effects and 2-factor interactions in the N = 12 Plackett-Burman design with 8 factor~ .. The first eight columns in
Table 6.1 are used for the levels of the 8 factors, but any other set of eight columns could
have been taken. In showing the confounding coefficients, Wt' limit ourselves to 8 factors,
because this makes it easier to display the confounding coeflicients in a compact table.
154
sponses at the plus level of column l (runs l, 3, 7, 8, 9, 11) witn the average responses at the
min us level of column I (runs 2, 4, 5, 6, I 0, J2 ). The first rmv of Table 6.2 shows that it 1s
an estimate given by
+ - (26 + 34 + 37 + 45 + 46 + 68 + 78)
11 ~ l
l
_)
(23
+ 24 + 25 + 27 + 28 + 35 + 36 + 38 + 47
48
+ 56 +
57
+ 58 + 67)
is an estimate of the
main effect plus the wf'ighted sum of the 2- factor interactions that are rn11founded with that
main effect (we are ignoring 3-factor and higher-order interactions). The weight applied to
each 2- factor interaction is the correlation (shown in Table 6.?l betwl'en the entries in the
main effect column anu the entries in that interaction col un .n. (In fractional factorial dL signs, the weights applieu to each cunfoundeu 2- factur intera< ', ion are also the corrl'lat1rn1s,
which are either +-
or -
J .)
1s
very few 2- factor mteractions to be important (effect spa1sity ) and those that a1e to be
smalln in magnitude compared to the main l'ffects ( hierarchical ordering) . ln many situa tions , especially in the early stages of an experimental inwstigation, it is Jppropriate to ignore 2- factor interactions and assume that each of the 11 estimated effects represents a main
effect. This is reasonable as long as the magnitudes of 2-factor interactions arc relatively
small compared to main effects. But what if this is not the case?
Suppose we analyze the results of the experiment under thL assumption that each of the
-~J
___
l'_l_A< 1'f I l
BURMAN
[l~Sf(,NS
I"
founded with two different p1;i1n effects is easy to show. Assume that a 2-factor inter.1ction
(sav, 12) were confounded \\ith two main effects (say, .land 4). Then the defining relation
would contain the \\'ords 12.1, 124, and ( 123 )( 124 l
34, making the fractional factori<il a
re,olut10\l rr ck,ign.1 But Ill this cise, the hl.l.'> Ill the ma111 effcct estimate \\'ill equal the full
magnitude of the intvraLtion not just" fr,iLt1on as 1n the Plackett-Burman design.
lkL.iu'e of the l'\IL'il'l\L' j'.Ht1al u1nfound1ng 1n PLiLkett Burman designs, the opt1c1n of
.1 cc 11111,ktc lo Ide'' l'I c.i 1l he <''l'ec 1.dh 11,L'fu I 1\dd 1ng 12 111ore ru 11s to the l'\ ist ing design h:'>Wlll~ hi ng the sigm 1n all f.ic or u1lumn .., will result 1n a 24-run resolu 1on I\ design ,,1th
main eflcch no longer u1nfoundcd with 2 fallor interactions.
C1vcn the rnmpkx urnfounding patterns of resolution III Plackett Burman dc'1gm, it
mav '>eem at 111-.,t glamc th.it they would not provide anv useful inform.111011 about 2 l,iLtor
intcraLtiom. But ,is \\c 'hcl\\ 111 the following ex,1mple, under some circu 11stances it 111.iy he
pos,ihle to estimate one or more 2-factor intcraLtions from the results of J Plackett-Burman
dc,ign.
6.3
= 20 RUNS:
A leading I ortune 500 financial products and services firm uses direct mail to read1 new
customers. I J-or a detailed d '>CUSsion, sec Case 9 (Experimental Design on the 1-ront r ines
nf ;>,Jarket1ng: Testing :'\c1' Ideas to Increase Direct ,\lad Sales) in the case study appendix.I
l\ut .is u>111pet1t1on increased over the rears, response to the hrm's ,,ftns had declined
slL',idil). In .in L'ffort lo re,l'r'L' this trend, the co111p.rn: hired a consultant to help \\'Ith the
pl.nrning .ind cxeuilion ofa large mailing nfa credit card offer.
The marketing team identified 19 factors to test (T,ihlc 6.3), and the consultant spe.Lificd
the 20-run PlaLkett-1\urman design shown in .!able 6.4. Herc, as in the 12-run design, the
entire de,ign m.1tri\ i' generated from the fir<;t row listed in Appendix 6. I. The procedure
is exactly the same ,1-, before. The last rntry in row I (
) is placed in the first position of
rel\\ 2. The other ent1ie-, in rel\\ I fill in the remainder of row 2, by each moving one posi
t1on to the right. I hi' prmns u>ntinue., until the next to the last row!'> filled 111. 1\ row nf
,di n11n11 .., -,1g11s 1s then .idded to c..omplete the design.
I alto rs A f were .ipproa, hes aimed at getting more people to look ;nsidc the ell\ clo11e,
11hilc the rcm.1in111g l.iL!nrs 1cl.1ted to the offer inside. l".illor (, ('tickc I refer' to till J1L'L'I
ofl st1Lkn .1t the top ofthL' lcttL'r to he applied hr the cu'1omcr to the orcin form. The l11m'.s
m.1rket1ng st.1ff believed that a '>ticker increases involvement and is likely to incre.i'>L' the
number of orders. I actor.\'/ '1roduct selection) refer, to the number of different credit L.ird
images th.it a customer cou d chose fro111, and the term "huckslip" (factor<> Q nnd Nl describes a small separate sheL'i nf paper th.it highlights product informat1011.
A total of I 00,000 peoplL randomly Lirnsen from a list of potential customers, p.irticipated in the experiment. Each of the 20 runs in Table 6.4 describes a test package th;it was
sent to .'i,000 people. The 1-c-sponse 1ari.ihle is the fraction of people who respond to the
credit card offer.
156
PLACKETT-BURMAN DESIGNS
6,3
TABLJ:,
B
('
[)
L:
I
G
II
Factor
( - ) Control
(+)New Idea
Envelope teaser
Return address
"Offinal" ink-stamp on envelope
Postage
Additional graphic on envelope
Pnce graphic Oil letter
Sticker
Personalize letter copy
C:opy message
Letter headline
I .ist o 1 benefits
Postscript oil letter
General offer
Blind
Yes
Preprinted
Yes
1-'roduct-specil1c uffer
Add company name
,\-/
Signature
Product selection
Value of free gift
!Zeplv envelope
lnlurn1atio11 on buckslip
Second lrnckslip
1n terest rate
u
jJ
CJ
/(
,)
:\o
Stamp
Nu
l.arge
No
Yes
Small
Yes
No
Targeted
Headline I
~ t,1 JHIJ rd IJ:)'U u t
CeneriL
Headline 2
CreatJve !.1yout
Control \er~1un
New po-;tscnpt
Managc1
Mauy
~l'llJOr l'Xl'CUllVt'
re-... .
High
Luv.
(,u11lrol
Nl'V\ '>1Vk
Product ink
No
J.ree l''ft
Yes
1 ligh
Low
TABLF
11110
6.4
-- --FAl I UR
----
+
+
t+
+
+
+
+
'
L:
+
+
+
+
+
+
+
+
Responses
----
+
+
}J
Ii
+
t
+
t-
134
104
-+
60
+
+
+
+
+
+
+
+
+
t
-I
-I
122
57
30
8f>
I 14
ll.bU
:!, 16
0.78
0.80
0.98
U.74
L98
1.72
4.\
(),86
47
0.\14
104
2.08
49
37
99
+
+
2,08
1.20
Ub
j')
0.84
2.68
61
40
L04
0.76
bK
108
+
+
1urs,000J l\Jte
52
38
42
+
+
+
+
+
-t
The resulting response rates are shown in Table 6.4, The estimated cffech, which are dit1Crc11ces between average responses at the plus and minus levels of the factor columns, arc
listed in Table 6.5. A Pareto chart, where estimated effects are ordered according tu their ab
solute magnitudes, is shown in Figu1e 6. I. Signillca11ce of each effect i.'> detnmined by cum
paring the estimate with (twice) its standard error, In Chapter 4, Section 4.6.4, we showed
how to calculate the standard error ufan estimated effect when the rcspunse is a proportion.
158
PLACKl'T'I-llllRMAN llf-''1t;Ns
- ~ - - - --
--
- - --
-- - -
--
14(0.0129s)(o.98702J
100
100,000
0 .0 7 17 ( percent )
S : Luw interest rate. Increasing the credit card interest rate reduces the response by
0.864 percentage points. ln addition, it was very clc<.1r based un the firm 's financial
models that the gain from the higher rate would be much less th,rn the loss due to
the decrease in the number of customers.
N. : Nu second bucks/ip. A main effrct interpretation shows that adding another buck slip reduces the number of buyers by 0.304 percent<.1ge points. One explanation offered for this surprising result was that the buckslip added unnecessary inCorr11ation
and obscured the simple "buy now" offer. A more compelling explanation that we
discuss in the next section is that the significant effect i' not the result of the main
effect of factor R., but is due to an interaction between 1 wo uthn factors.
/+ :Generic copy message. The targeted message(/ - ) emphasi1ed that a person could
chose a ued it card design that reflected his ur her interc,ts , whill' the generic mes sage (I +) focused on the value of the offer. The u-eati vc team was certain that appealing to a person's interest> would increase the rcsporl'>l' , hut they wen wrung.
!'he generic message increased the response by 0.296 percentage point.
: l.etter headline #1. fhe result showed that all "good" headlines were not equal. The
best wording increased the response by 0.192 percrnt<.1ge point.
6.3.2 Evidence of a 2-Factor Interaction Between
0.6.
Each 2- factor interaction appears in the confounding pattern of 17 main effects. For 16
of these main effects, the correlation with this interaction is - 0.2 or +0.2, whereas for a
l'I AC Kl f'l
ll\IHMA~
IHSl(,~S
implies that a large 2 factor interaction will create a large hias ( 0.6 times the value of the
interaction ) in the estimate <1lone partiutl<H main effect.
Factors S (interest rate) 01d G (presence of a sticker) are by for the largest effclt.'> in
Table 6.5. The correlation bl'tween the main effect of R (second buckslip), and the SG
interaLtion i'> 0.6. You crn .-heck this by Lalculating the correlation between the column
entries, as we did in the previous section . 1lcnce, a significant SC interaction would hias the
estim.lte of the main cffclt of/? hv 0.6 times the value of the interaction. This suggests that
it may not he the m.un effect of foctor R that is important, but the 2-factor interaction between S and (,'.This interpr.-tatinn is supported hy the principle of effect heredity .1s the
111.iin cffecto; of~ .rnd (,arc th e most import.lilt factors. As one might expect, at the high interest rate the effect of having a sticker is small (a change from 0.776<Y,, t1J 0.956% is implied
by the re'>uih Ill r.1hlc (1.4 ), hut .it the In\\' interest rate, the effect of having the stiLkt'I is
much J.1rgcr (a Lh.1ngc from 1.2(1'1 % to 2.024""0). J'he sticker is most effective when the rns tomer reLe1ves a more .1t t raL 11ve offer.
lbU
f'I A< Kl I I
BURMAN
lll ~ICN'
TABLE 6.6
Three Regression Models Relating 1he Response Rate to Factors S (Interest lfote ),
G (Sticker), R (Second Buckslzp ), I (Copy M.:::sage), and
j (Letter Headline) (Minitab Outpu )
(A) REGRESSION OF RESPONSE RATE ON S, G, II., AND THEIR INTlRACTIONS
G
R
SG
SR
GR
SGR
Coef
1.32500
-0.38625
-0.32000
-0.06125
0.15125
-0.07000
0.07625
0.04500
S = 0.236185 R-Sq
T
SE Coef
0.06602 20.07
0.06602 -5.85
0.06602 -4.85
0.06602 -0.93
2.29
0.06602
0.06602 -1. 06
1.16
0.06602
0.68
0.06602
90.2% R- Sq(adj)
0.000
0.000
0.000
0. 372
0.041
0. 310
0. 271
0.508
=
84.6%
G
SG
s
(c)
Coef
1.29800
'- 0.43200
-0. 27800
0.18800
0.234585 R-Sq
SE Coef
T
0.05245 24.75
0.05245 -8.24
0.05245 -5.30
0.05245
3.58
87.2% R-Sq(adj)
"c,,
G
SG
I
Coef
1. 29800
-0.43200
-0. 27800
0.15130
0 .11774
-0.06574
0' 197073
R-Sq
SE Coef
0.04407
0.04407
0.04407
0.04594
0.04501
0.04501
92.1%
0.000
0.000
0.000
0.002
84.8%
I,/, ANlJ _)(,
0.151 SG
29.46
-9.80
-6.31
3.29
2 '62
-1. 46
0.000
0.000
0.000
0.005
0.020
0.166
R-Sq(adj)
0.118 I
0.0657 J
89.3%
find that the three significant effects are S, c;, and SG, confirming that it is the SG interaction, not the main effect ofR, that is significant.
Table 6.6(a) shows the Minitab output when regressing the response rate on the main
and i nleractions effects of the three factors S, G, and R. The standard errors of the estimated
reg1ession coefficients use the pooled variance from the eight factor -level cumbinations,
assuming that the other factors have no effect on the response. The 1-ratim and the proba bility values of the regression coefficients listed in this table indicate that S, G, and
sc;
are
significant, while all other effects (including the main effect uf !actor J<.) a1-e insignilicant.
Table 6.6(b) lists the results of the regression on the significant effects S, C, and SC. The re-
r
l'I AC Kl I I
l~urman
161
m.lle'> oft he four m .1111 cflect' .rnd I he '>ix 2 fact or 111leracl1ons involving these four fact or-. can
he nht,uned when thLir h1ghe : order (3 - .rnd
-1
g1hk. J J,mng eliminated factor I?, we appi)' Chrng's finding and consider ,1 model th,11 111
elude-. the four factor.-. that were significant in our initial main effects analysis: S, C:, I, ,1nd /,
together with their six 2-factci interactions. The result of this regression shows that all 2-factor 1nteractions except sc; <l!"L mignificant, leading to a model with the follr main effects and
the S(, interaction. !he titting results for the model with S, G, SG, and tht wo main efkct-. of
I and fa re shown in Table 6.6( c ). The five effect.;; explain 92. l % of the vari .l!ion, a rather modest improvement O\'Cr the 87. ;cy,, that is explained by S, c;, and SG. It is cleu that factors S (intere'>l r.ite ) and (; ( -.t iL kcr ) .in d their 1n t er act ion ')(;a re the main driver'> oft he response r,lle.
11 ~
)
-
2h
+ [I
3
- 26 ]
+ 26
3
164
columns x!'" This implies that the usual main effect estimate of factor i is an estimate of
f3, +
(A64 )
J'- r
The confounding coefficient between the main effect of facto r i and the 2-Lrctor inter~rdiun
arnung factors f and r is given by Pr (p )' the correlation coefficient between the design vector
x , Jnd the interaction (calculatmn) column x,,.
Discussion
I. The result i1nplies that a main effect is unconfounded '-"1th ,rll interactions that con tain the main effect factor. The column of products of 1he elements in ,1 design col umn x, and an interaction (cakulatiun ) column x 11 con aini11g the main effect 1s
identical to the column x, (as the product ofa column with itself leads to a column
of ones). The sum of such a column is zero, implying fJi (irJ = 0.
2. A main effect in most Plackett-Burman designs is confounded with all other inter actions that do not contain the main effect as a factor. Depending on the run size of
the Plackett-Burman design and the particular main and interaction effect being
LOnsidered, this correlation can take on various value;, .
Consider theN = 12 Plackett-Burman design i11Table6.1, and the design column x (for
factor I) and the interaction column x23 (which one gets by row-wise multiplication of the
1
column ~
tween x 1 and x2 r, is + 1/3. Only main effects and interactions that contain the main effect are
uncorrelated. All other confounding coefficients are either - 113 or -t II 3.
for the N - 20-run Plackett -Burman design, the confounding coefficients are either
- 0.2, +0.2, or - 0.6. I-or example, the confounding coefficient betWL'E'll the main effect of
A and the BC interaction in Table 6.4 is 0.2; it is +0.2 for thl' main efkct uf A and the 13U
interaction, and - 0.6 for the main effect of}{ and the SG interaction. Only main effects and
interactions that contain the main effect are uncorrelated.
1-01 the N = 24 run Plackett-Burman design , the nonzero -:orrelations are either - I/_) or
+ 1/3. In contrast to the N = l 2- and N = 20-run designs, not every main elfect is contou nded
with interactions that do nut contain that main ertecl; some conelatiom are in fad 1ero .
Prujectivity Propaties
l,l<1Lkett -Burm<111 de,igns are useful in ,Lree11111g '>ituatiu1b whnl' thL ub1ecti1c is tu idLn
tify important factors for more detailed study. The principle uf "effl'ct sparsity " suggests
that , most likely, only a few factors amung a large pool of potl'nti<Ii faLturs arl' important.
When choosing a design for factor screening, it is important lo consider projections of the
design into small subsets of factors.
l~ox
ofprujectivity p if it produces, for any subset ufp factors, a cumplek factorial (possibly with
some combinations replicated).
Box and Tyssedal ( 1996) show that (most ) Plackett- Burman desigm attain projectivity 3.
This is true for the N = 12 and N -_ 20 designs considered in this chapter. Exceptions to
this rule, .!nd projcLtivity le,, than 3, arc the Plackett-Burman designs in N
,Y
'i6 rum.
40 and
Convince yourself of this fact by considering the first 3 factors in the 12-run PlackcttBurman design in Ta hie 6.1. Consider the eight factor-level combinations of these 3 factors,
and -;how that each one has at least one run; four of the eight factor-level combinatiom (the
ones at (
), (-+ -+
), I+
+ ), ( -+ +))have two runs. The general result hY Box
.ind rn-.ed,1! ( 19%) implies ,h,ll this projectivity holds for any s11hsct of 3 factors, not just
the lirst .\.
PlaLkctt Burm,111 dc-,1gns ire main effects dc'>igm, and they should be avoided ifwc ,ire
concerned about possibly large interactions. Nevertheless, the projcctivity of these de-.igns
allows the investigator to identify, in certain circumstances, selected 2-foctor intcrn(tinns.
Assuming that there ,ire no more than 3 active factors, one can estimate the main effeL 1' and
the interactiom ,1111ong thc'>c 3 factors. h1rthcrmnrc, while the Plackett-Burman design in
,\'
20 runs\\ ill not proicLt into full 2 1 foL!orials, Cheng ( 1995) shows that the proiL'L ti on
of th1-, design onto ,1111 I 1;1ctnr'> ha'> the propertv th.it all main effects, and 2-factor intnac
t1on-. ol these I faLtor'> L.lll he e-.t1mated when their higher order (ordn 3 ,rnd 4) interaL
t1on-. .ire .l'>Sllllled ncgligihil' I hi., 1s a rem;irkahle re'>ult, as it shows that one can csti111.1te
,di m,11n elfrLI'> .111d .:' f.iLtor 111cr.1L1ion> of4 f.iLlor-., .111d one can do so without spcLill'1ng
a priori which~ !.1L1ors ,HL' 1111portant.
In terms of their project1l'it1, l'lackctt Burman designs have an adv1ntage over resolution Ill fraLtion,d 1.!Llorial Lks1gns that have projeLtivity of only 2. ('or1,idcr, for ex,implc,
the.:>!,; ' 1 dc-,ign gc11cratcd h> ,l'>Sociating f.1ltor'> 5-15 with the intcr<1ctinn colum11' ofa
full f.Kton,1l 1n fa(tor'> I 4. J1 1s e;1sy to Lhed; that the 16 runs of factors I, 2, and 5
I 2 in
1 able "i.7 will not generate runs at all eight factor-level combinations in these three f.ictors.
Because of their complicated alias structures, experimenters have sometimes been 1:eluctant to use Plackett Burman de'>!gi1s for experimentation (sec Draper, 1985). However, the
interesting projective properties of Plackett-Burman designs provide a compelling rationale for their use.
EXERCISES
Exercise 1 Comidcr Case 8 (hpcrimcnts in Retail Operations: Design Issues and \pplication) from the L,i.,e .study appendix.
(aJ Jn th1.s LdSL'. we 'tudy JO factors through a 24-run design that comi.sts ofa 12 run
Plackett-Burman design and its fnldover. The 2 10 5 fractional factorial design in
32 rum would he dnother potential design. Is it possible to achieve a resolution I\'
design in .\2 runs? If si, discuss the generators and the confounding patterns
(assuming that interactions of order 3 or higher are negligible).
(b) Confirm the estimatul effects in the test result section.
(c) Obtain stand.ml errc1s of the estimated effects, using the followmg approaches:
(l I) l ' >L' the perLcllt ,hanges of week I and week 2 as indcpenocnt replications,
I~
Pl A( Kl'T'I
llll!{MAS
DLStc;ss
24 variances, and substitute the pooled variance Lstlll1ate lllto the equation
for the standard error in Section 4.4. Check whe; 1er the oh-.crvat1011s for
weeks I and 2 are uncorrelated.
f/1111: Calculate the numbcr of runs for which week 2 h,1-. larger -.,de-. than
week I. Under the null hvpothesis, you expect 2412
12 run.-.. You can u-.e
the binomial distribution with N
24 and 7T
0.5 (expressing the fol.I th,ll
there 1s a 50: 50 chance of increasing sales) to as;css the probability value
of the sample result.
(c2) Determine the signifie<:nce of the estimated efkcs through normal probabil
ity plots and Lenth's PSI:
( c3) Explain how regression can be used to calculate the standard errors of the es-
tim<lled effects. Hint: The regression relates the 24 X I vellor of res po mes to
4 20
a constant and the three design vectors A, lJ, and /-.This leaves 24
degrees of freedom for the standard deviation of the error, s 0. J 092. This
estimate is used in the covariance matrix V({3)
s 2(X' X) 1. The squ<tre
roots cfthe diagonal elemenh are one-half the strndard error-. of the estimated effects. Note that none of these calculations are needed when using
computer -,oft ware lo e\L'Utle the
rcgrc-.~.ion.
(c4) Discuss whether our earlier conclusion about the -.1gntfic,1nce of the cffccb
I'>
Ill
thogonal. (,omider any two foLtors, sa7 factors A ,111d /",and \l'rif\ th,1t the design
includes :'i rnm .it each of lite fuu1 l.illor lc\el u1111h111,1l1011-.. l~epeal thi-. cl1elk fut
other pat rs of fallors.
I h Reanaly1e the data from the Plackett-Burman <tnd the lactori,tl de,1gns in Table,
A9.2 and A9.6.
(c)
A binomial approximation (see Section 4.6.4) was used lo detcrminL' the stamt1rd
errors of the estimdted effect'>. L,:se ,dternative appru,H he-.:
(cl) Construct ,1 normal probability plot of the l'stin,.ited effect-. ,ind detcrmrne
the significance of the factors that wav.
(c2) Use l.enth's PSE approach to assess the significance. J)isrn-.s the similarities
and differences among these three approaches.
Note: The underlying true sign up proportion 'TT depends on the advcr
tiscment scenario for that particular run. Deusions (yes/no) by 111d1viduab in
that group are the results of lkrnouilli trials with succes-. probability 77. This
implie' !hat the variance nfthe sample proportion calculated from the
v1du,i1, 111 that grnup 1s given h) mr(p) - 7T( I
7T)ln.
11
indi-
Thi' deriv;llio11 ,1ssumes that the <>amc proportion TI applies to all subject>
in the group, an 1ssumption that mdy not he correct. Sign-up rates may \,tr)'
acrms suh1ects, TT - TT + /;,,where/;, expresses the subject variability. One
can show that th heterogeneity across subjects increases the variance of a
pro port inn .ind I hat rnr(p)
TT( I
7T )! 11. Discuss the effect of hctcrogcnL'
ity on the stand.ml error of an estimated effect.
r,
I ahlc 1\Y.2. I ollo\\ tlw .1ppro<1ch in Appendix 6.2. As<>ume a model with 2-f,1L tor
1ntcralt1om, .rnd dclcrmine the h1,1.' of the m.iin effect estimate'> ror cx,1mplc,
consider the estimate that corresponds to factor A (column I). Consider the inter,iction het,,ecn f.1Ltm, (,' ,rnd H, represented hy the column of their products (LOlumn 2). rJic corrcl;1ticn between 1hcse two columns expresses the confounding
factor hct\H'l'n (;}/and the main effect of A. ()fcourse, many intcr,1ctiom will
uinfouml the 111.iin effect of 1\. You Lan use the approach in Appendix 6.2 to deter
mine the complete confounding pattern. Alternatively, using an Excel spread-,hcet,
you can enumcrite thl correlations between each main effect and its confounded
2-f,JCtor interactions.
(fl In the section on key metrics and sample size, we claim that an overall sample si1c
of I 00,000 (,rnd ,1 sample size of 5,000 in each of the 20 cells) implies a certain
power ( 80/cJ) of detecting a 17% shift (from 1% to l. 17%) in the average response
rate. How was this determined? Recreate the steps that arc involved in this analysis.
Use computer software such as Minitab or )MP.
(g) Consider a 32 run 2 19
14
168
(g2) If only resolution III is possible, discuss the advantages and disadvantages of
the 20 -run Plackett-Burman and the 32-run 2 19 4 fractional factorial designs.
(g3) Is it possible to construct a resolution IV design in N
40 rum ? Discuss.
(a ) Which of the two fractions has the more preferable confounding pattern? Discuss.
(bJ Consider the projectivity properties of the two fractions. Assume that you have
reasons to believe that only 3 factors are important, but you do not know which
3 factors. Does either fraction result in ;i full 3-factor I 1ctorial of any 3 factors?
Discuss.
Consider the study in Section 6.3. Suppose instead of a 20-run Plackett Burm<1n design for 19 factors, the experimenters had elimi11<1ted 4 factors and had chosen
the 2111; 11 fractional factorial design shown in Table 5.7 of Chapter 5. Show that in contrast
to the 20 -run Plackett -Burman design, which has projectivit)' 3, this frnllional factorial de
sign has projecti\ity less than 3. To do SO, you need to show that nut ,i11 chrnces ur
J lallors have at least one run at ec1ch uf the l'ight factor -lcvd ,urnhinc1t1011 .,.
Exercise 4
170
analyses when factors are continuous and explains how to test the linearity of the response
relationship. Section 7.5 considers an experiment with three continuous factors where two
factors arc studied at two levels each, and one factor is studiLd at three levels.
1-.xperiments with k
factorial experiments, for example, require Y runs; 9 runs art required fork= 2 factors; 27
runs, fork
= 3 factors;
81 runs, fork
nonpar-
simonious very quickly, because they require more runs than the experimenter is usually
able or willing to carry out. Hence, it is important that the experimenter has screened the
factors beforehand and has reduced the number of factors so that a few important ones can
he ::.tudied at more than two levels. [f there are still too many factors, one can reduce the
number of runs in 3-level desig11s hy considenng ortl10gu11, I :r,1L11011s. \\'e introduLL lr<1L
tiurld! 3-level facturi:.il designs in Section 7.6.
7.2
FACTORIAL EXPERIMENT
Consider 2 factors A and B. Factor A is studied at a levels, while factor Ii is studied ,it /1 levels. We assume that the same number of runs, n, is carried out at all ah factor-level com bi
nations. Factor-level combinations are sometimes rderr<:d to as celk The experiment resulb in a total of abn responses, y, 1,, for
a different seeds a:id b different fertilizers, the analysis explained in this section, which assume-, equal preLision and independence of th<: uhservatiom will, most likely, not be appropriate. One can exp<:ct the respo11SL'S to he morl' alikl' within l'ach whole plot, and less
alike from one whole plot to another. While the correct analv-,is ol suLh data is straightforward, it is more complicated than the analy::.is we describe in thi-, '>L'Ll1u11.
\\e write the response as
)',;, = y.
y,,.)
where y.. is the c)\'er.ill 111ean of the ahn observations, y, .. is the mean of the hn responses
when factor A rs at it' ith level, y. 1 is the average of the an observation when factor His at
its 1th level, and y 11 is the ave1.ige of the
11
nee.ill the dot notation introduced in Section 3.3 of Chapter 3. Dots in place of subscripts
indicate thJt we <l\'erage the observations over these subscripts.
The difference ( )'
)'.. ) measures the effect of factor A, while averaging over all lev\' ) ,
els of factor Ii. !'here '' nn Illa in effect to factor 1\ if the diffcrcnLes (y 1
( \',
\'
) MC 1cro, or ncarlv <;o. L.1rgc differences illlply a main effect
)'. ), .. "(l',
nfL1ctor1\.
l'
) 1mw;ures the effett of factor Fl, while awraging over ,ill le\
The diffncncc (y,
els of faltcir 1\.
r he third LOlllJHinent 111 the l'XfHC'>s!On on the right hand side, v11
y.
t ( 1'
y.) (v.,
)' .)
(,l',
}'..
}'.,. i )'. .. ),mc.1surestheinter<1ll1onhetwec11 f~1c
tors A and H. It i' the ditferen L' between the oh,encd mean respnn'e at the li,.Jl factor level
colllhin.itinn and thL' prcd1Ll1>cl mean resprnl,L' that i' implied by the lll.;rn effeds oft.1ctc>rs
A and H alone. lntcract1011 i' negligible if these differences arc small. Large differences irn
pl; ,111 intcrallion l)L'twecn fa, tor.'> 1\ and ii.
rhe la..r LOlllponcnt ( 1'11 ,
y, 1 . ) represent:-. the error. It is the deviation of the oh.,crvalion frolll its rc,peLtl\e cell rnc'<lll.
'>1mrl,1r to our d1sLt1ssro11 oft he an,1lys" of variance (1\NOVA) 111Chapter3, we can partition the total sum of squares into several (four) components: the sum of squares of factor
\ ( m.1111 l'ffcll of,\ 1, t hl' <;lf fll of squarc.s of factor fl (main effect of foLtor fl), the '>LI m of
squares of the interaction 1\h, ,md the sum of squares of error.
)' .
hn
)'
. 2: 22 22 (J',,,
''
'lS(A)
/1
"
22 (y, ..
l'w)
I I
2}
(11
I) + (/7
I ) +- ( 11
i)(h
I ) ' ob( 11
I)
These cntne' ,ire d1spl<11ed 11 thl' t\NO\'t\ in Table 7.1. The mean '>quares arc obtairll'd by
d1\'id111g the '11111' of '>qll.lrl'' bv their respeLtive degrees of freedom. The F-ratim 111 the
t\\:O\',\ t.1hlc, \1~11\/5)/,\/\(nror), ,\fS(1\)/,\L~(crror), and MS(H)/:vtS(crror), arc used to
test the prcseme of interaction, main effect of A, and main effect of H. The proh,1bility
I fl R l le. 0 R .VI 0 IU
I I \' I I S
I All I I ; . I
A NOVA 'la hie jvr the I wv-1-actor h1aonul /.' pcmncnt
~ollr(l'
\kJll ~qUML"
.\f.\
"
SS(lactor /\)/(a
I)
I)
(a
'iS
!actor \
11
lt1((tll"
lnteraLt1<111
IH
~qu.11n
iStintt..'rd . . t1011
1)(11
II
r rro1
)\(L'rror;
S)I total,
I< Hai
l'ru,Jbd1I\
\'.dl!L
l>l'i(l'l'L'' <>I
heedornd/
Sum ol
Pl
\'ar1luio11
ub(n
I)
I),
\.1S(fa,tor !\)!
\1'i L'rror ,
.\ht l.tdlli Ii
\.1\( L'rror)
\/)( llllL'!.lcllOll;'
\1'>( L'rrnr)
Proh valul'
l<1<.tor 1\
P1t1h \.tlul
tador Hi
dhll
value., of the !-'-ratios, which computer programs usually list 111 the lasl column, exprc",s the
statistical significance of these components. The probability \alues Ire obtained from the
I- distributions as follows:
l )( b
P[F(a- l,ab(n
I, a/!(11
l ), 11b( ti
I))
>
,\JS(A8)/ M.S(error)
,\f.'i(li)/M'i(crrnr)]
IH' LUil
sidcrcd for the 2-level desigm in the previous d1,1pters. :--Vla111 cfkcts Mc' d1s11laycd h1 g1aph
111g IL"1(1oml' aver<igcs .1ga111st the le\cls offc1ctm'> \ .111d H. \;, thn the 1\:\( l\\ c.ikul.it1ti11'
nor the main effects and interaction plots need to be LMri,d out h1 hand as commonly
JI
,1il.1hle statistical software such as !\1initab include L<>ll\l'ilil'llt fu11ct1ons that perform
7.3
x r I. R I \I I s I s \.\' I 111
n1
R FE () R M () R I I I \ '
r l. s
17l
J{esull\ of a .1 f.1Ltorial e\JWriment in a stud\' on hnw to best hake a c:dke arc giH'll in
rah le 7 .2 . 1\ com merc1all) a\'a il.1hlc cake 1111xtu re 1s baked by varying the baking tern pc rat u re
(factor A) and the baking time (factor fl). Three different temperatures arc chosen: le\ el 0
represents the temperature recommended by the instructions on the package, whereas le\' els I and+ I represent temperatures 10% below and 10% above the recommended level.
Three different times are chosen, with level 0 representing the recommended time setting,
and levels I and + I representing the time settings I 0% below and I 0% above the recommended level. Three indepem1ent replications arc made at each temperat 1 1re and time comh1nat1011, and the fi111-,hed caki.-, arc tasted hv exper ts dnd r<lted on a 0 6 quality scale.
I he t\0.0\'1\ uhk I'> ... hm,n 111 I able 7.1. J'hc entries in the table can he calculated lrom
tht' cquatior1' in 'iell1011 7.2. \lthnugh thi-, 1s not partiurlarly difficult, the c1lculation' are
tediom gi\ en th,1t the\' invnh'l' calculatmg factor A averages, factor fl averages, and average
rating.'> for each of the 11/1 lallor il'vel combinations, as well as the calculation of the various
'>Ulll'> of square,. lortun,1tcly. computing software i-; available for the c;ilculntions. Stat1sti
c.il o;oftwarc '>uch ,is \linitah and JMP includes routines that compute the Al\'OV1\ table
,1nd urnduct tesh of s1g11iflc,1:1Le for main effect\ and interaction. r\]] th .it is needed i-, fc>r
the user to enter the inform,1tion into a spreadsheet; responses rnto onL co lumn (sa;', col
umn I), the levels of factor A into a second (wlumn 2), and the levels of factor H into a third
(column 3). Jn thi'> c\ample, there arc 27 rows. The first row contains 0I 1 hc response),
(ic\'cl of,\), .1nd I ilc\cl of ill. fhc -,eLond ro1v rnntains 0, I , I; ... ; the last (27th)
rt>I\ LOnta111., 2, I, ,rnd I. I he pMt1cular order of the rows docs not matter.
There j, q111tL' a str<>ng intt'"adion effcll between time and temperature. The F-st1tistiL
t(ir
mtnad1or1, I
.\ " I I
'2
L11kc Rrlfings
Jcrnp:r.11urc
Tirnc
Response
0, 0, 'l
0, 2, 4
()
I
I
0
I
I
0
4, 5,6
0
2, .\ 4
3,6,6
()
I, 2, 3
4,5,6
I, 3, 5
0, I, 2
TABl.F.
7.3
174
-~--
--
- --
--
lnteractton diagram
.. , '
' '
',
'
' '
' ',
...
' . / /
/'.
/
co
"
3 -
/
/
"''"'
"'
...... .
Bak111g ti rne
-
Figure 7.1
Temp
~emp
- - l'elllp - 0
- I
cfotribution; the probability value (0.001) is small, much smaller than the usually ,tdoted
significance level of 0.05. The interaction diagram shown i11 hgure 7.1 demonstrates the
naturT oC the interaction. For a temperature lower than the recommendation (kvel
I),
the quality of the cake is incre01sed by increasing the bah1g time. For a higher than rern111rnended temperature (level I ), the baking time, not surprisingly, should be reduced.
At the recommended temperature, quality sutler-, Lt' the bakin,,: time is either lower or higher
than the recommended value. The cake turns out best if the 1e:or11111e11datiuns for time and
tempnature are followed. But, the cake mrx is "rnbw,t " in the senoe th-it two other settings
(lower temperature, but longer time; and higher temperature, but shoner time) are equally
acceptable. A cake baked at a lower temperature has to be baked longer, whereas a cake
baked at a higher temperature requires a shorter baking time.
7.4
1 -
. 0 -= . 1). This
amounts to testing whether the three differences between the group means and thl' overall
mean. = (.
that is,.
1 -
. - /.lo
. 1
. = 0. Since
deviations from a mean always sum to zero, two zero differences arc sufficient to establish
thdl the three means are the same. This fact explains the twu degrees of freedom in the main
effects lest in Table 7 .3. The restrictions among the means tha arc tested by the 1---tesl (here
there are two,
1 -
= G and
1 -
_j
I \ I' I J(J
~1
I '- JS \\'I I
11
I A< I 0 H.., A I
I II HI f' 0 R M Cl Jl I' I I VI I S
J 7S
I hrec group nw,lll.., L.tn he repn:sented 111 111.rnv different, equivalent wavs. One c.in expres.., them h} their dil!L'1Tnce.., from the over.ill a1cragc (this was done ahmc). Or, one can
them h1 their difkre1i-e.., lrom the me,lJl ofa reference group (L'.S., the group \\'here
the lallor 1s at its Im, lei cl i. Or, one L.in ex pres . . the means 111 a \.vav h,11 helps u-, ll".,t
whether the function.ti relatio11ship helll'een the means ,ind the continuc>11s factor is linear.
l'\f1i'l'"'
~tatist1c1an-,
()
1. The p.i 1, the mean of the low group, becomes the standard against which the
rameler /3 0
other means arc compared to The parameters {3 1 and (3, arc the diffcrenn::s between, and
"and this standard. t\ lest e>fthc cqu,tlity of the three means amounts to testing whether
0. 1lcre the lo\\ group 1s taken as the reference group, but any other group unrld
/3
/3'
1r
ing the equations th,11 relate tl1e means to the alpha coefficients. The alpha coefficients have
a niLe interpretation if the factor is continuous, and when the distances between the levels
<Jrl' meaningful. The LOeffic1cnt a
( 1 -t- 0 -+ 1)/3 is the ovcrall 111ean, and a and
n , represent the linL'.lr and q11.idrat1c components of the relationship al'1ong the mean response and the coded foctor icwls. !The rnded lel'cls (
"PlLcd k1cb 111 till' origin.ii n1L'lriL, Sil)', tempcr.itures ;1t 1,500, 2,000, and 3,000 degrcL";. J
One ,,in -,n th1.., .i.., lollm'' :\s.11111c ,1 qu,1dr.it1L model between the mean llld the factor In
cl,,, fl 1
11
11 t
/!I c, 11
c1,111d 1
1 ,)
(11
h
2
(c1 1 /,
while the coeffilicnt
(a
/, 1 c) 6
c _,rs
a,
ff, -
I, 0, I) gi1t11 hy
/1
pro~1ort1011.1I
2 11 I 1)/6
[(a+ h + c)
2(a-+ 0-+ O)
to the quadratic component. Factor A has no cfkll 1f
<r
, )
cl
( 1
0.
l he second reprcsentatior: of the group means in terms of linear and quadratic com po
nents is useful when describ11g the relationship between the mean response and the levels
of a continuous factor. lt has the additional advantage that the column.' in the matrix that
relates the means 1, 0 , 1 to the coefficients a,,, a,, a,,
I
()
- I2
I
l[aoj
IY 1
12
are arthngonal; vnu can check that all pairwise vector products formed with the three columns of the matrix arc zero. JThe vector product of the first two columns is (I)(
I)+
176
( 1)(0)
(1)(1)
l.loV~I-,
+ ( 1)(
+ (J )(1) = O; and
2)
of
the second and third column, (-1)( I)+ (0)(-2) + (J)( I) -= O.j The columns (- l, 0, l)
and ( l, -2, l) are known as the orthogonal linear and quadratic polynomials for a factor with
three levels.
Here, we have discussed the situation when each factor has three levels, and we partition
the effect into a linear and a quadratic component. Orthogonal polynomials for continuous
foctors with 4 and 5 levels arc shown in Appendix 7. l. tor 4 levels, une para111eteri1es the
4 means in terms of an overall mean and linear, quadratic <.llld cubic components.
COii'
nal linear and lluadratic polynomials to partition the sum ol squares of each factor into a
line<Jr and a quadratic component, each with one degree of freedom. This becomes useful
for testing whether the rel<.1tionship lwtwecn the response and the levels ofa luntinuuus Lil
tor Is linear.
h>r that, we construct columm for the linear and quadr<.1til compoI1ents. The column of
the linear compoI1ent, A(lin), is assigned the value
when A is at the middle (OJ level, and value+ l when A is at thL high level; these Jn:: the cuefficients in the lined polynomial. The column of the quadratic component, A.(quJ), is assigned the coefficic.nts in the quadratic polynomial; the v<.1lue f-1 when/\ is either at the low
or high level, and value -2 when A is at the middle level. The same procedure is used to
construct B(lin) and B(qua), the linear and quadratic compunents (or factor 11. The four
columns are listed in Table 7.4. The length of these columns is determined by the number
or observations (runs).
The AJJ interaction sum of squares h<.1s four degrees of freedom. Abo this sum of squares
can be partitioned into four orthogonal components-the linear by linear, the linear by
quadratic, the quadratic by linear, and the quadratic by quadratic interaction components.
TABLE 7.4
Regression Formulation of the J2 Factorial /)esign, with Linear and Q11udrutic Main and Interaction H/{'ects
Rl-:GKESSUR C()! UVIS'i
UE:-iHiN
FAI
l I RS
MAl:'\J Fl'FECT
MllJN ElFH'T
OJ' lACTOH A
Of FACTUH H
(Jin)
(qua)
B
(1111)
I!
(qua)
Al!
(1111
x !111.) 11111
All
II
I
-I
0
()
l
-\
\)
0
0
l
0
0
I
-l
\)
Ali
-I
I
()
AlJ
\
2
II
II
0
[)
II
\
11
I
-2
ll
'i
ll
-2
II
I
_j
177
We create the 1..olumns i\H(li11 X Jin), AR(lin X qua), AR(qua X Jin) and AR(qua X qua).
I he clement.-. 1n ml um n, \H( Jin X Ii n) arc the products of the elements in A (Jin) and fl( Ii n ),
the clements 111 the L<1l1111111 i\H(lin X qua) Ml' the products of the clements in A(lin) and
R(qu.i), and so on. These rnlumm arc also shown in Table 7.4.
\\c u1ns1der a regression rnodel that relates the response vector to these orthogon,1' columns. I hat is,
\'
f:311
-i
/3 :\(Jin)
{3 A(qua) -+ {3 H(l1n)
{3 1R(qu;1) !-
/J; AR(lin
Jin)
I-'
\t.111d,1rd regre-,,1on -,oft\\'.ire Lan he Lhed to oht;1111 till' e<;1imates and thL' 1cgression sums of
'>l]UMe'> t h.1t ,1rc ex pl.11 ned 11\' L'.l<..h of the-,e reg1T">'>Or Lolu mm. The ort hogonalitv oft he re
grc.,.,or uilumn.'> has 1mport.111t LOmcqucmes. \.\'c pointed out in Append;x 4.) that each regreso,1on e'timate .ind the rq~e-,s1on sum of squares of each column arc not affected hv the
1)rl''>ellLL' of other componerh in the model, ,111d that the individual regression sums of
-,quare' arc additi\l'.
l'.\l'rLise 7 in Ch.1ptn 4 'h<m., how to c.ilL tilate the regression sum of squares that 1s
explained hv a single- column, sav, x with entries >: 1, .x,, ... , x.,. The regression sum of
-,qu.irc' "g1n11 h) ~\U(x)
~ \ l' ' , ~ >:, . < ;1\en the respono;c<, 1 1, y,, ... , y,,, 1t ''easy
to calculate the regression su11 of squares for each of the rcgrcssor columm in Table 7.1. I-or
example, the column A(lin) is used to calculate SSR(A(lin)), and the LO!unrn A(qua) i-, used
to calculate SSR(/\(qua)). Tk' regression sum of squares of the main effect of faLtor A,
SS(A ), which is calculated in Sections 7.2 and 7.3, turns out to he the sum of these two LO!llponents: SS(A) SSR(A (lin)) -t SSR(A (qua)). This shows how much of the sum of squares
of the main effect of A c<1n he attributed to the linear a'isociation, and how much to the
quadratic one.
I he <,;Jllle procedure L.lll he .1pplied to the main effect of H, as well as the interaction between!\ and fl. We hnd that SS(H)
SS/~(H(lin)) -+ SSR(H(qua)), and SS(Intcraction)
,<,,',U(1\/i(lin X Jin)) r S.'iR(1\H(lm X qua)) t SSU(AH(qua X lin)) + SSR(AH(qua X qua)).
The following cx,1mplc m 1kes use of this decomposition.
the -,ale-, of 'evcral '>tore brand products. Products with stable sales (i.e., ro trends) and lim
1ted seaso11,il1t) were sclc<..teLl for this stud>" I !ere we focu., on the ,,iJes of the store hrand
apple juice.
i\ complete 3' faltorial experiment is carried out to assess the effects of price and d1'pl.1y.
Three price levels .lrL' cons1dned: The cost price (low level -1 ), which is the cost to the
-,upcrmarkct; the rcgul.ir pri'-e (high level +-I), which is the recommended retail priLe to
cw.tomer-. as listed 111 the rL'g!onal warehouse price manual; and the reduced price (le\'l'i O),
which is the price h,ilfway l1etwecn the recommended retail price and the cost to the
supermarket. 1 he three display Lhoiccs arc normal display space (level 0), as determined at
178
TAHLI'
7.5
()
4U 8
44.2
91.5
\2.0
S0.2
85.7
9.0
24.9
2'1 9
1S. LJ
ll
0
()
()
il.'I
\4.9
:i
)l)
18.U
')
play (level 1), which amounts to twice the normal display area,
With three display options and three price leveb, the de~ign ~,dis fo1 nine treatment combin<Hions, The design is replicated once, Eighteen weeks arc needed lor this stud)', and the
time arrangement of the experimental conditions is rando111i1.cd. Furthermore, each experimental week is preceded and followed by a base week (which is a week where the product
is priced at its regular price and displayed at the normal shelf pmition), For this reason, and
because of holiday weeks that are not used, the experiment spans roughly 40 consecutive
weeks. The response is the number of units (divided by 10) that sold between Wednesday
noon and Sunday 9 p.m. of each experimental week. The de::;ign and the observations are
shown in Table 7,5,
The interaction plot and the two main effects plots arc given in hgure 7,2, The interaction plot reveals very little interaction because the lines connecting average sales for different prices but from the same display are almost parallel. The absence oC an interaction
makes it appropriate to study the main effects. The sales effects of both price and display are
(roughly) linear.
The graphical analysis is supported by the AN OVA in Table 7,6. The sums ofsquart'S fur
display, price, interaction, and error can bt' obtained from the n .. pressiuns in Table 7, I and
Section 7.2. The calculations are tt'dious, It is much simpler tu obtain the 1\NO\'t\ through
the
.~1initab
"ANOVA .>Two-Way" command. For this, une enters the data into three col-
I,
I; thL Ja-,t
>
level~ . .ire
components automatically For th,1t one needs to construct the spreadsheet shown in
Tuble 7,7 , It contains the regressor columns of Table 7.4 and the respomes of Table 7.5. The
data for week l are in rows 1-9, while the data for week 2 are in rows JU -18. Rows I 0 -18
are identical to rows 1-9, except that the responses are different.
Below, we illustrate the calculation of SS/( (D( qua)), the regression sum of squares that is
explained by the quadratic component of display. lJ(quaJ is the fourth Lolumn in Table 7.7,
--='!
1811
I X I' I I! I
111! 11
I' i\ ll I l 7. 7
Nc,grnrnm ron1111/<1tH111 \'a/n of \/'/'It flu,'
IH~l<1'.'
I A<
I ll
I~
PHIC I
ll!Sl'I A'I
l'I{!(./
"')
/)
/J
/)
l111J
ljUd I
/)/'
I'
quc1
/'
t'lm
lm
itn)
/;/'
qua)
lilll"
/I/'
(ql .. 1
JJ/'
lin1 (qu.i 'qu,11
Kt''i~'utht'
I(.;
II
II'
t)
YI "J
II
\)()
II
()
()
II
()
II
ll
II
II
.,
I)
()
(J
()
()
()
\1 I
ti
II
I)
II
()
l}\)
,() ~
I~
1.1
~J
.\I 'I
'111
I )H5.4
2 l \.{l
~h
1'.J.U
'I
and it> ele111e11ts are used 111 the following LakulatJon. \'\'e l111d:
(t)
+ (
2) 2 t (1)- c (I)'
~ (I) (
\V,
l(I0.8)
2)!
2 ) 2 ~ (I ) 2 t (I ) 2 t
t-
I (55.9)
I (SLJ .3)
I ( liUl)
2(2! 9)
2('i0.2)
2(5.\.5)
1(.\4 .2)
..j.
2(.2'1.LJ)
(!)
2) 2
(!)'
+
I. 70 ,:; )
2)'
(I )
1(8'i.7)
(1) 2
(I)'
l(LJ.OJ
I (\I .4)
2( q LJ)
I ( 31 .9)
95
.)S/\(/J(qua))
"'
..,, ,
_x,y i21r ..;.,x;
(95 ) 2/\6
250.7.
This is the number that is reported at the bottom of'!Jblc 7.7 and in th..: r\:-.10\'r\ in T,1ble7.6.
The -,ums of ;qua res of the other lOlll ponents Lan be obtained 1n a '>1111 il,1r lash ion.
The F-statistic for interaction in !able 7.6, } ~ ( IJ0.1/4)/( 1,079.7/LJ) 0.27, 1s small and
insignificant (probability value 0.889), conhrrrnng what we had seen in the 1nteract1011 plot.
PriLl' (F
10.94 with probability value 0.004) and display (I~ . 19.32 with probability value
0.00 I) are both highly ~ignificant. furthermore, the AN OVA ,hows insignificant quadratic
components for both price (f
(213.6/1) /( 1,079. 7/9) - I. 7l and probability value 0.215)
and display (F
2.09 and probability value 0.182). We LOnLlude that the rel,1t1omhip' be
tween sales and price and between sale~ ,md di~play are linear.
7.5
I{
S A I
l HR FF. 0 R M 0 R !'. I I VF LS
IR I
Two -level and 3-levcl designs can be combined. Consider, for example, the experiment
where factors A and H have two levels each, while factor C is studied at three levels. We call
this a 2 2 3 1 factorial experiment.
Such a 2 ' 3 design wa~ used to improve the consistency of a bottle -filling operation
(Montgomery, 2005, p. 184 ). Process engineers controlled three factors: the operating
pressure in the filler (factor/\), the number of bottles produced per minute (line speed, factor ll), and the percent carbonation (factor C). All three factors arc continuous. For the purposes of this experiment, the engineer selected 2 levels for pressure r 25 and 30 pounds/
(inch) 2 j, 2 levels for line spe"d (200 and 250 bottles/minute), and 3 levels for carbonation
(10, 12, and 14%). The response is the (average) deviation of the actual fill height from the
targeted fill height. Positive deviations represent fill heights above the target, and negative
numbers are fill heights below the target. The average deviation is calculated from all bottles
within the same production run. The experiment was replicated once.
The de;,ign and the resulting data arc given in Table 7.8. The levels of the 12 runs arc
shown in columns 2 - 4. The design is orthogonal; you can check that each level combina tion of any two factors is studied with the same number of runs.
With the d.it.i from such ,'n experiment, we can obtain the sum of :,qua res of the main
effects orfactors A and R (wi th I degree of freedom each), the main cffcd of factor C (with
2 degrees of freedom), the !\!i in tcraction (with I degree of freedom), the AC and HC interactiom ( c.ich with 2 degrees of freedom), the A RC interaction (with 2 cegrecs of freedom),
,rnd - since there arc replica< ions - the .,um of squares of error. We have not provided the
detailed calculation equations for the sum of squares in the 3-factor experiment, as we
expect vmr tn use computn :,nftware for the computations. The ANOVA table in Tahlc 7.9,
without breaking down the 3-level factor C (carbonation) into its linear and qu;1dratic
components, can he obtained from the Minitab command "ANOVA > General Linear.
Model." (Herc we have 3 factors, more than the 2 factors allowed in the Miniti!b command
1
TA ll l.L
7.8
Pressure
(l.1e1nr A )
( foc1nr H)
Carbonation
(foctor
- ]
- 1
- l
- I
- I
- I
- 1
0
0
0
0
I~ u 11
.1
5
6
7
-]
-]
I
- I
- I
JO
11
12
- I
I
Response
( dcvi.111011)
-3
-I
- 1
0
0
- I
I
0
2
6
5
.I
9
6
II
5
4
IO
-
---
182
EXPERIMENTS WITH
~ACTORS
-------- ----
---
TABLE 7.Y
p
F
MS
SS
DF
Source
1 45.375 45.375 64.06 0.000
A: Pressure
1 22 . 042 22. 042 31. 12 0.000
B: Speed
C: Carbonation 2 252.750 126.375 178.41 0.000
C(l in)
1 248.062 248.062 3 50. 21 0 . 000
6.62 0.024
4.687
4.687
1
C(qua)
1. 042
1. 47 0.249
1. 042
AB
1
2.625
3. 71 0.056
2
5.250
AC
7.15 0.020
5.063
5. 063
AC(lin)
1
0. 26 0.620
0.187
AC(qua)
0.187
1
0.41 0. 671
0.583
0. 292
BC
2
BC(lin)
1
0.563
BC(qua)
0 .021
1
0. 542
1. 083
o. 76 0.487
ABC
2
ABC(lin)
1
0.063
1. 021
ABC(qua)
1
Error
8.500
0.708
12
Total
23 3 36. 62 5
"ANUVA > Two-Way. " The command "ANOVA -, Cener,d Linear ,\.1odcl" pruvidc-, the
ANOVA for the factorial experiment with more than 2 factor>. We must specify three col umns that identify the levels of the 3 factors, and request the sum of squares contributions
for the main cffecb of A, H, C, and the interaLtions A13, AC, {j ( , and AliC. )
For the additior'ial decomposition into linear and quadratic components, we construct
the spreadsheet in Table 7.10. lt contains orthogonal polynomials for the main effect and the
interaLlions involving the 3-level factor C. Its effect on the rc;ponse is expressed thruugb
the columns C(lin) and C( qua) that use the coefficients of the linear ( - l, 0, l ) and the quadratic ( I, - 2, 1) polynomials. The interactions that involve the 3-lcvel factor C can be parameterized similarly. The columns that represent the linear and quadratic cumponents of the
2-factor interactions AC and BC arc obtained by multiplying the elements in columns A and
B with the elements in C (lin ) and C (LJLia). The columns representing the linear ,111J LJUadratic
components of the 3-factor interaction are obtained by multiplying column A with Ji(.'( lin )
and /3C(qua). The columns in Table 7.10 are of length 24 as there arc 24 observations. The
procedure employeJ in the previous section is useJ to obtain the regression sums of squares
that are associated with each of these columns. The regression sums of squares are listed at the
bottom of the table, and they have been added to the AN OVA in Table 7.9.
The 3-factor interaction (ABC) and the three 2-factor interaLlions (AB, AC, BC) are insignificant. The largest 2-factor interaction is between pressure and carbonation (AC, with
F ~ 3. 7 1 and borderline significant probability value = 0.056 ). The lines 111 the interaction
plot in Figure 7.3 that connect average:, for different carbonatiun (CJ arising under identical
pressure (A) are almost parallel. This is another indication that the AC interaction is
negllgib\e. The main effects of a\\ 3 factors arc highly significant. Increased pressure, speed,
and carbonation increase the average deviation from the target Ii\\ height. Since carbonation
is studied at 3 levels, we can assess whether the effect is linear o r whether a quadratic component is needed. The main -effects plots in Figure 7.3 shuw that the effect of carbonation is
IM
on
MORI I EVU.S
10
8
~
c: 6
:;;-
,,
1:'. 4
4-
,, ,,
,, ,,
,,
,, ,,
,, ,,"
,, ,,
,/'
..
c:
"'
;;;:"
ll
--,----
2
()
( Llf[HJlldlJU!l
- -
Prl's~un'
-I
8
6
g ()
:;;-
~-~---~---~-~-~~---~----~-()
1:'.
4-
Carbonation
2
0
-1
Figure 7.3
roughly linear; this assessment is confirmed by the very large and highly significant linear
component of factor C (F = 350, with probability value= O.OO!J). The quadratic component
or C is much smaller and not nearly as significant (F ~ 6.62 did probability value = 0.024).
7.6
The number of runs in 3-level factorial experiments grows quite rapidly with the number
of factors k. Even with only 3 factors, 27 runs arc required. Orthogonal fractions of
3k
factorial experiments c.rn be constructed. This reduces the numbn of runs but, depending
on the particular fraction, confounds certain main and interaction effects. We describe a few
simple
3k-p
and
r
'
185
TARLE 7. l l
r.cvcl Level 0
I cvel I
!actor ,\
l .cvcl - I
J .evcl 0
Level I
n rt
hy
h {3
(()'
Cl
( /3
ay
h IY
cy
{3
TA81F 7 .12
7'/u
l ''
' and
l '
PF S I(; N
/J
I)
()
I)
()
()
()
A T1 2 fraction:il factorial in 9 runs Ciln he constructed from the Craeco-Latin square design in Table 7.11. The dcsig11 involves 4 factors (A, R, C, and[)), at 3 levels each. The rows
and columns in this table represent the coded factor levels ( - 1, 0, + 1) for factors A ,rnd El.
l-aL!or levels for r;Ktor r; are given by the Latin letters a, h, and C, where a represents level
- I, h represents level 0, and c represents level l. The factor levels for factor[) are given by
the Creek letters a, {3, y, where a corresponds to level - l, f3 corresponds to level 0, and y
corresponds to level l. Graeco-Latin square designs have the property that each letter ( Latin,,
as well as Greek) appears exactly once in each row and each column. Also, each Latin letter
appears with each Creek letter ex;ictly once.
The 9 runs of the 34 2 fractional factorial design are listed in Table 7.12. The first two columns list the runs in a full 3 2 factorial experiment The levels for factors C and D result from
the (;raeco - Latin letters in Table 7. I I. For example, the first run wit\-- letter combination
a a is described by (A =
1, fl =
l, C = -- I, D = - l ); the next ru n with letter combi0, Ji i , C - 0, J) ~ + l ); ... ; the last run with letter combination
nation hy is (A
ho: is (A = + I, B = + 1, C = 0, [) = - I). The 33 1 fractional factorial design in Table 7.12
is obtained by omitting the factor D. Both design~ are orthogonal. You can check that each
level -combination of any two factors is studied with the same number of runs (name! y, one).
The 9 runs oft he 3 1 2 cks1g11 in Table 7. 12 allow us to estimate the sums of sq uarcs or :ii I
four main effects. h1ch sum of squares has 2 degrees of freedom, because comparisons of
3 levels arc invol\'cd. Since the design is orthogonal, one can obtain the sum of squares of a
factor by ,weraging over the other 3 factors. (Minitab's "Stat > ANOVA > General Linear
Model" provides the 1\NOVA table. Replications are needed to obtain an error sum of
squares and test the significance of the four main effects.) Of course, the main effects are
186
l!IJ(Er OH MOl{J
TAfllf-
A 34
l. E\'L L'
7 .J3
13
- J
- J
u
lJ
I
()
I
I
0
- I
0
0
- J
(J
()
I
I
0
0
0
()
(J
0
0
0
- I
0
ll
0
l
I
- I
0
I
0
-]
-]
- I
- I
()
- ]
0
l
I
0
0
0
- 1
{)
confounded with 2-factor interactions, which limits the useful11ess of such dl'.signs to initial
scn.T11ing experiments, where the aim is to identify important !actors for further study. A
34
sign that docs not confound the four main effects with 2-factor interactions. This resolution
IV design is shown in Table 7.13.
Similar to 2k
P fractional
1'
down a full 3-level factorial in (k - p) factors and generating the remaining p columns of
the design matrix from specified gennators. The generators imply a defining relationship,
and from the defin ing relationship, we can obtain the confounding structure. However, the
steps Me more complicated in 3-level designs, because they in volve modulus-3 arithmetiL.
Modular arithmeti'- is a system of arithmetic for integers, whne numbers "wrap around "
after they reach a certain value-the modulus. Two integers u, b arc said to be thl'. same
modulo 3 if their difference is divisible by 3. In this case, we write u = /; (mod 3) . Consider
the sum S of the (coded) levels of the 3 factors A, 13, and C in the full 3 ' factorial design in
Table 7 .13. The sum can take any one of eight possible values, ::, = - 3 ( if all three factors
are at - I), - 2, - l, 0, I, 2, 3 (if all three factors arc at t I). Wl' huve selected the level of the
fourth factor Das D = - 1 if S = 0 (mod 3); thut is, if Sis 3, 0, or 3. We have selected
that is, Sis - 1 or 2. These are the levels shown in Table 7.13.
7.7
l)e,igm that stud\ LOnt1nuot1' faLtoro., at more than two levels arc useful if one wishes to cx plon" the functional relatiomhip between the response and the factors. Such design' ,1Jlow
us to flt quadratic models, which will tell us the lcveb of the factors that maximize (or min
1m11L' ) the response. The litc-..1ture refers to this area as response surface analysis.
The 3-levcl factorial and f.. 1ctional factorial designs are useful, but other designs, such as
central composite and simpkx designs, have been employed. The central composite de.,ign
in 2 factors, for example, adds to the four runs oft he 2 2 factorial design [i.e., ( I, I), ( + I,
l ), ( l, + I), (+ I, + l)], a center point (O, 0) and four "star" points [i.e., ( - w, 0), ( w, 0),
(0, w ), (0, w l! . !"hi, design studies each factor at 5 different lcveJ,, n.imely w,
I, 0, l,
and 11', with w speuf1ed hy tlie experimenter. These ck'>igns are included in most c:-.peri
lllL'nt,1! design -,oft11arc prog1ams such a., M111itah ,rnd JMP. For more on these desigm, sec
!lox, l luntn, ,1nd I lunter ( 200'i ).
----- -
Xl'l Hl~ll s rs
WI
had .JLccss to a set of 11 icnds' Docs your proLedurc 'iatisfy the as:,umptions in :-icc tion 7.2 that justify thv analysis in :-icct1on 7.Y Can you expect tint the obsenations arc indcpc11dc111 ,111d ofcqu,d precision?
Excrci ~c 2
Con sider Case IO ( Jliggly Wigg Iv) from the case study a ppcnd ix.
t,1 ) hir each of thL' two prnducts, complete the A\iOVA table. Jn particular:
(a I )
~peed\'
the degr L't's of freedom and Lalculate the mean squares and the dp propnate /--statistics.
190
FXl'FIUM!SISWl 'lll
f,\(
(cS) Keeping track of the data: Should experimenters :1ave kept track of the number of weekly customers, and should they have analyzed unit sales per rns
tomer? Why or why not?
(d ) The experiment used just one store. How would you prou.'L'd ii 1ou h.id morl'
stores available? What about if the stores were of different s11l's?
(l')
Discuss the practical difficulties of carrying out such a11 exper1111cnL l)id thi:-. studv
do a reasonable job?
Exercise 3
Cons:der Case 11 (United Dairv Industries) from the case studv appendix.
(<I ) Consider the Latin square design for the test markets 111 l'an I, ~how that the Lalin
'>quare is orthogonal (i.e., "a111e number ol runs at L\lLh ll'\l'I u1mh111at1on ul ,u11
two factors ). Orthogonality in1plie., that one Lan 1gnorL' tilllL' .illll loL.1t1011 whl'n
obtaining the effects of advertising. Ignoring these two factor'>, the ob.,crvatiom tor
the four advertising groups are as follows.
0 cents (A): 7360
3 cents (B): 7364
6 cents (C): 8049
9 cents (D): 9010
13153
11258
13880
13147
11852
12089
11800
11450
7557
7900
8501
7776
Ave
Ave
Ave
Ave
9', 81
%53
10 5 58
10 346
Use the AN OVA calculations for the completely ranlom1zed one factor experi
ment in Sed1on 3.2, and calculate the sum of squares due to 1\dvertising. C'hLLk
that it is idrntical to the one listed in the ANOVA ta bk 111 Part I of the case study,
Repeat this for the factor City. lgnonng advertising <llld t1111e, obt,un the A:\( )\'A
for the completdy rnndomiLed one-factor experiment (with faLtor City), Show
that the sum of squares due to C:ity is th<: same as the one listed in the ANOV,\
table in Part I. I )iscuss.
(b) Compare the ANOVA results in Part I of the case study and the results in TJble
3.10 of Chapter 3. Discuss the sign1faanLe of advert1s111g. \\'hy are the two tests
diffrrent? Which test is more relevant?
(c) The analv'i1s in Part 2 is the same as the analysis in the randrnrn1ed rnmplete block
experiment
choice.
111 ~ect1on
Exercise 4 Orthogonal fractiom of) level faltorial design'> .ire disu1'>Sed 111 '-,ect1on 7.6.
\Ve U'>ed Graeco Latin square arrangements lo construct the.,c dt:signs. Thi.-. "trategy L,ln he
extended to factors with more than.\ leveb.
( 'onsider factors with 5 levels. Construct orthogonal fractional foctorial desigm that al101' you to 'itudy the main effells oUour 'i lcvel faLlor'> in ju st .2'i rum.
( .1 )
Write down a table of 5 rows .ind 5 rnlumns. Add Latin letter., to the .2 5 cdb.
The first row consists of letter., a, b, L, d , ,111d e. Rearra ige the letter., LVCl1cally,
s w
I'\ p I H r \ 11 i\ I
I I II I A ( l 0 H s
T Tl I Hr[ 0 H M 0 ll I
I Fv
Fr s
191
f~ir
Creek letter'> 1r, {3, y, fi, t:. But, now shift the letters to the right, and move the letter
1n the f.ir nghl posili<>n of .i row to till' for left position of the subsequent one.
Write down the 2'i ru11'> that vou obtain from this arrangement. Check that the
number'> of runs at c;ich factor level combination of any 2 factors arc the same,
m.1k111g thi.s <111 nrthof''1n.1I design.
(h) horn thi.s orthogonal design, you can obtain the sums of square, of4 factor'>. I low
m<rn\ degrees of freedom arc assoc1.11ed with each 'illm of squares' I low do vm1
obtain the sum of,qL .ires of error, .ind how manv degrees of freedom arc a'-'>OLi
a!L'd with it? I l1'>CllS'- lww vou would determine the significance of a main effcLI.
(L) .\.'>'>umc that 2 f.iLtor 1ntcractio11., .1rL' 1m1)ort<1nt. \\'ould thi-. affcLI the main crfccl.<,
an.1lv'>i'>'.
Exercise 5 l'cter \\'.'.\1. John ( 1990) dc'>crihed <Jn 18-run experiment that involves S controllable factors, each studied at three levels.
/l
I
\9.08
11.K~
>9 77
42. I 'i
2
_\
2
}
_,
I
3
I
2
3
4n.R2
4'1.05
46.28
!ti.RO
1'i.n7
\9,)()
12.hS
11 \ l
>9 'I I
l'i.21
l'i.'il
1.H7
41\.07
46.1,~
'10NOH llJCH,ONAI
19'
Section 8.2 discusses ,in interesting case study that involves a nonorthogonal design with
many factors and different nunhers of factor levels. Section 8.3 talks ab011t useful computer
software for design LOnstruct on and the analysis of the resulting data.
8.2 THE PHONEHOG CASE
\\'c would like to th.111k MMk \\'achcn, CFO of Optimost, for providing the data and for
sharing his modeling insights with us. Optimost (www.opl1most.com ) is d lcchnolog1 .ind
services company based 1n !\'cw York that specializes in comprehensive real -time testing and
conversion -rate marketing. \\'e would also like to thank Phil Nadel, CEO of Gulfstream Internet (the parent comp<1nv of Phone Ifog ), for allowing us to use this e<1se.
Phoncl log.com 1'' , owned 111d operated by c;uJfstrcam Internet (http:// www.phonchog
.mill ), is a suhsLnption-hasd service through which consumers get free long-dist,ince
phone calls. Particip.rnts sign up for the program, then cam phone min .lies hy visiting In ternet sites, cnlL'ring '11ecpst 1kc'>, or try111g new produL ts and serviLes t\s of Nmcmhcr
2001, the wsll'lll h.1d more ti lll l lllillion 11ll'lllhns.
'-,1nce Phone! log is a sub'>( "iption based service, 11 1.s important that its VVeb site Clll ,1t tr.1ct Lll'10lllcrs to '1gn up f11r its program. Phoncflog needs to learn which advert1.,ing
Lnp1e,, ()ffrr,, .ind 1111.1gcs 111l 1L,1 .sc the 'ign UJ' rate of Internet surfers who come in u111t.1L1
with its \\'ch site. I xperimc11 1 >arc run continuallv to determine mndific.ations to the currrnt ">land.ml (ba,Llinc ) '>tratl'g1 with the hope of improving the sign-up rate. This cc1se
foLusts 011 I 0 differL'nt .irt'<l' A the Phone Hog \,\cb site displayed in Figure 8.1: A-top and
A bottom 1111dge ol the headline, H (subhcadline), C (main copy ),/) (form), t (privaLy
copy ), I (submit button ), C (l1ow it works section), f-f (main image on right -hand side ), and
I (footer). The choices for cac h area arc described Ill Figure 8.2. ror example, there arc.four
different choices for the top image of the headline (A-top): the baseline showing pictures of
tive people making calls and the word Phoncl-fog written next to them, and three C\perirnrntal versions (picture of a hog's head peaking through the "O" on white background;
same pill me on hi uc hac kgro11 nd; pi ct urcs of rive people calling 1vi th Phone Hog's logo to the
right). There arc I0 choices for the bottom area of the headline (A-bottom): the baseline
"Let our advertisers pay for your long distance calls," and nine experimental versions. 1\rea
H describes the main image on the right-hand side of the Web page; in addition to the baseline picture showing a rntatirg flash image of a woman on the telephone three cxpcrirncn
tal pictures arc considered: a hog standing in a telephone booth cxtendi.-,g a phone, no im age ,11 .ii I, .111d ,1 p1Ll ure '>howi ng ,1 brief summary of the high I igh ts (yes Inn) oft he progra rn.
!I http:l/agency.op\imosl.com - PhoneHog.com
GJ@JLRJ
PhonHOG
... leuse complete
th1~
us
7~;;'~1";,,{f
a"ld
r1rst Name.
: -'
Jlfi ::_.: :
.i
1r,
'wd receJVe
;.r:;ir~._,.
lls on yoLll'
& p*3ase
~Jame
E-tna1I
, Retype Password
Password:
Zip (:ode:
IIowitwEJ
Mdk_
'i-
oJl l-::
:;~:G-:"--_,,::.
.~,.
- you're.';! member,
. 1_ lhng Card tIJ you.
~ Oon
Figure 8. I
,\
f4D
Internet
l~1llorial
e.\pll'i111ent th,ll com1Jer-, <1ll levcl co111h111alilll1.'> ol thL''L' I() l~1Ltors requires
1,658,880 rum. ()f course, Il IS llllposs1hJc lo 'ludy alJ
<.0111h111ations. Only <I small fraction of the faLlorial experi111l'nt L,111 hL Llln,idered. \n L'\
pertmcnt with ju,l 45 rum was earned out. The 1uns (wh1Li1 JrL' rckrrcd lo as "Lrc.1t1\n"
arc Ji,tcd in Table 8.2. Creative 45 uses the baseline level lor each of the I[) areas; the other
44 runs arc test versions where one or more baselines are changed to test ver,ions.
The 44 test runs were offereJ lo Internet users random!) <111d with equal probabilities,
while the probability of the baseline (creative 45) was four tIJ11es ,1., large. I he 45 different
creal1ves were made available over a 2 week period. During this period, PhoneHog recorded
then umber of dist111ct visitors lo the Phone! log Web site ( VJL ITS) and the number oft irnes
visitors clicked the subsequent page to nhlain additional information tel ICKS). The diLk
CLICKS IV l ~l'\S, measures the success uf a run. \'he results <He -,hown
\aim..,
h
- 111'
PhonPJm . . 1(11
1
-
~
l v~Jue;
b, 1 ~t
11nc
~ I
.-
.. -.
\'
'
Join .for Free and let our advertisers pay for your cal/S.
'
-,
h<>W! n
"~- i~ .
How to get !1f!lional advertisers to pay for your long distance ca/If,
.
;.
re It>~
' '"ho"'"''
figure 8,2
J\,1sclinl' and TL'st Versions for the JO rest 1\rcils on PhoncHog\ Web Page
In Chapter 5 we learned 'iow to find good orthogonal fractions of 2-lcvcl factorial designs. However, the situatio11 is different here as factors with many dilfcrcnt levels arc involved. This makes it difficult to find orthogonal fractions that have a rur;onable number of
runs, Other design criteria, different from orthogonality, must he used to dctermim the
fractions.
The experiment involves 10 categorical factors, with varying numbers of factor levels.
The numbers of factor level , are given in the last column of Table 8.1. Considering main
m
196
\~Alli
SubheadHne
Values
Earn free long distante 1:alls by visiting our advertisers' ites, entering
and trying new products and services.
~weepstakes
It's easy to earn free long distance calls by visiting our advertisers ....,l_te_s-=~-----0----1
Earning free long distance calls with PhoneHog is easy. Just dick, register or
If)' a ne" product. Then start calling for free!
Get free long distance calls by visiting our advertisers' sites,
--+~e~n_tc_ring sweepstakes and trying new products and seni_c_e.s_--~----_,
Earrd'ree long dbtanct' calb by Yisiting our ad,crtist"rs' silt''>. t"nttring
s~eepstakcs and trying ne" product'> and senkcs
"">bllll-'-k-~---
m:Copy
r. ...
Please complete this brief form so that we can email yot..r free calling card to
you and provide you with free membership in PhoneHo<JCOm A~ a member
of PhoneHog.com, you will receive mdny exciting oppo unities to earn free
long distanc_e ca_!l.s_o~ your new calling Cdr~. T:!:'.ank you
Please take a minute to J_o1n now
r .....,,
--------
Complete the brief form below to join Phone Hog today for free. We II
instantly email you a free long distance calling card. As J member of
PhoneHog, you will receive many exciting opportun1t1e~ to earn free long
---+-d-is_ta_nce calls on your new calling card. -----~~
Pleas.icomplete this br-ief form so that we can email a free Tong distance
calling card to you and provide you with free membership in PhoneHog.com
As a n.ember of PhoneHog.com, you will receive many Pxciting opportunities
--+--oto earn free long distance calls on your new calling card
_,,
Sia rt earnrng-freeiOrlg distance today. It take5Tess than 30 seconds.
!1'
Figure 8.2
Continued
etlechalone,weneedtoestimate 1 +(6
(4
1)+(6
1)+(5
1)+(4
1) + (4
l) i(2
1) -+ (3
lJ
+ (4-
1)-t-(6
I) +
1) =35e ffcd.'>.Co11s1deringcreat1\e+F45
as the baseline, the regression formulation of the main-effects model includes a constant, 5
indicator variables for A-bottom (uh4
1 for version 8, ub9
'iO
I for version I 0 ),
I for \crsion 4 ),
to
get better estin1,1tes of the main effects, but the cost of too many .idd1
I Value
OnP accmnt per pRr~on,
1r< t J'.Jdn11
f
rnM
r.,<-
'1/o/!I
l.-lst !\JAmf'
1"'1p Cnrli
I.
Cr>ndpr
0Mrlle ()FemalP
~11
'
fla;sw11rd
f
~J.1nt1
ir~I
Zip c rnlt-l
It'~'
(Jr1F
.trrl
t1r~t
Nan11
c~1c-/1,-u
RP<.l(N~nrc.
only 1-1/f-!,J<iR
Last Ndmo l
Zr . . onP
(,t'->nrler
0MMle OFemalA
EmaI
rni,.c;w11rtl
>-
I icq~-
N~mP
RHtypP Pas,>wuni [
c.~~r1dpr
OM.1IP 0FPmale
F If :-;.t J'Jcirn
Last Na mp
Retype Passworrl
rrk t
Zip cnG;
Grnder.
Mnle
Female
t f',Jdrru>
I.
f-1'
hgurc 8.2
ri.-111
W(lf
Continued
,111d rnmputcr programs to onstruct such designs 111 Appendix 8.1. JM fl, through ih u1s
tom dc;.1gn kature, 1s able to generate a I) optim<il design for a given number of rum (-,cc
Section 8.3).
Ohser\'c that the resulting design in Table 8.2 is no longer orthogonal, and that thi-. fact
has consequences for the subsequent data analysis. Also, note that the main-effects model
ignores all interactions. The main-effects interpretation could be seriously wrong if interactions are present.
198_1
_J
Values
By clicking below I agree that I have read and accept the Member,;!l!J2.1.
Privacy Agreement. I understand I will receive emails from PhoneHog with
opportunities for more free minutes
By du;l(1ng below I agree that I have read and a<.:cet ttie Membership & P11vacy A;n:e1r1en1
Hf_!_1ai1:-. lrorn PhomiHog with opportunities lot more tree rmnutes
t
l
.II
ur1llt:rs\;md
will fL"(;C1vl.l
Send me a free calling card! By clicking below, I agree that I have read
and accept the Membership & Privacy Agreemgnt. I understand I will
receive emails from PhoneHog with opportunities for free long distance
calls
Send me a free calling card! I agree to the Membersll1p & Privacy
Ag_reement. I understand I will receive emails from PhoneHog with
opportunities for more free minutes
Submit Button
Values
~
[
Jl>lllllc:k~l
I 1111agcl
1-igure 8.2
( .011t111uLd
------ -
Tu
- - - - - - - - - - -- - - -
---- -
""~l ,:r"i:.
----
----------'
-=-=-= - - -
oining PhoneHog 1s fast. easy and FREE Once you 're a member, we'll I
stantly em11I your free PhoneHog Calling Card to you
,
ake your free calls from any phone, anytime, anywhere
j
1111" ii "orh .:
1Joining Phor eHog 1s fast, easy and FREE Once you 're a rr:ember, we'll
I instantly errail your free PhoneHog Calling Card to you
Make your fee calls from any phone. anytime, anywhere .
it"" works:-
I
[
How
Ji n\\ If \\ nrl..
fll
How it works:
--
-- -
___
.Footer
----~
How it works
Figure 8.2
Member Login
FAQ
Contact
Cont1nuLd
A-top explains 9.7248 of the '. ariation, implying that the additional contribution of 11 top
is 0.8484. Or, to say this diffe1 l' ntly, A-top explains 0.8484 of the variation when it is .idded
to the model with A-bottom. The factor fl explains an additional I 0.9232 when factor fl
added to the model with /, bottom and A top, and so on. Sequential sum' of squares
alwan .idd up to the regre-,s1(1n '>ttm of squares that is explained by the largest model IVith
,11! factor'>; the rcgrc-,sion sun' of squares of the model that includes all factors is .\~ I
SS (crrnr )
101.2383
~ . 12 '.
93.8180.
I or ortl10go11,d dlsigm, th -,um of squ.ire., of ;1 factor docs not Lh.inge if other Lil tors
1s
c.111
rcgns-,ion ulntrihution (whid1 I'> unconditional of other factors ) to the total variahilit:. In
the 11011orthngcrn;1I situ.ition this 1-, no longer possible, a'> the contribution of a l.1Llor
Li1,111gcs with the factor'> that .ire alrcadv present in the model.
Au I I
8. 3
E
F
G
H
I
Error
Total
Seq SS
8.8764
5
0.8484
3
10.9232
2
6.3376
3
7.0235
5
27 .1948
3
1.3796
5
7.3321
4
4.8428
3
1
19.0596
7.4203
10
44 101.2383
OF
0.861413
Adj SS
8.6401
0.0146
5. 9607
9.3502
7.3990
14.9875
3.5688
5.6330
3. 6671
19.0596
7.4203
R-Sq = 92.67%
Adj MS
1. 7280
0.0049
2.9803
3 .1167
1. 4798
4.9958
0. 7138
1. 4083
1. 2224
19.0596
0.7420
2.33
0.01
4.02
4.20
1. 99
6.73
0.96
1. 90
1. 65
2 5. 69
0.120
0.999
0.052
0.036
0.165
0.009
0.484
0.187
0.240
0.000
R-Sq(adJ)
67.75%
rl.ic partial (or adiusted) sum of squares (Adj SS) measures the explanatory contribution
ofa faLlor as this factor is added last to the model. Fornampk, the adjusted sum of -.quarcs
offallor 13 (5.9607) in !able 8.3 is the rnntribut1on ot taLtor B ,hen 1t 1s added to the model
that dues not include factor B (i.e., the model with ti bottom \ top, ( through/ i. lleL,lLht'
of lhL nonorthogonality, the partial (ad1ustl'd) sum of squares is d1tkrcnt from the scqucn
ti.ii '>lllll of squMl''> (\\'h1d1 '' 10.92.~2 for faLtur HJ
l'hc degrees uf frecdu111 of eaLh f.tllor sum of squ,ltt's LOI Il''>Jllllld lo the nurnlitr of
indiL.tlor variable-, that ,ire needed tu rqirtscnt the leveb of th.it faL101. I or a faL tor'' 1th u
le\Tk the degree-. of freedom arc u
I. 1 ht ,1d1thll'll 111t\lll 'quart's arL' oht.t111nl b) di\ id
ing thl' artial (adju,tcd) sums ot squ.trcs by thl'1r dcgrl'c'> of lrted!lln.
A sensible strategy fur .tssessing the rmporlanLl' uf the vanolh fadors 1s lo (t>n.rn!t thl' 1.1/
.111sted mean square'> and their associated J.-statistics and probability \,iiul's. \\'c notiLl' that
the factors A-bottom, H, C, f;, and I affect the srgn-up rate. The weakest LOlllJ.loncnt among
these five factors is A-bottom, with probability value 0.12.
\\'e need to find out which levels of these faLtors are benefiLial. The main-cflccts plob in
hgurc 8.3 show that the best level for ;\-bottom 1s 4 ("~top pa1'1ng !or long distance Laib').
Level 3 works best for subheadline Ii ("Faming free long tfotanLL' L.dls '''1th Phoncl lug is
easy. Just click, register or try a new product. Then st<1rt calling for frcc 1"). The simple 1mi
tatio11, "Please take a minute to join now," works best for the rnarn rnp) (lcvcl 2 nffaLtor Cl.
The -.mall-font privacy line (level 2 off;) works better than all others, perhaps because 1t
does not highlight the "fine print." Level 2 uffactor I (providing link buttons to get tu more
information) is quill' successful in enticing viewers to visit the subsequent Web pages.
Hence the best factor level cornb111atio11 rs gr,en b~
(A bottom
4, H
3, (
2, I:
2, I - 2)
The m.l!n-effrcts plots were obtamed Js an option in Minitab' "~tat , 1\NOVA __, ( ;cncr,d
Linear Model" command. Minitab dislays the titted means from the main effects regression model with an intercept and 0/\ indicator variables for the ab-,cnce/prescnce Llf the
\lop
:\ hol
2(1
----~I
\~
l,
'
l'I
IH
l)
T -,---
T
I
'6
10
II
IH
-----=---
l/\~
..
I
-r
fi
Ii
--,
I
Pigure 8.3
---,---l-
I
"1 / - .j /
20
fl
'
various levels (3'.'i rncfficienh n total; sec Section 8.2.1 ). The fitted mcam Jre different from
the mcam that arc obtained b:,1 averaging over all other factors. However, the differences arc
usually minor as long .is the cit-sign i'i not too different from an orthogonal design.
1 he results of the main-efkcts regression model with the 5 identified factors arc shm,n
in !able 8.4. Substituting (A-bottom
4, fl - 3, C
2, r - 2, f
2) into the estimakd
cqllat1<1n lead' to the prediLted cliLk through rate
17 ..\671 i o."i.\.>h
1.5)14
22.56.
I hi' rtprcscn1' .1 ~; .. .,, llllJ'1t11cmcnt mcr tht cl1Lk through ralt' of till' current ha<>t' Incl
(creative# 45; cm
lo.65).
Comment. Lxpcnmcntat1cn at Ph on et Jog 1s a cont1nu.d activity that tries to impnl\e on
the current best rcwlts. Our best factor-level combination becomes a baseline for the next
set of runs. Also note that the w11incr among the 45 studied runs is creative# 32 . Its click
through rate (22.72 1Yo), and its factor levels arc quite similar to the predicted click-through
rate and the factor levels of the best strategy. It certainly makes sense to also include this
winning strategv (A-bottom= 8, B = 3, C = 5, E = 2, I= 2) as one of the crcatives 111 the
next wave.
There is a lot ofunccrtaintv in our analysis. There <lrc many factors, tre factors are categorical, and there arc many f.1ctor levels. Our assumed main-effects model may he 111..:orrcct. It may well be that noth1rig works except for one single specific corPbination of foctor-
lcvcls. If we arc lucky to have tnis particular combination as part of the exr.erimcnt, its result
will stand out as the clear Willner. A situation such as the one describ1:d here introduces
204
TABLE 8.4
Model Coefficients: Model withA -hottum, 13, C, E. and I (baseline 1 selected as reference level)
s =
0.938145
R-Sq = 73.9%
p
T
32.35 0.000
1.04
- 1.73
0 . 46
- 0 . 97
0.11
0.306 (largest)
0.095
0.645
0.338
0.915
2.50
1.08
0.018 Cl argest)
0.288
1. 87
-0.40
- 0.41
0.072 (largest)
0.694
0.687
3 . 56
0.47
0 . 09
0.001 (largest)
0.641
0. 931
5. 16
0.000 (largest)
R-Sq(adj)
61. 7%
complicated inter<ictions, and a mGin-effects analysis and it> i1'1plied hcst levels could he se riously flawed. So, ;tis a good idea abo to include the winne" :.is one ol the creatives in the
next wave.
Alternatively, imeractions may not matter and the result may be affected only by main
effects. furthermore, because the experiment represents a very small portion of all possible
level -combinations, it is very likely that the best combination snot part of the studied runs
of the experiment. In this case, our main-effects model and its implied best factor-level
combination will outperform the winner in the experiment , supporting a strategy of in cluding the best creative in the next waive ofexperimenb.
'fo be on the safe side, we recommrnd that both the best and the wi1Ining combinations
a1-c included in the next stage of ex pcri men ration. Our approaLh tu Lksign i 11g expertmcnb
is powerful but not foolproof. The key is to experiment. Missing something occasiu11ally or
including something that eventually turns out to be unnellsary will be of small conse quenle compared to the accumulation of insights over time.
8.3
Ust>l'ul tools for the comtruction of designs and the analysis of"the resulting data arc included
in must statistical software packages. Our discw,sion in this section focuses on two general
statistics software programs with strong process control a1;d design components : JM P
NONCll!IJIO(,()l\iAI
~oftw<1re,
20S
mental designs and the anahsi'> of the resulting data arc also available Design-EilSl' .rnd
lk'>1gn -I xpcrt, d1.'>tnhuted hv '>tat Fase (htpp://www.statease.com), '>hare many of the fc.1 t11re'> fnund in l.\IP .ind [\11n1t.1h.
8.3.1. Minitab
\ 1111 it.i h ma kl''> 1t L',lS\ for the u scr to construct l lcl'l'i fartonnl and (mctionnl /i1ctonr1/ de
szgns. The user enter'> the number offactors and is then presented with a list offull and traction.ii designs and their run '>.fl''>. After deciding on the number of runs, the user can LOil
.,truct the design l'1ther through default generators that optimi1e the resolution of the
de.'ign, or h\ -,t1pul.it111g >J'eLif1L gener.1toro,. In either '>ltuation, Minitah obtains the dL-,ign
Lolumns, d!'>pl.w-, 11 de-.1rnl ,1 random11cd arrangement of the runs. and indicate-. the
wnfound1ng patterns of the part1Lular fr,1Lt1on that is selected. If default genercito1-. .ire
used, \1111it.1b di-,plan the .idopted generator'>. The user can add center points, repliLate the
design, hloLk the l'\perimcnt by '>pecifying blocking gener,1tors, and modify the design by
considering foldovers. These, an he complete (full) foldovers where the signs of all factors
arc Li1,rnged, or foldon'r'> ofindi\1dual factors. Min1tab displays the confounding pattl'rn of
the modified design.
i\1initab can also construct Taguchi orthogonnl array designs for designs with factors at
three or more levels. rhcse designs include 3-level factorial and fraction,.! factorial designs,
Latin square and Craeco Latin square designs, and mixed -level designs su ch as an 8-ru1i de
sign with two 2 lcvel factors and one 4 level factor. However, Minitab dics not specifv the
1mpl1ed contoundlllg pattern' for these designs. The user needs to understand that the I <llin
.rnd Cr.1cco Latin squ;irc de-,i gns as well as the mixed-level design mentioned here arL' orthogon;il main-cffcll"> design' 1hat leave main effects and 2-factor interactions confounded;
refer to the di..,cu.'>S1on 111 'iect on 7.6.
:\11n1tah also comtructs rnpo11sc surfacl' 1frs1g11s (Lentral composite and Box - Behnken
designs ) and 1111xt11rctles1g11s ('1mplcx and lattice designs). Response surface designs arc use
ful for titting quadratiL models that arc subsequently used to determine the optimum rnn
dit1ons oft he respome. I or further Jiscussion, sec !lox and Draper ( 1987 ).
()met hL' design h.t'> been l arned out, M 1n1tah foLilitates an cffiL1ent ,rnalysis of thl' d.11.i.
l he analys1'> of 2 lc1clfi1rtorwl and fractional jnctorinl designs includes t";tirnatcs of the cf
feLts (as \\ell a'> the rcgressio11 coefficient'>, which arc one half of the effects), standard er
ror-, of the est1matl'll cfkLt'> 11 rcpliLations .ire a1.ulablc, and the ANO'vt\ t<1blc. In till' 1111
replicated s1tuat1on the user can omit model coefficients (e.g., by specifying a model that
conta1m only certain -.clcc!L'd main effects and interactions). Minitab combines the omit ted effects into an estimate of the variance and calculates standard errors of the cstimatl's.
Normal prohahilit\ plots and Pareto plots for assessing the importance of the effects, as well
206
as main-effects and interaction plots for assessing the nature of the relationships, are read ily available.
Programs for determining the sample size are also available in M initab. In uddition to
the sample size determination for 2-samplc comparisons of means and proportions (which
we discussed in Appendix 2.1), one can obtain the sample sizes in the one-way ANOVA
model (see Section 3.2) and 2-level factorial and Plackett-Bur:nan dt:signs.
8.3.2 JMP
):VIP is an equally versatile and useful software package for Lht: L011struction and Lhe
analysis of a variety of experimental designs. It can construct Juli jc1ctorial designs for
speci -
fied number of factors with different numbers of factor levels. The analysis of the resulting
experimental Jala includes the AN OVA table for testing the sig nificance of main and inter actions effects. JMP's output is very similar to Minitab's ANO\/A output.
The screening designs in JMP are particularly useful. )Mt- ,dlows the w,er lo construct
2-level factorial and fractional factorial as well as Plackett-Buman designs. For a spl'Cined
number of 2-level factors, the software offers a list of available 2 level arrangements with
their implied numbers of runs. The software allows the user tq rl'view J11d Lh,tnge thl' grn
erators of the fractions and the available blocking arrangemer ts. The software displays the
confounding stru~ture that is implied by the selected gene1 ators. Center points can be
added, the design can be replicated, the order of the runs can be randomized, and the design ur parts of it can be augmented by various foldovers.
JMP facilitates the estimation of user-specified models once the data from the experiment have become available. Similar to Minitab, the user can omit certain main effects or
interactions from the model, calculate a variance estimate by pooling the omitted effects,
and compute standard errors of the estimates. )MP has excellent graphical capabililits; the
prediction profiler allows the study of main and interaction effects, and normal probability
plots of the effects are easily constructed.
Scrt:ening designs provide the user the option to construct full and fractional 3-level, and
mixed 2- and 3-level designs. Similar to Minitab, )MP does not pruvidc 111!orrnatio11 011 the
confounding structure of the 3 -level (or mixed 2- and 3-level) fr<1Ltional facturi,d desigm.
JM P includes procedures for constructing and anaJyzrng rcipunse rnrfact' dt'szgrzs, 1111.>.:turf
dl:'sigm , and 'foguchi designs. It includes prugrani:. fur sampll' .'lze clcu1111111<11to11 111 the q11e
way /\NOV/\ situation, but it does nut determine the sample sit'.e in 2 level foLlori,d and
Plackett-Burman designs (as is done by Minitab).
Another useful feature of JMP is the construction of cuslo!ll dc:iigns. After entering the
number of factors and their levels (which can be either categrJtical or continuous), and after specifying the
d1~sired
model (in terms of its desired mair. effects and interaction com -
ponents), JMP determines the minimum number of runs th"t are needed to estimate the
model coefficients (this is referred to as the minimum solutiot1 ). lt also calculates the num ber of runs that <.)re needed when combining all possible ,factor lcvcb into a factorial
arrangement (this is called the grid solution). Furthermore, fw a given >pecitied number ot
runs (not less than the minimum number of runs needed) JMl' constructs designs that are
_ __
_ _ _ ____ _ _
N_o_NoRTHOc,oNAl
rirs1GNS
zo7
optimal with respect to certa'n optimality criteria (such as D-optimality). AD-optimal design rninirnizcs the dctcrrni11ant of the wvariance matrix of the estimated rnodel rneffiucnts, 11 guarantee' elfic1ent cstirnation oft he rnodcl coeffiuents; see Appendix 8.1 for further discussion.
8.4
\\'ch ,jtc dt",ign pro1itk'> an ide;1J area for applving experimental clesign methocls. In the
Phone! log Lase, the umsultar:t and the rnrnpany decided to test IO factors with each factor
,it 111,111: le1'l'1', leading to ,1 11011orthogon<1l design that required a more cornplex analys1'>
uimp,1red to 2 level fraction ii factorial or Plackett-Burman designs. Ir the online sLtting,
this was a semihle approach. Although it rneant creating 45 different W~h pages, doing so
was not prohibitively d1ffiudt or expensive. 13ut 2-level designs would be useful also in this
setting. Many factors could lw sueened in a large resolution TII design to identify the likely
few important ones. 1 hen supplcrnentary 2-level experiments could be carried out to test
.1dcl1t1011,d .dtern.11i1e' 1m kc1 f.!ltor<-., or the method' nf C'haptcr 7 could he med to te't
'ome of these fallor'> .it more than two il'vels while maintaining an orthogonal design.
( omputer ,oftw<1rc h<l'> 'i111plif1cd tlw construdion of suitable designs .111d the efl1uent
,rn;1lv"' of the l'L''ult111g d.11.l. ' omputcr ,nf!l\.ll'l' ;ivoid' tediow, hand (or t.1lcu);ltor ) L<llllputat1ons and 11 '11lljllitics th' urnstruction of graphical displays. Use the L<lmputer to your
.idvantage, hut do not tru'>t it s outcorncs hl1ndlv. Check the reasonahlen.>ss of the results, ,1s
result' depend on nrndcl assumptions that rnav he violated. There is no substitute for com'>L'll'>l'
lxccl has heuimc thL stand.ird computer soft war~ for business analysis, and you prohabl)' me 11. h;ccl 1s appropnalL' ,111d useful for simple analyses, but it is deficient in situations
that require a more sophisticated approach. rortunately, good statistical software packages
(such as ivlin1tab, ~ t\~, )Ml', ~!'SS, and R) arc available.
Much can be learned by studying a textbook, reading case studies, and solving end-ofchapter exercises. However, 11 has been our experience that to really master the material,
you must apply the methods 1n the real world. Cet out and experiment 1I )iscovering the unexpected is more important 1ha11 confirming what you know.
lllOll
portz111t. For example, <hsUllll th,H one desires lo study the main effects of three factor-,, each
with t\\'O levels ( I and+ I). It 1-, straightforward to write down an orthogonal design 111
1\' 8 runs, and among8-run designs, this design allows us to estimate the main effech with
the lca.-,t \,iriahilitv. 1lowevcr, .i-,sume th,it one wants to estimate the main effects from the
result-, of just/\' h run-,. An orthogonal design in 6 runs docs not exist. and one needs another criterion to determine the optimal design. One can select a D-optimal design and use
avail,1hlc cnmpulL'r o,oftwarc such as the custom design fc<Jturc in }MP ) to determine the
lcnk The note helm, on computer software explains the algorithms that ,1n: used hv the
prngr,1111s to find them. Using JMJ>, we obtain the following levels for the 6 runs:
R1111
J,h IPI I
'
2
1
The m.1trix Xis obtained h~ :1dding to the three rnlumm of levels ii column of ones. ('heck
th.it the matri\ X' ,\ i-, no lo11gcr diagonal, ,md verifv that the dctermirmt of its ill\crse is
I/ I,024. Convince yourse lf that ,1ny other .irr.1ngcment will result in a argcr determinant.
hir e\amplc, change the level of the first factor in the first run from I to I. Repeat the m.1
tnx algebra, and you will find th<1t the determinant of the resulting (X'X) 1 is 1/.1'18 and
larger than I/ 1,02-l.
( )ur d1-.cus-.1on L111ph.hi1n the important role oforthogonalitv in de-,ign of experiments .
.\n orthogon,1l tk'>lg11 1-. ,ilso ,1 I) (and 1\ I optimal 111,1in effect' design. However, for cl'rt.11n prohkms ,rnd run sill'> orthogo11.il de-,igns m;n not exist, ,rnd in these situ,1tions
{) .rnd A optim,ili11 hcconH 11-.cf11I dcs1gn cr11L'ri,1.
Observe that A .md ll-opt1m;1l1ty critcri.1 are model specific, a-, thLv look at the 11rLcision of the coefflcients in a ccrt<1in specified model. Herc we discuss the main-effects model,
hut extensions to model-, th,H include interactions or quadratic components of factors with
more than two levels arc possible, and software for finding the optimal designs is readily
.ivailahle.
Categorical l>csign Facto1 s wit Ii More Than Two Levels
t\ main-effects dc'>ign with categorical factors at more than two levels can he paramctcr11ed in terms of a regression model with an intercept term and indicator variables that express the absence/presence of the various levels of the design factors. C:rnsidcr the <>pcci,11
situation of2 categorical factors with 3 and 4 levcls. One pi!ir of levels (cg., the first lcl'cl of
factor I and the first level of factor 2) becomes the standard against which all other f.ictor
levels arc compared. The rcg;-ession model includes k =- (3 - 1) + (4
1) = 5 indicator
variables that express the pre-;ence/absence of factor levels 2 and 3 for factor I, and f.ictor
levels 2 through 4 for focto1 2. The vector of regression coefficients (3 is of dimension 6,
c1nd we need a minimum of 11 runs (N 6) to estim<Jtc the coefficients. Of course, better estimate'> of the main effects wuld be obtained if more runs were available. A 12-run full foctorial design with il single run at each level-combination would be an excellent choice. This
design is orthogonal, and it is opllmal in terms of minimi1.1ng the variability of the resulting estimates. But, let u:, assume that our resources allow for only S
8 runs. An orlhogo
nal design in 8 runs is not possible, and hence one needs anotlwr cntcrwn to select the rum.
D optimality is a reasonable critenon as 1t leads to the most precise main-effects estimates.
Consider the PhoneHog example in Section 8.2 as a second illustrative example. There
we study 10 factors with 6, 4, 3, 4, 6, 4, 6, 5, 4, 2 levels, rc>pectively. A full factorial is
orthogonal and hence D-optimal, but its number of runs is prohib1tm:. Assume that we
want to estimate the main effects of these factors as precisely il~ possible and are looking for
a [) optimal design that uses a certain small number of rum. We can write down a regression formulation for the main-effells model. It include'> an 111ten.ept ,111d 14 rcgressor ull
umm. One particular combination of I 0 levds (one for each factor) bernmes the standard
against which the other levels are compared. lhe .\4 111diGttt>r-, expres-, the Jb-,encc pre-,
ence <>f the other leveb 1n the considered runs. At J minim urn, we need \'i ru11' lo est1111,11l'
the 111.iin etfrcl'>. Additional rum would help e-,timale the par,1111eters 111ore preuscl).
Phonl'lfog was looking for a design that estimated the par,11mter'> (i.e., the main effects),,,
alcurately as possible.
the de-,ign.
J)
Note on Software
)e\eral iterative algorithms for determining A and [) optimal designs arc proposed in
the l1tnature and thq have been implemented in easy-lo-use software palkagcs. I hL cus
tom designer 111 )MP, for example, starts with a random des1t-;n of the desired run o,izc with
each of the runs
s~1tisfying
rn/ wo1d1nc1tf' exchange (1\leyer and N.1cht-,he1m, 1995) " used !ll 1111 prm e the de-,ign. LtL h
1tcral ion of the algorithm involves testing L'\'cry value of cveq f,1ctor 111 the de-,ign lo dllL'I
m111c 1f replacing that v,tluc mcrcascs the optim,tlity critenon. It .'>o, thL llL'W value rq1L1ce-,
the old. Iteration continues until no replacement occurs 111 .rn entire itcralL'. fo ,l\'(>id con
verging to a local optimum, the whole process ts repeated -,ever.ii times u-,ing a different ran
dom -,[,1rt. The custom designer display-, the best of these des:gm.
A recent article hy Kuhfeld and Tobias ( 2005) describes LOI 1bi11atonal and hcumlll opti111i1al1on methods for constructmg U-opt1111al factorial designs. The authors d1s<.uss
t-:cdcrov's approach of iteratively exchanging candidate/design pairs (where rum from the
list of possible design runs are swapped) and the rnordinate exchange algorithm of Meyer
and '\Jachtsheim ( 1995) (where coordinates of runs arc swapped), and they appl}' simulated
annealing optimization techniques to improve the performance of these two methods. LJsc
ful SA~ Macros arc available from the first author's Web site. An earlier paper by Kuhfeld,
Tobi,\'>, and Garratt ( 1994) describes usdul markl'tmg ,1pplicatio11s of I )-ol'l im,d designs.
EXERCISES
Excrcbe l
Consider Lase 12 (Almquist & Wyner) from the L.t'e -,tud1 ,1f1l1e11dl\.
\a) Consider the design in Table A I 2.2 o\ that case. ~how that 1t is bc1bnu:d (1.c., '>atne
number
,(),()({ 111()(,(),AI
211
run., at each level cnmhination ofanv two factors). Show th<lt the main effecl'> ol
,\le-,sage and J>romoton arc not confounded with the Message hv Promotion interaction.
flint: lJ-,e the regr-,sion appro;ich and shcrn th"t the interaction column x . is
orthogonal to x (me,,.1ge) and x, (promotion).
(h) L'sing statistics '>Oftware of vour choice, conf1rm the regression output in Tahlc
\ 12.4. L',111g re., ult-, 111 \f)pend1x 4.4 ( Krief Primer on Regression), explain how
the program oht.i111-,'
0.0707 .rnd the st.1nd,1rd errors 111 I .1bk t\ 12.'1. ht the
rq~ression on ju.,t ~Jt,sage ;ind l'rornntion, .rnd explain whv the regre<,s1on l<ldli
Lienh 101 .\kss<1gc .111d Promotion .ire llllLhanged.
(LI ( om1dcr the design in Lihlc t\ 12.'.'. I orrnul;ltc ,1 regression model with an 111IL'I
cept, three m.1111effcL1' for the 2 level faLtors (Subject, Action, and C:los111g), .111d
linear .md quadratil L<lmponcnts for the two 3-lcvcl factors (Salut.ition, Promo
tion); sec Section 7.4. SpeLify the 16 X 8 design matrix. Imagine fitting two regrL's
s1om. One nwciLI com1dcr., all eight rcgressor<,, while the second uintains the 111
tcrcept and only the three main effects of the 2 level factors. Would the estimate.,
nf the three main effects stay the s.ime 1 'v\'hy nr why not?
(d) Using a design software of your choice, obtain a 16-run D-optimal design with
three 2 level factors 2nd two 3 level factors. Discuss the software's approach of obtaining such designs.
Exercise 2
(a) l'.,e i\11111t,1h or ,lJl\ (!her stat1stiLs software suLh as )MP or SP.'.'>, ht the main
effects model with .ill !O factor.,, .rnd rnnfirm the results for the Llick through rail'
in Ta hie 8. \. Scuind, l'lllcr the I() 1:1Ltors in a dirterent order and observe th.it the
'>equential regression sums of squares change. This is a consequence ofnonorthogon;ility. For orthogon,il designs, the sums of '>qua res would stay the same.
(b) Focus on the model with just three factors, H, f., and/. Define indicators for the
levels of the\ factors; J 1ndicltnrs for H, 4 for f, and 2 for I, for a total of9 indiLa
tor rnlurnns, hi, h_), J.6, cl, e2, e), c4, il, i2. fstimate the model that includes ,1
constant and the indicators h3, h6, e2, e3, e4, and i2. The constant represent'> the
mean response when all factors arc at their baseline values (level I of H, level I of
f, lcvcl I of /l. Oh1<1111 the least squares L'sti111.1tes and their standard errors. I 111d
the best lcvds of each factor and obt,1in an estim,itc of the click through rate of the
he'>! factor level combination. C:ompare thi., to the prediction (22.56) that vou got
111 SeLt1on 8.2.2.
L) 1\11,il\'1e the .1ct1011 r.ltL" using the same approaLh that we used for click-through
rates.
T11111F
/\I.I
Regular p.ickage
:-Jn in store samples
~n
cnupun.'-.
Deluxe package
In-store samples
Coupons
Reduced fat percentage
Gold sticker
New package lettering
In the first phase of the work, the group developed an initial list of .1bout 20 factors for
possible iriclusion in the experiment. Gradually the list was reduced t0 its final form consisting of 6 factors ('I ahlc A I. I ).
Arriving at the final list was not an easy matter lwcause each manager had personal favorites among the list ofpotenti;1J changes. Al Dougl;is (finance) felt that the current deluxe
package only added to the cost without affecting sales. Rill Evans (marketing) strongly disdgreed. "Al, our lunch meat line of bologna, ham, and turkey has the finest products on the
market, and the cl<tssy pack.ige enhances our quality image." A heated argument ensued.
Clcma Johnson oi' i'.ip St()res, the supermarket chain, suggested simply adding <t gold
star to the package. " It will catch the shopper\ eye, and that is important in a crowded
d is1'l,1y case."
The advertising manager felt that coupons were an important way to increase sales.
Others argued that sales wnLld increase hut not enough to offset the discount provided hy
the coupon.
Eagle had in the p;1st ncccsrnnally set up in-store displays where customers were offered
lrl'e <;<implcs ofhologna, h.1111, and turkey. There was general agreement that this led to;11ore
sales (''if they try it, they'll like it 1"), but these free sample displays were expensive, and it
was unclear if the inuea.se 1n sales was sufficient to justify the cost.
r.agle had recently developed a new version of its cold cuts with a reduced level of fat. Extensive testing had shown th;1t taste and appearance was unchanged, and the firm felt that
the lower fat product would appeal to health-conscious customers, and therefore sales
would increase. The downside was that the low -fa t version had a higher productinn cost.
Several team members fe11 that a change in package lettering to a holder look would increase sales. The proposed nc:w lettering would result in a small increase in packaging costs.
HOW MANY STORES SHOULD BE INCLUDED IN THE TEST?
There was general agreemen l among the members of the team that the shorter the test period, the better. It was agreed that the test would be run for one week across a random
sample of stores. The /'.ip Ste res supermarket chain had approximately 500 stores all located
in the Midwest. Thnc w;1s extensive dat;1 avail;1hle on sales of the three Eagle Rrand products hv store .md week. The te<lm eliminated weeks from the database that were not consiclcr-ed aver;1ge. The not average weeks included the week of the Super Rowland week<- with
special promo! ional .tel iv it i .'S. For average weeks, average weekly sales per store of E.igle
1;
111
\1.1111 cfil'ct nf A
Main effect of C:
cover pra:c
2
,
c
~c
c
"
111
..,
t ;~
.,~.
~-
'"
i....
()
.,
~ J
'
,.\
:i..
,-
T
: S.l.99
\O
()
)l
----.,
'11 $5.99
S~.99
: I/3 rd
le"
'II
_.:,:
r
.-,
Ill
';[,
.,
..,
.,""
J
"
J
I()
"
,, ,,
,,
,, ,,
,, ,,
,,
,, ,,
,, ,,
,,
,, ,,
,, ,,
,,
,, ,,
,, ,,
<.'O:
turrcnt
C i/lnl
more
,, "'
( +: I/3rd more
I 1 lrd le"
ligurc 1\2.1
S.l.99
A~:
$5.99
s.dc' Lhangc at the Ll'!ltcr pot11t ( 2.1 'Yri), where each of the 3 factor-, is at the midlevcl, falls
ahoul 111 l111e hL'll\'L'L'll thL' low .111d high lcvcb; for sales, there is no appreciable cun.itme.
I he 1\( 111ter.ict1011 .idds further ms1ght to the sale-. analvsis. It shows that the incrL'<l'e in
,,des at a lown cmn pnlc (1 \ ) is much smaller ii the numherofuiptcs 1s reduced ((' ).
1\lternat1vely, the increase t11 newsstand copies (C+) has a minim.ii impact if the cover pnce
i'> high (1\ ).Furthermore, combining the levels that arc optimal indi,idually - Jm, cm er
pnLl' and high number of Lopies-inL.reases sales by more than what can be expeLted hy
summing the two individu.il main effects. Depending on the team's profitability analysis
(which is not discussed here ), the publisher should increase the number of copies on the
newsstand only 1f it plam to lower the cover price. Otherwise, cost savings from a reduction
111 ncws.\ land rnp1cs mav hm c little impact on sales.
CASE 2
221
The J\/l interaction shows that the two significant main effects arc even greater in combination. A high cover price (A+) and a low subscription price (B-) result in more new subscribers than the implied number that is obtained by adding the two individual main effects.
Analysis of curvature leads to important insights. Significant curvature for subscriptions
means that, within some range, cover and subscription prices have negligible impact. Yet beyond <1 certain level, price changes result in a big jump in subscriptions. However, with only
one center -point test cell it is not possible to determine which factor (or perhaps both) is
causing the curvature. With ,;ignificant curvature, the next step for the marketing team was
to run a new test design with more combinations dt different price and subscription levels.
i::ollowing this test, the publisher's marketing team ran additional tests to pinpoint the
optim;d price poinh. They ended up increasing the cover price and increasing till' subscription price, while maintaining the same level of newsstand copies. These price increases
ultim.itely increased profit while maintaining the number of copies sold on the new<;stand
and through subscriptions.
QUESTIONS
Fxerci-;e 2 in Chapter 4
I All I I 1\ \ I
Fnctors 1111d Ass1incd lcvr/_1
F-\< IOR I I\'!- I
l.ll!Cli
'\n .HI nnw 1n1.,crt
( rcd1! tilfd
I l<1rder offer
/)
'itrong guc1r~mtcc
~trongcr
1-e..,t11110111cll..,
Bumper sticker
llallsv
/<
guMantcc
ing firm that organi1cd the mailings, the seven factors listed in Table 1\3.1 were idcnttlicd
for the test.
The "act now" imert, a '.mall separate note in cnlor, urged people to act now .ind to
respond/pay today. 1 he front of the insert showed the cover of the next hsttc and urged the
recipients to act irrnnediatel} to make sure they received this very special issue. The b.tLk of
the insert in bullet point st}k and vibrant J,111guage described the article<., that would appear
in the next issue. lithe act now insert was included, the words act now were also bold Iv written on the rep Iv <.ard
The -,ernnd faLtm ga\'C pd1ple the option of paying by credit card rather than only by
personal d1eLk.
The factor "hard ofler/h.Hder offer" refers to the language in the offer described 111 the
letter from I lams. the ptrhii-,her. hir example, "hard" encourages the person to "gL't thL'
next i.-,sire of ,\fol her /0111,," .md "harder" '>.IV'> "to get the next issue hot off the pre-,-,, send
vour rq1h .rnd -,uh'L rrlw tod. \."
"(;u,1rantee" allows the pnson to l.lll<..el the subscription at ,rnv time during tlw fir .... t
1e.n, .ind rL'Le11 L' !llOllL'I h.tL k for the j-,-,ue' not vet r-ccetved, whrle the '\tronger gu.1r.111tec"
lllL'.111' th.it the full 'ith'L rrpt1011 priLe would he rdllndcd, as long a-, the '>tthscription "c.111
celcd before the end ot one year.
"k-,11mornals" refer to .in imert rn \\'hich !)'pica! subscribers and notable people make
positive comments .ibout tht magazine.
"(,utsy/ballsy" rl'fers to a srngle word that is printed on the outer envelope. Prc1 iously,
Mo/her Jones had effectin~ly used the word "ballsy," but had received some compl.1ints
about this Lrnguage. 11.irrts was interested in finding out whether softening the langu.1ge to
"gutw" would produce 'imil,1r results.
THE MAILING PROCESS
The mailing process consisted of two stages: printing and insertion. Printing produCL'O the
,1ddressed outer envelope with either "gutsy" or "ballsy" printed on 1t. The testimonials
were printed on a single shlct of paper, and the reply card included information on the
price, whether a credit card c lllld be used, the type of guarantee, and the phrase "act now"
1mluded or not.
A tirm speLiali1ing in mailings was responsihle for carrying out the logistics of the
experiment. The hrm's production line rnns1stcd of automated insertion equipmrnt that
( ,\'I'.
229
te'>l cell," overs1mpli11ed the sample sill' issue and often led to a weak test with no s1gnificrnt results. In this L<lSL', with only \5,000 names and an average response rate of I%, ,1Jl ef
fl'LI would ha,e to Lh,rnge th1 response h} about 20/ci (from 1.0% to 1.2%) to have a 50:50
Li1ame of being found significant.
I he u1nsulta11t offered L'llL<>urdgement. \\'ith the right multi factor test design and a fo
ct1s on hold changes, they could create a strong test with useful results. Thr consultant surnmari1ed the requirements:
l.'se one experimental design to lest numerous variables, maintai11ing the same test
power no matter how many variables were tested.
llse all available names, but design the test so differences among 'egments can be
quantified.
I .1kc adv,lllt.1gc olthc flexihil1tv of e-mail hy using a higher-rc-,0lution test design
with more test cells yet less LOnfound1ng among the effects.
TFST rACTORS
t\ftcr brainstorming idea' and trimming the list clown to the boldest ideas, the marketing
team identified 13 \'ari.1hlc.s and selected two different versions of each variable to test.
These 13 factors uiuld he tested <;imult;incomly in a 16-run design, hut for reasons out lined
helm,, the consult.int 'elcctd .1 \2-run fraLtional factorial design instead.
I hL .\2 ru11 tksig11 requirl s grc,1tcr effort for the m<Hkcting team to co11struct 32 diilercnt
e rn.1ils, hut 1t has 1rnportan st.it1stical <ldv<rntagcs. \\.'here a 16-run design is only of resolution 111 (with main e!lects lully confounded with 2-foctor 1ntcraltionsJ, the 32-run tk-,ign is
of resolution!\' (1,1th mam l'ffells rnnfoundcd with 1-f.ictor inter<1Ltion ..,, hut independent
of2-factor interactions). SinLe higher-order interactions are unlikely, this design reduces the
potential rnnfou11d1ng error and also helps identify key 2 factor interactiom.
1 he three rnstorner segments al-;o had to he considered. Including a 3-level factor Ill the
test would h,1ve led to .in unh.il.inced design, in which each factor level would not h.ivc .ip
pc.ired in the same number of runs. Instead, the three segments were defined as <l factor
with 1 levcls, with segment I (the IMgest -,cgment) taking up 2 levels. just ;isa 2-levl'I faLlor
rcqu1rn one uilumn in the test design, a 4 level factor requires thret: columns. The 1\, R,
.rnd \ /l 1n tl'rat t ion uil um n' 111 1 ahle ;\ S. I were used to define the th nc segments.
After creating the test design, one of the 13 factors was eliminated. 'I he team planned to
test d search box at the top of the e-mail message, but this was too difficult to exeu1te ,ind
LOlumn I was left empty. 1 Ii~ remaining 12 factors plus the 4-level segment factor arc listed
in Table AS.2.
TA Ill. f'
AS. t
r\
1\H
Segment
Name...,
Scgmcnl 2
11,SXh
Scgmcnl I
I 151lX
Scgmenl 1
8,96()
IAllLF
AS.2
~cgmenl
'-.q.~nll'nt
) l .uni rnl
:--;u
IJ
I
I
<i
Wlnte
ll.1ckgrou11d color
' l'rnpty)
I Jn1gn ul e-mail
/\ I >"rnunt offer
Srrnple
None
Current
No
15% off
I I rl'e gift
:\one
\I l'rnduets pictured
N "Valued eusto1r1l'r" copy
( J l "'" sell cup~
I' sub1eet l111e
lew
Current
P,irtner pro1notions
( ' urrent
( + )
~'"" !Jea
I''"' atrd
I re'
.\trill'
jlL'tlcil "'I
Slrt)11gcr
Ne\' u1in
"'tpn.1.il nlkr
f11r<Hll llhltlllll''"
A 11nd B: Segme111
The marketing team had defined three key customer segmu1h, b,1,cd un bchav1or,d vari
ables .111d the li1111ng of recent purd1,1scs:
'icgnll'nt I consi.,ted of customc1", who had made a purch.ise onlinc or in ,1 '>lore with111
the last \month-,. ~egmcnt 2 had 111,tde ,1 purd1,1se within the I.isl
() 111ontl1'>. '>eg111ent \
had made a purd1asc \\tthin the J.t-,t 6 -12 111onths.
( : link tu 011/i11e Ca111/ug
The c mail included a "Shop our catalog online" button towards thl' bottom of the
e-mail. the tea111 kit that J link to the \!\ch '>Ile would L'llu1ur 1ge cu.,trnllL'J'' to hruwsL' the
Jvailahk products.
IJ: /Jackgruullll Color
All l'-mails were -,ent with dark text on white background. l'he Lre.1t1ve d1rl'Llor thuught
that ,1 blue background might hl'lp the e-mail stand out.
F: /)esign of E-Mui/
I mails used a has1L font with a -,mall compan> logo at the ,op. I hl' team wanted lo test
a stronger brand image, with a larger logo, more stylized fom, and greater use of thl' Llllll
pany\ brand colors 111 the e-mail.
(,;Partner Promotions
\\'ith brand-nan:e products, the marketing team believed t1"1t pro111otrng several brand-,
could help convinu customers to make a purchase. They dccded tu promote two spccitiL
brand., in two bright boxes under "Olfers from our partner.," at the bottom of thee m,1il.
H: Navigation Bar on Side
F mails currcntlv wrnt out with a sideb<1r similar to the n.ivigation bar on the companv
Web -,1te, but with a shorter list ot links. they didn't want to test <111 e m.111 without an}
sideb<1r, so instead they decided to te-.t the LUrrent 11<1vig.1t1011 har ver-,us one with more
choices.
'
( AS I.
Bl
]: Special-Offer Starburst
Since e mails were 'ient to tl "select group of customers," they wanted to play up the ex
clusivity with an eye -catching red star at the upper right stating "Special e-mail offer."
K: Discormt Offer
'J he Internet director had gone back and forth between offering a special e-mail discount
or not. He thought the discount helped, but had never quantified whether it pulled in
enough sales to 1ustif) the lower margin.
I: Free Gift
Thev had not offered a free gift with on line orders before, but wondered whether 1t was
worth a try, as other companies were doing it. They selected an attractive, but low-cost, penand pencil set that they could offer for free. At first, they wanted to offer it only for orders
of $50 or more, but choose instead to be bold and offer it with every order.
M: Products Pictured
I ven e-mail fornsed on a selection of products-with pictures and prices-but they
never knew how many were best. Some people on the marketing team thought that a simple
offer with just a fl:1, prodt1Lh would get people to respond faster. Others thought that a
J,irgcr selection would give more people something of interest. They decided to test a few
producl'i versus mam products. hlr the test, every picture was the same size, so e-mails featuring "m.rny products" had additional rows of product pictures.
N: "i'alued Customer" Copv
Their standard e mail copv stated, "As a valued [company I customer, we would like to
offer you the.,c lnlcrnl'l onh sfll'Lials." Thev tested this against a copy Y.ith a strongn mes
'>age, adding a second .,entencc about how onlv their best customers get these special offers.
0: Cross-SC'// CO[I)'
l'he sewnd copy change was designed to sell more products. An additional sentence was
added to enc..ou rage people tn order a variety of office supp! ies at once to lower shipping costs.
P: Subject Line
The Internet director had been testing different e-mail subject lines. Currently, "Lxclusive c mail offer from Icompany I" was the wmner. Since he knew the subject line was important, he wanted tu test an.other version, "Special offer for our best customers."
TEST DESIGN
The consultant developed a 2 11 111 fractional factorial test design, based on a 32-run, 5-factor,
full factorial design in factors A-F, with factors F-P assigned to I0 of tl{e 26 interaction columns using the design generators:
P ABC, C - AHD, H =ABE, I= ACD, K =ACE, I.= ADE,
M fl<.'[), N HCE, 0 - RDE:, P-= COE
l\1initah statistical software was used to generate the design columns. The test matrix for
the IS design columns (the 13 factors plus the 2 factors, A and B, that specify the three
l3l_A~'=N~~X
----
A 5. 3
AB l F
""
-;;;
0
";;;
~
.5"'
c
"
E
Test
Cell
c
"
E
"
'./'
"
.r.
"""
c:
;;;
.s
;:!
;.u
~
co
3
-"'
c:
c:
,..,
......
0
0
0
6:
...
c:
..."'
c:
-:; '- 0.
E
.,,,"
Vl
c:
...0
"'
::c
c:
"'c.v:
"
Ci
0..
"'
/)
/'
(;
II
c:J
-:::
~
c5
'...
"
E
2u
-;;;
""
co'
.,,,
-e
Vi"
";;;
"'"
%
OJ
5.
...
:J
/,
v:J
0..
"5
":l
../')
"
:J
:;
"'
.E
0
"
~
.,,,
~
;"
.'.J
.r.
.r.,,,,;LJ \.1
1.
.~t
,\'
(}
I'
~~lllH.'~
OrdL'"
I .'i IS
88.\
88\
I }(llJ
I, ill
21
Kl
l.1"'1111
KH.l
1.180
1,'iJ'i
SH'
KOO
1,180
1,515
88\
() qu,,
""
/\
0::
>.
~:J
:.;"
:J
4
~
ti
7
8
9
JU
+
+
II
12
13
14
h
lfl
.%.\
17
18
19
20
KL1
I,=' I J
88.\
(l90
1,091
I, 178
HH.l
88 \
I 180
1,:, 1,
ti.JI
~'
+
t
21
22
23
24
25
26
27
28
,,
.,
KK.\
)"
30
31
32
1,180
1,118
88.3
88.l
1,171
29
,.
-r
+
+
<JI
Rt'SJ1< ll1'C
.\
HI-~
l\d[L'
{910
0.bH 0 10
(>
l}
l .0210
.s
0.{l~p!I
')
o)yuiu
J. l9 u
I 12 11 )
14
I'
11
J.j
u
9
10
11
LJ
17
,-).i,o
0.00%
0.591u
I I -~u
I .~:iu'o
l.IO'o
I I 2 ~u
U./LJ1J u
1
0.,
I .17<~o
I ~5'-hJ
lo
18
\
1 Viu
0.
q~JllJ
I 2-,u,u
II
(J_qo,u
K
K
I\
lI
9
I.\
8
{_),--,_{ 1h1
I .09
0.91
"
1.10,,
l.Uo%
J .lJ2ll10
1 .471u
O.ti8%
segments) are shown in Table AS.3. Minitab also provides the alias structure for this fractional factorial desi~.n. Ignoring 3- and higher-order interactions results in a design that can
estimate the I 5 mlin effects of factors J\ through P, IS effects each rnnt<1ini11g 'even
2 l~1ctor interactions, and one effect co11ta111111g only 3-factor 111tcract1ons.
I able A5.3 lists the sample size and rcspome data for each test cell. S111ce each customer
segment was randoi11ly assigned to certain test cells based on the t I
levels in columm ,\
and B, the numbers of customers contacted in each test cell arc not the . . amc, ,ind the lL'st is
not completed balanced. In addition, after names were ,1ssigncd to test cclb, the hnal
purge/merge (where addresses arc double-checked and invalid e mail addresses Mc re
I
moved) dropped soi11e names from the test. In the end, only -~(:1,060 names were used. Lach
vcl",ion of thee mail was sent tu a-, few ,1s
(14
I or
I'
,ls
i,/
I'
,1-
_J
CASE
23)
The creative team--made up of the creative director and a single person who designed
every e-mail-was somewhat tentative about the test. The thought of creating so many different c-m<lil;; for one drop was daunting. They also didn't know if all required combinations would work from an artistic st<rndpoint.
The consultant worked to minimize their concerns ;rnd lighten their workload. hrst, he
helped the tc,im define clear, independent factors that could work together in any combination. Then he sat down with the team to review every required test cell, changing factor
detlnitions to make ,11! test cells essentially simple rnt-and-paste combinations. Fin,1lly, he
worked with the crc;1tive te<1111 as they developed each version; he checked everything to ensure u1111pliance and Lomistency and solved any problems as they arose.
Overall, the creative work added two days to the marketing schedule. The team was surprised how smoothly things went once all factors and combinations were clearly defined.
TEST RESULTS
The test dropped on Tuesday, and initial results were analyzed after one week. Since the
team wanted to increase the number of orders, the primary metric was the response rate.
Average order size was also analyzed to help assess profitability, but this particular analysis
is not shown here.
The main effects and 2-factor interactions arc shown in the two Pareto charts in Figures AS.land AS.2. The effects are calculated by applying the+/- signs to the response
rates in the last column ofTahlc AS.3, and dividing the resulting linear combination hy 16.
Alternatively, the effects can be obtained by regressing the response rate on the design <ind
intcr<iction columns; the only difference in the results is that the si7e of the effects is cut in
half. As disrnsscd hclow, the significance of the effects is best determined through logistic
K:
Jl1,Ln11nt
offer -0.5Y'l
/.:Free gtlt
A: Segment -0.228
/: Spcci.i I nl ler starbu rq
0.077
10.07'.)
f: (cm11t1
, -0 071
N,1v1gattn11
0.0'J5
h.ir
fl: Scgnwnt
0.070
0.067
I: .Mr' , 0.056
] +0.027
J I 0.020
!i - 0.013
f----~-,~--~,~--~,~--~,~--~,---~,---~,-
0.0
0. !
02
O..'l
0.4
Figure AS. I
0.5
0.6
0.7
214
APPENDIX
K: Discount offer
G l',irtner promotions 0.291
0.599
!\] - KL - - - -
B: Segment
AM
All
i\K
D: B<lckground color
M: Products pictured
;\}'
!\U
P: Subject l111e
AL
Ill'
AW'
AL
Ml
!).()
0.1
IJ.2
O.. l
11.4
()_ 5
U.ti
U.7
Figure AS.2
regression. The output of the standard regression and the logistic regression are surnmari1ed i11 Table AS.4. The Mi11itab statistical software is used for both regressions.
The standard regression with the response rnte as the dependent variable has several
drawb.rcks: (I J It ignores the different sarnple .,izes. It lrctb each respunse rate as cqually
pn.:Lise and analyzes the response rates the same no matter how 111a117 e-mails were -.ent.
(2) It uses unweighted averages of the response rates in the estimation of the effects, instead
of rnorc appropriate weighted averages that adjust for the unequal prec:1siori. (J) ThL' ~tan
dard errors of the estimates are obtained by pooling smaller irniignificant dleLls in lo the experimental error term, a somewhat arbitrary decision that can overstate the number of significant terms. Logistic regression represents a better approach for analyzing pruportiom
CASE
235
TARLF. AS.4
Standard Regression and Logistic Regression Output of Significant Factors
[ () ( I s I
l
l(
HI (I
I{
I . . . " I () N
nI
N ll M B (,
I~
cl
K ., /_' I\ H 1 A N
J)
"
Odds
Ratio
0.88
1. 02
0.85
0.69
1. 22
0.98
1.17
95% CI
Lower Upper
0.78
0.91
0.76
0.61
1. 08
0.87
1. 04
7, P-Value
0.99
1.15
0.95
0.78
1. 37
1.10
1. 32
0.000
that originate from samples of different sizes. The number of positive responses among the
sampled cases in each run is modeled as a binomial random variable, with a success probability that depends on the design variables. Since success probabilities are always between
1cm and one, logistic regrcs~ion Jllodels the logarithm of the odds (the ratio of the prnhabilitics ofsucces.-, and failure) as a linear function of the design variables. For a detaiil'd discussion of logistic regression and on how to interpret the coefficients in lo gistic regrl'ssion,
we refer the rcwkr to< :h.1ptcr 11 in Ahrahalll and I edoltcr (2006). I Jere we use the logistic
regression merely to assess the significance of the regression coefficients.
2.!6
APP~NLJIX
SIGNIFICANT EFFECTS
The ,1vcragc response rate in this test was 0.9 l 6%, with just 0 - 21 orders for each test cell
and a total of only 312 orders. This was a small sample size in an unbalanced design with
low n:ponse rates, and yet the subsequent results were convincing. Significant effects include the following:
K: Discount Offer
The elimination of the 15% discount resulted in a 0.599% reduction in the response rate.
The team calculated that the loss of margin from selling the product cheaper is more than
covned by the increase in the number of orders.
G: Partner Promotions
The two partner offers in the e- mail reduced the response r.ite by 0.29 l r)!ii, contrary to
'what they had expected. The team theorized that the additional offers may have confused
the message and given customers too many disjointed offers to choose from.
L: .Free Gift
The free pen-and-pencil set increased the response rate by 0.264%. Analyzing profitability, the cost of the gift was easily covered by the increase in orders.
A: Segment
The significance of at least one of the three components responsible for the segment effect (A, B, AB ) indicated that the three segments responded differe11tl[. The respomc rate
for the three segments are summarized in !'able A5.5.
J"he differences among the three response rates arc small and not particularly signiflLant.
The 95 % confidence intervals in Table A5.5 overlap each other. This ilnding could suggest
thJt the blocking with respect to the three segments may not have beL11 needed. Nevnthe
less , the signific<ince of the factor A in the earlier analysis and the summary in Table A5.5
raises the question of why half of segment I (/\ + /) -) had J response rate lower than any
other grnup, while the other halfoi' segment I (A - H +) had the highest response rate oC all.
After some investigatiun, the problem LUuld be trc1ccJ lo ,1 sirnpk nrur in the cxcLution ul
the experiment. The top half of segment I ( i.e., the best customers) had been placed in the
A - H + test cells, while the bottom half were placed i 11 the A 1- 8
poi11ti:d out the risk of a nonrandom assignment of names but abo showed that the 'cg
mentation model needed some refining; perhaps the best and wurst recent buyers should be
in different customer segments.
KL Interaction
The final significant effect was the KL 2-factor interaction, with an effect of 0.171 I. Before explaining how this interaction affects results, it is worthwhile to take a step b,1ck and
sec where it came from.
Analysis of the data shows 31 independent effects: 15 main effects, 15 strings of2 -factor
interactions, and one string of 3-factor interactions. In Mini tab, the default is lo label the
interactions with the first of all confounded interactions. For example, the labeling of the
significant interaction in Figure A5.2 starts with A} because A is the faLlor that is listed first.
'P'
_ _
_ _ _ _ _ _ _ _ _ __ _ _ __ _r__
:A_s_-F._4_~/_2_n_ __
noring interactions of order 3 or higher)? Briefly explain why you have selected this
design.
2. Would it be beneficial to replicate the design? Why? 1 low would replication help you
determine which effect~ are significrnt? If you choose not to replicate, how would
you decide which effect-; were signifirnnt?
CASE 5
23 7
TAllLF A5.5
lfrsponsc Rates Jin the Three Customer Segments
Com hi
.\
11.i!1nn .'>
2
3
Ii
~c g mcnt
Segment 2
Sc~ment
--
------
4
No TE:
t-
Availahle
N,rn1es
All
Average
Re,ron se
I 1,586
0.97 5%
----0. 789% }
I 3,508
0.940%
1.090%
--- - - - - - -
Segment 3
8,966
- - - - --
0.803 %
l. 2
0.9
c
"u
ct
0.7
0)
Segm ent I
Segm ent 2
Segment .'l
However, this effect could actually be the result of one or more of the seven confounded 2factor interactions. The list of interactions is given in Minitab, or the interactions can be cal culated using the design generators that have been listed earlier in the Test Design section.
The seven interactions mixed together in the Al column arc Al + RM ~ CD+ EP -+ FC
KI ~ NO.
The regression results in Ta hie A5.4 show that this column has a significant effect, hut do .
not identify which interaction (s) is most likely. Here is where marketing knowledge and statisti cal principles come together to help pinpoint the most likely interaction effects. The following two principles can help:
238
APPENDIX
often they result (rom related factors-factors located close together (like two clements on
a direct mail envelope) or conceptually related (like price and offer variables).
In this case, the choices are
1. AJ
The starburst (!)has a different impact depending on the customer segment (A)
2. FG
The e-mail design (F) and partner promotions ( G) work together to impact
the response.
3. KL
The 15% discount (K) and the free gift (L) have different impacts depending
15% discount and the free gift-increase the response less than what can be expected by
addi1ig the individual main effects.
The interaction can be understood by comparing both points on the left versus both
points on the right. On the right (with no discount, K +),offering the free pen-and-pencil
set gives a large jump in the response versus offering no free gift-the response more than
doubles from 0.41 % to 0.84%. ln contrast, the points on the kft show that, with the 15%
discount (K-), the Cree gift increases response only slightly (frnm f.18"'u to l.27u!(,).
Ovnall, this interaction shows that the l 5% discount is great, the I rec gift i:, goud, but
buth together are overkill-the free gift adds little to the benefit of the di:,count offer. These
data helped the company more accurately quantify their rctun; on investment (ROIJ on
ever)' combination of offers. Also, this gave the marketing team deeper insight into cusKL 1ntcrJction
1.4%
1.2%
l.OlYo
"'
"
~ U.8%
"'
p::
0.6%
U.4%
-i---
K+ No discount
K-: 15%off
!Jiscuullt offer
- - L : No lree gilt
Figure AS.3
CASf
239
tomer behavior, showing tlrnt one strong incentive is valuable, but additional incentives arc
prob,1hly unnccess,iry. With these results, the Internet director decided to offer a discount
more often, hut some! imes switch lo a free gift, depending on the e-mail campaign ;rnd the
profit.1hility oft he llhtnmcr scgmrnt.
CONCLUSIONS
The Internet director was amazed by the depth and value of the results of this one rest in
one drop with just 34,060 names. He learned in one week what would take 6 months using
standard techniques of testing one variable at a time. With these results he decided to do the
following:
C:onsistentlv offer the I 5% discount (testing different discounts in future campaigns).
/\void the partner promotions that hurt response.
lJsc the special -offer starhursl (! + ),even though it was not quite significant.
Offer a free gift every few e-mail campaigns to keep the offer fresh and sometimes
offer it along with the discount to the highest-value customer segments.
Improve his segmentation model, adding more variables and splitting apart recent
high value <llld low -value buyers.
I le implemented this strategy in the next campaign. The response jumped to 1.54%, which
was somewhat higher than the prediction and much better than the previous performance.
The Internet director continued testing offers along with bolder creative changes, eventually achieving response rates consistently between 3% and 5% while adding more names in
every drop.
After these results, the marketing team began testing changes in their catalog, retail
stores, and regional ,1dvcrtisi11g, continu;1lly squeezing greater profit from every marketing
dollar. They found few major breakthroughs, but continually uncovered a number of small
ch,!llgcs that <Hided up to a big bottom-line impact.
QUESTIONS
CASF 6
TARIFA6.2
Fstimatrd J:::jfccts
l:Ht-cr
Y:actor
\
fl
/J
I
(,
All~
Cf:+FC;
!\C Rr +!\/)
-u '
n;
-t
nc;
Fl
0 ..>45
0.165
0.005
0.035
0.165
0.045
0.555
-0.145
0.255
0.035
0.085
(J.205
AF~
fl(' +- /)/
fl(;+ /JI
~(;"
('/) I
HI
o.nzs
/HJ ("/ +
A ll/J
f(,
0.075
0.105
241
CASI
245
experimental design, we found a few early papers all in the marketing literature, and we
found no paper that employed a Plackctt-Rurman design, which was used in our study.
Curhan ( J 974a) used a 2-level fractional factorial design to test the effects of price, advertising, display space, <llld display location on sales of fresh fruits and vegetables in supermarkets, while Rarclay ( 1969) used a factorial design to evaluate the effect on profitability
of raising the prices of two ret,1il product;, manufactured by the Quaker Oats Company.
Holland and Cravcm ( 1973) presented the essential features of fractional factorial designs
and illustrated them with a hypothetical example concerning the effect of advertising and
other factors on the sales of candy bars. Wilkinson, Wason, and Paksoy ( 1982) described a
factorial experiment for assessing the impact of price, promotion, and display on the sales
of selected items at Piggly Wiggly grocery stores. In addition, marketing researchers have
used small experimental designs in survey and conjoint analysis applications (e.g., sec Ettenson and Wagner, 1986; Jaffe, Jamieson, and Berger, 1992; Srivastava and Lurie, 2004).
Our purpose in this case study is both to report on a successful retail marketing application of experimental design and lo highlight the opportunities that exist for operations
management researchers ;ind practitioners to apply these methods to service problems. In
gener;il, experimental design in service operations can be used to test the effects on service
quality and effectiveness of changes in staffing, training levels, procedures, and service system design. Particular ex,1mplcs in marketing include optimizing the design of Web sites,
increasing the effectiveness of direct mail distribution channels for maga1ines, credit cards,
and other product<>, <llld variow; in-store experiments to evaluate changes in factors such as
p.Kkagc dc.'>ign, price, .rnd pomt-of-s;ilc displ<iys.
1 he tr.idition<1I a11prn<1ch lo experimentation in manufacturing, as well as in ret,1iling
and other service areas, is to test one factor at a time while holding the remaining factors at
fixed levels. In contr:ist, in multivariahle experimental designs such as factorial, fraction<1I
f,1Ltori,1l, ,md PlaLkl'tl Hm111an designs, all f.ictnrs arc tested simultaneously. Recausc nfthc
orthogonality property of these designs, it is possible to obtain independent estimates of
important effects (main effects and interactions), while greatly reducing the required
sample si7e.
In the rel a ii a re<i, which is the focus of l his case study, firms typically make very large invcstments in testing. However, few companies use sophisticated state-of-the-art experimental design techniques for in-store tests, choosing instead to test one variable at a time.
Supermarkets offer especially attractive opportunities for experimentation, because of their
low profit margins .rnd highly competitive environments.
In designing an in-store experiment, there are many issues that need to be addressed.
I low many and which factors will be included< How many levels will be tested for each of
the factors' Whal alternative designs should be considered, and which design should he selected? With respect to sample size, how many stores should be included in the experiment,
and how should they be chosen< Over how many days or weeks should the test be run to obtain statistically valid and significant results? How should the results be analyzed'
This case study both provides insights into the design issues that are important to decision m;ikers and presents the details and results of an actual application. The product tested
was a popular maga7ine with a very large readership. For proprietary reasons, we do not
identify the company or the magazine, and minor changes were made to the data presented.
In spite of these modifications, the factors tested and results are essentially unchanged.
246
A)' p lo N 0 Ix
Retail testing is ideally suited for the use of experimental dl'sign techniqul's, o!Tering decision makers the opportunity to test numerous variables at a relatively low cost. Dozens of
elements can be tested simultaneously with the same sample size as a test of one variable
alone.
Tim case study describes a magazine supermarket test of I 0 in-store variables using a
24 - run Plackett-ilurman experimental design. All 10 factors were tested simultaneously
over a 2-week period, with only a fraction of the sample size required !or one- variable tests.
Results quantified the main effect of each factor and allowed for the analysis of 2-foctor
interactions.
TEST !"ACTORS
The supermarket is the final stage of the grocery supply chain. Typically, firms give a great
. deal of attention to supply -chain m;:magement issues that include forecasting, inventory
management in sturcs and warehouses, and transportation and logistic-, management.
Within the supermarket, there is a range of management issut'' th;1t affeLt quality and pru
ductivity. Some may be addressed with the help of mathematical models , for example, the
u>e
or queuing models to schedule the front -end checkout area or computer models for de -
sales. lhe maga1.inc publisher imtigatecl the study, but the variables tested were uf general
in I crest to the management of the supermarket chain as well.
Single-copy magazine purcha:,es are often an impulse buy. The cover price aid at
;1
news.- ,tand is usually much higher than the per -copy subscription jHile; so, loyal customer-,
have a strong incentive to purchase a subscription !or its low price and in home delivery.
Publishers invest extensive time and effort on each magazine cover, using experience, focus
groups, and one-variable tests Lo find the right pictures, words, colors, and layouts to attract
those impulse buyers who spend just a few seconds selecting a magazine.
In this particular experiment, the magazine itself was not changed. Instead, the project
focused on the location, number, and arrangement of magazine racks as well as in -store ad vertising. Copies of the magazine were primarily displayed near the checkout area. The operations team was particularly interested in the effect on sales of adding additional locations
throughout the store. Management of the supermarket chain was also interested in evalu ating the effectiveness of these additional sites. These added locations had been unused ar ea.'>, a11d because the displays required relatively little space, the m<lga1.ine was <.ill especially
attractive product to test in these additional locations.
The effect of in -store product location on sales has been studied by a number ot authors.
LJreL'.L', Hoch, and Purk ( 1994) used a basil test -umtrul exper1111entd! approach to assess the
CASE
TAB! F
247
A8. I
'.\In
r.nd cap
<
l\umhcr ol pnckcti.; on
111,1111
r<lLk"
) I ow l.cvcl
(:L1rrcn1
No
No
No
No
No
'.\In
Random
( +) l ligh I.eve!
Yes
Over the belt
,\1orc
Yes
Yes
Even
Yes, in 20% of copies
Yes
Yes
Yes
sales impact of in-store shelf space management. Changing the location of products among
various she If positions, they fou 11d that rea rra ngi ng products in com pie men t(I ry groups a 11d
placing certain products at eye level could increase sales. Placing fabric softener between liquid and powder detergents and moving toothbrushes from a top shelf to a shelf at eye level
both increased category sales. They also found that shelf position was more important than
the amount of shelf space allocated for a particular product. In earlier research, several authcirs studied shelf space elasticities, including Brown and Tucker (J961) and Curhan
(1974h) while fhilte7 and Naert ( 1988) and Bultez et al. (1989) studied space allocation using an attraction model to estimate brand interactions.
In this case, the tcrn1 wanted to test as many factors as possible, which made sense because mos I of the cost of the experiment relates to the number of stores included in the test
rather th,rn the number of factors. Aher brainstorming a wide range of new ideas, the team
identified the I 0 factors in Table /\8.1. For each factor, they selected two levels: the low or
minus level and the high or plus level. A number of factors related to the number and loeation of pockets and rack--. A pnckct is one slot in a magazine rack, holding a few copies of the
same magazine. A rack is the physical display with a few or many pockets. The main magazine rack in one aisle of the supermarket may take up all of the shelf space for 30 feet down
the aisle and hold 150 different magazines. A small countertop rack may have just two pockets holding one magazine.
A: Rack on Con/er in Produce Aisle
The tc,1m wanted to ,1tlr1ct c11stomers a-; they entered the store. Most supermarkets ,ire
designed so that customers begin shopping in the section where produce and other fresh
foods arc displayed. The team hJd a new, small rack created with just two pockets. The r<1ck
was designed to fit on top of a refrigerated case located in the center of the produce aisle.
The di.,play was easy to install and took up little floor space. The team anticipated th.it a
magazine display early in the shopping route would increase the likelihood of purchase.
B: Rack Location 011 Checkout Aisle
Two different mag.1zine racks were available at the checkout aisles: the end-cap racks that
customers see as they <lpproach checkout and the over-the-belt racks above the moving
grocery belt, usually with smaller-sized magazines. The team had in the past tried both locations, but never tested one against the other.
248
APPENDIX
CASI
249
Cdsh registers were programmed to register the discount when two or more copies were
pun.:hdsed.
]: On-Shelf Advertisement
!he final in store advertising factor the team tested was an on-shelf "billboard." These
small signs in plastic frames were attached to the edge of shelves so they stick out into the
ai'>lc. These on-,belf '>1g11s were placed in a few of the non magazine supermarket aisles.
TEST DESIGN
With I 0 f,1ctnrs, there .ire several altcrnc1tive designs that can be considered. One possihilitv
is ,1 ~2 run 2" ' lr.H.tlllll.tl faL101ial de.sign of rc'iolution !\'. In thi' design, main effects
.ire u1nl(1undcd \~1th ~ .ind 1 Llltor interal t1ons, whereas pairs of 2-Lictor mteractions .ire
u1nfounded with each other, C\LCpt for one contrast in whiLh four 2-factor interactions .ire
uinloundcd ror ,1 cl1sL lJ',..,1on of design resolution and lOnfounding patterns, sec Hcl\,
f funter, <lnd I !unter (2005). Assuming that the .~ and i factor interaction.'> arc negligible,
thi' design pro\ Ilk'- ck.1r L'o.,tim,lte<, of all main effects. !n addition, with proper labeling of
the foctors, it may he possible to anticipate which 2-factor interactions arc likely to be neg iigihlc and therch: c'timate 2-factor interactions as well. A second alternative is a 16 run
2 10 "fractional factorial design. Hut this design i' resolution !II, with main effects confounded \\ith 2 factor 1ntcract1 Hls.
1\ third alternative 1s to chome a Plackett-Burman design (Plackett and Hurman, I 946).
f'l.iLkctt Burm,111 dcs1gm ,ire ,1 Ll.io.,s nf orthogonal designs for factors with twn levels, with
the number of runs Na multiple of 4 (i.e., 4, 8, 12, 16, 20, and so on). If\[ 1s a power of 2
(i.e., 4, 8, 16, 32, 64, ... ), these designs coincide with the fractional factorial designs. fhe
orthogonal Plackett Burm,lll designs with N = 12, N ~ 20, N - 24 runs arc important in
practice because they result 111 uncorrelated estimates of main effects of a large number of
factors 1n very few runs. For 2 level (fractional) factorials the run size must be 4, 8, 16, 32,
and -,o forth. Thi-, lcaveo., large g.1ps in the run si1es. In our case, with 10 factors, a minimum
of 16 run.'> would he needed, while the next highest run size would be 32 runs, as noted
above
Orthogonality of the dcs1gn 1111plics that the main effect of one factor can he calculated 111
depcndcntly of the main effects of all others. The main effect of a factor is the difference he
tween the response averages at the high (plus) and low (minus) levels of that factor. Plackt'ttBurman designs have fairly complex confounding schemes. In contrast to the fractional
foctori.11 designs where main effects and interactions arc either not wnfounded or "fully
.diased," Plackctt-Bur111.1n designs leave main and interaction effects "partially aliased."
1
1'his means that the ah-,nlutc values of the alias coefficients are strictly less than one. Tht' lit
crature refer., to de,1g11s that lead to partial aliasing a-, nonrcgular designs; sec Wu and
I L1mada ( 2000 '.
The authors selected the 12 run rcAected Plackett-Burman design in Table A8.2, rnn
.,isling of .1 tot,il of21 runs. Thi-, design was chosen to increase resolution vvhilc minimizing
the number oftrc.1t111c11t comh1n.1tiom., or "tL''-t cells." The 12 run rcAcctcd test dcsign c.1n
include up to 11 factors. \-\'ith only IO factors, the I l th column, K, was simply left empty.
\Jotc that although factor columns may he left empty, test cells (rows in the matrix) mar not
be eliminated. In an empty column, the resulting effect is simply a measure of experimen-
250
APlJENDlX
TAlll 1'
A8 .2
<
"'
0::
~;:
'.5
.5
-"'
u
0"
c
0
-"'
u
OJ
..c:
c
c
.S
;;;
u
0
:-g
!:5"'
-5
i::
"'
c"
-~
>-c
0
"'
0::
-'6
0
Cl.
""'0"
t;
~
"tJ
0
0
u..
-"'
u
c
"'
VJ
;:::
"'
0
G
0
;:;
"'
"
.<'..
;:;
"
E
t::
"c
"'
II
>--
:J
-"'
:;;
v"
.c
c
"'
"
~
>
"
0
c;
et:
J)
+
+
+
+
~
-"'
u
..D
-J
"'"'
.g
"
"E
'...J
v
..D
"'
~
'lbtCclJ
v;
"'
Vi
":Ju
~
~
"'
-<
::l
c'2
9
c..
g.
'_,
c
~
-~
c:_
:;
;;:
;:;
OJ
"6
<
c....
o:;
c
C-
~
K
+
+
7
8
t-
9
llJ
II
+
+
+
+
+
+
+
+
+
+
t-
12
13
14
15
16
+
+
+
+
17
IH
I'!
2U
21
t-
22
t-
23
21
tal error and /or interactions. The removal of test cells, on the other hand, creates a
nonorthogonal tesl design destroying the independence uf the rnai 11 dT<.ch.
ln
.1
resolution JIJ Plackett-Burman design, each main effect is confounded with all
2-factor interactions th<Jt do not mclude the nrnin effect, but it is unconfounJed with 2-factor
interactions that incl udc it. Plackctt-13urrnan designs are nonn:gular designs with confounding (alias) coefficients strictly less than one in absolute value. ln the 12-rnn Plackt'tt-Burma11
design, for example, the alias coefficients are either + I /3 or - I /3.
A complete foldovcr (or "reflection") of a Plackett-Burman design, such as the one used
in Table A8.2, leads to a resolution JV design where mai11 effecb are unumfounded with a!J
2 factor interactions. The term reflection is used because 12 additional test cells are run with
every plus and minus switched, somewhat like holding a mirror up tu the original design.
Forexample,testcelllis:A+,B+,C-,D+,E+,F+,c; - ,11 ,I ,J-t,K~.!'orthcflrst
reflected test cell, 13, all signs arc reversed to become: A-, R-, C+, D-, f-, F-, G+, H +,
f-1 , f , Kt . Though main effects are independent of all 2-factor interactions, each 2-foctor interaction is confounded with many other 2-factor interactions. A reflected PlackettRurman design provides more accurate estimates of the main effects of a large number of
factors, hut it creates challenges in trying to identify significant 2-factnr interactions.
Reflected designs can show the presence of 2-factor interactions, but it is difficult to
quantify individual interactions. A significant difference between effects calculated from the
12 original Plackett-Rurman runs and effects from the 12 reflected runs is due to one or
more interactions, because the interactions switch signs from the original design to the reflection. However, the group of interactions confounded within each column cannot he
separated mathematically. Experience and general statistical principles, like effect heredity,
ca11 lead to the 'uhiecti\'C ,eJectio11 of likely i11teractions, but selective a11alyses can only offer clue'> to potential interactions. If important interactions seem to be present, the best
course of action is to run <l higher-resolution follow-up experiment where all interactions
can he clearly quantilicd.
Recausc of the time and cost of producing many test cells, reflected designs arc seldom
used in direct mail, print advertising, or even Internet applications. But for retail testing, additional test cells add little, if anv, further cost. Each store needs to be set up and monitored
i11dividu<1lly, -,o more stores require more effort, hut the number of u11iquc test cells docs
not m;ike a difference. The statistical benefits far outweigh the cost of implementatio11. The
onlv constraint is the number of test units available (i.e., the number of stores that can he
used for the te<;t ).
In th is case, a Li rger .l2-ru n fractional factorial design with less con founding would hcwe
heen preferable. But the company chose to limit the number of stores used in the test, so the
larger design was not possible.
DEFINITION OF KEY MFTRICS AND SELECTION OF TEST UNITS
J'he key metric for this test was unit sales. The team wanted to uncover any factors that increased the numhcr of lll<1ga1inl' copies <;old throughout the superm<irkets. After analy1ing
sales data, the team could then calculate profitability based on sales and the cost of each new
pocket, rack, and advertisement.
Unit sales were easily and reliably measured using scanner data from each store in the
test. However, unit sales were not directly comparable among stores because each store had
a different historical sales level. For example, a large supermarket may sell 100 copies per
week, while a small store sells only 50. These store-to-store differences would likely overshadow any differences due to the test factors. Therefore, sales data duri11g the test were
standardi1.ed hased on the historical sales volume of each store. The actual key metric was
the percent change in s,iles relative to the historical baseline: IOO(actual units sold - baseline units)/(baseline units).
Calculating the baseline sales level for each store can be complicated and potentially a
large source of error. If stores vary widely in sales levels, then they should not be grouped
together in the same test, because our confidence in a 10% change in sales is much different
tor d store that sells JO m.iga1.ines one week and 11 the next, as compared to a store selling
I 00 magazines one week and 110 the next.
252_1_
AJ"'I .'>l>I \
ln1t1aJly, the authors suggested a mirnmum of96 stores (or the test. With a resolution l\'
32 run fractional factorial design, this would have given three stores -three replicates per test cell. However, analyzmg lest costs, management set the limit at 50 stores, all from a
single supermarket chain. At this point, not wanting to risk having just a smglc.: store in ,omc
te'>t ct'!ls, the authors changed the test to the 12 run reflected design (described earlier) with
ju-,t two replicates in each of the 24 te-.t cells.
\'\'ith 24 unique combinations 1n the 12-run rellected test design, .it least 24 storL's must
be sekcted as test units for the experiment. Two or three t1111e-, the 1111ni111um number of
store-. is often better for three reasons:
I. Larger sample size. More stores offer greater sales volume per week,
be completed more quickly.
!>O
the test
Gl11
Varwbility 111ialys1s. \Vi thin Lei! vari.1t1on Lan he used a;. a lllL\l>ure of' e.\.peril11L'lltal
error or stores can be combined together lo rcduLc total l'<Iriahilitv.
J. ldentificatw11 of outliers. With three or mores stores per test cell, a store with sur
prisingly high or low sales can be 1dentihed, sLrutinizcd, and, if appropriate, eliminated from the analysis of lest results.
Selection of a Retail Partner
!'he hrst step is the '>cleLlio11 of.1 rl'tail d1a111th.it1s used fm the t,-.t. !11 tl11, ca-.e, the pub
li-,hcr '>clcLled a grm:cn chain known for its L'XCL'llcnt coupcr.1t1011 with prl'\ iou> lL'st L<llll
paig11 ... ,'l.1any of the Lhain's supcnn-irkct-, were loLatL'd in Lime proxii111t: h.1d . . rrong mag
a1111c -.ales, and had a fairly standard store lavout. There were '>trong rca;.om to npect th.it
the tnt IT'>Ulh would transfer to other Lh<llll'>.
1\ltcr approving all tc'it factors, the retail partner agrLcd to run thl lc'>l dnd -,harL' sc.1n
11er .,,iJes data for all stores during the cour'ie of the test. The d1ain's m.111agemc11t team was
'>Uf'i' <>rti\'l' and ,1greed to arrange lllL'l'llllg-. 11ith '>ture 111.111.1 gL'I'> '><>th.it the tL',llll u1uld L'\
pl.1i11 and man.1gc tl1l' execution ol the k>t.
Analysis of Available Stores and Selection/ Matching of Test Units
l he grocery chain had nearly 100 stores from which the team could select the f111al 48.
The lirst step in this selection was to analyze the past sales of .ill stores and eliminate out liers. '-:cw stores, highly seasonal locations, and stores with dramal1L rates otgrowlh (or de cline ) were eliminated first. Then stores with Jm\ sales volun1es were re1m>ved.
< nntrol charts (individuals, X, .ind moving range, Ml~. chMh) ofwcLkl) s.ilcs d.it.i were
created for all rerna1n1ng '>lores plnttlllg unit sale'> per week <Ind adding Lontrol l1n11h to
quantify variability and identify speual causes (sec Montgornerv, 201)!, for a d1scu-,s1011 of
control charts for individual meusurcments).
The authors selected 48 stores with high sales volumes, low 1ariability, and stable sales
over time . The next step was simply to match smaller stores with larger stores so that the average '>ales volume per test cell would be relatively comtJnl.
The fu13\ baselme sales numbers were (clkubted aftn store;. were 111c1td1ed and plaLed in
each test cell. Each pair of stores was considered one lc'>I unit, and 24 new Lontrol Lharts
were created. All pair> were simil<lr in show111g minimal .,ale, grov. th ()\'er the pre\ iou-, k\1
,J
____
<AS!
2>J
weeks, with avcr.igc s,ile.s consistent with the long-term average over the last few month.s. i\
couple of special 1..auses-ident1tlahle sources of variation-affected previous weeks. 1\
holid<1y 6 weeks before caused a large jump in sales, and a special issue of the magazine before that also caused a shift in sales. Therefore, average sales over the previous 5 weeks were
selected as a baseline for the test.
The average sale over the 5 previous weeks for each two-store test unit was selected because it gave a valid, easily understood baseline for comparison. More complex options
could have been used instead of the 5-week average. For example, a regression model based
on past performance-including seasonality and growth rates-could have been used to
predict future sales. Covariates could be added to the model based on information about
mm pet itor pricing, pm motions, and special offers. With sufficient and accurate data, a
regression model may work well. However, historical results do not always predict future
performance, and numerous predictor variables can potentially create additional sources of
error. Also, in this case competitive data were not available, and recent sales were fairly consistent among all test stores. Therefore, the 5-week average sales level gave a clear and simple
method for standardi1.ing all test units without undue complexity or potential error.
Minimizing and Measuring Experimental Error
Since two stores were combined into one test unit, store-to-store differences were not
med as measure of experimental error. To get greater consistency among test units, large
stores were llldtched with sm;ill stores, potentially creating higher within-test cell variation.
Therefore, week-to-week variation of each pair of stores over time was used to calculate experimental error for the test. With the same combination of factors run in the same stores
over a number of weeks, the weekly difference in sales paralleled the natural market variati,on. bch additional week provided an additional replicate for each test cell.
Sample Size
The power of the test was determined by the overall sample size. This number depended
llll the number of the storL'S, plus how long the test was to run. Once the number of stll1cs
wa .s set, !Ill' only W<l)' tll llht;1in more datil and increase power was to run lhc test for a lo11gn
period of time. Sample size is an important issue in planning the testing schedule. Company
executives were concerned ;1hout the cost of testing, while the team wanted to run the test
long enough to identify small effects.
Sample size calculations require a reliable estimate of the variance of the key metric, in
this case the percent change in sales. This variance must come from the test units as defined
for the test. In this case, the important number was the variance in weekly sales for each pair
of stores used for each test cell. An estimate of the variance was obtained by pooling the information from the control charts of all 24 pairs of stores. An average of 125 copies of the
magazine was sold in each pair of stores every week, with a standard deviation of about
12 copies, or I 0%.
The authors recommended running the test for at least 5 weeks, assuming the standard
deviation during the test would remain at I 0%. With a total sample size of l 20 test units ( 24
pairs of stores X 5 weeks ) and standard deviation of l 0%, the team would have an 80%
chance of detecting any factors that impacted sales by 5% or more.
254
APPEN_D_1x_ _ _ _ _ _ _ _ _ _ _ _ __ __
_ _ __ _ __
_ _ _ _ __
The overall sample size in factorial-type experiments where factors are changed simultaneously has a different meaning from the sample size in the experiments that test one variable at a time. With one-variable tests, each individual comparison represents a separate statistical test requiring a certain sample size. A test of one factor alone would require the same
120 test units recommended for this 10-factor Plackett-Burman design, or a total of50 weeks
for a series of 10, one-variable tests within the same 48 stores.
As the launch date approached, company executives felt the need tu speed up the project
and reduce costs, so they limited the test to just 4 weeks. Then, just berure the test began,
further delays reduced the run length to only 2 weeks.
TEST RESULTS
The 12-run reflected Plackett-Burman test matrix and the resulting percent changes for
weeks l and 2 are shown in Table A8.3.
The main effects are obtained by applying the plus and minus signs in the design col umns tu the averages in the last column of Table A8.3, and dil'iding the rl'sulting sum bv I 2
(the. number of plus signs). Alternatively, one can regress thl' response ( thl' averages in the
last column of Table AS.3) on the design vectors. The only differencl' with the rl'grl'ssion is
thl' deltnition of the effects, which arl' cul in half when using regrLssiun.
We treat the changes in weeks I and 2 as independent rcplil<Jtiom and ctlculatc for each
run ,in estimate of the variance of individual measurements. For exa111ple, for the lirst run,
the variance estimate is [ ( J 2.5 - I 7 .9 )' + (23.3 - 17 .9 )' Ill = 58.32. We pool the 24
variances to obtain an overall estimates ' = 93. 75. The variance of cJch run average (aver age for weeks l and 2) that goes into the main effects calculation is given by s2/2. The vari ance of an effect is var(effect) = 2(s 2/2)/12 = s 2/l2 - 93.75 / 12 -= 7.81,andthcstamfard
error is vi.81 = 2.79. El1ects that MC larger than 1.96 times the standard error (5.47) arc
considered significant. The effects arc displayed graphically 1n Figure A8. 1.
Three effects are statistically significant:
_J
FINAL COMMENTS
In ~8 stores over 2 weeks, the team learned more than they couicl havl' learned from months
of testing one variable at a time. Two new rack locations were the significant winners among
the four factors that related to the number of pockets and 1tllation or racb. rllt' team
avoided unnecessary operating costs ,1fter all five 111 -storc .1dvert1s1ng l.1L1or-, showed no
effect. Finally, the common perception that redistributing copies was a worthwhile invest
ment proved to be a significant 1111slonceptio11.
ThL focus of this case study 1s on increasing magazine -,,tie-, 111 ,1 retail .,ett111g, but the
methods we have presented and disrnssed apply lo retail produlls 1n general. l11 te'>l1ng rnch
produLls, decision makers are interested in a range of factors, includ111g pme, pack,1gl' de
s1g11, location, and advertising. The experimental design methodology dl'sLnhed here L.111 be
USL'd lo test speLiflL options, for ex.1111plc, one p.tLk.tge de"g11 'ersus lllllther. or it L,In tu
cu-, 011 providing more general insights about the etkLtivene.,-, ol f.t(tors suLh .i.s .1dvert"
ing ,111d product location. More generally, the expenment,d design appro.1d1 has applica
bility to a wide range ol problems involving '>ervice operatio1h .111d 111.irkcting progra111s.
'J hesl' statistical tools offer an efficient methodology lor future studies ,11111ed at improving
the qu,1lity and effectiveness of service systems.
QUESTIONS
Exercise I in Chapter 6
~SH
AP
l N1>1x
_ _ _ _ _ _ _ __
lrated them with d hniothetical examle concerning the effell of advl'rti;,111g and olhl'r fav
tor-, 011 the ~ales of candy bars. Wilk111so11, Wason, and Pabo: ( 1982) de-.cribed a faLLonal
t'Xpenment for assl'ssmg the impact of pnce, promotion, ,md di-,play on the sales of selected
itelll'> al Piggly Wiggly grocery store;,.
Although the market testing literature 1s sparse on the ust ofexcnmental design 111od
eb \\Ith many factors, I- or 2-factor exenments have been common. hir example, Lod1sh
et al. (I 995a) analy1ed the results of389 televi..,ion adverfoing experiml'llh to dl'lerminc the
cffrct of advertising on sales. Their data set mduded three types of tests: comparing two
di f'ferent versions of advertising copy, comparing two different levels of exposure, dnd test
ing rnpy and exposure simultaneously using a factorial design. In a related paper, Lodish
et al. ( l 995b) examined the carryover effect of television-advertising exposure by tracking
sales for an additional two years beyond the original one-year test period.
I actorial and fractional factorial designs are well known and have been widelv used 111
beha1 ioral marketing experiment'.> in laboratorv 'lettings hee e.g. )dffe, Jamieson, and
Berger, 1992, Srivastava and Lurie, 2004, and Ettenson and \,\agner, I 986) as well as 111 con
joint analysis applicatiom. Green, Krieger, and Wind (200 I 1 described a credit-card study
that.illustrates how fractional factorial designs may be used 1n con1oint dnalvsis. Their design consisted of 12 attributes relatu1g to potential credit c,1rd sen ices, each havmg two tu
six le1els. ]or example, annual price (six alternatives), retail purchase 1nsuranu' (no, yes),
rental car insurance (no, yes), and ,1irport dub admission (no ad1111-,'>!on, $)kl' per visit,
$2 kl' per visit). L1sing a fraction,d facto11al dc-,ign, 64 profdn wnl' Lll'dll'd uut of' ,1 tut<1l
of I86,624 possible attribute-level combmdt1om. The 64 profiles l\t:re partitioned in tu
"blocks" of eight profiles each, with all profiles in a given block being pre-.ented to each rl'spondent. [;or each profile of credit card services, the respondent ,,,Is asked to 1nd1G1te the
likelilwod of purchase on d 0 - I 00 point scale. I his blockl'd tract10nal design provided 111
dl']1c11dent (uncorrelated) estimates of main effects.
(; n:en, Carroll, and Carmone ( I ')78) prol'1dcd an L'xcelle111 uven iew and disc uss1011 of'
the kl'\ elements 111 frdctional fatton,d design-., while (,rt'l'll ,1nd Sri1111,1-.,111 ( 1978 l'J9()\
<1nd (; reen, Kriegn, and \,\'ind (200 I) prlll ided 11ot<1ble l'L'I 1c1,-. ol t hL l'.\ lL'll'>I\ L' I1tn,1t u 1L'
on tonjoint analysis. Bradlow (2005) d1stus-,ed current l">UL'., 111 urn101nt ,malvsl'> and the
need for future research; Wittink and Catt1n ( 1989) and \'\'i11111k, Vnem, and Hurhenm
( 1994 I documented the w1despn:aJ Lo111111erual use of con10111t models. Although l,rl'L'll,
Cai-roll, and C:.irmone ( 1978) brictlv discussed Plackett Burman dcsigm, we found no p.1per., that used these designs in co111oint and discrete choitc models.
()ur Plackett Burman design i-, ,1 ma1n effcLl.'> model that. <I'> \H' 1,ill show, ma: prn11dc
evidence of likely 2-factor interaLl1011s under some circurnstdnccs. I he fractional tk'>lgm
used in conjoint analysis are tvpic,dly main effects models ,1s well, L.onfoundlllg m,1111 dfrth
and 2 factor interactions. Carmone and Creen ( l 981) showed how selected 2-factor interactions can be included in fractional main effects designs. Plackett-Burman and fractional
factorial models are orthogonal designs, which means that effects arc l''>timated independ
cntly and with 111in11num variance. Orthogonal designs may be prohibitively large in situat10ns with many factors, including $Orne at more than two 11.:vcb, and 1n cases where interactions are important. !:-or these lircumstances, nonorthogunal designs are availahk and
mav be generated using statistical software. Kuhfcld, Tobias, and (;arratt ( 1994) discussed
such nonorthogonal designs and their use in conjoint and discrete thoicc '>tudics.
''' j
t ASE
259
Our review of the literature shows that fractional designs and rel<ited orthogonal dL''>igns
have been u<>ed exten-,ively in conjoint and discrete choice studies. As we have noted, there
hdve ,1Jso been a few papers on market tests involving relatively few factors that use faL torial
or fractional factorial des1gm. However, it has been our experience that until rcccntlv the
gre.1t maioritv of market testing practitioners relied on the trnditional 1pproach of te.'1ing
one l.ictor at a t11nc. 111 this c.1-,c we show the benefits of statistical met Inds that simult.rncouslv test many fodor'> and also demonstrate the usefulness of Plackett-Burman designs, an
1mport,rnt class of e.\pcriml'lltal dt's1gn models.
THE EXPERIMENT
The Factors
rhe firm's marketing group regularly mailed out credit-card offers and wanted to hnd
nC\\ \\avs of increasing the eftl'ctivcness of its direct mail progrnm. The 19 factors shown in
Table t\9.1 were thought to influence a customer's decision to sign up for the advcrtist'd
product. factor'> 11 F- were approaches aimed at getting more people to look inside the envelope, 1vhile the rem.1ining factors related to the offer msidc. Factor C: (sticker) refers to the
peel-off sticker ,11 the top of the letter to be applied by the customer to the order form. The
firm\ marketing staffhelicved that a -;ticker increase<> involvement and is likely to JllLrcase
the number of orders. hie tor N (product sclcllion) refers to the number of different lrcdit
c.ird im,1ges that .1 Lw,tomcr could chose from, while the term "huckslip" (factors Q .rnd R)
de-,Lrihc-, ,1 -.m.1'1 '>cp.iratl' -;heel ofpapn th.11 highlights product information.
A Plackett-Burman Design for 19 Factors
\\'1th '>O manv f".ill<lr,, we chose a 2-leYcl design. Bv doing so, we uiuld keep the numlwr
of n1n' rclati1-ch 101' and ,t\'01d more complic.1tcd and possihlv nonorthogonal tJL''1gns.
J"wo-lcvcl screen111g designs <1rc common 111 the field of experimental design; sec Box,
TAR11/\9.I
/) J'oq.igc
{ 1\cld1t1onal graphic on envelope
/ l'ncc gr.1ph1c nn kiter
(, ~ticker
/ (_ontrol
'
( .cncral offer
Blind
Yes
l'rcpnnted
Yes
Small
Yes
(+)New Idea
Producl specific oiler
Add company name
"'"
Stamp
No
l..1rge
No
'\Jo
I Copv me,,age
I I etler headline
K I.isl of benefit'
l'os1'cnpt on let I er
.\1 Signal u re
;\' Product ,e1ecl1nn
() Value of free gitt
I' Reply envelope
Q lnforrnal<on on hucblip
II ~econd huck,lq1
.s I ntcrcsJ r;ilc
Targe1ed
(~cnenL
Headline I
Standard layout
Control version
/\1anagcr
Many
I I1gh
Control
Product info
I leadline 2
C.rca11vc layoul
!'\cw pmhcnpt
Senior executive
rcw
No
I.ow
Low
'\Jew siyle
hee gift 111fo
Ye.,
High
~h'-JIJIX
260
---
----
--
TI\
81 I'
A 9. 2
"0Cl.
"p,
0
a;
c;
>
c:
>
c:
w
c:
c:
0
Cl.
~"
"'0Cl.
"'O
"'O
<
E
~
..,<
.5
OJ
c;
:.cCl.
0"'
OJ
c
<U
c:;
'5
v ?
Ii
])
;..
Test
Cell
~
~
0
u
::l
OJ)
<
Cl.
0
.....l
v
.....l
:.cCl.
"'
.~
[".
CJ
<U
-"'
.::
OJ
c:
ct
V)
c;
1-f
+
+
+
+
0
u
.::"
Cl.
v;
.::"
""' =a
'2
"'
2. :r:"
b!J
<U
;>..
Cl.
~<U
-'
cE
"c
<U
co
c
~
I\
.9
"
.....l
c:
.-
'iii
"-
"
o:;
::l
c:
Vl
u
:l
"'O
"~
._
"
:l
Vi
,\!/
\'
()
"
Cl.
0
c
0
Cl.
""
.
o:;
>
c
"'
"'O
0.
CJ
0
,_,
r::t:
..':
VJ
}'
()
/(
::l
co
c:
"
CJ
'iii
sc
19
1.04%
\8
(). lbu,.11
12
0.8!l!1u
2.f.t-;U;(J
[IJ.j
2.UK' .,
1.20%
1.22'),o
(,IJ
ill
t-
+
+
fr~
S7
\()
12
15
16
17
18
'i2
1.\4
l3
14
R.rtv
( JrdL'I >
7
9
10
II
2
0..
b!J
::l
cQ
<t
5
6
"'O
;>..
Re~pun:,L'
+
3
9
v
llJK
\9
40
49
\} ()(JU/U
]b!JI()
U.7WJ10
0.80%
'N
0.98%
0.7.:JOto
1 .9810
8(1
4.l
0.8(1%
47
0.9.fl~(J
IUl
2.o~u,n
\,
I ~(110
l. J 4lVu
20
1.7 } 01ii
I lunter, and Hunter (2005). Our philo . . ophy in testing many factor-,, each at twu leveh, was
to identify which factors were active-that is, which factors had a significant effect on Lile
response. Once these active factors were identified, it would be possible (if needed) to test
each of them at more than two levels whik still maintaining an orthogonal design.
With 19 factors, we created the 20-run Plackett-Burman muin efjtcts design shown in
Table A9.2. Plackett-Burman designs are orthogonal designs for factors that have two levels
each, with the number of runs N given by a multiple of 4 (sec Plackett and Burman, 1946).
For 2-level fractional factorials, the run size N must be a power of 2, leaving large gaps in
the rLIIl sizes. For example, a minimum of 32 rum i::i required in a frat..tional factorial design
involving 19 factors. The Plackett-Burman design, on the other hand, can study 19 factors
in just 20 runs. This is why these designs are useful in situations where the number of runs
is critical.
I11 a Plackett-Burman design each pair of factors (columns) is orthogonal, which by def1nitio11 means that each of the four factor-level cumbinations I(
), ( + ), ( t ), ( + +)I
appe<1rs in the same number of rum. In the 20-run design (Table NJ.2), fur every pair of
columns, each of the four combinations appears tive time>. As a LUJ1scqucnce ul urthogo-
--
_J
-- -
--
-- -
---
TA
~--
CASE 9
-- - - -
n 1.1- A 9. _'\
A Frartional !'r1rtnrial Drsign with c;rnemtars E - AR, F = AC, G - AD, fi - RC, I = H/J,
J
('/J,
I\
R.t1n
/)
/-
(~
11
+
+
13
14
15
!Ii
+
+
+
+
+
+
+
+
+
-----
+
+
+
+
+
()
7
H
9
10
11
12
n
+
+
+
+
~
:v
--
+
-
+
+
+
+
+
+
+
------
+
----
nality, the main effect of one factor can be calculated independently of the main effect of all
others. Plackett and Rurman showed that the complete design can be generated from the
first row of + 's and - 's. l n Ta hie A9.2, the last entry in row l ( - ) is placed in the first po<;ition of row 2. The other rntries in row l fill in the remainder of row 2, by each moving
one position to the right. The third row is generated from the second row using the same
method, and the process continues until the next to the last row is filled in. A row of - 'sis
then added to complete the design.
In what follows, we will assume that 3-factor and higher-order interactions are negl.igible
and can therefore be ignored. The main effect of a factor is the difference between the response averages at the high (plus) and low (minus) levels of that factor. Both fractional factorial designs and Plackett-Burman designs are orthogonal, but the natures of their confounding pattern-, differ. Consider a fractional factorial design in which main effects arc
confounded with 2-factor interactions; for example, the saturated design for lS factors in
16 runs shown in T,1hlc A9 ..1. The design matrix is constructed by first writing columns of
signs for a full factorial design in four factors (columns A-D). The signs for the remaining
columns are determined from 11 generators that use all interaction columns in the fllll fac torial design. For example, consider the generator K = ARC. Multiplying the signs in column'> A, fl, and C, row by row, results in the column of signs for factor K. There are 15 main
effects and 105 2-foctor interactions (15 1/2!13!). Each interaction belongs to a single set of
seven 2-factor interactions, and each main effect is confounded with one of these sets. For
example, we flnd that A is confounded with BF, CF, DG, HK, IL, JM, and NO. The factor A
does not appear as a letter in any of the seven interactions, and no two interactions include
the same factor. The column of signs for factor A is identical to the column of signs for each
of the 2-factor interactions that are confounded with the main effect of A. Hence there is
perfect correlation (p = 1) between the column of signs of A and the column of signs for
each of its confounded 2-factor interactions. For example, multiplying the signs in columns
262
A p I' EN ll 1 x
Band I: row by row to obtain a column representing the Hl: interaction results in a column
o( sigm that is identical to the column of signs for factor A. Because of this perfect corrcla
tion, estimating the main effect by takmg the difference between the response averages ,11 the
h 1gh (pl us) and low (min us) levels of a particular factor actually g1 ves an estimate of the
main effect of that fador plu:; the sum of the seven 2-factor interactions that MC rnnfounded
with that main effect. If all of these interactions are negligible, then the result will be a clear
estimate of the mam effect. If one or more of the interactions are s1gnlficantlv different from
zero, the estimate of the main effect will be biased. The books by Berger and Maurer (2002),
Box, 11unter, and Hunter (1978), and Ledoltcr and Burrill ( 1999) discuss lracl1011al facto
rial designs, confounding, and the analysis of experimental rL">Ults.
l'iaLkett-Burman designs have nHHC c:omplex urnfound1ng patterns. Lach main etle..:t is
confounded with ,di 2 factor interact1om cxcqll those that 1n\oll'c tlut 111a1n effell In 0111
19 fall or de-,1gn 1n 1able t\9.2, the 111~1111 ctfell lor L'<tch fall or 1s u>11i1HlnLkd 111th ,di 2 lac
tm interactions involving the other 18 factors for a total of I 5 i in tnaLl ions 18 ~/ 2 ~ I 6 1J. ll u t
in LOlltrast to the fracl1onal factori,d design shown in L1ble t\Y.J, the u1lu111n of s1gm for
each main dkct 1s nut identical to the colurnn of-,igns lor each ol its uintuundcd 2 t.1Llo1
1nln<1d1011s. Although not idrnt1cal and thus not pcrlcc:tlv rnrrclated, these rnlu111m of
s1gm c1re correlated. That is, the wrrclation between the s1gm in a 111,iin effect rnlumn and
the signs in each 2 factor interaction column that is confounded with that 111,1i11 eff\.ll ts
strictlv less than I in absolute value (Ip!< 1). As a consequrnLe, it can he shown (sec Chapter 6) that estimating the main effect of a part1rnlar factor by taking the diffLrence between
the high (plus) and low (minus) levels for that factor actu,dlv provides an estimate of the
main effect plus the weighted sum of the 2-factor interactions that arc confounded with that
m,11n dfect.
The weight associated with each 2-factor interaction i-. the correlat10n between th,1t
2 faLlor interaction and the main effect; see Barrentine ( l Y96) for a d1'>1..uss10n of the struc
ture of confounding patterns in Plackett Burman designs. Lnumcr.iting all corrcl,1tions
among factor columns and interaction columns reveals that, for the 20-run PlackettBurman design 111 Table A9.2, the weight\ (corrclc1tions) are either 0.2, + 0.2 or 0.6. Of
the I :'d interactions confounded with each ma111 effect, 144 h.ivc weights of 0.2 or +0.2
whik 9 have weights of 0.6. /\. particular 2-fallor interaction will appear in the confounding pattern of 17 main effects. 1--or 16 of these main effects, the weight a<,'>oci,1tcd with
this interaction will be 0.2 or t 0.2, while for a single main dfrct, the weight assouated
with this interaction will be 0.6. !or example, consider the main effect of factor N and the
<;(;' 111tcract1on. V\e use +I and
I lo represent the column signs ,1nd mult1plv the entries
in c:olumns sand G lo obtain the entries Ill column sc;. \Nnting each LOiumn ,lS ,l rm\ to
save '>pace and listing the run numbers above the entries, we obtain
Rull
( ol u 1n11 I<
l l1l11111n \'(,
- t
. I
'I
I0
II
, I
I
I
12
I
I
I
Ii
14
t'
I
I
' I
I
I
Ib
I
I/
18
I
I
+- I
I
I 'I
20
Both colunrns hal'c I 0 plus s1gm and I 0 minus -,1gm, .ind the c11t11L'' in ea1..h rnlun111 .idd to
1no. lurthermore, the sum of the '>quarcs of the l'ntril's 111 l'ach colun111 is 20, the numh . :r
of runs N. The column> are correlated. In 4 of the 20 rum the -,ign; Ill.itch, \,hill' 111 lb runs
thl' signs are opposite. J'he correlation hctwl'cn these two mean 1no columns (call them.\
and.::) is given by
_________________________
c_A_s~_._9_~ _
fl
L-'Z.
\ 2: x
12
2: z-
20
0.6
T-or s1mplicitv, suppose a single 2 factor interaction confounded with a particular main ef-
fect is important. A total of 17 main effects will he confounded with that interaction. For
each l>f these m::iin effects, taking the difference between the high (plus) and low (minus)
Incl-. liir that f.1Ctor prmidl'' .rn estimate of the main effect plus a time'> the magnitude of
the rnnfounded 2-f.JL tor 1ntoaction. 1\s noted previously, for 16 main cfkcts the fraL't1on a
will he 0.2 or 0.2 .. rnd the hias in nur estimate of each m;1in effect will he relatively '>Ill.ill:
plu.. or minus 0.2 time., the magnitude of the 111ter,lltion. for a single main effect, n "ill he
O.h ,rnd the h1.1s 1,jJJ lw O.h t1111L'S the m.1g11itudc of the interaction.
< 1ill'll the uim~)ll'\ confnund1ng pattern<. of l'lackctt-Burman designs, it may 'L'L'lll dt
fw.t gl.1nLC !hat thc1 11<1tild not prmidc ;rnv u.,eful 1nformation about 2 factor interaL 11nm.
In I.ill, lr.id1!1on.ilh' 1hn h.11L' hL'en u,ed ,1, main efti.ch design.,. \1orc recently, ho11-e1L'r,
i'lad.ctt Hurm.rn dcs1gm h.ivL' received much greater attention from researchers heca11sL' of
wh.il I-lo\, lluntcr, .rnd l luntcr (2005) call "their remarkable projective properties." In analy1ing the resuJu, of our experiment in the remainder of this case, we will discuss these proicctil'c properties .111d show lw1v they can he used in certain circumstdlllt'S to estim.1k one
or more 2 factor interactions from the results of a Plackett-Burman experiment.
The Results
I he fou1' oft he npcnmLlll wa' nn inne.i-,ing response rate: the fraction of people who
respond to the nffi.r. r\ large mailing list of potential customers was available for the test.
r he overall 'ample ,i;e (the number of people to receive test mailings) was determined .1ccord1ng to sl.itist1lal .rnd marketing cons1der;1tions. The chief marketing executive \\,Jnled
to limit the number of name-. 111 order to minimi1e the cost oftest mailings that performed
worse than the control (especially when testing a higher interest rate) and also to reduce
postage costs. Of the 500,000 packages that were mailed, 400,000 names received the control mailing (th.it was run in p.irallel to the test) while 100,000 were used for the test itself.
I hercfore, e.1d1 of the 20 test cells in Table /\9.2 was sent to 5,000 people, resulting in the
response ratL:s listed in the Ja-,t column.
I nr cad1 f.1ctor in the experiment, 50,000 people received a mailing with the factor at the
plus level and 50,000 people received a mailing with the factor at the minus level. Each main
effect is obtained by comparing average responses from these two independent samples of
50,000 each. Kecause the design is orthogonal, the same 100,000 people are used to obtain
independent estim.ites of each main effect.
The marketing team regularly used 25,000 50,000 names for each split-run test, '-.O the
sample -.i1c of I 00,000 for the designed experiment wa:-. not much different from what h.1d
hecn dnne in the p.1sl !'own l.Jlculations ming the M1n1tab software convmccd the a11tlwr<,
th.11 this s.1111plc 'i1c was l.irgc enough tn detect meaningful differrnce-.. Determining the
st.1ti-.tiL.d sig111fic,111LL' nf ead1 main effect is equivalent to the sta11d,1rd statistical lc't fur
,om1).iring two independent s,1111ple proportions, 111 this L.ase of s11e 50,000 each. The firm
estimated an average response rate of I% and wanted to be quite confident of dctell1ng a
change of0.2o (either ,lfl in, rease from I <Yci to 1.2/ci or a decrease from I 0;,, to 0.8%). r\t the
:; 010 significance level with a s;11nple of 100,000, the detection probability (statistical power)
164
A I' I' EN lJ I X
\:Interest rate
0 .. 1.1h
opy message
0.192
I I L'lter headline
/ l'ricegraph1c '....,;;~
I. I etter postscript . 1
I/ l'er">llali1.at1on
I' l!eply envelope
<J- \.due ol I ree
0.128
;:rm
0.116
0.104
0.096
gilt
+0.092
-0.092
0.088
0.080
H: l!eturn addrns
-+0.076
0.064
+0.064
0.052
0.0
0.1
0.2
0.3
Etfrct
Figure A9.l
0.5
0.4
111
I
IJ.6
I
0.7
T
0.8
0.9
1.0
percent.ige points
was found to be 0.86 for a change to 1.:2% and 0.92 for <t change tu 0.81i1. Thu-,, with ,1
sa111pk size of 100.000, the authors and marketing team were Lonhdent of being able to detell very small yet economically meaning!ul differences.
'I esting all factors .,1multaneouslv has J,1rge sa111pk-sue aLh,tntages L0111p,tred to testing
e<tch of the 19 factors one at a time '>up pose we kept the total -,.1111pk s11e ,1t I 00,000. !"hen
a sample of 5,263 persons would be used for each of the 19 te-,h of one [,1ctor at a t11ne. Be
caU'>L' the control package was already be111g sent to 400,()(J() people, the group of S,26 ~
people would receive a mailing of the control with one factor changed. The two -,ample
proportions would then be compMed to determine thl' effeLl of Lh,111gi11g th,1t one L1ctm.
Us111g Minitab, we <.:Jkulatcd the power of '>Lich a tc-,t u-,1ng the a">Sumptions just de
scribed-a 5% significance level and an average respome r<tll' of 1%. !"he st<tti'>tical power
(detection probabilit:) is 0.32 for a Lh,mgc to 1.210 and 0.29 Im a ch.111gc to 0.8"'i1, comp.ired
to O.K6 and 0.92 (rcspell1vely) fr1r the Plackctt-Hurman dc-,1g11 c,dculated previou-,ly. r\g,11n
using ,\I initab, we found that to obtain the same statistic1l power a-, the Plackett Burman
design would require a sample size ot about 25,000 for each of the 19 tests of one faLtor at a
time; this would yield .i total sample size ol 4 75,000 people, ,111 inue<tse of 375,000 persons .
r
CASE
26S
The sign of each effect shows which level is hetter: for positive effects, the"+" level increases
respome; for negative effects, the" - "level decreases response.
Significance of the effects was determined by comparing the estimated effects with their
standard errors. The result of each experimental run is the proportion of customers who respond to the offer. Each proportion is an average of n = 5,000 individual binary responses;
its standard deviation is given by a = ~(1 - 7r)ln, where 7T is the underlying true proportion. Each estimated effect is the difference of two averages of N/2 = J0 such proportions. Hence its standard deviation is
Std!Jcv(effect)
2-;{l _- ;) -
2 7r(l ---;)
~ J7T(l
= \14/N -
- 7T)
N
n
n
J N n
Replacing the unknown proportion 7T by the overall success proportion (averaged n1Tr all
runsandsamples),p = (#Purchascs)!(nN) = 1,298/100,000 = 0.01298,leadstothestandard error of an estimated effect,
- ,0.01298)(0.98702)
Std Frror( effect) - V4/20
- - -0.00072
5,000
The standard error i' 0.072 if effects nre expressed in percentage terms. Significance ( di the
5% level ) is determined hy comparing the estimated effect with 1.% times its standard er ror, :+- !.%(0.072) - :+:: 0.141. The dilshed line in Figure A9.J sepilrates significant and insign ifie ant effects.
The following five faLtnrs hc1d a significant effect on the response rate.
S- nr I ow lntrrcst l?ate
Increasing the credit -card interest rate reduces the response hy 0.864 percentage points.
Jn addition, it vv;isvcrvcicM based on the firm's financial models that the gain from the higher
rate would be much less than the loss due to the decrease in the numher of customers.
G- or Sticker
The sticker((; - ) increases the response by 0.556 percentage points, resulting in
much greater thdn the cost of the sticker.
<l
g,1in
R- or No Second Buckslip
A main effect interpretation shows that adding another buckslip reduces the number of
buyers by 0.304 percentage points. One explanation offered for this surprising result was
that the buckslip added unnecessary information and obscured the simple "buy now" offer.
A more compelling explanation, which we discuss in the next section, is that the significant
effect is due not to the main effect of factor R but rather to an interaction between two other
factor'>.
I+ or Generic Copy Messaf(e
The targeted message (I - ) emphasized that a person could choose a credit card design
that reflected his or her interests, while the generic message (I+) focused on the value of the
offer. The creative team was certilin that appealing to a person's interests would increase the
response, but they were wrong. The generic message increased the response by 0.296 percentage points.
j- or Letter Headline# I
'I he result showed that all "good" headlines were not equal. The best wordillg 11lcreased
the res~mnse bv 0.192 percentage points.
The response rate from the 400,000 control mailings was 2.1 %, while the average re
some for the test was 1.298%. The predicted response rate for the implied hest -.trateg1,
starting with the overall average and adding half of each significant effect, amounted to
2.40%. This reresented a 15% predicted increase over the respo11'e rate of the "uJ11trol."
Further Analysis of the Results
!'he confounding of main effects and interactions introduces some unLertainty into our
interpretation ot the results. A straightforward approach for ohtaining unconfounded main
effcLh is a "foldovLr" of the original Plackett Burman design. Jn sucl1 ,1 foldovcr design, thL'
20-run Plackett-Burman design would be augmented by an additional 20 runs in which the
sigm of each of the 19 design columns arc switched. The combination of a Plackett-Burman
design and its complete foldover creates a design in which main dfrch ,1re no longer rnn
fou1ided with 2-fallor interactions. In our experiment, a foldover was not carried out (with
40 runs it would have greatly increased the operational compkxit1 ot the mailing), and we
cannot be certa111 which combinations of mJ111 effects and 111te11ct10m Jre responsible for
the significant estimates in 1-igure J\9.1.
The use of our Plackett-Burman design is supported bv rn1p1rrL,tl C\PL'rimcntal design
prrnuplcs. Effect sparsitv (Box and ~lever, 1986) means that the number of important t:I
fells rs typically small; hierarchic,tl ordering means that imp< rtant intcraLl1om are usually
k1HT 111 number, <Ind 'mailer in m,1gn1tudc, th,rn ni,un cfkLts '\\'u ,lfld I L1111,1d,1, 2tHHl . 111
add 1t1011, on the b<hlS ol effect he red it v ( 11 a m,1da ,u1d \\'u, 1':ILJ2 ) the p1111u pie that srn 1I1
cant 111teractions arc likely to inrnlvc factor-, with sig11if1L,lllt main cffl'Lt., it ts po-,sd1le 111
solllL' lircumst,111Les to 1dentifv likely 2 faLto1 111teraLt1011-,.
I a Lt ors S ( i11tnc-,t rate) and(, (presence o! a sticker) ,ire bv f,1r the l,1rge'>l cf!ects 111 hg
urc .\':I. I. The rnrrdation between the main effect of R (scrnnd bucksl1p) and the SG 111teract1011 is 0.6. Hence, a significant SG interaction would bias the estimate of the main cf
feet of R by 0.6 times the value of the 1nteractron. rh1s suggcsb thc1t 11 may not he the main
effect of factor R th<ll is important, but the 2-factor interaction betwern Sand G. I his in terpretation is supported bv the prinLiple of cffeLl heredity, s111ce the main effects of'> and
G Ml' the most important factors. As one might expect, at the high interest rate the ctkLl of
having a sticker i-, small (a change from 0.776"1" to 0.956, "' implied h\ the re-,ulh 111
Table J\9.2); at the low interest rate, howcve1, the effect of ha\ 111g the '>l ILker "'mud1 l,1rge1
(a change from l.264<Y<> to 2.024;b). !'he sticker is most cffccti1c when the customer receive-,
a more attractive offer.
Box and Tyssedal ( 1996) showed that the 20-run Plackett-Burman design produces,
for any three factors, a complete factorial arrangement with some combinatiom. replic,1ted.
The cksign is said to ha1e "ro;ecti\ity" 3. In contrast, fract10'1,il faLlori,il de-,1g11s th,ll urn
found main effects with 2-factor 111tcractions, sud1 as the one -,hol\ll 111T<tblc1\':1.5, fail to
produce a complete factorial for some sets of three factors and hcnLc only have project ii
it} 2. \Ve use this projcctivity idea to provide more ev1dcm.c that the .1pp,1rcnt main effect
of U (second buckslip) is actually a consequence of the bias created by the SG interaction.
consider the th rec L!Ltor-. .\ ( ;, .rnd /( Oft he 20 runs in Table A9.2, there is at least one run
.it L'<ll h oft he e1gh t f~ll tor le1,l L nm h1n<1t1om oft hese th rec fdctors. In spcu f yi ng cac h l.Omhlll,1t ion, we let the first sign indic,Jte the level ofS, the seumd sign the level o(G, and the last
sign the level of/{, I here .ire four runs at each of the four combinations (
), ( t ' ) ,
(-r ~), (-+
) and one run at each of the remaining four comblllations. Because we
hal'e at le.1st one re-.ponsc at L,1d1 combin,ltion, we have a full factorial arrangement 111 factors S, c;, and R (ignoring the other factors). Because the number of runs at each combination is not the same, we must use regression to estimate the effects. Domg so, we find that
the three significant effects arc S, c;, and sc;, confirming that it is the SG mteraction and not
the main effect of/? that is -.1gnJficant.
Table A9.4(a) shows the results when regressing the response rate on the main and interaction effects of the three faLtors S, c;, and R. The standard errors of the estimated regre-.,ion coeffic1enh use the pooled variance from the eight factor-level combinations, assuming that the other factors have no effect on the response. The I-ratios and the probability
'f" . ,nr~ A9.4
lfrgrn.,1011 lfrrnlt., for .\loilcl.1 l<clat111x the f<cspomc Rf/Ir to
/-11cton \ 1 /11tcrest Rate).(, 1.\t1ckcr), f< 1.\econd Fluckslip ).
I 1<'opr .\ fr.<sagc ), 11 nd / {/ cttcr f lead/int)
(.,) Rr<.Rr-ssrn--; or
IU-SPO"<Sl' RATJ. ON s,
c,,
R, ANO H!~IR
R.llc
I Q,
Predictor
Constant
G
R
SG
SR
GR
SGR
l{.ilc
I "JX
Predictor
Constant
G
SG
(IU20)C:
i004'i)S<;/i; JI -
1Il.Hln1\
O.Oi6)UI
Coefficients
l. 32 5
-0.386
0.320
0.061
0.151
-0.070
0.076
0.045
11
I 12 '>
R.ire
I 2'l8
(!l.-l l2)S
StdError
0.066
0.066
0.066
0.066
0.066
0.066
0.066
0.066
t-ratio
20.07
-5.85
-4.85
-0.93
2.29
-1. 06
1.16
0.68
Coefficients
1.298
0.432
-0.278
0.188
StdError
0.052
0.052
0.052
0.052
(0.070)S/I ..-
P-value
0.000
0.000
0.000
0. 372
0.041
0. 310
0. 271
(I, 508
0.872
t-ratio
24.75
-8.24
-5.30
3.58
P-value
0.000
0.000
0.000
0.002
(0.27R)c, (O 151)\C
+ (0.118)/
(0.066)/;
0.921
Predictor
Constant
G
SG
I
Coefficients
l. 298
-0.432
-0.278
0.151
0.118
-0 .066
StdError
0.044
0.044
0.044
0.046
0.045
0.045
t-ratio
29.46
-9.80
-6. 31
3.29
2.62
-1. 46
P-value
0.000
0.000
0.000
0.005
0.020
0.166
268
AP I' EN VIX
values of the regression coefficients listed in this table indicate that 5, G, and SG are significant whereas all other effects (including the main effect of factor R) are insignificant.
Table A9.4(b) lists the results of the regression on the significant effects S, G, and SG. The
regression explains 87.2% of the variability in the response rate.
C:heng (1995) showed that in the 20-run Plackett-Burman design, for any four factors,
estimates of the four main effects and the six 2-factor interactions involving these four fac tors can be obtained when their higher-order (3- and 4-factoi) interactions are assumed to
be negligible. Having eliminated factor R, we apply Cheng's finding and consider a model
that includes the four factors that were significant in our initi.il main effects analysis: S, G,
I, and;, together with their six 2-factur interactions. The result of this regression shows that
all 2-Cactor interactions except SG arc insignificant, leading to J model with the four main
effects and the SG interaction. The fitting results for the modd with S, G, SG, and the two
lllain effects of I and j are shown in Table A9.4( c). These five effects explain 92. J % of the
variation, a rather modest improvement over the 87.21b that is explained by S, G, and SG.
Jt is clear that factors S (interest rate) and G (sticker) and their interaction SG arc the main
drivers of the response rate.
A FOLLOW-UP EXPERIMENT
A- lJ of the test matrix in Table J\9.6 to create the 16 mJil pJckages. The + / - combina tiom in the 11 interaction (product) columns are used solely for the statistical analysis of
the results. All pairs uf columns in Table A9.b arc orthugonal. All 15 effects (4 main t'I!'ABLE
A9.5
in
) Cu11trul
t )
,-..;cw Jde.i
A Annual fee
13 Account -open1ng fee
Curre11t
Lu wet
No
Ye,
Current
l .owt'r
Lo\v
I l1gli
j)
ralL'
CASE 9
269
TARLL A 9. 6
Results of the Follow-up Fxperiment
1J
::;
::r'.
'J
'-'
-'-
er,
<=
<=
v
"J
"J
..,
:J
01.
~
~
<=
:J
- ~
1n1
(ell
"'
::;
::r'.
;.:
1J
~
lntcrac1inns
fl
/)
\Ii
\("
/\ [)
/Jr'
1in
(;/)
~Ji('
/\fl})
/\(})
+
+
.l
i-
+
H
IHJ
J\K
J6H
1r
184
2)2
162
172
187
2)1
174
,_
J()
11
12
HO
I 72
219
15.1
152
J\
14
I'>
16
"'
+
+
+
+
+
+
---~
+
-
Response
R,ite
2.4)%
1.36%
2.16%
2.29%
2.491<1
3.39%
2.32%
2.41%
1.84%
2.24%
1.69%
1.87%
2.29%
2.92%
2.04%
2.03%
----
fects and 11 interactions) can be analyzed independently, and none of these effects are
confounded.
The Results
Each of the N
16 test cells was mailed ton= 7,500 potential customers. A total of2,837
customers, or I00(2,837) /( 16)(7500) = 2.364%, responded to the offer and placed an order. Main and interaction effects were calculated by applying the plus and minus signs to
the response column and dividing the weighted sum by N/2 = 8. The results are shown in
Figure /\9.2. Stand;ircl errors of the effects (expressed in percentage changes) are ohL1incd
hy substituting['
1).()2364 into
,/
{P(I-/>)
---;; -
0.0877
Although one m;inagcr had thought that charging an initial fee would give the impression of exclusivity, this fee had the largest negative effect, reducing the response rate by O.'i 18
percentage points.
270
APPENDIX
-~-----
U.518
(Initial
1ntere>t
AIJ - - - - - - - - - - - -0.302
rate - - - - - - - - - - +-0.252
IW
"m"'""'-..;;;:::r.
0.108
+ U. I 02
1\ IW
~Jlll:i?
AlJC
Z;"l;! -U.052
1 IJ.085
A HCD
-0.052
BC
-0 048
ACU
AC
+U.008
+-0.002
0.000
0.125
U.250
IJ.375
Fffect in percentage
Figure A9.2
-,-----,0.500
IJ.625
po111I>
Another attempt to slightly increase the interest rate showed, once again, that the longterm interest rate had to stay low. Raising the interest rate reduced response on average by
0.498 percentage points.
The main effects are quite strong. However, the significant interactions (A/! and ClJ) imply I hat one needs to look at the effects ufA cllld l3 and of C and/) joint I}. The clic1gra111s in
hgure A9.3 show the nature of the interactions. The AB interaction supports both rnc1i11 cffi.:lb, but prnvide:i <1dditio11aJ important in>ighb. With an alrnu11t-oprning fee (Ii t ), the
lower annual fee results tn only <I small increase in response from 2.05% to 2.16%, but with
no acLount-opening fee (B- ), a lower annual fre results in a large increase in response from
2.27% to 2.98%. The estimated response of 2.98% is highe>t for the combination A+ H ,
the lower annual fee and no account-opening fee. The AU interaction expresses that A+ and
B- together increase the response rate beyond what can be expected by either of the two
factor' separately. This may result from positive synergies or may be due to the negative impact of the account-opening fee, which for some customers may cause an immediate rejection of the offer. The nature of this 2-foctor interaction provides extremely valuable in for-
CASE
271
r:n interaction
A fl intcr,1c ion
1.2~
:u s
2.7S
2.75
-------------- ~
2. 2~1
I.Ti
2.h
,---
---,
!\ - :Current
A
i\ n llllil I
f-:
l.7'i
Lower
f('C
fl' "'"
Figure A9.3
- ,- - -
C-: Current
T--
C : I ower
Initial interest rate
/)
n -.
m,1t1011. L\1ng it<- lin,rnci,il modeb, the u1111pany found that the increase in response 1L-,ulting from no account-opening fee and a lower annual fee (A+B ) was much greater th;in
the lms in revenue that would result from eliminating these fees.
The Cf) interaction shows that when the long-term rate is low (D - ), the effect of;1 lower
initial rate is small and not ,-,tatistically significant (a change in response from 2.57'Yo to
2.66/ri). It is clear that offering the lower initial rate would not be profitable if the lower
long-term rate were also offered. However, if the long-term rate is high (D+ ), then the
lower initial rnte has a large impact, with the response changing from 1.91 % to 2.321ii. The
interaction shows that, for persons receiving both lower rates, the increase in response is
considerably less than the sum of the two main effects. This customer behavior is consistent
with the concave value function used by Thaler ( 1985) and based on the earlier work of
Kahneman and Tversky (1979). In contrast to the main effects that suggest both interest .
rates should be low, these results followed by additional analysis using the company's financial models showed that a lower long-term rate coupled with the current (higher) initial rate
was the most profitable.
FINAL COMMENTS
After these two ma iii ngs-one with a 19-factor Plackett-Burman screening test and the other
with the 4-fac1or full-factorial follow-up test-the marketing team learned more than they
had ever before when using the simple technique of testing one variable at a time. The specific
findings of these experiments led to immediate and substantial improvements: increa.,cd response rates, lower costs, and higher profits. But the longer-term benefits have been even
more substzintial. This study introduced the company to the use of formal experimental design methods. Since then th~ firm has continued to experiment, incre:1sing the speLd and
profitability of its testing programs and becoming a leader in the applicat;on of these tools to
direct marketing. Testing has given the company the ability to quickly prove what sells <llld to
greatly improve its performance in the highly competitive financial services marketplace.
272
APPENDIX
011
Exercise 2 in Chapter 6
274
A I'" F N
!)
rnenl: a medium-size store with a loyal, varied, and stable dientelc. !-'our products were
sLudied: Camay soap (bath size), White House apple juice (32 oz), Mahatma rice ( l lb), and
Piggly Wiggly frozen pie shells. Sales of these products were stable withoul Lrrnds, and they
exhibited limited seasonality.
A complete factorial experiment was carried out. With three price levels, three display
oplions, and two advertising options, the design called for 18 factor-level combinations
(treatments). Since the design was replicated once, 36 weeks were needed. Furthermore,
each week was preceded and followed by a base week (which is a week where all four products are priced al regular price, displayed at normal shelf position, and not advertised). For
such a time arrangement and because holiday weeks were not used, the experiment spaJJneJ
roughly 80 consecutive weeks. The response was the number of unib sold between Wednesday noon and Sunday 9 p.m. of each experimental week.
Trend and seasonality were not considered sniuus l~1ctors because products were stable
with minimal seasonality. furthermore, prior studies showed that the customer flow varies
little throughout time.
The precise schedule of the treatment week:,, and a detailed discussilln of the necessary
prep.a rations and logistic problems that arc associated with running such an experiment arc
given in Wilkinson el al. ( 1982).
THE DATA AND THE ANALYSIS
I !ere, we consider Mahatma rice and White House apple juile. The /\NOVA t<.1bles (for the
model with the three main effects, three 2-factor interactions, and the 3-factor inter<lction)
are shown below. We also list the cell dveragcs for the various f'actor-lcvcl Lolllbinatiom.
!\NU\ 1\ for \ \'h11e I luirn' Apple f u1Lc
~u111
SuurLt'
Ma111 ,llects
Prill'
llispla)'
Adwrtis1ng
2-JaL!llr interactions
l'riLc x Display
PricL' X Advert
14~
.\,98 \
2(>1
1,070
tS,395
428
x Advert
~
..j
Ad,en
208
J fouur 111ll'raLl1u11
p x /) '< ;\
l:rror
'fotal
1,440
9,861
U1>play
99
l:rro1
TotJI
2,X4_l
IJ1>pla'
AJvl'rtt-.111g
PriLL'
3-L.lLl11r intcrnction
Price
2 LiLtur 111tcr1.1\.t1u11..,
!'rice x Displc1v
1,068
107
Llisf>l.1y X Advert
I''< J)
ol Squar,s
624
Regular
Pricl'
Regular display
lxpa11ded display
S,u.11 d1slay
Reduced
Pricl'
A!JVl:HTISJNl1
Cost
Regular
Reduced
Cost
Pr1Le
l 1 r1Ll'
Pr11.1..
!'!ILL'
j/ .5
51.5
7 \.0
28.0
J'i.5
32.5
21.0
38.0
42.0
6ll.O
46.U
22.J
)2 ;
76.:J
>\.()
~.l.lJ
:;=i.,
IUl.ll
CASE 10
275
Rcg11l,ir
ADV FR rl'ilN(;
l'rrcc
Rcd11tcd
Price
c:nst
Price
Regular
Price
Rcd11ced
Price
Cnq
I' rice
11.'i
11.0
61.0
26.'i
22. 'i
38.0
41i.O
44.'i
'ii .5
.lB.O
.\l 'i
1Zcg11IM dl'J1l,1,
I 'Jn
2 '5
I 'l'"'Hkd dl'pi.11
'.fi.O
lh ()
~f1l'li,il
17.[J
6:i.O
,Jr,pl,lV
ADVl'RTJS!NG
7k.O
COMMENT
In Sec lion 7.2, we di'>cusscd lhe analysis of ,1 general 2-(actor factorial experiment. tlcrc we
li1cc ,1 J-(octor lactnri,d cxpcr1mcnt. However, extending the A NOVA table to this situation
is straightforward. Now we have a total of nbcn responses,
y,1k1
Exercise 2 in Chaplcr 7
CASE 11
277
TARLF.All.l
Test and Control Markets
Northca.,I
Midwest
Southwest
Southeast
Test Market
Control Market
Binghamton, NY
Rockford, IL
Albuquerque, NM
Chattanooga, TN
Utica-Rome, NY
Fort Wayne, JN
El Paso, TX
Montgomery, AL
TARLE Al I .2
Test Drsign
r ! ..,-1 MAH!..: r1
R111g
rirrn:
h11m1on
l11h -:
Oct 72
Nn\' Ian 7'
leh Apr 71
LON 1H'11
)(ock
rord
Albuqucrque
Chat tanongd
\l.11
fl
("
1\11g
fl
/)
/J
/J
/l
l!ticaRome
rort
\Vavne
VIAHKFT
,,
A
A
A
A
1'vlo11tgomcry
I I Paso
A
A
A
A
A
A
A
A
A
TARI.FAIJ.1
Results
TFST MARKETS
Time l'erincl
Mny Jul
Aug- Oct
Nov Jan
I-ch-Apr
1972
I972
1971
1973
llingh<lrnlon
Rockrord
Alhuqucrq11e
ChattancH1ga
7,:1110 (A)
11,258 (B)
13,147 ([))
11,800 (C)
11,852 IA)
7,77(:, (I))
8,501 ((")
ll,450(f))
12,089 ( R)
7,557
7,JM (B)
8,0'19 (C)
9,010 (/))
r:ONTROI
Time Period
Mav Jul
1\ug Oct
Nov-Jan
I-ch-Apr
1972
1972
1973
1973
l r11c.1-Rome
7,900 (fl)
IA J
f-ort Wayne
El Paso
Montgomery
7,166
10,970
11,706
7.411
7,489
7,679
8,536
12,718
11,495
11.753
12,008
8,2-,0
12,902
13,826
--- --------
7,853
7,768
The 16 test-market resp)nses in the Latin square allow for the estimation of the main
effects of the three factors: location, time, and advertising. We arc most interested in the
effect of advertising, after adjusting the analysis for possib le location 2nd time cffcch. The
ANOVA shown below indicates that there is no strong evidence for an advertising effect.
The /-statistic for testing thL' significance of an advertising effect is 639, 139/330,445 ~ 1.93;
it<; J1l"llhahil1tv value I" P(.\ n) > 1.931 = 0.225 is larger than the significance level 0.05.
Time is aloo not significant (F = 2.42, with probability value 0.164). The only sig11ifiL.111t
factor is location, with Rockford and Albuquerque having considerably higher cheese sales.
nx__l
A~OVA
p
MS
F
Source OF
SS
Advert 3 1917416
639139 1. 93 0. 225
Cities 3 79308210 26436070 80.00 0.000
799400 2.42 0.164
rime
3 2398201
330445
Error
6 1982671
1ota1 15 85606498
Advertising
1 (0 cents) 9981 287.4
2 (3 cents) 9653 287.4
3 (6 cents) 10558 287.4
4 (9 cents) 10346 287.4
Cities
1 (Binghamton)
7946 287.4
2 (Rockford)
12859 287.4
3 (Albuquerque) 11798 287.4
4 (Chattanooga) 7934 287.4
Time
1 (May-Jul 72) 9549 287.4
2 (Aug-Oct 72) 10216 287.4
3 (Nov-Jan 73) 10138 287.4
4 (Feb-Apr 73) 10634 287.4
Part 2 (Control Markets)
rhe 16 control market respomes (<ill under zero cent ,1dverfoing) origin.1te from a fac
torial C\periment with two factors: location and time. The J\\:0\'A tahlc allows us lo test
whcthn therL' Jrl' time and location elfi.Lh. I line i-, '>OlllL' ind1c.1t111n lor ,1 t111lL' L'ilLct, hut
the ev1drnce JS weak (probability value 0.075 ); the 101..alion clfrll i-. \cry '>ignifJL,Jlll, with
lort \\d)'lle and Fl Paso having rnnside1"c1bly higher cheese de-.. I hL' \\eak t1111e elfell and
the -,ignilicant location effect confirm the tinding'> of l'arl I.
J<e>ults fur ( 0111rol A/wk.et> ( hutunu/ / Jc.<1~11)
F
p
Source OF
SS
MS
Cities 3 78938086 26312695 85.41 0.000
Time
3 2985501
995167 3.23 0.075
Error
9 2772578
308064
rotal 15 84696166
Cities
1 (Utica-Rome)
2 (Fort Wayne)
3 (El Paso)
4 (Montgomery)
Time
1 (May-Jul 72)
2 (Aug-Oct 72)
3 (Nov-Jan 73)
4 (Feb-Apr 73)
7718
12604
11741
7828
277. 5
277. 5
277. 5
277. 5
9321 277. 5
9988 277. 5
10047 277. 5
10535 277. 5
CASE 11
270
We combine the observations from the test and control markets and fit a regression
model that includes variables for the four levels of advertising, the eight different locations,
and the four time periods. The ANOVA table for this model is shown below. This is no
longer an orthogonal design, because the different factor-level combinations do not have
the same number of runs; for example, there arc no observations for control cities and advertising at levels R-D. A consequence of a nonorthogonal design is that sequential and adjusted sums of squares are no longer the same. We are interested in adjusted sums of squares
because they tell us about the regression contribution of each factor, on top of all other factors that are part of the analysis. We find that there is not a huge benefit to increased advertising. The test statistic F =- 2.45 is not significant at the 0.05 level. There is evidence for a
time effect, with salc'i increasing linearly with time. The effect of location is quite strong,
with higher sales for Rockford, Albuquerque, Fort Wayne, and El Paso. The main effects
plots in Figure A I I. I illustrate these relationships graphically.
Ii C.<ii I Is fin Trst and Cnntrol Markets (Combined Analysis)
t\ NOVI\
p
Source DF
Seq SS
Adj SS
Adj MS
F
2126193
1917416
Advert 3
639139 2.40 0.101
Cities 7 158246501 158246501 22606643 84.94 0.000
5348522 1782841 6.70 0.003
Time
3
5348522
Error 18
4790430
4790430
266135
Total 31 170511645
Advertising
1 (0 cents) 9977 144.2
2 (3 cents) 9649 295.5
3 (6 cents) 10554 295.5
4 (9 cents) 10342 295.5
Cities
1 (Binghamton)
7946 257.9
2 (Rockford)
12860 257.9
3 (Albuquerque) 11798 257.9
4 (Chattanooga) 7933 257.9
5 (Utica-Rome)
7871 341. 2
6 (Fort Wayne) 12758 341. 2
7 (El Paso)
11894 341. 2
8 (Montgomery)
7982 341. 2
Time
1 (May-Jul
2 (Aug-Oct
3 (Nov-Jan
4 (Feb-Apr
QUESTIONS
Exercise J in Chapter 7
APPE~JDIX
282
TAllLl
Al2.l
Message
Promotion
RuJC Included
in Fraction?
Price
-I
Yes
-I
Yes
-J
-I
Yes
Ye..,
-!
-J
I
Yes
A ll L IC
A I 2.2
12
J\
123
2.l
-I
-[
-I
11.1 11
>1
-I
-I
k'.:
-1
-1
v.
-I
-I
y~ ~
l!.'llJ
I
-I
Y,
11.U I
-I
-I
I
f{l'-.pon_..,(.'
Price
-I
-I
y"
y,
-I
y,
U.lN
~
IJ.l \
O.IJ11
~
IJ.lll
11.U.
belcd 13 and 23 could have been used for two additional 2-level factors, resulting in an orthogonal fraction of a 5-factor 2 4 4 1 factorial design.
The design for the factors Message, Promotion, and Price in Table A 12.2 is balanced and
orthogonal. The design is balanced as the factor levels of each factor occur in the same number of runs. It is orthogonal as the foctur-level combinations fur each p.1ir of factors appear
in the same number of runs. Because of orthogonality, the main dlecb of the three factors
can be estimated independently of each other. The effect of message can be obtained by averaging over the other two factors; the same is true for the efieLb of promotion and price.
The orthogonal half-fraction was carried out and the results (proportions of sampled individuals responding to the offer) are shown in the last column ofTahlt' Al2.2. The estimated main effects arc
Promotion: Ave(Promution at
~ 0.
175
U.075 - 0. l 0
+ I)
0.175;
l.ASF 12
I)~
283
/~l'pnmm
At2.3
Constant
Message
Prom0tion
Price(lin )
Price( qua)
- 3
- I
Price( cu hie )
Response
- 1
3
3
y, = 0.14
y, - 0.09
y, ~ 0. 13
y., .0.40
- I
I
3
3
I
I
-3
TARI.~
)';
O.Ot
y, = 0.06
y, O. to
y,
0.07
A12.4
Regression ( !utput nf tlic Ma111 lffccts Model with Orthogonal Trend Components
284
Al' p EN 0 J x
W<b
which decreased with increasing price. Of course, these results rnme from a very small
study, and they should be confirmed by additional experiments.
EXAMPLE 2
In a '>econd example, Almquist and Wyner discussed the launch of a cre<llive arts and act iv
ities Internet portal for Crayola, the maker of colored markers and crayon;,. The goal was lo
design a letter marketing campaign that attracts target customers to the site and cunverh
browsers into buyers. In their letter to potential customers, Crayola varied several levels of
the following 5 factors: ( l) two different subject lines; (2) three salutations; (3) two calls to
ae1io11; (4) three promotions; and (SJ two different closings./\ tull 2 1J 1 factorial des1g11 that
includes all possible factor-level combinations requires 72 different letters. Constructing
. and sending each one of72 letters to
monitoring their performance is
<I
number of differen1 letters by comidering suitably chosen fra-.:tium. The discussion of fractional experiments in Chapter 5 has shown that while fractions of L1cturial cxperimenb
coni\rnnd effects, they can provide much useful information <1bout the importance
llf
the
studied factors.
It is straightforward to construct an orthogonal half-fraction of 36 runs by combining
the 9 runs in the full 3 2 with a half fraction 23
founds the main and interaction effects of the 2-level factors, but it docs nut confound the
main effects of the 3-level factors. However, 36 runs may still he too many, and one ma)'
want to look for designs with fewer runs. Almquist and Wyner mention running a I 6-run
design, but they do not specify how they selected these runs. A balanced and orthogonal design in 16 runs is not possible. The 16 runs cannot be divided evenly among the 3 levels;
hence, the design cannot be balanced. furthermore, there is no arrangement that achieves
the same number of runs at all factor-level combinations of each pair of 1:1ctors; hence the
design cannot be orthogonal.
The design software JMP was used to obtain the l 6-run 1J-uptin111/ design in Table/\ 12.5.
/\ D-optimal design minimizes the dctenni11,111t of the covari,111ce 111atri.\ ufthc main -elfl'cb
estimc1tes; it maximizes the precision of the parameter estimate~. Nute that this de.s ign is
quite Llose to being an orthogonal design.
QUESTIONS
290
RF.rEltE_N_c_F_~s_- - - - - - - - - - - - -
CHAPTER
Fisher, Ronald A.: The Design of Experiments. Edinburgh: Oliver & Boyd, 1935 (and various later editions).
Len th, R. V.: "Quick and easy analysis of unreplicated factorials." 'frchnumetric:., Vol. 31
( 1989), 469-473.
Montgomery, D. C.: Introduction tu Statistical (Juulity Control (3rd ed.J. New York: Wiley,
19%.
Yin,(,,!.., and Jillie, D. W.: "Orthogonal design for process opti111iLatio11 and ib applica
ti on in plasma etching." Solid State Technology (May 1987 ), 127 - 132.
CHAPTER
Abrahc1m, 13., and Ledoltcr, J.: Introduction to U.egre:.s/011 /\!loi/elu1g. lklmont, CJ\: Duxbury
lJrcss, 2006.
Eibl, S., Kess, U., and Pukelsheim, 1:.: "Achieving a target \'alLL !or a 11Jc111uL1Lluri11g pro
ccss: A case study," journal of<Jiwlity 'frcl11wlogy, Vol. 24 (I 992), 22 26.
Ledolter, )., and Swersey, A.: "Dorian Shainin's variables search procedure: J\ critical as sessment," fournul of Quality Technology, Vol. 29 (1997), 237-247.
CHAPTER
Abraham, B., and Ledolter, J.: introduction tu U.egression Modt'ling. lklmont, CA: Duxbury
Press, 2006.
Box, C. E. P., and Tysscdal, ).: "Projective prupcrties of certain orthogonal c11rays." 13/u mctrika, Vol. 83 (1996), 950-955.
Cheng, C. S.: "Some projection properties of orthogonal arrays." Annuls ufStatistics, Vol.
23 ( 1995), 1223-1233.
Draper, N. R.: "Plackett and Burman designs." Encyclopedia ofStatistirnl .Sciences. New
York: Wiley, 1985, 754-758.
Draper, N. R., and Smith, J J.: Applied 1?.egresswn A11ulys1s (2nd ed.). New York: Wiley, 1981.
Margolin, B. H.: "Orthogonal main -effect 2"3"' designs and tl<Yo-factor interaction aliasing." 'J'echnometrics, Vol. 10 ( 1968), 559-573.
Plackett, R. L., and Burman, j. P.: "The design of optirn um rn ultifactorial experiments ."
8io1netrika, Vol. 33 ( J 946), 305-325.
CHAPTER
Box, C. E. P., Hunter, William C., and Hunter, J. Stuart: S1.1t/,1ics/(1r l:\pcrimentas: /Jc sig11, lnnovatiu11. und Oiscuvery. New York: Wiley, J 978 (2nd L'd., 2005).
John, 1'. W. M.: St11tistirnl Methods in J:'ngim:ering und Quulity Assum nee. New York: Wiley,
il)L)()
Muntgurnery, D. C.: Urngn and Analysis oj J:'xpa/l/lents (6th ed.). New York: Wiley, 2005.
CHAPTER
!3ox., (,, E. P., and Draper, N. R.: Empirical Mudd llliildi11g w1d i<.L'spo11>e Surjiices. New
York: Wiley, 1987.
RF.fERENCES
291
John, P. W. M.: Statistical Design and Analysis of Experiments. New York: Macmillan, l 97 l.
Kuhfr:ld, W. F., and Tobias, R.
APPENDIX
Case 4
Abraham, B., and l.edolter, ).: Introduction to Rcs;rrssion Modeling. Relmont, CA: Duxbury
Press, 2006.
Case 8
Barcl.i), \'V. D.: "fallor1.1l design in
Vol. 6 (19691, 427 - 429.
,1
Risgaard, S.: "Industrial use of statistically designed experiments: Case study rcferenLes
and '>Orne historical anecdotes." Quality fngincering, Vol. 4 (1992), 547-562.
Rox, C. E. P., Hunter, W. G., and Hunter, J. S.: Statistics for Experimenters. New York: Wiley, 1978 (2nd ed., 2005).
Rrown, W., and Tucker, W. T.: "The marketing center: Vanishing shelf space." Atlant11
I lolland,
c:. W., and Craven~, n. W.: "Fractional factorial designs in marketing rese<nch."
_B.
~ HFlFRF~'l(__
Fs_- - Milliman, R. E.: "Using background music to affect the behavior of supermarket shop-
Pl.id~l'tt,
~d1aub,
D. A., ,1nJ .'-1011tgomery, D. C:.: "Usmg experimental Jes1g11 to opt1m11e thL '>IL'
Sn\ ,1st<1va,
J., and
'i8S.
\,\'ilk111son, ). B., Wason, J.B., and Paboy, C.H.: "Assessing tl1e impaLl of-,hort-term su
permarket strategy variables." Journal ofMarkct111g R.cmm h, Vol. 19 ( 1982 ), 72
Wu, (.
86.
Case 9
Barcl.1y, W. D.: "Factorial design in a pricing experiment." Journal of Marketing Research,
\'ol. 6 (1969), 427-429.
Barrrntine, l.. H.: "Illustration ofrnnfounding
Ill
111 Alanugemrnt,
mid the Sczenccs. Helmont, CA: I )uxburv Preo,s 2002.
Box, ( ;_ E. P., Hunter, W. C., and Hunter, I. S.: )tat 1st1cs Jin Lxpcrzmentcr.'. :\ew York:
f-11.~zneertng
J.:
REFERlNCES
291
I
~
c'l4
H 1 1 H< 1 N<.1 s
- -
-------
Wittink, D.R., and Cattin, P.: "Commercial use of conjoint analysis: An update." Ju1mwl
of \/arketing, Vol. 53, No. 3 ( 1989), 91-96.
\Vitt ink, D.R., Vriens, M., and 13urhenne, W.: "Commercial use ot conjoint in l:urope:
IZe-,ults and critical reflections." lnternatwnal journal ofReot'arch in Marke/Ing, Vol. 11,
'\o. I ( 1994), 41 -52.
\\'u, L I.. J., and I famada, M.: Expenrnents: Planning, Analysl',, and Parameter Design
296
INDEX
cells, l 70
t1uns, 27-28
ch,unpron-challenger testing, 2
67,
(18,
95, 99
torial experimenb
97
l)
I 95
i1
17
lJ opt1111al1tv, 2118
2 Ill
pukr software,
S_l
sarnples, 50-51
computer software, l. 187; for A- and D-optimal
designs, 210; )MP, 187, 205, 206-207, 210;
Minitab, 187, 205-206; for nonorthogonal
Jl);
scatter dia-
cal software
de ivloivre, Abraham, JY
Design-Ease, 205
Design-Expert, 205
87,
83 -86
distribution, II
SlJUare distribution and, 16, 17; f-'-distributiotl and, 16, I 7; normal distribution and,
13 -15, 14; probability density function and,
12 -13, 13; I-distribution and, l 5-16, I 6. See
llisu 110r111al distribution.
correlation coefficie11t, 19-20
Lrackcd pots example, b'i-82, l 17 119, '"slltll
ing l11gher-order 111tcractiuns arc negligible,
12 .\Y,,ll;111ca11oladrs
discrete distribution,
JU-\\;
variance of.1
discrete distribution, I 0
dot diagrams, I fl, 23
Fagle llrandscdst'study, 214-216,215
lcconomic Cuntrul oJ C)1wl1ty a/ Mrmu/actured
l'rud11tl \'ihch.1n), llJ
II
elkct herl'd1ty, UK
--------
JNDF.X
297
139
errors of measurement, 39
bee\, 207
011
M.irkct1ngL,1sestud;, J'i.)
161, /'>6-/5/,
80, 80, I 39
histograms, 18,
cx11crimental error, I J9
factors, 65
factors at three or more levels, 66, 169-191;
white paper), 6
interactions: in three-factor design, 72-73; in
two-factor design, 69, 70-72, 71, 72
interquartile range, 18
levels, 65
60
foldovcr: switching 'igns in cverv column, 133
135,/l4,139
Ill
Jfi
rtlll.S,
124
128, /26-127,
designs
-sLllist1c, 53 - 55, 57
224, 240-241
multi factor techniques, 2, 4
2~8
INIJFX_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __
techniques, 4
nqd1g1blc higher-order interactions, 7 9 80, HO
nonorthogonal designs, 192-2 l l; computer
2~
population distributwn, IO
2UI
1101111.11 d1stribut1on, I J
l -, )9
I (l 1
proport1on Lst1111at1ll11, \I
\2
11ull l11pothesis, 29
ie,
39
model, l 04 - l 06
~ne1111_1t I Box),
randomization, 94
randomized complete block experiment, 4748, 55-59, 56, 57, 58; 60, 61
Pande,
Peter~-,
random sampling, 25
random vanabks, H
63; inference
range, 18
rcgrcsslOll Jppt<>d< It to dLtc11111111ng s1gn1!1c,1nt
eftclls, 88 8ll
l'c1rl'lll pnnuple, 7;
part1,il .tl1ascs, 152
<i
243,243
I Ob
:s
runs, 65
p1cd1arts, 18,2/
Pigg!> \\'iggly
LJSL'
2~r,
~tu
C'erllury, 6
s.impk, 22 24
ol designs 111 N
12, 20, and 24 rum, l 62;
dtrl'd rnail credit card c_ampa 1gn, l'iS 161,
l2,
teLhniqucs, 4
populauon, 22
population d1stribut10n, 40
2!J.I
Jllll
-,_l
mJI d1stnhut1on
12
'Ill',
quadratic 11wdcis, I 87
qualitv 111a11agl'rnLnt 4, h
Quetclet, Adolp1e, .l9
(i
rando1111zat1011, Y.J
randomized rnmplete block experiment, 4748, 55-59, 56, 57, 58; 60, 61-63; inference
I{.,
p.ir,1111eters, m s,1111pl111g, 24
effects, 88 84
I().!
.~8
response va11abks,
pl'ILL'lltile ot order p. 18
rubust cks1g11,
runs, 65
I ()(1
10.l
18~,
205, 2Ub
(i'i
pie charts, l 8, 2 I
1'1ggh Wiggly e<1'c -,iudy, 273-275, 27.J 175
pl.1nhu effect, 36
l'lackett Burman designs, 2, l 50-168, l 92, 245;
conlounding patterns, 163
164; comtrud1on
'iabburg, ll,1\1d
llo/JlS
'/'/11
Ce11turv, 6
sample, 22 2"1
sample
-~2
Sill', 2~.
lJ, 206
INDEX
299
sampling distrihution, 26
nagh), 7
<;11edecnr, Ccorge, 60
Taguchi, Cenichi, 6
split-run testing, 2
stat1st1c,1l expcri111e11h,
st,1ti,tic.il inference, 26- 16; ait"rnative hypothc'is, 29; "best practice" drug, .16; central limit
effect fnr .wer;1gcs, 26; central limit effect for
proportions, 27-2/l; central limit theorem,
26, 28, 41; confidence intervals and tests of
hypotheses: compMing mcam of two independent samples, .'12 .l1; rnntldencc intervals
"''l'"ll'>e, 91
124
9 l; orthogonal1tr, 1-9
128,1211
.11nple, 128
n11is,
/J;'/.!8;onl1nelcarn111gex
90,
276-280, 277-280
unreplicated fallonal dcs1gm, 82, JOO
I0 I
vanan...:c
fl'
149,
53
\5
J 39;
:: l.1bk, l.\
\IJO~.J\
124
ample, IW
1111<.'
t'x
89, I 04
c<l'>L'
111
Plackett
149,
PREFACE
analyzing real data sets, and designing and carrying out their own experiments. Each chap ter includes. many exercises ranging from straightforward "drill-type" problems to more
challenging ones that test tools and concepts. The 13 cases involving real-world applications
are a key and unique part of the book. Some of these cases describe how experiments were
conducted and give readers the opportunity to analyze and interpret the results, while
others are written so that students can develop their own designs and compare their approaches to what was actually done.
Each chapter ends with an important section of notes titled "Nobody Asked Us, But ... "
Including these notes allowed us to focus on the basic concepts in the main text and then to
elaborate on them at the end of the chapter. Doing so gives the reader the opportunity to
first learn the basics without being bogged down with too many details and then through
the notes to build upon these core concepts. The title of this section rnmes from a wcUknown column of the same title written by the late New York sportswriter Jimmy Cannon.
The development of statistical computer software has made it easier to design experiments and analyze and interpret the results. We have not tied the buok to a specific computer program, but discuss computer output from several packages, in particular, Minitab
and )MP.
Instructors can use the book in a number of ways.. The entire book can be covered in a
full semester course on experimental design that would include most of the cases in the case
study appendix, as well as an experimental design project that would combine methodology with real-world practice. The book can also be used for a section on experimental design in a course on quality management. To do so, the instructor would assign Chapters I,
4, and 5, along with several of the cases. Jn addition, the instructor might assign selected sections of Chapter 2, which reviews basic statistical concepts, and Chapter 3.
Many people contributed to this book. We begin by acknowledging several people who
greatly influenced our thinking and learning. We gratefully acknowledge George Box, Norman Draper, and the late Bill Hunter who taught courses on design of experiments and statistical modeling when one of the authors (J L) was a graduate student at the University of
Wisconsin-Madison. The very lively "Monday Night Beer Seminars" in George Box 's basement had a profound impact as these discussions showed the importance of well-designed
experiments for learning and also provided a strategy for im?lementing these methods in
real-world settings. We also pay tribute to the late Sebastian B. Littauer, a distinguished professor at Columbia University and a recipient of the American Society for Quality's Shewhart Medal, who was a mentor to one of us (AJS). He was an expert in statistical methods
who influenced many by his extraordinary teaching of statistical quality control not only as
a set of problem-solving tools but as the conceptional foundation of a quality management
philosophy.
At Stanford University Press, a number of people made important contributions. We are
especially grateful to Martha Cooley, our editor, for her encouragement, insights, and
suggestions, and to Jared Smith who carefully checked and organized the manuscript in
preparation for its production. We are grateful to the production services team at NewgcnAustin, including Andy Sieverman, who oversaw the production process from start to finish, and Teresa Berensfeld, whose excellent copy editing improved the presentation.
,,
I
CASE
INTRODUCTION
Marketing know-how does not always translate well from one channel to another, as one
office-supplies retailer came to realize. With consistent growth and solid profit from their
retail stores, one industry leader decided to expand into direct marketing channels, mailing
out catalogs and sending e-mails to direct small-business customers to their Web site ,rnJ
store'>. Two of their biggest challenges were bullding solid ma'I and e-mail lists ofprospec tiVt' customers and translating the in-store experience onto a two-dimensional page. A year
after starting these new programs, the marketing vice president wanted to speed the learning curve with a more disciplined approach.
Talking with other executives, he decided to bring in an outside consultant to strengthen
their marketing testing efforts. Both the catalog and Internet programs had room for irnprowment, but the flexibility and low cost of e-mail (versus printing multiple catalogs) became the deciding issue on where they would first apply scientific test mg techniques.
With fast response, low costs, and flexible production, e m.1il was a great place to -.tart
testing. In addition, what worked in e-mail cuuld then be tested in the catalog and L'l'en 1n
retail -,tores. However, in this early stage of their business-to-business e-rnail program, the
lntcrnct marketing director had very few e-mail addresses he cuuld use. Retail sales associates had begun asking for e-mail addresses, and online orders were growing, but at this
point the marketing team had only about 35,000 names. Moreover, these names included
three distinct customer segments, each with different buying behavior. With so few names,
the Internet director had tested one or two new ideas in each 111onthly drop, but had J dif
ficult time tryi11g to get a statistically significant read on his results.
PLANN I NG THE TEST
The consultant agreed with the Internet director that sample size could be a problem. He
ex.lained that there was no magic shortcut-no way to redu<:e the natural variation in the
marketplace-so it was necessary to overcome variability with bold factors and a sufficiently large sample size. He explained how simple rules of tbumb, like" 100 orders in each
'fo eli111inate this variable, he designed his test panel with no hub within 3 inches of the
panel edge.
Tiu.: expedatJ()ll was that under the u1r1Tnt cu11ditiuns ahuut 2"u u1 3% u( the huks t111
a given test panel would have broken tents. !'he cust er test panel i11Lluding labor and i11SlJl'Ctio11 was estimated to be about $20.
r
r
'
PREFACF.
xi
We wish to thank several people who helped us develop the cases and examples in this
book. Mark Wachen, CEO of Optimost, provided the data for the Phone Hog case in Section 8.2 and shared with us his modeling insights. Optimost (www.optimost.com) is a technology and services company specializing in comprehensive real-time testing and conversion rate marketing. We also thank Phil Nadel, CEO of Gulfstream Internet (the parent
company of Phone Hog), for carefully reviewing the case and allowing us to use it. Jay Harris, publisher of Mother Jones, was instrumental in the development of the Mother Jones (A)
and (B) cases, providing access to his organization and contributing many helpful ideas as
the experiment at Mother Jones was designed and carried out. Alexander Dean, president of
David Brooks Company, was very generous with his time and expertise. The broken pots
example that we introduce in Chapter 4 and discuss further in Chapter 5 was written based
on many discussions with Alex and describes a simplified version of the production process
his company uses in the making of clay pots.
We thank Elsevier Publishing Company for allowing us to include the article [Bell, G. H.,
Ledolter, J., and Swersey, A. J.: "Experimental Design on the Front Lines of Marketing:
Testing New Ideas to Increa~e Direct Mail Sales," International Journal of Research in Mar-
example in Sections 4.5 and 6.3. One of the greatest benefits to us in writing this book has
been the interactions, both professional and personal, that we have had with Gordon.
CASE 1
EAGLE BRANDS
INTRODUCTION
Bill Evans, Director of Marketing at l:agle Brands, was worried. Eagle, a national producer
of packaged sandwich meats, was facing increased cornpetitio11 and declining market share.
Looking over the latest quarterly supermarket sales numbers, ham observed that the situ ation was not improving. I le realized that drastic action was needed to turn things around.
Evans had recently read an article in the Wull Street Journal about a statistical approach
tu product testing and was intrigued by the idea of using it tu tr-y uut su111e new marketing
initiatives. The article called the appruach mu1tivariab1e testing (MVT ), and its proponents
claimed it could be used to devise an efficient in -store test ol multiple variables that might
influence sales and prolits. Evans had in the past led a pruject to test market a new paLkage
design, and although the experiment provided very useful results, it had been a major un dertaking. I le was concerned that testing a number of variable, in one experiment might be
prohibitively expensive to carry out.
The journal article mentioned QualTest, a mariage111e11t ul!1:,ulti11g lirm speciali1ing in
applications of MVT. !.'.vans contacted the lirm and arranged lo have Qua!Test give a pre
sentation at Eagle Brands. Evans assembled a group of 10 key prnplc, including the head s uf
sales, tinance, and accounting. Steve Cardner , a senior QuaJ"le-;t con>ultant , made the pre ~
sc11tat1on, explaining the approach and illustrating it with ::ievn;d LclSL"exJmplt:s ofsULLt:ss
Cul experiments for QualTest clients.
DESIGNING THE EXPERIMENT
The response to the QuaJTest presentation was a positive om, ctnd Eagle Brands hired the
firm to help them design and evaluate an in -store marketi11g expen111ent. Qua!Te.-,t con ~
sultan ts led by Steve Gardner began a series of meetings with r~ aglc managers.
One of Eagle's major customers was Zip Stores, a nation~ii 'upern1arkct chain, and the
plan was to select a group of the chain's stores to participate in the test. Hill Evans realized
that input from the chain was important to the success of the experiment, and a merchan dising manager from Zip agreed to join the team.
INTRODUCTION
This book is about the power of statistical experiments. In the increasingly competitive
global economy, firms are constantly under pressure to reduce costs, increase productivity,
and improve quality. Testing or experimentation in the business world is commonplace,
and the usual approach is to change one factor at a time while holding other factors constant. To some, this approach seems logical, simple, and therefore appealing. But as we will
show, it is highly inefficient, and it may fail to identify important factors and lead to wrong
conclusions. The hetter method is to test all factors simultaneously. Doing so not only reduces the costs of experimenting but, as we will demonstrate, also provides the experimenter with more and better information.
Elementary courses in statistics that cover topics such as probability, hypothesis testing,
confidence intervals, and regression analysis often appear abstract; and although they are illustrated with numerous examples, they typically seem far removed from practical issues.
In this book we use and build on basic statistical concepts to explore approaches for solving
real-world problems. Although our focus is on practice, it is important to keep in mind that
statistics is a science, and sciei:ice is based on theory. While computer software has made the
implementation of statistical methods much easier, there is a danger in relying on a cookbook approach in which the ,user fails to understand the underlying concepts. In contrast,
this book's presentation com.bines theory and practice, and focuses on strengthening the
reader's understanding of fundamental statistical ideas.
Our goal in writing this book is to share our passion for the subject and to provide stu dents, practitioners, and managers with a set of highly relevant, interesting, and valuable
tools. In the past, in the area 6f experimental design, nearly all the attention was focused on
manufacturing rather than services. In contrast, most of the applications and examples in
this book will involve mark~ting and service operations. In the next section, we give a brief
introduction to some of the' cases that are included.
[
i
I.
DESIGN OPTIMALITY
.. ,
+ l coefficients
The factors may be the price in a marketing study, or the temperature and the conLcntra
ti on of an input factor in an engineering problem. A total of\,' > /... + I experimental runs
are needed to e<;tirnate the k + 1 coefficients. The N X (k + 1, design (regression) matrix X
consists of a colun . n of ones and k columns of factor levels th.it need to be '>elected at the
design stage. At issue 1s the optimal seleLlion o( the elemenh 111 ,\
Our interest is in the precise estimation of the regre.,.,ion LOcfficients f3
({3 11 ,
/) 1,
, (3.) '. Le.isl '>llll<lrL'S theorv (see Appendix ~. 1) 11npl1L''> th,1t the v,1r1,111LL' of the L''>tl
mJte
f3
is given by
V(/3)
Several optimality criteria have bern proposed rn the design literatun.:.
A optimality. 'We look for a design that lcatb to the rn1alil'-.t <l\'l'rage variance of the
resulting e-,t1matcs. The design that allow-, LI'> to estimate the pa1ameters with the
-,111allest ,1vcrage error mu.,t 1111nin111L' the lion ot (.\ \ ,
I his Llltlr10111., L,iikd
,\ optimalitv.
I) optim11/1tv. Altcrnallvcly, we look tor a design that 111in11ni/l''> the \olume of the 1oint
cont1dence region of the parameters. 1t c,l!l be shm\ll that the volu111L' is pruflLH
tional to the '>quare root ot the detcrmin,111t of(\'' ,\)
I knee, 11e w,1nt to '>ekct
the levels of the factors in the de.,1gn 111atrix X rnd1 th,ll the dctcnninunt of (,\ " X)
is mm1m11cd. lh1s criterion 1s Lallcd lJ opt1mal1t).
The two design criteria ,ire similar, involving two Lloselv related fullltiom of the reciprtJLab
of the eigenvalue' ofX'X. A-optimality mirnm11es their sum , whereas U-opt1malitv 111in1111 i1es the sum of their logarithms.
Before one can apply these criteria to dctcrn11ne <ln opt1m ,tl de.,1gn, one must <>pec1ry the
per111issiblc experimental region of the design factors. Also, one needs to remember that
the rnefficients {3 , (1 1, 2, ... , k) are affected by the choice uf the scale for x,. lfx. denotes
the price measured in dollars, we can change f3 bv a fallor of I00 by measuring the p1ice in
crnh. It is often desirable to scale the factors uniformly. We have done so 1n the 2 level designs
in Chapters 4 6 by ,1dopting the scaling \ and t \, implymg uni form 1t v ,1uoss the/... fa Lt or,.
(k
+-
I)
<le~ign
J\ ,rnd D-optimality of the main-effects design. I-or a proof. see John ( 1971, p. 194 ). lt is in
'>itu,1tions where orthogonal de>1g11s L.tlllH>l lw luuml th,1t .\ ,111d I) opli111,dity llL'colllL' 1111 -
_,_________
Nu
lrnag<~
Fonn
Ccntcn.~1l
Figure 8.2
Continued
T /\ u
011
1. E
8. I
1"11c1r
~ulllbtr
~u111ber
Arca
A-bo110111 (ma111 headlinej
;\ top (marn heaJline)
IJ (.,ub headline)
C (llldlll
lllp)")
[) ( fur.11)
l (prrvacy copy)
F (submrt button)
G (huvv it work.s ~edion)
// (ma111 i111age on rrght "de)
I (fouler)
Nor t
of
I e1c/.1
ul Lcvl'b Jnd
Levels
IU
4
n 11,4,h,H,9, IUI
4 (I I
.\ (I, J, lJ I
4 (I, 2, 5, ti I
6 11 t1
ti
(J
(J
4
7
4
4 (I
4)
Ci I I, 2, 4, 5, b,7)
5 (I 5)
l (I 4
2 ( I, 2)
so
;:
"~
0..
"
40
30
20
JO
No
Yes
200 4 d onati o ns
0.6
O .~
0
'D
i
f
0.4
cu
g_
0.2
.~
:0::
0.1
0.0
1957
1967
1977
Cl ass
Figure 2.6
1987
1997
Y and Y
factorial designs
in Chater 7.
Orthogonality of the design simplifies the analysis of th<' resulting data considerabJy.
Main effects and interactions can be estimated by averaging over alJ other factors. The esti mates ctre independent, and the sum of squares that is explain xl jointly by the studied !actors can be partitioned into individual, unconditioned sums ofsqu::ires that ignore all other
factors. For example, the total sum of squares in the 2- facto r factorial experiment in Sec
tion 7 .2 can be partitioned into the individual, unconditioned sums of squares for factor A ,
factor Fi, the interaction, and the error component.
Such an additive unconditioned decomposition is no longer ossiblc if the design is not
orthogonal. Orthogonality, for examle, is no longer presen t if observations are missing
frum
'111
orthogonal design. More importantly, an orthogon .il de-.ign mav simpl)' not be
available in situations that involve many factor:, with differ ent numhn:i of foctor lcveb.
Consider, for example, 7 factor:, with 2 factors at 2 levels , I factor dl 3 level:, , 3 factor:, at
4 icVL'ls, and I foct,.r at 5 levels. A full factorial with (2 )( 2)( 5 )( 4 )(,l )( t)( 5 )
certai11Jy orthogond. Also, a few special orthogonal
fraction~,
3,840 run' is
possible, but the n.Lmber of runs of these orthogonaJ fractions is sti!J quite large. It is simply nut possible to find an orthogonal fraction with a moderate number of runs. Other de sign criteria need to be adopted if one wants to select a good design that is able to study the
main effects of these 7 factors in, say, N = 30 runs. In this chapter, we discuss useful guid ing principles for constructing such nonorthogonal desigm. Design concepts such as
D - and A-optimality become useful, and we discuss them in Apendix 8. l.
Dotp lo t of 20 04 donations
-~5
.0
30
~ 25
~
~
20
0.
l 'i
._?..
c 10
.s
:;;
>
~
()
..........
-,---
-- - --
4,000
2,000
()
6,000
8,000
10,000
12,000
14,000
2004 d onations
c
::'"'.
0..
"'
30
20
10
0
0
300
600
900
1,200
1,500
1,800
1967
1957
3nn
tiOO
<fOO uoo
1987
.1mi
'loo :.rno
l. ~00
1.1100 0
JOO 60n
1977
JOO
600
900
1,200 l, SOl'l
l.~00
1997
1, ~ 00 r.~oo
inn
Pa n el va riabl e: class
2004 do nations
Figure 2. 7
Dot Diagrams, H istogra ms, Rox Plots, and Scatter Plots of Con tin uous Variab les
l,eve l I
Level 2
Level I
Level 2
Level J
Quadratic
u
I
Quadratic
Level I
Level 2
Level l
Levl'i ,I
Cubic
I
-[
Level
Level
Level
Level
Level
Quadratic
I
2
Lubic
-[
4
5
- 2
6
4
I
EXERCISES
Exercise 1
(aJ Use an available computer program to obtain the ANOVA table in Table 7.3. Obtain the interaction plot in figure 7.1. Obtain the rnai n effects plots of factor A and
factor B, and comment on whether or not these plots 're useful.
( b) For this rather smaJJ data set, calculate the nine cell avuages, the three averages for
factor A and the three averages for factor B. Use the expressions in Table 7, I to calculate the sums of squares and convince yourself that the results coincide with the
ones given in Table 7.3.
(c) Discuss in detail your experimental procedure. How would you carry out the bak ing experiment if you had to use your home oven, and .the rating procedure if you
3.1
INTRODUCTION
In Section 2.5 we used sample information to test whether the means of two populations are
equ;il. In this chapter, we extend the discussion to the comparison of more than two means.
We discuss two designs for making this comparison: the completely randomized experiment
and the randomized complete block experiment.
Internet experiments that present one of several advertising messages at random to users
of search engines are examples of completely randomized experiments. There the k advertising messages, which may differ with respect to advertising text, background color, and
font size, are offered to distinct Internet users at random; each user responds to one and
only one advertising message. The response in such studies is the sales volume gent,>rated
from each advertising message, or the "hit ratio" (the proportion of those who access a particular Web site in response to the message).
Consider another example. A firm wants to test three different in-store promotions for,
a major product, and identifies a group of 15 stores of similar size to participate in the experiment. Each store will test one and only one of the promotions for a certain period of
time (say three weeks). The promotions are randomly assigned to the stores, with five different stores per promotion. In the language of experimental design, the three promotions
are called treatments, and the 15 stores are called the experimental units. Since the treatments are assigned to the experimental units at random, we call this a completely random-
ized experiment.
An alternative design for comparing the three promotions is the randomized complete
block experiment. Suppose the firm believes the I 5 stores are not homogeneous and that
possible store effects could introduce additional noise that would make it difficult to recognize differences among the treatments. Hence it may be better to observe each store under
all three in-store promotions. We could divide the study period into three one-week periods and, for every store, assign each of the three promotions to a different week. In this
design, each of the 15 stores acts as a block. Within each block, treatments are assigned
to the three one-week periods at random. The design is called a complete block design
= J 2, 20,
AND
24
RUNS
'!'Jhlc 6.1 lists the Plackett Burman design for N - 12 run;. The design matrix
1vas
con
'lruded as follow'>. Starting from the first row (which 1s l1>tcd bclm1 1n the row under
,\'
12), you cyclically rearrange the symbols. That is, the sequence of plus and minus sigm
111 row 1 gets pushed to the nght by orn: space to form row 2 and thl' 111111us sign 1n thl' far
right position of row 1 gets moved to the far left position 111row2. I he plu-, -,ign in the far
right position in the second ro\.\ get> moved to thl' f~1r left pth1t1un in rm1 J, .tnd so 01i. I he
cyclie<tl rearrangement of rows continue-, until ro11 11. ThL' , 2th ro11 1' .1 ro11 of .di minus
s1gm. Alternatively, you can cycle through the 11 ruws bv pu,h111g the -,equcncL' ul plus .u1d
m1nu-, '>igm to the left and moving the entry in the far-left pm1tion of .1 row to the f.1r right
po-,1t1on of the subsequent row. 'I his only changes the order f the rum.
\1111ilarly, for\
20. You create the f1rst 19 runs by cyclically rearranging the symbob
in the row shown below; the 20th row i'> a row of.di minu., '>ign'>. The rl''>Uiting de-,ign ma
lrix I'> slrnwn in I able 6.4. The same procedure 1s used tor ge H:rattng the dc-,1gn 111.1tnx tor
,\'
24 runs. !'he init1.il rows needed to comtruct PlaLkett 1lurm.1n designs with ,\' , 28
run-, L.lll be found in the original Pl.1ckctt and Burman ( t Yo.16) rl'lercncL' .rnd tn adva11Lcd
hook.-, on design ot experiments.
'I Im procedure results in the standard order ofa Plack.ett 1lurman dl'sign. As with all e\PL'rimcnts, one should randomizt.: the .i.-,signmcnts of the expLriml'ntal urnh to each run, or
il c\periments arc carneJ out in timeurder, one ~houlJ rando11111c the order of the 1 um.
,\'
12
20
21
8
t
f
f
'I
Il
II
II
I I
I!>
I'
p;
I 'I
'll
2I
"
'I
"'l
2v
;~ 1
26I V- ~
E = ABCV
E = ABC
F = BCD
2 ;\~
E =A BC
, .4
2l\-'
')
:;;
2111
21; -1 1
111
F= BCD
G = ACD
E = ABC
F = BCD
G = ACD
H =ABD
E - ABC
F= BCD
G =ACD
H = ABD
J = ABCD
E=ABC
F= BCD
G =ACD
H=ABD
J = ABCD
K=AB
L =AC
M=AD
N = BC
0 = BD
P = CD
BE
3-f
AE
F
CH+DE
E+DJ
+ CJ
+ BJ
+ A!
J + AF
+.
BG+ CH+
DE
,.
E+
H+
]+
AO
G+
AP
F+
AN
AJ
BE .
BM
BJ
BP
CM
DL
EO
DN
AF
BG
CH
DE
CK
CJ
Pl
DK
FM __ <: BP
GOY -:,..:::,)FL
I{#, . \. . . GN
. FK
. HN
co
~l>f
GK
'FIL
.Kf
LO
. ;:;;i
MN
No TE: The expressions below the columns of plus and minus signs from th e previous page specify the confounding patterns of the es timated effects. I, whi ch denotes
the column of plus signs, is not used as a factor. Interactio ns of order three or higher are assumed zero.
I"
lJ
PLACKETT-BURMAN DESIGNS
- - ' - - - - -- - -- - - -- - - - - - --
- - -------
--
--
In cl1apter 5 we focused on 2-level fractional factorial designs. As we have seen, in those de signs the number of runs N is a power uf 2 (N = 4, 8, 16, 32, etc.). In this chapter, we dis cuss '1nother important class of fractional designs called Plackett -Burman designs.
In a classic 1946 paper in the journal Biometrika, P lackett and Burman showed how to
construct 2-level orthogonal designs when the number of runs N is a multiple of 4 (N = 4,
8, 12, 16, 20, 24, and so on) . If the run size is a power of2 (fo- example, N = 8, 16, 32, .. . ),
these designs are identical to the fractional factorial designs that we studied in Chapter 5.
The 2-level fractional factorial designs leave large gaps in the mn sizes uf the available de
signs. r:or example, 7 factors can be studied in 8 runs with a 2 111 ''design, but if the number
of factors is between 8 and 15, 16 runs are needed; and 32 runs are needed for l 6 to 31 factors. The Plackett-Burman designs for N
Su~)pose
that we wish to estimate the main effects of 8 fac1ors and want to achieve this
through a design with as few runs as possible. We could use :he 2r\
fractional factorial in
Table 5.7. This design does not confound main effects with 2cfactor interactions, but it re quires 16 runs. A Plackett-Burman design with the smaller run size N = 12 is an option if
economy of run size is important.
Plackett-Burman designs have resolution Ill-confound ing main effects with 2-factor
intnadions. Traditionally, they have been used tu estimak 111,1111 L'flLl ts under the assu111p
tion that 2-factor interactions are largely negligible or small in magnitude. Mure reu.:11tly,
researd1ers have begun tu explore the so-called projective prupertie-; of Plackett-llurman
designs and have shown that in some c1rcumst,rnces they can he ei'1cll1\ely used lo identil'\'
llh\y 2-factor \nteract\uns. ln Sect\un 6.3.3, m uur Lhscuss\on uf the n:w\to of a case study,
we make use of this important property of Plackett-Burman designs.
'
-~J
TABLE 5 . 5
/!-Run Fractional Factorial DesiJ<ns: Genera tors, Confounding Pa tterns, and Resolu tion
Resolution IV design:
4 fact o rs: 2fv
D=ABC
5 fac tors:
6 factors:
25
7 factors:
2'IJI- 4
Iii
16-- 3
wJI J
D =AB, E = AC
D = AB, E = AC, F = BC
D = AB, E = AC, F = BC, G = ABC
Run
5
6
7
8
-- -- -
24- I
IV
2s- 2
Ill
2f11
27
Ill
D = ABC
D =AB
E =AC
D = AB
E = AC
A
A + BV +CE
B
B + AD
AB + CD
D + AB
AC+ BD
C + AE
E + AC
BC+ AD
BC + DE
D
BE+ CD
A + BD
B +A D+ CF
C A.E+ BF
D - AB - EF
E+ AC- DF
F+ BC + DE
AF+ BE+ CD
B + AD + CF + EG
C _. AE +BF + DG
D -+ AB + o;
E + AC BG - JJF
CE
r :-; nc
D = AB
E = AC
F = BC
G ~ ABC
A + BJ) + CF + n;
EF
AC; - BC + DE
G + AF + BE + CD
The darker -shaded area represents the run s of the 23 factoria l building block des ign . The lighter-shaded area rep rese nt s the calculation columns that are available
for generating the levels of addition al fac to rs. The expressions below the columns of plus and minus signs specify the confounding patt erns of th e estimated effects. For example, in the 2\' design, the linear contrast in estimates B + AD + CF. Inte racti o ns of order 3 or higher are assumed to be zero throughout the table.
No TE :
'.I
A7.I
TAHLf
----
---
---
-- -
----
l'ACTORS
----
Ldminat1on
A = Lamination IJ
Roll Thickness t.x.it Temperature
C =Spray
Pressure
[) = Break
Point
I:= Hold
Time
Rcspon..,c
Standdrd
Deviation
Average
---
- l
-I
-1
II
7.5
4.Y'iO
J_:>
0.707
l l.O
lUl4
1 _.J
u.-07
44.5
6.)64
15
'32\
6
120)
22.0
9.899
23.0
24.042
26
30.0
5.657
0.0
0.000
l.O
1414
1.0
0.000
0.0
0.000
23.5
7.778
4.5
2.12 l
7.5
0.707
12.0
9.899
271
()
-I
-l
19\
-I
-1
-l
-l
-1
-]
-[
-1
I
l
l
l
-I
21
119\
2
(71
40
(3)
29
ill]
-l
-1
-1
-1
-1
-1
-1
l
l
-I
I
-1
-I
-1
40
( 17
34
(2)
0
(6)
0
(16)
1
(l)
0
( 26)
-1
-]
-l
-1
18
( 12)
6
(8)
8
(4)
19
'l.ll
------
-----
-- - - -
0
I
2.1)
0
12[ \
.l
I)())
49
128)
( 18)
0
( 14)
2
(25)
I
( 15)
0
3 l)
29
129)
l
(24)
7
( 10)
'221
---
A 8 .3
Test Remits
TA fl L F
-------
i:'.
0
r
.L
-7
,_,
'J
<
6
:t
::::""
:;
;;:;
""
c;
...;
-"'
'J
.~
l'e'1
( ell
-5
=
v
';!
~.,
>-.
.D
.D
E:
-"'
u
;!
-a
/l
::J
"
E
"'c
-.r
"'
::>:""
<
;::
-a
'J
-a
:;
;;:;
E
'.l;
~
:;;
v
>
-a
'-
0..
0
:3
(,
II
'?
'J
-"'
.g-
<
-'.'.:
ce:"'
._)
v~
0..
>
-;:,.,
~
"
>
OJ)
4-.
-""
'J
=".
:.::;
c
-"'
"J
-?
.,;
.g
u
""
0
>
u
u
--"'
u
v
V)
-a
:!'.
--".
,.,
0
0
<(
::;
.l
i-
6
7
8
g
12.5
29. I
5.8
16.8
+
+
+
+
4 9
+
+
17.8
-3.7
11.0
11.0
1..1
- 6.0
5.3
17.5
- 14.6
7.7
28.9
17. I
6.6
16.8
25.0
125
8.9
14.1
-3.2
I-
t-
+
+
JO
11
12
13
14
15
Iii
17
18
19
20
21
22
\\'eek I
j-
(2 ST0Ri.,/TF'1 <. c I l )
j-
IN SALFS
I\
PFRCF.NT Cfl,\NflF
0..
c
0
-t
j-
+
+
+
+
+
+
+
+
+
+
+
I-
_,.
+
+
21
+-
2~
Week 2
,.\vcrrigc
23.3
26.0
4.2
1.8
3.6
12.8
0.0
-8.8
1.5
33.J
-J J.9
-8.7
8.4
3.9
-6 .4
11.4
6.7
-5 .J
23.2
35.0
8.4
18.3
17.1
2.4
17.90
27.55
0.80
19.30
4.25
15 ..10
- 1.85
1.10
6.25
15.90
-8 .95
-7.00
12.95
5.35
0.65
20. l 5
11.90
5.85
20.00
30.00
I 0.45
13.60
15.60
-2.RO
---
A: Dr splay in prnducc
+10.7<J
10.h \
n:
/-: p.1ck.rg1ng
Signifrcanl effects (ahnvr line)
Crnss-pronrnle with s.rlsa +5.51
:..,};~~
.l.64
Ad
-W: -"_.~~
(~:
Discou nl
~-:~
-I 14
-1.09
C: Add
10
.l.10
-1.2 6
0.02
-i-
n.n
2. ~
5.0
7.5
Figure A8. I
10.0
12.5
CASE
This case is reprinted from the International Journal a( Research in Mnrketing, Vol. 23
(2006 ), pp . .rn9 319 with r1crmission of the Elsevier Publishing Company.
INTRODUCTION
"Test everything" ha' been a rallying cry in the marketing and advertising industry
throughout the 20th ccnturv. Industry experts like Hopkins (1923), Ca ples (1974), Ogilvy
( 1983 ) , and Stone and Jacobs (2001) have stressed the importance of testing new ideas in
the marketplace. Hut as statisticians developed and refined sophisticated experimental design techniques, most marketers held firm to the approach of changing one factor at a time,
oftrn called "split -run testing" (also referred to as AIR splits, test-control, or championchallenger testing). Only in the last few years have marketing leaders begun to embrace
advanced techniques for real -world testing.
The financial industry - including insurance, investment, credit card, and banking
firms - was among the first to use experimental design techniques for marketing testing.
The project described here is from a leading Fortune 500 financial products and services
firm. The company n;ime and proprietary details have been removed, but the test strategy,
designs, results, and insights arc accurate. Tests were run within two direct-mail campaigns
th<lt focused on increasing the number and profitability of new customers. The initial experiment, a Plackett - Burman screening design of 19 factors in 20 runs, was followed by a
4-factor 16 run full-factorial experiment.
A\though factorial, fractional factorial, and related methods of experimental design have
been widely applied to manufacturing problems, there have been few applications to direct
mail, Internet, retail, and other market-testing programs, and we found no papers that apply Plackett-Burman designs to these problems. For in-market testing, in an early paper
Curhan ( 1974) used a fractional factorial design to examine the effects of price, advertising,
display space, and display location on the sales of fresh fruits and vegetables in a supermarket, while Barclay ( I%9 ) used a factorial design to evaluate the effect on profitability of 1ai;, ing the prices of two retail products manufactured by the Quaker Oats Company. Holl.ind
and ( ravens ( I973 ) 11rcv' nted the essential features of fractional factorial designs and illus-
REGRESSION APPROACH APPLIED TO THE A..NAI YSIS 012-LEVEL FACTORIAL EXPERIMENTS, AND 1 HF FORTUNATE
CONSEQUENCES OF ORTllOGONALITY
In ~ect1on 4.3 we defined and cakulall'd main and intcract101 l'fiects, and we -,bowed th,1t
they arc linear combinations of the responses, with weights coming from the design vectors
and the calculation columns (obtained by multiplying eleme ts of the design vectors). An
alternative way of obtaining main and interaction effects is to write down a regression model
for the response and to obtain the estimates of the regression coefficients. Denote the vec
tor of responses as y, the k design vectors consisting of~ I \a'ues as x, x, .. , x1., and the
calcul<1tion columns as x 1h xu, ... , x, u (each a product ot 1-.vo design columns), x 1"'' ...
(each a product of three design columns), ... , ,ill the way to x 1 ; (the product of all k
design columns). Including the column llf unes, x,J
column is of length 2'.
F\amples of these vectors are given in Tables 4.4 and 4.9. 1-here WL' Ji-,t the vector or re
sponses, as well as the design and calculation columns. l'he only d1ffLTL'nce 1s that the rnl
umm are denoted by factor labels (H, 1, ... , JU'C' 1n !able 1.4; and A, Ii, ... , Afi(f) 111
Ta hie 4.9), instead of x , x", ... , x
~ 1.
Also, !'ables 4.4 and 4.9 do not list the column of ones.
/3,
,x
'.\./,./I
/3
(X' X)
X'y
21
'
Len th, in his paper "Quick and Easy Analysis of Unreplicated Factorials," (fo,cusses another
useful strategy for assessing the significance of effects in unreplicated cxperimc11b. !fr, pro
cedurc is based on the following simple formula lor the stancbrd error uf an estimated effect. 11 none of the factors are active, the standard deviation ol the m estimated effrcts
}1, / ;, . . , f " serves as the standard error ufthe estimated cCfcch . However, if some effects arc
active , this estimate is too large, as it not only incorporates random variability but abo the
effccb of active factors. Hence, one needs to omit from the calculation of the standard de viation the estimates of all active factors. The normal probab11itv plot discussed previously
does this informally when determining the best -fitting straight line tro111 just the estimates
in the linear portion of the middle part uf the graph, not fru111 thL' esti111ates u11 the L'xtre111c
left and right side that do not appear to fit the line through the rniddk.
Len th ( 1989) uses the fact that the median of the absolutL values u( the estimated non
active effects, suitably normalized, provides an estimate oC the: standard deviation , and he
calculates
... , lf "i )
The factor 1.5 in th~ normalization arises from the relationsh::J betwee11 the standard devi ation and the median of the absolute value ofa mean zero normal random variable. Jn the
next step, Lenth ( 1989) omits from this calculation all estirna'es with absolute values larger
than 2.5s, and he calculates a revised standard deviation
PSF == ( l. 5)Median (I J; I J~ 1 ,
,
1
...
1f "i )
),1< 2.Ss
He calls this the pseudu standard error (PSE ) and uses it l1l the calculation of the rnn fidenu' intervals 1\Jr the eCtecb. !'he 95'!1u cunl1dcncc interval !or an cfft:ct, t:st1mutcd
e}Tf'CI >- (t)(PSE), uses the 97.5th percentile ofa t-distribution with m!J degrees ol freedom.
For a standard error that is estimated with reasonable contidcncc ,111d th.it comes lrom
many ubservations , one would use the 97.5th percentile of the standard normal distribu tion, or simply t - 2. However, in the unreplicated situation the PSE comes from very few
observations, and Lenth ( 1989) lound through sirnulatiom th <l t the 1-dist1ibutiun with m /5
degrees of freedom works best. For m - 7 effects , l ~ 3.76; for rn = 15, t ~ 2. 57, and for
m
3 l, t
2. 22. Le11th recomnll'nds displa ying the estimakd l'iTL'cts 011 a h.1r -.:hc1rt , ,111d
adding to this chart the margin of error, :!:: ( t )( PSE ). If an esti1 n ate cxccl'ds these limits , then
it is likely that this particular factor is active (i.e. , :iignihcant ). We should point out that
Len th .,uggests even larger margins of errors by innnporating .,imultaneous .1djustmc11ts for
the multiple cornp<1risons (sec Appendix 4.2 for a discus:iion of multiple comparisons ).
Minitab displays Lenth 's PSE in the context of its norm.ii prubability plot (sec 1-'igure 4.6 ).
INEFFICIENCY OF APPROACH
E~
l HAT
---- ---- ---All too often the effects of k factors are studied by carrying out SLILLessive cxperimenh in
which the levels of l'aLh factor are changt'd one al a time. ~uch experiments start with the
standard settings of the k factors, then change the levels of the rne I.it.tor th,il 1.-, rnmidned
the must influential. The responses al the low and high setting, of this um faLlor art' Ulm
pared whik keeping all other factors fixed, and the kvt:I al '>'11ich lhl respume is hesl 1s
locked in for the ne.xt stage. The factor that is considered sern:1d most import<H1t is v;1ried
next. Again, responses at the low and high lcveb uf this factor arc rn111pared, and thL best
tevcl of this factor gt:ts locked in for all subsequent runs. Then rn to the third factor, and so
on, until the last factor is reached.
Cu111pared to the factorial (multifactor) experiments whne the il'veb of all f"ctllurs are
changed togetha, the approach of d1.111grng one factor al a time is inefficient for >l'\eral
It requm:s more runs lo achieve the same prec1s1on for the l'ifnts <.''>ltmates.
It may miss the opltmum altogether.
It cannot est irn,ile mteractions
ll does nut provide gent'fal co11L.lus1011s ,1buut laLtor effect,, g1ve11 th.1t thl' L'sti111dll's
depend on spL'ctliL levels of the I"L'lll,1ini11g faL tors.
\t\',, illustr,1te these shortLOrnings 111 t hl' rnntnt of the 2) foctonal dtsign 1n the 2,. factorial
vvitl1011t tL'pliL<1l1ons,
WL'
rn.ited bv rnmpariug two observatiom .tl the low ,rnd the !ugh ln els ol L'<IL h factor. A"-.ume
that tht approach uf ch;rng1 ng u11e fa Lt or di a time begins with ( x 1
+ ) and vanes
thL Jc,,ds ot faLlor l first. To obta111 the same prl'Llsto11 for thl' t'st1111,1lL's, une must stdrt \\ ith
1 ,,\,
(x,
, x2 = +).The 2 runs at (x 1 =
, 'i - +)have alreadr been obtained in the first
sll'p; hut 2 more runs at (x
, x2
- ) are required. l'h1' kads to ,1 total of 6 runs, as
rnmp<tred to the 4 1uns in the 2z f~tLtorial Lksign. This shows th,1t the dl'l'rllad1 ofch,1ngi11g
one foctor at a time requires more runs to obt,un estim,1tes with thl' s.1111e prechion.
Also note that the approach ot ch,1nging onl' t.1ctor at ,1 t llll' 111,11 1111ss tht opti111u111.
Comider the situation with the follow111g Juu1 laLtor levt'I co111h111at1ons ,md tht1r respumes that are supposed lo be max1mi1ed:
t- ):y = 80
- ):y - 90
):y
70 J
- ):y - 110
-t
CASE 10
PIGGLY WIGGLY
Wilkinson et al. rwilkinson, J.B., Wason,). B, and Paksoy, C.H.: "Assessing the Impact of
Short-Term Supermarket Strategy Variahles," Journal of Marketing Research, Vol. 19 ( 1982),
72- 86 J described the results of an experiment that assesses the impact of price, promotion,
and display on the sales of several grocery items.
THE EXPERIMENT
TAB! lo
4.9
------
---
-- ---
- - --
-"
>-<
0:
r~
z
z
";..;,,;
<
z
z
<
-::;:
<
"'>--
"
""',_.
-z:;:
<
-~
"'"
r
"'"',_.
z
c_;
lest Cell
ll
l)
AB
AC
----
---
AD
BC
HD
( /!
ABC
ABlJ
--ACD
+
+
+
+
+
152
-t-
+
9
10
II
12
13
14
15
16
+
+
+-t-
+
I
t-
ABCD
~--
BCD
+
-;-
2.45%
3.36%
2.16%
2.29%
2.49%
3.39%
2.3Li:\o
2.44%
1.84%
2.24/o
1.69/r.
1.87')-b
2.29%
2.92%
2.04%
2.03 1Yo
CASE
12
Eric Almquist and Gordon Wyner ["Boost Your Marketing ROI with Experimental Design," Harvard Business Review (October 200 I), 135-141) make a convincing argument
whv expcrimcnt;d design can speed up the learning curve of marketing research. IJirect
m;1rketers have used simple techniques such a.'> split mailings to compare consumer reactions to different prices or promotional offers. However, such traditional testing techniques
that change one factor at a time become prohibitively expensive if more than just il couple
of advertising techniques need to be evaluated. Changing factors simultaneously ilnd
changing the factors according to a well-constructed experimental pbn is the key to efficiently learning which of many factors have an influence. Almquist and Wyner discussed
two examples.
EXAMPLE 1
The lirst example describes how a company called Ri1Ware tests the sales response to .1 c<1mp.1ign I hat varic.s th rec factors: Price <it four levels ( $150, $160, $170, and $180), two differ-.
cnt messages (one cmphasi1ing speed, the other power), and two pro111ot1011 slratcgil's (one
involving a free trial period, the other a free gift). With 3 factors-two at 2 levels and one at
4 levels - the 2 "4 foctorial involves the 16 experiments listed in Table A 12. l. The la .st column in this table indicates the orthogonal half-fraction that is suggested by the design software JMP as <1!1 8 run screening design.
A fraction of a mixed -level design with one factor at 4 levels is easy to generate. One
writes down an 8-run dcsigr in seven 2-level factors (see Table A 12.2 given below) and uses
two columns and their interaction to assign the 4 levels of the 4-level factor. This procedure
generates an orthogonal design with one factor at 4 lcvels, and up to 4 factors at 2 levels each.
Herc, we let the first two columns represent the levels of the two 2-levcl factors. The columns 3, 12, and their product (3)(12) = 123 are used to determine the levels of the 4-level
factor. These are the columns in boldface. Level I of the 4-level factor is associated with
( - I, I, - I); level 2 with ( - 1, - 1, I); level 3 with (I, - I, - I); and level 4 with (I, l, I).
This leads to the 8 runs that a re indicated in Table A 12. I. Note that the unused columns la1
TABLE A\2.5
7'hc 16 Run [) -Optimal Design for the Crayola Marketing Campaign
----- ------ -----
Suhject
Action
Closing
Salutation
Promotion
0
- \
0
()
0
- \
0
- \
- \
I
0
- \
- \
I
I
0
- 1
- \
I
0
0
- 1
TA
ll l.E
Al 3. I
D!'scripti()n n( the 45 Crentivcs and the Resultin;; VTS!TS, CUCKS, ACTIONS, CTR = CLICKS/ VISITS,
and AR = ACTIONS/VISITS
-( ,rcanvc..,
1\
.1
(run_o:;
hnttnm
top
(,"
/J
Ii
/{
Visitors
Clicks
Actions Cli((%)
AR(%)
I 0.97
2.177
405
2:\9
I H.011
111
2, 150
420
212
19.53
9.86
10
1,988
376
203
18.91
10.21
2, 163
412
2:\2
191
231
19.01
1-.46
10.72
2,239
2,204
404
249
I R.JJ
11.29
2. 119
11 n
213
]lJ~'-l
10.05
2, 124
414
2011.~
2,088
1n0
1'>2
413
479
420
183
196
8.61
9.38
10
6
6
6
2,208
2, 131
10
II
2,262
12
11
I 0.31
245
17.24
20. '\7
215
l'Uk
252
21. ! 7
I 0.08
11.14
9.39
11.09
202
19.5!.
214
214
I ".52
9.54
2, 134
.193
391
18.32
I 0.02
2, I 01
356
186
16.94
8.85
2,089
2, 144
353
385
188
226
I 0.54
202
2,068
360
4 i .l
16.89
17.93
1-/ _,_,_,
lJ
196
19.97
2,054
)79
I HO
r .~.45
2,202
4_::;,3
230
2,087
174
216
2,087
1ql,
41'
2 I6
2[18
}h
2, 12.1
2,077
211.:.17
1-.92
I 8.81
'() 19
1h.-o
9.05
2.059
34
).19
188
183
8.88
28
2,21 I
4 1
242
I h.iri
18 (\-
I 0 91
2'1
2, 121
120
21 I
,:'.().()h
9.93
"'
2, I 62
406
2() 1
I' --
9 ..~IJ
\[
1,649
161
"-1.)
2,257
252
\2
\1
289
s 11
9.76
II 16
2.12\
.184
177
2, 188
189
227
I ii.OH
1-.r
2,244
2,202
406
434
I 86
206
19.70
2,241
448
223
IY 9'!
2, 185
454
241
2fl77
39
2,166
479
242
22 11
40
2, 194
2,094
451
20.~S
1M
243
217
2,214
397
214
J 7.9J
2,061
.190
210
18.92
2,072
7,698
441
1.282
222
74 I
2, 151
14
Ill
2,242
15
16
Ill
Ill
2,090
17
18
Ig
l()
)I
"
[()
21
.~ 11
lfl
2;
\'
\6
JO
8
.l7
18
41
42
43
44
4'>
~)
18 09
I 0 18
I .2R
16.65
8.99
9.66
9.47
8.76
I 0.44
I 0.34
I 0.34
9.79
8 ..11
I 0.17
8.28
9 ..\5
9.95
11.02
I 1.17
I 1.07
10 ..16
9.66
10.18
10.71
9.62
--The numbers under areas A-bottom, A-top, R through l refer to the available levels in each of the 10
tl.'st areas . Level l represents the baseline level.
NOTF
REFERENCES
CHAPTER 1
Box, Ceorge I. P., Hunter, William C., ,ind Hunter, J. ~tuart: Statist1rs{or Experimenters:
/)esign, Tnnovatwn, nnd Discovery. New York: Wiley, 1978 (2nd ed., 2005).
Deming, W. Edwards: Out of the Crisis. Cambridge, MA: MIT Press, 1982.
Fisher, Ronald t\.: The lks1gn o{ l:xpcriments. Fdinburgh: Oliver & Boyd, 1935 (and variOll'> later edit inn'.).
1-islwr KO\, Jo,lfl' U. A Fisher, 7'/w I if( nfa .',(1c11t1.,t. New York: Wiley, 1978.
Pandl', Peter'>., l\'cum,lfl, ]{ohert P., and C:avanagh, Ronald R.: The Six S1gmu Way: ff ow
(;F, ,\fotnro/a, 111111 Other Top Co111pm11cs /\re Hn111ng Their l'er{nrma11ce. New York:
:-.tccraw-l lill, 2000.
'>al,hurg, I )avid: Flu I 11d1 fosting 'fr11: I low "itatistics Rcvo/11tio111zcd !->( rcnce zn the rwr11t1ctlr Cn1t111T. \;c1, York:\\' H. freeman, 2001.
CHAPTER 2
Cl1rkc, D. C1.: .\/ur~ct111g A1111li'sis 1111d l>l'nsron Making. Redwood C:ity, CA: The "c1cntific
Press, 1987.
l-L1lcl, Anders:/\ flrstury of'J'rohalnlity and Statistics 11nd Their Applirntinns Before 1750.
:\cw fork: \\'iley, I 98h.
'>hcwhart, \\'. ,\.: f(<l/Tom1c ( ontrol oj ()11a/1ty uf.\1a1111facturecl Product. \kw York: \',in
l\;o-;trnnd. 19~1.
'>t1gkr, '>tephcn \!.: .'>tati.,tu 011 the fable. Cambridge, MA: I larva rd LTnivcrsity Pre">, 1999.
\'\'clch, H. !..: "The significance of the difference hctwecn two means when the population
variances arc uncqu<1l." Hwmetrika, Vol. 29 ( 1937), 350-362.
CHAPTER
l\m, ( ,corgc '"I'.. I lunlt'r, \\'illiam C, and Hunter,). '>tuart: St11t1sticsfnr 1-.'xpcrimmtcrs:
J>cs1g11, /1111ovatio11, and I>iscnvcry. New York: Wiley, 1978 (2nd ed., 2005).
Cl.1rkc, D. C.: Marketing Amilysis and f)f'Cision Making. Redwood City, CA: The Scientific
Press, 1987.
hshcr, R. A.: Sta11st1rnl Methods for Research Workers. Edinburgh: Oliver & Boyd, 1925.
4.1
INTRODUC J'ION
In 1)11, chapter, we hcg111 foLusing on the heart of thi'> book. 11 ( 'haptn 1, we were um
ccrned with compa.111g the effcct1vencs., of'>c\cral 1real111e11h, Lillh
tor. 111 one example, the fador was a product displa1', and we tLstl'd thrL'L' ditlcrenl display.,
based on a Lo111pariso11 of weekly store sales. Herc, we extend 1hal disLussion, focusing 011
experiments with multiple faLtors.
Fxperimenta1 design methods have roob
Ill
field lo introduce the material we will cover. ~uppose we arc nperime11t111g \\'ilh 11,1ys lo
imprm'L' the yield
Of COrll,
and
WC
type ol fCr
til11er, \aricty of seed. and type of pesticide. We decide lo tL'sl two fcrtili;er for111ulat1om,
two kinds of seed, and two different pesticides. As we discussed m ( hciptcr I, the lrad1t1onal
method for testing multiple factors is to test one factor at a t11m. But hsher ( 1935) showed
that a factorial design that tests all faLtors si111ult,111eously is a 11 ud1 bettn approad1. Using
fishers method, wc test the 2 X 2 X 2
8 possible combinaliom of fertilizers, seeds, and
pesticides. We divide our experimental field into 32 equa1-si1ed plots and randomly assign
each of the eight co111b111,1lions to four plots. hJr ca'-h of the 32 plot:-., \.\L' measure the num
ber of bushels of corn produced. This factorial arrangement would allow us to compare the
two krtilizers, the two kinds of seeds, and the two pesticides. It would also allow us to un
co,er ,111y interacllom between factor!->. hn e>.amplc, 1t ma) l 1 i111 out that 'eed \aril'l) l 1s
bcllcr than variety 2 when fertilizer I is used for both, but that the opposite is true when fer
tilizer 2 is used with both seeds.
(:on sider another example. An advertising ,1genL~ is deo,1g11111g .in onl1m ad. It idl'nl1fies
three factors to test. with the response be111g thl' lrdction ol ad 1'1ewLrs 1,ho '>lgn up for the
adl'erti,ed service. One foctor is the ad copy- -.i traditional l'et'>ion or a 111ore modern one.
The SL'rnnd factor is the font
the b<1Lkground color white or b\ue . In cl fadorial Lk-.ign, lllle ul the eight po-....ihk ,\Lb
would rcmdomly be '>l.:11\ to each v1.:wcr. 1he qut:-.t1u11 I'> \\h,1t 1-. 1rnpm\,rn\ h.:r.:( h 1\ the
Cn\)'), the font, m the h,KK\jround L~)\m \\1a\ matt.:r<~ DL1 the b\.tm-. ml.:r,1LI m the '>C\\'>C t\1,1\
INDEX
15
experiment
ers, 3
alternative hypothesis, 29
Alumni Donations case study, 20- 24, 21-24
ambiguities in two level lr.ict1onal factorial
designs, 125, 121-:- I -15
11\~(l\.'t\),
~s
"best-practice" drug, 36
CASE 13
PHONEHOG
This Lase continues our discussion in Section 8.2. Phone Hog recorded the number of distinct visitors to the PhoneHog site (VISITS), the number of times visitors click on the -,ub
sequent page to obtam additional information (CLICK~), am1 the number of actions of av
tually completing the subscription agreement (AC'! IONS ). The click-through rate, CTR CLICKS/VISITS and the action rate AR= ACTIONS/VISITS rnea-,ure the succes> of tht
creat11es. The results are shown in 'lahle Al 3.1.
QUESTIONS
b . erc1>e 2 in Chapter 8
_J
Main~effects
Advertising
Cities
14,000
~
12,000
:;;
4-
c:
""
~
----
10,000
-------
K,000
t>,UUO
------------- - f ---
~-,--------,-------,-----~~--,----~--,--r
,---,5
-1-
Time
12,0UO
v,
~
ro
4-
l0,000
ro
:::;:"
8,0DO
Figure Al 1.1
_ _ _i
CASE
11
J_
Thi-. Lase is ad.ipted (rum D. ( ;, < l.11ke /Murkct111,~ J\1ui/y,1, 1111d I >ct1~1011 .\/11klllg, ThL '>u
entiliL Press (I 987) ].
Researchers at the United Dairy Industry Association ( L'Dl1\) were evaluating the results
o( a recent field experiment that tested the impact of varying levels ol advertismg 011 the
sale., of cheese. The principal objective of the study was to measure the retail .,ales response
(pounds of cheese sold) to varying levels of advertising. Eight markets were selected for the
experiment-two from each of the four geographic region-.: Northea'>l, Midwest, ~outh
west, and Southeast. Two markets with similar monthly saks patterns were selected from
each geographic region in a way that m1111mi1ed overl.ip of local tek1 i-,1011 and ne11-,papcr
cover.ige. Vl/ithin each geographic reg1011, the two markets 11cre des1g11all'd as test or LOil
trol market on ,1 random basis.
becutives determined the levels of advertising to be le;ted in the experiment. It wa.,
beliL'1ed that the levels should be distinct enough to generate measurable difference-, in the
results. They decided to tests the impact of four levels of advertising: 0 cents (level A),
3 cents (B), 6 cents (CJ, and 9 cenb (D), all expressed on a per-capita basts. The 6 cenb per
ca pi la level represents a national campaign costing approx1111ately 12 m1llion dollars ( 111
197 l). The principal medium for advertising was television, with point of purchase displ~11
materials in stores and newspaper ad~ playmg a secondary rok. Lich of the tour leveb of advertising was implemented within each test market during one of four l-month periods he
tween May 1972 and April 1973. The sequence in which the .1dvertisi11g leveb were tested
was selected so that each advertising level was tested in onlv 011e test market during <InY Olll'
tilllL' period. Such Jn arrangement 1s referred lo a' a I.attn ~ifUUrl' design. You can check that
ead1 letter in the Table A 11.2 (A, B, C, J J) appc.irs only onu: 111 ead1 uilu11111 and e,1d1 ro"'
\\'1thin each market, L' DIA executives obtained the coopei at1011 of approximatelv lO su
perm.1rkeh 111 oht,1i11ing quarterl1 .1ud1h uf d1ee'>L' saiL'' .\1 , 1,1gL' LhL'L''L' s.iiL'' I 111 J'llllllL1'
per store in each lL''t market aero" the fDu1 l 11w11lh f'L'l'imb hell\L'Lll ,\I.iv I 'J7 2 ,111d :\pril
197 l ,1re listed in I ,ihlc A 11.3.
l'hc J1r<lll'durc for de1cr111111111g wh1Ll1 L'ffrcts .ire sig111tiL.111t 1nvoln'' nrnlt1ple comf1,1r1-,nm
of 111,1111 l'fTeLI'-. In 1hc 2 1f.1ctnr1.il cxpcrinll'nt, fcir nample, we assess the significance of 15
effects. Jn /11
15 LOmparisons, 1t would not be unreasonable to sec one effect outside the
l ri t iL.il 1al uc I ma rg1 n of error l, + ( I. 96) sin 11drird error( effect), iust hy cha nee even though
thert' .ire no acti1'l' f,1t tor,. rci gu<1rd against the error of declaring a f.ictor significant incnrrcctlv, we can apply multiple u>mparison proLedures that increase the critical value. 1\ simultaneous 5/ci margin of error 1s obtained hv repL1ung the 97.5th percentile (0.025 upper
tail probability) with the pcrLentilc of order (I + 0.95 '")12. This simultaneous margin of
nror uses the f.ict th.it estimate., of the effects arc independent. !or cx<1mple, for 111
7
1
comp.1ri'>ons, the JK'rLcntile of order (I + 0.95 )/2 - 0.9963 from the standard normal
distribution is 2.68. This is larger than 1.96, the factor used in the critical value without a
multiple comp.irisnn .1djmtrne11t.
\\'c can apply the multiple comparison.., procedure to the results of the rq1l1cated
3 factor cr.icked pol'> example. In Table 4.6, the confidence intervals and significance tests
for the effects arc ha,ed on a /-1alue of 2.106, which corresponds to 97.Sth percentile of a
I distribution with 8 degree nl freedom. Applving the multiple comp.rnson method with
111
7, the apprnpri,lte I v:1li1c 1., ).56, the 99.63th percentile of the t cfotribution with R degrees of freldom. 1 he cnnh lcnll.: interval for each effect becomes wicn; Estimated effect
3.56 ( 1.41), or I 'tim.11L'd effect +-S. I 1. Am effect with absolute value greater th,111 '.i.13
is statisticallv signific.1111. Th' main effects are still significant, but the significance of the F<C
1nter,1L ti on (e.-,t i mated effcL t lf 5.5) becomes borderline.
\d1u,tments for multiplL comp;1risons guard against the error of judging too m.1111 factor.'> .is important. ( )nc could Mgue agJinst thl' use of such adjustments on the ground that
1t J'> muall} not ,1 \l'l'IOlls mi,t.1kc to u>midcr an imignificant effect as significrnt. f\lost l'\
pcrime11tat1011 1-, ,1 '>eque11t1al <lctivitv, and not ,1 one-shot affair. Including borderline '1gnifiL.111t f;1llor.., .it .1 '>llh'L'']llL'nt ,t,1gc Lert.11nlv involve' more work .l\ cxtr;1 factors need to
he c.irried .1lo11g. l lo1,t1,r, <>Ill' l\111 karn .lt the next stage that such foLtor-, arc not lll'L'cicd,
.ind not n111Lh h.ir-rn i' dorH h1 not ruling them out immediately. On the other hand, dispmlllg ol
.1
,1
REc;Rr-.SSIO~
Note th.1t rcgrc>s1on software 1s reJdily a\.iil,1hlc ,rnd all that is needed in practice is .lll underst.1nding ofhow to 1nterprL't the program output. \'\'hile a detailed k1owledge ofrcgrt's-.ion ''not nccc.,.,an for ,ipph ing the design apprnad1 put forward in this chapter, it \\'ill
help you underst.111d Ll'rtain isrnes in Chapters 7 and 8. Also, if you have h<ld prior e:--pnsu1-c
to regrl's.,ion, the m.1tcnal in the following two appendixes will give you .1 brief conLi.,c -,111nm.1n of the mJ1n n''>tilt.s
Con-,1dcr the -,111111lc l1ne.1r regrc-,sion model y
{3 11 + {3 1x l t:. Ignoring the 1rn1-,e t:,
this model represenh a straight ltne if plotted on ,lll x-y graph. The noise componrnt in
troduLe'> random '>L.tlll'I around the model line f3n I f3 x. Assume that the noise uimpo
nent h,1-, mean /CICl .ind \'Ml lll(e rr 2 .
,\s.,ume th.it there .ire 11 p.1irs nf ohserv<1tinm (x 1..11), (x,, y;), .. .,( x:n, y,,), whid1 arc
1
gra11hLd on a 'Lattl'I plot. f-igure 2.7 111 ~CLtion 2.3 is an example of such .1 plot. The ohieL
tl\L' ''to dctl'lm1nc the line th.ll 111.s the data hc-,t. l.c.1st -,quarcs cst1m<1t1on selech the e-,t1
mates for {3 11 and f3, \\hich we denote h: {3 11 and f3, hy minimi;ing the sum of the squMcd
1crtiL.li di'>lancc' ~ ., .l,
13 1 {3 1x) ". !'he estimatl's can be calculated quite ea.,ilv. l:xpre'>siom for the L''t1m,1te'> Lan he \Hittcn down in vector/matrix format,
lhc 11 X I column \CL.tor y rnns1sts of the response (y) observations. Then X 2 m.itrix X
consists of two n X I colum"s: a vector of ones, denoted by I, and the vector x cont,1ining
the values of the rcgrcssor (x: variable. That is,
xl
Xn
x~
Xn
the m.1tn\ product of,\"and \",and (X' X) 1 is the inverse of(X' X). The matrix exprL''><.ion
lor the estimates 111 equat10 (4A. I) is very L.011vcn1cnt as it also wori<s for more general
mndck
lhL' lilted \,dues from thL' regression lit,
y,
/3 11 + {3
regres'1on cocft1uenh in the model equation hy their estimates. rhe residuals arc the
d1ffercrKl''> hetl\'l'L'n the oh-,enations and the fitted 1alues, y,
y
J',
11 -L fe, -:,).
(fe
5.1 INTRODUCTION
In Ch;1ptc:r 4, we di-;rnsscd 2-lcvcl factorial designs. These designs arc very useful when
there <ire relatively few factors. Rut ask, the number of factors, increase~, the required number of runs in a 2' factorial design grcws rapidly, with each additional factor doubling the
number olruns. \\'ith 4 factors, there arc 2 4
6 factors, 2 <,
10
32 runs; with
viously, an experiment with that many runs would be out of the question. If full factorial designs were the only choice for the experimenter, experimental design tools would ha\'C limited value. But as we will sc<' in this chapter, fractional designs in which the experimenter
performs only a fraction of the number of runs required in a full factorial design offer an
extremely powerful .1pprnalii to experimentation.
CASE 8
EXPERIMENTS IN RETAIL OPERATIONS:
DESIGN ISSUES AND APPLICATION
Gordon H. Bell, Johannes Ledolter, and Arthur}. Swersey
INTRODUCTION
Experimental design methods have long been recognized as an integral part of production
and operations management in general and quality management in particular. With its origins in the pioneering work of Sir Ronald fisher, who published 'J'hc lJt'sign of faperirnents
in 1935, experimental design methods have been widely applied to manufacturing problems, with numerous case studic:s and examples appearing in the literciturc.
In the early 1980s, largely in response to competition from Japan, LJ.S. firms took a renewed interest in these statistical methods, with the Big Three aulomubik makers al the
forefront of these activities. Experimental design was emphasized throughout that decade
as <111 important aspect of statistical process control (SPC ) and total quality management
(TQM) activities. More recently, Six Sigma programs have gained widespread attention,
with L'xperimental design being a prominent part of that methodology. Principles uf lean
production have been combined with Six Sigma, resulting in an approach that simultaneously focuses on both these methudulugies. Over time, the focus of ~ix ~igma and uther
qualit)' <1ctivities has shifted from its original focus 011111a11ufactur111g to a b1 uadcr sl'OflL' th cit
incl udcs health care and other service areas. Similar! y, concepb of lcc111 prud uction have 111ure
recently been applied to service operations. For example, patient-focused care in hospitals is
designed to increase quality ofcare by decentralizing many ancillary services and bringing the
caregivers to the patient. This approach is very similar to just- in - time production /cell manufacturing in a factory setting.
Bisgaard ( 1992) provides a notable , historical review ur ex perimcntal design case studies
that includes what he calls "a partial and unsystematic list of articles ... showing engineering and manufacturing applications of experimental design." This list comprises more than
130 case studies. More recent case studies applying experimental design methods to manufacturing problem are discussed by Lin and Chanada ( 2003 ), C:herfi, Bechard, and
Boudaoud (2002), Schaub and Montgomery (1997), and Young (1996 ).
In contrast to work on manufacturing problems, applications of experimental design to
service problems, including marketing and retail operations, have been limited, with examples rarely appearing in the academic literature. In searchrng for service applications ot
CASE
PEAK ELECTRONICS:
THE BROKEN TENT PROBLEM (PART B)
Peak r<l!l a replicated 2 1 fractional factorial design to solve the probkm discw,sed in Case 4,
with each run being a single test panel. The response variable is the number of broken lenb.
The order of the 32 runs was randomized, with the run seyuence shown in parentlwses. The
results are shown in Table A7. l.
QUESTIONS
1. Analyze the results. Estimate the effects, and obtain their significance by comparing
CASE
A random sample of 40,000 persons participated in the test de,cribt:d in Case 3, with the letters mailed on March 15, 2000. The 27 3 design shown below (Table A6. l) w<1s used, resulting in 16 different experimental runs. bch run consisted ofa [1articular cornhi11at1u11 l>fL1Ltor settings, with each combination sent to 2,500 persons. The response variable ,hown in
Ta bk A6. l is the net response rate (in%), which is the percentage of people who subscribed
and paid (either by cash or credit card). The estimated effects are shown in Table /\6.2.
QUESTIONS
1. Analyze the results of the experiment. Which effects arc stafr,tically sig11ificant at the
50fci level? At the 10% level?
'i'AHLI
i\(J.
ABC, F =
LJ
!l(.'J),
F.'
C = ACIJ, and
l~fsponses
Respoml' (%)
2.08
2.7b
2 j()
-1.1/4
2.-3(>
2.M
t-
2.64
2.40
2.52
3.24
2.12
L\2.
+
+
,_
.\. \ 2
+
+
I.%
uo
5.7
TABU
t-1
I
5 factors:
E = ABCJ)
Resolution l\ 1 designs:
F = ABC, F = BCD
F = ABC, F = BCD, G = ACD
/: =ABC, f- = BCD, G = ACD,
20[\'2
6 factors:
7 factors:
2~\J
8 factors:
2;,~'
9 factor~.:
10 factors:
2"
2 10
11 factors:
12 factors:
13 factors:
2/i1
212
Ill
21 J
14 factor':
2i1~
15 factors:
2i1;
Ill
ff= ABD
Hun
.A
+
-+
+
+
14
+
+
+
+
+
+
+
+
CD
ACD, H
= ABIJ, 1 -
CD
ABC
ABD
ACD
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
-'
+
+
+
+
+
+
+
ABCD
+
+
+
+
+
+
+
+
+
+
BCD
BD
+
+
+
+
BC
AD
E = ABC, F = BCD,
E = AFW, l = BCJJ,
.\'=BC
f =AB< , I = B< V,
N = BC , CJ = BIJ
l: =ABC, J- = BCJJ,
X = BC. 0 = BD, P
+
+
+
15
IA
AC
+
+
9
10
11
12
13
AB
+
A
(>
Ill
+
+
+
+
+
+
+
+
+
In many comparative studies we evaluate the success of a new strategy or method through
the resulting change in a proportion. for example, we may have two different advertising
strategies (I and 2) and may be interested in whether or not strategy 2 increases the pro
1T2 - 1T,
portion of people who buy a certain product. Under the null hypothesis 1/0 : 7Tt
the distribution of the difference of the two sample proportions p
Pt is normal with
mean ()and variance 211( I
11 )! n, where 11 is the si1c ol the ilrsl (,lllJ scLOnd) sample.
for a lest with significance level a, we reject the null hypothesis in favor of the one-sided al
lernative J-lt: 11 2
7Tt ~ 8 > 0 whenever p_
Pt> Zt "\, 211(1
rr)/11; zt ,. 1s the
I 00( I
a) pem:ntile of the standard 11ormal distribul!on.
\Ne are looking for a test with power I - (3, which irnplic, probability (3 of falsely accept i1lg the 11 ull hypothesis if the al ternali ve (11 2
8) is actually true. This require
menl implies the equality
7T)
. 211( I
-'t
,' 11( I
v'
"\
7T)
<")
II
(11
Zt
+ 8)(1
7T
/J
<'>)
II
The .ihove equali<rn can be solved l(ir the sample si1c 11, leading to
rr) + Zt \! r.( I
rr)
(rr 'i5)(1
7T
?/
7T )[ Zt
"
Zt
/J
j2
32
Example Consider the planning value 11
0.03 for the common success proportion, and
assume that it is important to detect an increase ofone-halfo;'a percent (8 - 0.005). for
a
(3 - 0.05 and z0 .95
1.645, we must sample
2.(_\.l.\)) j\().97)~ \.64.5
n
(0.005)2
+ 1..645 1-
25,200
in each group, for a total of 50,400 people for the two grour s combined. StatisticJ! LOll1
puter software such as Mini tab and JM P includes routines for .,uch -;ample size calculations.
Mini tab, for example, returns this samplc size when asking fo1 the power/sample si;e in the
two proportions case.
Comment. The result in Appendix 2.1 is pertinent to the design ot cumpuruuve experiments that ctttempt lo estimate the difference between two unknown suLcess proportions. It
shows how to select the two sample sizes such that a certain specified difference (8) in the
2.1 INTRODUCTION
This chapter reviews basic concepts that we use Ill the remamdcr of this book. Section 2.2 reviews discrete and cont111uous probability distributiom including t>rn important speual
cast:s, the binomial and normal distributiom. In Section 2.3, \\l' focus 011 the graphiLal display and numerical summary of information. lop1cs covered include bar and pie charts for
categorical data; dot diagrams; histograms and scatter plot> for L011ti11uous data; and sum
man measures including the mean, median, standard dev1at10n, and u>rrclation coef11c1cnt.
In SeLtion 2.4, we di.-,cuss sampling and tllld(>llJ -,am piing, and 111 SL'Lt1011 2 ;, \\L' rL'YiL'" thL
basILS ofstatistiLal i11frre111.e. We discuss LOlifidellLe intnvals dnd hy r111thL'SIS tests ford '111gk
mean and a single proportion, and determine the sample site th.it is rcqu1rLd for estimates lo
achieve a given le,el of precision. \\'e also address the comp.mson uf t\\'O f'llpulat1om, using
data from the completely randomized as well as the randomi;ed bloLk experiment. ,\ e<1se
study on the effectiveness of two advertising strategies completes the chapter.
2.2 PROBABILITY DISTRIBU rIONS
The world is uncertain, and measurements on products and processes vary. Probability dis
tributions descnbe the vanability among the measurements.
/(u11dom variables are variables whose outcomes arc u11cenai11. I or example, the pur
chasing response of a customer who receives a catalog or an e mail offer can be "yes" or
"no" or in coded form, I for "yes," and 0 for "no." ~imilarl:', the soldering quality o( a circuit board, expressed in terms of the number of flaws, is a random variable. The board may
have 1ero flaws, exactly one error, two errors, and so on.
R.andom variables with a discrete number of possible outcomes (in the first example, 0
and I; in the second example, 0, I, 2, , , . ) are called dzscretc rundum l'urwhles. \\'e use discrete probability distributions to describe the uncertainty. Later in this chapter, we discuss
the binomial distribution, the most important discrete distribution.
Variables such as the length or the width of a product, the amount -,pent on purchases,
the commuting time to work, the gas mileage of a car, or the yield of.1 process arc continuow,
_J
xii
PREFACE
We thank him for contributing so much to this book and look forward to continuing our
collaborations with him in the future.
We also tbank the many students who Look our classes at the University or lowa, the
Vienna University of Economics and Business Administration, and Yale University. We
treasure the interactions we have had with our students and value all we have gained from
them. Finally, we could not have completed this book without the encouragement or our
families and closest friends. Writing a book is inevitably more time consuming than anticipated, and we will always be thankful for the patience and support we received from those
nearest to us.
We welcome comments from readers. Our e-mail addresses arc johanncs-ledoltcr@'
uiowa.cdu and arthur.swersey@yale.edu. Throughout the book we have tried to Lonvey our
passion for the subject of experimental design and to share with readers our strongly ldt belieCs in the power of these methods and their practical value. The SULcess of this book will
depend in large part on the experirnents carried out in the future by those who read it.
Johannes Ledolter
Arthur J. Swcrsey
-,-
PREFACE
I
_[_
Our interest in writing this hook began about 10 years ago when in our own work we st,1rtcd
to explore the applications of experimental design methods to problems outside of manufacturing. We recogni1ed as others had before that these powerful approaches were valuable
tools for marketing problems. We also discovered that beyond marketing applications there
were other important questions outside of manufacturing for which experimental design
method' rnuld he mcfulh, applied. r'or example, in education much research has been directed at determining the relationship between student learnmg as measured by standardi1ed test'> .ind <..lass s11c. LHge sLale tests have been <..arried out, but researchers have missed
the opportunitv to use experimental design methods that would allow the experimcnter to
simult,1nrnusl) and efficient I\' tl'st other variables such as textbook, use of computers, level
of p<1rc11t.d invoh L'lllent, .ind amount of homework. \\c also ohserved that existing hooks
Oil experiment.ii dc'>ign rou1scd almost exclw,ively Oil industrial applications. Recogni1ing
this, we h.we written a hook th.it <11ms to fill this large gap in the literature hy cmph.1si1ing
marketing, scnicc opcr.ition'>, ,rnd general business problems.
We h;1ve writtcn this hook for hoth academic and practitioner audiences. It can he used
effectively in MRJ\ cour'>es in quality management and marketing research and in undergraduate and graduate engineering courses 111 design of experiments. It is aho well su1ll'cl for
sLll ;,tudy hy qualit) profess1011,ds, 111d11ageme11t consult,rnts, and other practitioners.
\\'e ,1ssu111e that readers h.ivc had ,1 basic undergr,1duate course in statistics or an introductory statist 1cs mu rsc at the M RJ\ level. Chapter 2 provides a review of the basic stat is ti
c.il Lnncepts th.it we use throughout the hook. In suhscqucnt chapters, material that is more
mathematic,illy ad\'anced (review of regression using basic matrix algebra) is incluckd in
appendixes. We have included this material for the sake of mathernatical rigor and completeness and to give those with more mathematical backgrounds the opportunity to delve
dcqwr into the has1~ methodolog).
In teaching statistical methods we have found that students learn best if they sec the relevan<..c of the m.1tcrial, learn dearly how to apply the tools, and understand the underlying
st.it1stical LOllcepts. People learn ahout design of experiments best by solving exercises,
-.or--.
I n
-x,
n 1~
II
<J<J <J
<J <J <J
<]
<]
<:
/1
<J<J<J
<J
<J<J
<J
<J
r1
<J
<J<J<J
<] <!
<] <]
<J
< <J
<
<] <]
<]
<J<<J I ~
r-1
,..__
il
<J <J<<
<]
: : : ~~:1i
11
<]
::J
<]
<]
<]
<I
<J<J
<J<J<J <l
<]
-
00
<l<l<J<J
,..._
<J <J<J<J
CJ
c
c
"'
0
~
0v
e
OJ
<J<l
<]<]<]
""
"'
"'
iii
.~
r<""l
CJ
- "'
<]
<J<J <l
<l<J<l <l<l
'
TABI I.
6.5
Factor
,,
1.298
\vcragc
ll.OM
O.(F!1
0.0\2
OiJ.11
() 092
n J1X
fl
('
/)
I
I
c;
-0.556
II
I
-0.192
11.0XX
"
,\1
ll.116
(i.()61
()
().(1')2
,.
0.0)2
I'
<)
fl
O.ll'lh
0.08()
-0.304
-0.864
s
-..:n
10.1
0.296
1 1:
( 2)(0.' !717)
( ,: '>ilcKL'f
0.8h-I
0.)511
I: Price graphic
0.128
/ .. I ct tcr r1m1su1pt
0.1111
I. ,\dd1t1on,i\ gr.111h1c
K: I isl ofhcndits
0.1188
0.(181)
Q: Info on hucks\1p
/l: !<ct urn <1dd re"
,\1: S1gn.1111rc
,\ lnnlnpl
\
ft,t-.L'r
Prndt1Lt ".clc1..l1on
/) l't1 . . t.lgl'
( :tlt(ic1.1\ st.1111p
0.0C-4
I
<
l
'
on
11.llhl
0.05)
O.llll
+11.ll.\2
l
0 I
T
0.2
tl..l
T
ll.'!
I
() 'i
-,-0.6
hgurc6.I
0.7
I
1.0
Sections fi.2 and fi.3 describe the rather complicated confounding patterns of PL1ckcttBu rm an designs. In th is <lppcndix wed iscuss these pat terns in more de ta ii, and we show how
they arc derived. We ignore interactions of order 3 and higher, and focus on the confounding of main effects and 2-frictnr interactions. Furthermore, we discuss the projcctivity properties of Pl:ickctt - f~urm,1n dc-;igns, which make these designs useful for factor screening.
Result
Consider an orthogonal (' csign with k factors at 2 levels each, such a' the fraction di foc tnri;d or the PLickc11 - l111rman design. The confounding coefficient between the main effect
of factor i and the 2-Lictoi intcrnction among factors j and r is given hy the correlation coefficient between the clc-,ign \CCtor x, and the interaction (calculation) column x 1,. Let us
denote this correlation coefficient as
Proof
General regression results about the bias of regression estimates when fitting an incorrect model arc used to show this result. This approach was employed by Margolin ( 1968) in
his analysis of the confounding patterns in Plackett-Burman designs.
We are fitting the main-e'fects model:
(A6. l)
where X = [ x 1, x7, ... , xkJ is the orthogonal design matrix and f3 = (/3 1, (3 2, ... , (3k)' is
the vector of rnai n effects. Table fi. 1 Iis ts the design matrix of the Plackett- J)urman design in
N - 12 runs. The de-,ign matrix for N = 20 is shown in Table 6.4.
Assume that 2-Lictor interactions arc present and that the true model is given by
y =
X/3 -'
X.f3. +
t:'
(Ah 2)
X.
, x 12, x 1.1, ... , x 23 , ... , x 1 ul is the design matrix consisting of 2-factor interaction
(calculation) columns, and f3. = ((3 12 , {3 1,, ... , {3 23 , ... , f3k 1.d' is the vector of2-fallor in teractions. Ccncral regression results (sec Draper and Smith, 1981, p. 117; Abrah;1m and
Ledoltcr, 2006, p. 208) implv that the calculated main effects, obtained by taking (onc -h;1lf
of) the difference of the rcS~'Onse averages at the plus and minus levcis of the factors, arc
cstimdtes of
(Md)
For orthogonal desigm (such as the fractional factorial and Plackett-Burman designs) the
matrix X' Xis diagonal with diagonal elements given by N, the rn1rnber of runs. f11rthcrmorc, the columrn, of X dn d X, sum to zero, and their squares add up to N. Hence, the
matrix (X' X)
X' X. is
-..- --
__J_ _ _
7.1 INTRODUCTION
lintil no"" we have d1srns.scd experiments with factors at just two different levels. \Ve uKlcd
thetwolcvclsas"lcm"a11d"h1~h,"or"
J"and"l-J,"orsirnply" "and"+."Afactor111ay
descrihc two catalysts in a chrn1ical reaction, two ways of displaying information in an ad
copv, two cover prices ofa 1111gazinc, or two different budgets for an advertising campaign.
With 1ust two lenls, the assinmcnt of the " - ",rnd "-+"levels is arbitrary.
In so111e appliLations a t '1 factor mav have three (or more) level-,. There Ill<!)' he
threL' L.1lal1sts, 1hrec methmk ,111d three priles. It i'> common to code the three le1el.s as
I, 0, and
I. The foll or mav he citegoric1l, with no particular order among the categoric,, Jn thi-, c.hL'. the assignment of the c.1tegories to the coded levels is arbitrarv as
,rnv one of the three Liltegories un be ,1ssoc1<1ted with .1 certain level. This will not he the
case if the factor is continuous. The cover price of a maga1ine may have been set at one,
two, or three dollars. Or, th temperature of
at I ,500, 2,000, and .\000 dc~rees. For continuous factors, the assignment of the actu,d levels to the uJLicd one,, I, 0, ,rnd l- I, LJ1T1es additional meaning. Jn addition to stud ying
1Vhcthn or not the llll'.ln IL',pon-,e-, ,1t the three levels MC the same, Pill' can explore the
funltion,d rL'iat1on-,h1p hctw"cn the mean response <1nd the cont 1nt10L1' factor. \Ve 11mh,1hl1 \\'ould not l'\J'l'll th.it th LOYL'r price has a linear effect on sales. J.inc;irity of the '>.lies
re'>ponse to Lhangn 111 priLL' 11,i1 ,1du;illv he the hypothesis that needs to he confirmed or
refuted from the data. Where.is a linear function of price can be fitted (perfectly) to s,lics responses at two different priLc levels, we need at least three price levels to fit ,1 quadrat1L function \\'ith 1ust two levels for pnce, it is impossible to check whether a linear relationship is
.1ppropr1,1te.
'->ell1on 7.2 d1sLU\SL'S the )-:neral factorial experiment with two factors; the first f,1Ltor A
ts studied <it a different lcveb while the second factor H 1s studied at h leh'ls. A complete focton,1! experiment requires r!m at all nh factor-level cornhinatiom. \Ve show how tn estimate and test the main effec '>of the two factors, and we discuss how to assess the interaction effect. An example is given in Section 7.3. Section 7.4 discusses additional useful
90
80
. .. ' . . . . . . .
70
-------
........
--
.....
20
~
()
I' rice
.. Displ<iy
7(1
hll
)()
20
,---
~----~-..L...-----------,--
()
Figure 7.2
TABLE 7.6
A NOVA Tahle: Sales n( Apple Juice
~-~--
Source
DF
SS
Display
D(1 in)
2
1
1
2
1
1
4
1
1
1
1
9
17
4636.1
4385.4
250.7
2624.8
2411. 2
213. 6
130.1
85.8
9.9
29.0
5.3
1079.7
8470.7
D(qua)
Price
P (1 in)
P(qua)
Interaction
DP(linxlin)
DP(linxqua)
DP(quax 1 in)
DP(quaxqua)
Error
Total
"o
1 1: Signific~nt
--
MS
2318.0
4385.4
250.7
1312. 4
2411. 2
213. 6
32.5
85.8
9.9
29.0
5.3
119. 97
F
19.32
36.55
2.09
10. 94
20.10
1. 78
0. 27
0. 72
0.08
0. 24
0.04
p
0.001
0.000
0.182
0.004
0.002
0. 215
0.889
0.420
0.781
0.634
0.838
TAHLF 7.10
Regression Formulatwn of the Mixed 2 23' Factonal Experiment, wllh Linear and Quadratic Main and lnteract1011 Effects of the 3-1 e1cl Factor
-- -- -REC,RF....,SOR COl.U:\1'.:S
---M-\l\: O!
DESlC'\J FACTORS
A
-]
I
-]
-1
-]
-I
l
l
-l
1
-1
-1
-]
- l
-]
-]
-]
I
l
l
l
l
-]
-]
0
0
0
0
l
1
l
1
-]
C{11n
-l
I
I
I
-]
l
- I
l
-I
I
l
l
1
I
-]
I
I
l
-I
I
I
-]
-1
-I
-I
I
l
-1
-1
-1
-]
-1
0
0
-1
I
-I
-I
I
l
I
1
-I
-1
1
-]
I
-]
l
l
-]
-1
0
l
l
I
1
Sum of squares
C(qua
AB
AC(lin I
.-\Ci qua)
l'.'JTFIL\Lll()'\i .4'
HCl1in1
HC1qua)
IN fl IZ.-\( Tl<l'."'
.-\BC:l1n
Al~
.-\/l( qua I
Re'l1ome
---
-1
-I
- - - -(
45.375
I
I
I
-]
l
I
22.0"2
l
I
-]
-]
0
0
0
0
I
I
I
I
-I
-I
-l
-I
I
l
l
l
I
2
-2
-2
2
1
l
-1
-1
l
I
l
- l
I
I
I
l
I
I
I
l
l
I
-]
I
2
2
-2
-I
I
l
I
-I
0
0
0
0
-]
-I
I
I
I
2
2
I
I
l
0
0
-1
0
0
0
0
-I
2
2
-2
-2
-I
- I
l
I
l
I
l
1
-l
l
I
1.042
--
5.063
0.187
0.563
l)
- 2
2
I
I
0
0
0
0
-I
I
I
-I
-]
-]
-2
I
I
I
-I
-I
I
2
-2
I
I
0
I
I
I
I
I
-I
-I
-1
-I
I
0
-1
-2
-]
-2
4.687
I
l
0
0
0
0
l
l
l
I
l
-I
l
I
248..062
-]
-2
I
l
I
I
0
0
0
0
I
-]
0.021
-]
-I
I
I
I
-I
6
5
7
7
]()
-I
0
0
0
0
i
-I
-I
I
() 063
-t
9
6
-1
l
I
1.021
11
.,
AB I I
X.2
Clrr~
/'hrough
/~otc,
! '\( IOH
( reativcs A
1
run') ho1to111
2
.1
1
ti
top
l
.1
(1
1
3
(1
<)
J(I
<)
11
12
I\
11
1,
lh
17
IX
19
UN
U(H
I>
x
Ill
Ill
2,2'12
2, I 'i I
2,212
2,ll-1
2, Ill I
2,089
2.111
I>
(l
')
Ill
"
21
(J
(J
22
Ill
(J
2l
I
10
2h
,,
21
(,
2H
24
')
6
h
2.
](1~
l,hILJ
289
'Cl
\
.l
"2
Ii
.1
,1
6
3
:i
6
I
5
5
I>
1
I
2fl ; ~
I ~q2
2
2
1
I
2.211:
15
IX. l'i
2,07'7
'i
44
ll).l)~
l7LJ
h1
lh. lh
I S.IJ'
.'il.llh
'i
1-.22
4 I\
.068
2,lhI
119
Ill
()
160
11 l
12n
1()(1
\.I
18:i
2,[)l)[)
~Sh
lS I
2,211
2.1 21
ti
I 'J. -,2
I ;2
IX. \2
li>.91
!h.HlJ
1- 9')
2,0)9
479
120
l'il
191
IX.K \
(J
160
452
11 \
IX.\\
l')\4
2'1 I\
I 21
211. 17
1'1. \X
21 1-
20.
HI
IX.91
l'J.().J
1- In
ll.\
\.17
1
1
I
\
4
401
410
IX hO
19 ,\
\9\
\,
420
176
412
19 I
~'I.'. ~
,,
18
39
40
41
42
4.>
40"i
\71
\\
\(,
JJ{ (<>10)
( :i<cks
2,118.'
2.0X7
\II
\I
\2
,.~
2, 119
2.121
2,0HX
2,208
2, 1.l 1
(J
.'''
h
(J
21l
. I
(J
2, I "7
2, I 50
1,988
2, I (11
')
I>
I
4
('
(>
(J
Visilor.,
(1
10
10
10
h
II
/)
/l
.l
2
I
2
4
:i
2
I
2
2,2;7
"ill
2. 12 \
2. 188
2,244
2,202
2,241
2, IH'i
2, 166
2, 194
2,094
2,214
2,061
2,072
.184
7,oLJR
189
406
114
448
454
479
451
364
397
390
441
1,282
~q
I -11
IX '7
I '. -,2
~ :! 72
!HllX
Ii. "J
IX.OLJ
11 '()
ILJ.'J'J
20.77
22.11
20. ~'.)
I !.IX
I
l))
I X.92
21.28
lh.(l)
The number' under .ireas A-hot tom, A top, fl through I refer to the" .ailablc leveJ.,
I I
L'.tch 111 the JO tl''t .ire.is listed 111 J.ihk8.J. l.cvcl I rcprcscnh the baseline level.
111
\
I
APPENDIX
CASE STUDIES
I hi-, appcnd1\ Lnnt.1i11' thirteen c.1sc '>tll(iil''>. The following tahlc shows for each C<l'>l' the
scct1011-. oft he book that u>11ta111 rclev,rnt material.
(.,1sc"\o.
2
3
4
9
IO
II
12
Titlc
Relevant C:ha11tcr'
I .aglc llr,111d.,
Maga11nc !'nee Tcst
t-.lothcr Jones (Part A)
Peak l-lectronics (Part A)
( )fliLc '->upplic'> I mad I est
Mother Jones (Part H)
Peak f kll rornL'> r Pa rt BJ
4 and 5
4
4 and 5
Piggly Wiggly
l'nitcd J).iiry Industries
Almquist & \Nyn r
Phnncl Ing
4 and 5
5
5
(l
7
Ii and 7
8
8
ACKNOWLEDGMENT
(,ordon I I. Bell (President, I ucidView, 80 Rolling Links Blvd., Oak Ridge, TN 37830, L1SA,
( 865) 693-1222, ( 865) 220 8410 (fox), gbcll@lucidview.com) contributed to Cases 2, 5, 8,
and 9. \\'c MC vcr) grateful tn him for allowing us to include these case studies in our text.
CASE
INTRODUCTION
The publishing industry has seen a continual decline in magazine sales over the last few years.
rvJore maga1.ine title.,, free content nn the Internet, and lower readership have all led to in dustry difTirnllics. Publishers push subscriptions through direct mail and online sales, .ind
they try to advance -.ingle-rnpy newsstdnd sales in supermarkets and other retail outlets.
One leading publisher wanted to find new ways to increase profitability by testing new
price points. I [owever, profit depends not only on the magazine cover price but also on the
number of new subscribers Jnd the cost of unsold copies. So, the publisher focused on three
factors to test:
Cover prier. This is the price paid for a single newsstand copy; it is considerably higher
than the per-copy subscription price.
S1;hscription pncc. This is the price shown on the subscription card inside each magazine. While publishers like the higher per-copy profit from single copy sales, they
want to get as many long -term subscribers as possible since a larger subscriber b.ise
incrc;1scs their advert i.-.111g revenues.
,\ 'w11hcr o/copu_
, 011the11cwssla11d. The publisher loob at the balance between h.1,i11g
enough copies <lvailahlc for every customer, yet minimi7ing the number ofleftovn
maga1.incs. J'he m.1rketing team wanted to test if a larger (or smaller) excess of cop ies might increase sales. With a few magazine racks in each store, they wondered
if more copies in every rack would lead to higher sales, or if fewer copies in each
rack-even an empty rack or two-might encourage customers to "buy now while
supplies last."
PRICE TEST
With the high cost nf printing different covers and subscription cards, the publisher wanted
to minimi;e the number oftest cells. But the marketing director also e,pected to sec some
interactions among studied factors and "curvature" in the relationship between sales and
price.
CASE
PEAK ELECTRONICS:
THE BROKEN TENT PROBLEM (PART A)
In the late foll of J 9lJ I, Peak l\bnagement '>tarted to get concerned with "broken tent<' the
nulllher one c.1use of rework. 1\ tent is a piece ofphotoresist (or film) that covers a hole that
is not to be plated with copper. If the tent brakes before copper plating, then the hole is
plated and the panel needs to he reworked by scraping the copper from 1.he hole.
THE PROCESS
rhe production pr<Kcss consists of many steps. Steps 4 and 5 arc relevant for this discus'>ion. In step 4 (laminate p1otorcs1st), a th111 photosensitive film or resist is laminated
(bonded) onto a copper panel. The film is applied by rolling a sheet of film onto the panel
.ind puttmg the panel between two rollers with the heat and pressure causing the film to adhere tn the panel. In '>tep "i, a film negative showing the circuitry is placed over the photoresi'>t cmTred p.1ncl, and thL' panel is exposed to ultraviolet light. The circuitry on the negative 1s opaque ;111d hl<lLb the l '\'light. The rest of the resist on the panel is polymeri1ed
I h.irdened ) hv the l 'V light. In the developer, the p.mel moves on a conveyer through an
X foot-long challlhLT 1\ den loping solution I'> sprayed onto the panel and the resist, which
1s not polvmcri1ed (hardened), is washed away, exposing the circuitry. At the end nf this
.'>tage, e,!Lh hole th.it 1s not t he plated should he tented (covered by film). However, ,1 tent
Illa\ he broken ,1t this point.
HERCULES, INC.
,\ qll.il1ty 1111J'lllH'llll'llt te,rn1 at Peak, using d (1.,hho11e diagram, identified the photorl,1st
as a likelv contrihlltor to the hroken tent problem. Peak had been using Dupont 421 'i resi'1.
'iuentists at I lerculcs, Inc., a competitor of Dupont, suggested that their film could s1gnific.111th reduce hrnkcn tent'>. Peak ran a test with the new resist, using 40 lots of 36 panels
c.ilh, 1ntcrspcr-.ing I 0 lots ti it used the current Dupont 4215 resist. T 11ey found no stat 1st IL<lll) signific111t diffcrrnces .n the m1mbcr of broken tents per panel between the Dupont
,rnd Hercules resish.
l ll'rLules was olw1ouslv 11nhappy with these results .ind suggested that Peak run .i de'>1gned experiment th.it might improve the process. The Hercules representative and Lou