Sunteți pe pagina 1din 297

To f,ea Vandervelde, my wife and inspiration (/l)

To my brother, Rurt Swersey (A/SJ

- - - -- - - - -

-- - -

INTROIJU( TION

1.2 A PREVIEW OF CASES

Throughout the book we illustrate concepts with practical examples. In addition, we 111cludc a group of real cases based on the actual implementation of experimental design
methods. In this section, we discuss the highlights of a number of these cases.
In the marketing area, consumer testing is an important and widely used tool. But most
marketing pr'lfessionals hold firmly to the approach of changing onlv one variable at a time,
which is often called "split-run testing" (also referred to as A I B splits, test control, or
ch.unpion-challenger testing). Only recently have marketing managers begun to embrace
multifactor techniques that simultaneously test marketing \,1riahlc,. I'hesc L'XPL'rillll'lll<il
design methods are particularly well suited to product testing 111 .'>upern1arkets.
I11 one significant marketing application, and one of the Lases 111 the book, ,1 m.1jor mag
di.inc publisher sought to increase sales of its popular magazine in a chain of supermJrkl'ls.
The lirm identified I0 factors to te ..,t, i11Llud111g a disn>unt 011 rnulllplc Lop1cs 1 no or res),
an additional displav rack in the snack food area (no or ves), and an on shelf advcrtl'>L'mrnt
(nnor yes). Alter considering a number 0Calternat1ves, the publisher implemented a 24 run
PlaLkctt-Burman experimental design (see Chapter 6). l:ach run C<lllsio,ted of <l particular
LOmbination of settings of each of the I0 factors.
/\key part of the experiment was to decide how many stores to include and how long to
test, in order to achieve statistically signifle<mt results. A total of 48 '>tores were included,
and the experiment ran for two weeks. As a result, the tirm identified '>CVeral changes that
increased sales by 20%, and equally important, it gained ins1ghb into which changes would
have a negative effect or no effect.
Direct mail is a common marketing channel, and firms use 1t tor a widc range ol prod
LILI.'> including credit cards, clothing, and maga11J1cs. Typical!\,, r-c'>p<>nse r<lles arc vcr} low,
and a small increase in response can mean large finanlial hcnc11ts. A/other }onr:s maga1111c
had cxtcnsive experience in direct mail testing aimed al increasing their subscription rates.
Their protocol was to test only one change, such as thc color of the cnvclopc, in ead1 mailing to potential subscribers. Using a fractional factorial dcs1gn (Chaptcr 5) the firm was able
to test seven factors simultaneously in a single mailing, gaining valuable and immediate msights that led to large l!1creases m response. Moreover, the results were attained with a
sample size (the number of people receiving the mailing) that was much smaller than would
have been needed if the seven factors had been tested one factor at a time.
1\ leading office supplies retailer designed and implcmented an e-mail te<;l targeted at
small business customers, a group the retailer wanted to attract to their stores and Web site.
!'he retailer identified 13 factors that it wanted to include 1n thio, experiment, with each fac
tor havmg two possible values. The factors included the b<1Lkgrouml color of thc e mail
(white or blue), a discount offer (normal price or 15% discoLlnt ), a free gift ( 110 gift or a pen
and pencil set), and products pictured (few or many). Testing all pmsibk Lombinations of
thc 13 factors would have required 2 11
8, 192 different c rnail dL'signs! l3ut using a fraction<il factorial design, a methodology that we discuss in Chapter 5, the firm was able to succes..,fully test all l.l raLtcm with just 32 differrnt designs.

INTRODU C TION

I3

Peak Electronics, a manufacturer of printed circuit boards, was faced with a recurring
problem. Jn the circuit board production process, most of the holes on each board are
plated with a thin layer of copper so that current can flow from one side to the other. Some
holes, however, arc not meant to be plated and instea d are tented, meaning that they are
protected by a thin layer of photographic film. During the manufacturing process, a significant number of these tents were breaking, and their holes were being plated. The result was
the number-one ca use of re.work at the firm, because the copper in these holes had to be
scraped out.
At the time. Peak was using film supplied by Dupont. The sales representative of Hercules, a competing filmmaker, suggested that Peak perform an experiment using the Hercules fi lm to test the effect on broken tents of a number of key manufacturing variables. The
sales representative designed the test and helped Peak analyze the results.
With the explosive growth of the Internet, Web site design has become an important issue,
as firms attempt to attract a greater number of people to visit their sites and order their products or services. Phone Hog is a subscription-based service through which consumers get free
long-distance phone calls. Participants sign up for the program and earn phone minutes by
visiting Internet sites, entering sweepstakes, or trying new products and services. The PhoneHog case in Chapter 8 describes how experimental design can be used to improve a Web site
to obtain more customers. In this case there were 10 factors to be tested with the number of
variations, or levels, for each factor ranging between 2 and 10. For example, the top image on
the Web page had four possible designs: (I) photos of five people talking on the phone with
the Phone Hog logo on the right, (2) a cartoon image of a pig peeking through the 0 in the
Phone Hog logo on a blue background, (3) the same image of the pig on a white background,
and ( 4) th e photos of th e five people talking on the phone with a different Phone Hog logo on
the ri ght. If every possible combination of factor levels were included in the experiment ~ a total of 1,658,880 test Web pages would have been req uired. In fact, the experiment consisted
of just 45 different Web pages (each page a combination of factor levels), with each person arriving to the site randomly assigned to one of them. The number of visitors to the site and the
number of visitors who click on an icon to request additional information were record ed. As
a result of this experiment, the click-through rate, which is the number of clicks divided by
the number of visito rs, increased by 35%.

1.3

A BRIEF HISTORY OF EXPERIMENTAL DESIGN

The field of experimental design began with the pioneering work of Sir Ronald Fisher,
whose classic book, The Design of Experiments, was published in 1935. Fisher was responsible for statistical analysis at an agricultural experiments station in England, and his ea rly
work on experimental design was applied to improving crop yields and solving other agricultural probl ems. Over the years, applications of experimental design to industrial problems have been widespread, with particular attention given to problems in the chemical
industry, such as m ax imizing chemical yields and assays. In 1978, George E. P. Box, Wil-

f
t

liam G. Hunter, and

J. Stuart Hunter published Statistics for Experimenters (second edition,

2005) , a book that became, and still is, a standard text in the field.

INTRODUCTJON

Beginning about 1980, U.S. manufacturing firms, faced with cornpditive challenges, especially from Japanese companies, took a renewed interest in quality management and design of experiments. This period spurred renewed interest among U.S. manufacturers in cxperi mental design, and in the 1980s the American Society for Quality (ASQ) and many other
organizations started to offer numerous seminars on experimental design. However, little or
no attention was given to the application of experimental design to service organizations.
More recently that has slowly begun to change, and several articles have appeared showing that multi variable experimental design techniques provide powerful approaches to service problems. "The New Mantra: MVT" (Forbes, March 11. 1996) discussed the experirnrntnl design applications to services by a quality consulting firm, while "Tests Lead Lowe's
tu IZevamp Strategy" (Wall Street }uurna!, March 11, 1999) e>-.pL1rncd how that ti rm hel[Jed
Lowe's improve its advertising policy. The article,

"Bou~t

Yom Markctlllg !ZUl with l:xper-

ime11tal Design" (Almquist and Wyner, Harvard 13usiness Review, October l, 2001), told
how another consulting firm used experimental design tn improve mctrkcting decisions. In
short, business leaders are beginning to realize that experimental design has widespread ap
plic'.1tions to management decision making, particularly in service organizat1011s.

1.4 OUTLINE OF THE BOOK


Chapter 2 presents important basic concepts of probability and statistics. It is meant to be
a concise review and provides the common language and notation that we use throughout
the book. While writing it, we assumed that readers have previously had an exposure tu
most of the material covered in the chapter. We discuss a number of important distrihu
ti om such as the binomial, normal, 1-, and F-distributions. In subsequent chapters, they art'
used extensively. We also discuss useful tools for displaying data such a~ dot ploh, histograms, and scatter diagrams. Chapter 2 also shows how confidence i ntcrvals and tests of
hypotheses are constructed based on sample information and are used to make inferences
about a population mean or the difference in. two population means. These important statistical tools arc applied in later chapters to identify statistically significant factors.
111

Chapter 3, we extend the discussion in Chapter 2 and focus on cum paring more than

two population means. For example, we might want to compare the effectiveness of three
different advertising strategics by testing them in a number of stores. We present two statis
ti cal models: the completely randomized design and the randomized block design. Jn Chapter J, we emphasize two important ideas, randomization and blocking, that arc used
throughout the book.
The heart of the book begins with Chapter 4, where we focus on so-called 2-level facto
rial designs. In these designs, there are k factors to be tested, and each f"actor is studied at two
different values (levels). ~or example, in a Web site test, 011c faLtur 111ight he the banner
headline (version I vs. version 2), while another might be the image under the headline
(roduct hoto vs. happy user)_ ln the full factorial design, the experimenter tests all combinations of factors and kvels, with each combination called a run. With k factors, there art
2" runs. Thus, testing two factors requires 2 2
4 runs, testing three factors requires 2'-= 8
runs, and so forth. The main effect of a factor is the difference in response at one level of the

,l

I NT R OD U CTION

fac tor versus th e oth e r. For example, for the im age under the banner headline, the main effec t of that factor is the difference in respo nse if the happy user image is employed rat her
than the product photo. Jn so m e instances, there may be an interacti on between factors. For
example, th e differen ce in response b etween version I and version 2 of th e banner headline
may d epend on which im age under the ba nner is used. In Chapter 4, we show ho w m ai n
and interaction effects arc estimated and discuss the various approaches for determining
which effects are statistically significant.
The focus of C hapter 5 is on 2- level fra ctional factorial designs. Full fac torial designs are
useful for experimenting with relatively few factors. As th e number of factors increa ses, the
numbe r of runs required in a full factorial desi gn in creases dramati cally. In fact, th e inclusion of eac h addition al fac to r in a full fac torial d esign doubles th e number of runs req uired ,
with 4 fa ctors requirin g 2 4 = 16 runs, 5 factors requiring 2 = 32 runs, and so forth_ If full
factorials were th e only option, the expe rim e nta l design approach wo uld have limited va lu e.
In a fractional factorial des ign , the experiment requires only a fraction of th e number of
runs n eeded for a full fac tori al design . For ex ample, a full factorial desi gn with seven fa ctors
req uires 2 ' = 128 run s, or sepa rate experiments. But as we shall see, it is possible to construct a frac tion al design req uiring only 16 runs that provides nearl y as much information
as in a full fac torial des ign. In some instances, a fractional experiment may produce results
that are diffi cult to inte rpret. We show how a follow-up expe rim ent ca n be designed and executed to resolve these ambiguities.
In Ch apter 6, we discuss Plackett -Burman designs. Th e number of runs required in a
fractional factorial design is a power of 2. Thus, the number of runs would be 8, 16, 32, 64,
and so forth. Tn a Plackett-Burman design , the number of runs req uired is a multipl e of 4,
so the numhcr of runs would be 4, 8, 12, 16, and so forth. For example, in a particul ar situ ation, if the experimenter were limited to fractional factorial designs, . she m ight have to
choose between a design of 16 runs and a design of 32 runs. Th ere is a rather large ga p between allowable run sizes. The Plackett-Burman designs give the experimenter additional
options that may be adva ntageo us. We disc uss the characteristics of Plackett-Burman designs and illustrate their use with several case exa mpl es.
The designs in Chapters 4 through 6 are all 2-level designs, with each fac tor bein g set at
one value or another. In Chapter 7, we extend the analysis to include designs in which factors m ay be at m o re than two levels. We show h ow regression analysis can be used to estim ate effects, and we discuss the constructio n and analysis of simple fractional designs th at
include factors at more th an two levels.
The last chapter of the book, Chapter 8, is devoted mainly to the most advanced topic.
The designs discussed in earlier chapters have an important property called orthogonality.
In an orthogo nal design, effects are estimated independently of one another. That means
that th e particular estim ate of on e effect is not influen ced by the estim ated value of another.
In Chapter 8, we consid er no northogonal designs invo lving many factors a nd several leve ls.
We show how regression analysis can be used to anal yze these design s, and we illustrate the
approac h with th e Ph o ncHog case, which was described ea rlier in this chapter. Chapter 8
ends with a discussion of experimental design so ftware fo cusing on two so ftware products,

Minitab and JMP.

~~N 11wnu<

1.5

_ri_o_N_ _ _ - - - - - - - -

NOBODY ASKED US, BUT . . .

!( 1\ hsher, The 1 i/c ofa Scientist, is an inten.>'>ting biographv of~ir l{onald hsher written
b7' Im daughter, Joan l is her Box ( 1978 ). Fisher is one of the stat istiuans 111cl uded i11 J'hc
Ladv Tasting Tea: How Statistics Revolutionized Science in the 'J'wenticth Century, bv David
<;,1bhurg (200 I). fhe title of the book comes from a paper that f'ishl'r wrote, which 1s in
eluded in fisher's The J)esign of Fxperzme11ts. As the story goes, a lady claimed that bv tast
1ng 1t she could tell whether milk or tea was put into the cup first. I isher designed ,111 ex
periment to test her claim. Salsburg's book has stories of other great statist1L iam including
\\'illiam Gossett, famous for the t-distribution, which we discuss 111 Chapter 2. I he online
(and free) encyclopedia Wikipedia (www.wikipedia.org) has interesting biographiLal information on Fisher, Cossett, and many other important figures in the world of statistics.
I he NHC tdev1sion white paper "If Japan Can, Why Can't We?" winch was broadLast in
1980, was a milestone that marked the beginning ofa qualit\' revolution in manufodunng in
the United States. W. Edwards Deming was featured on the program, and he cast igatcd 1\ mer
ican firms for shoddy quality. Deming, a statistician with a Ph.D. in physics, gave a serie., of
lectures in Japan in 1950 that greatly influenced that country's quality efforts. The Deming
prize, the highest award for quality in Japan, is named in his honor. Deming's ( l 982) book,
Out o_(the Crisis, is a good source for learning about his quality management ideas.

Not long after that NBC program, the work of Cenichi raguch1, a Japanese consultant
and former professor, began receiving widespread attention from manufacturers rn the
UnitL'd States, particularly in the automobile industry. Tagud11 methods hcLa111e ,1 familiar
bu11word for his approaches to experimental design. "itJtistiL.ian-, have oltrn L.rit1Li1ed
Tagt1Lhi's statistical methods, but there is grneral agreement th,1t hi-, l'!lgineering ideas ,11-c
\'L'f) meful. He is probably best known for two con<.:epts; robu-,t design and the I agudn loss
function. Roh11>l des1gn means <lesigning a product or proLe-.-. th.it i-. i11-.e11-,itive to L'll\ iron
mental factors. ror example, a robust cake reLipe would produce a good cake even with con
s1derable variation in baking time and oven temperature. The 'foguc/11 /ossfimctwn i-, an appealing alternative to the traditional approach to determine whether a produ<.:t or process
meet-. customer specil[c.1tiom. h1r cxampk, tu illustrate the trad1t1onal ,1pproad1, -.upposL
Lhe plat111g thickness in millimeters ot a pnnted urcuit ho.ml 1s ,lllL'Jltablc 1f 1t fall-. \\'1thi11
cntain upper and lower specification limits. ~o, a board having th1L.k11ess just below the up
per '-pecifiLation would be judged acceptable, whereas a bo.ird whose th1Lkne-.s was just
ahmL' that limit \\'oul<l be classified as ddtllive. In reality, there is a target that is ideal, <ind
the Lioser each board comes to that target, the better. In contrast, under 'laguchi's loss tunl
tion, the loss associated with an individual board would he equ,d to J L.onstant time'> the
squared deviation between the board's thickness and the target. With this function, dou
hling the distance from the target would quadruple the los-.. In th.: traditional approaLh,
where each board is either in or out of >pl'citicatiom, two prnLesses might have the -,ame
fraction of boards meeting speulications but, in reality, \en different quality le\ek One
process might have most of its acceptable boards with thicknesses close to the target,
whneas the other might have a more uniform distribution with board thicknesses evenly
spread within the window defined by the specification limits. This process would have much

INTRODUCTION

lower quality than the other, but under the traditional approach, the quality of products
produced under th e two systems would be judged as equal.
In recent years, the Six Sigma approach to qu ality has been embraced by numerous organizations. Six Sigma was originally developed at Motorola in the mid-J 980s and refined
first by Allied Signal and more recently by General Electric. Six Sigma has many similarities
to total quality management (TQM) and other programs in the past, but it also has some distinctive characteristics. One is its focus on defining and responding to customer needs. In doing so, it takes a broader view of quality management compared to some more narrowly focused programs of the past, better integrating quality activities into all areas of the
organization and aligning these activities with th e strategic goals of the firm . In addition, Six
Sigma programs have been more widely applied to service processes, including many implementations in hospitals an d oth er health care organizations. The d esign of efficient experiments is an important component of the Six Sigm a approach. One of the m any books on Six
Sigma is The Six Sigma Way: How GE, Motorola, and Other Top Companies are Honing Their
Performance, by Peter S. Pande, Robert P. Neuman, and Roland R. Cavanagh (2000).
EXERCISES

Exercise I

Search the Web for the work of Sir Ronald Fisher on experimental design, in-

cluding his earliest effort s performing agricultural experiments at the Rothamsted Experimental Station in the United Kingdom .
Exercise 2

Read Mother Jones (Case 3 in the case study appendix) and Peak Electronics: The

Broken Tent Problem (Case 4 ). Both of these cases describe a company's first exposure to experimental design m ethods.
(a) Mother Jon es: Suppose the organization wanted to test each of the seven factors in
a separate mailing. What specific shortcomings would this approach have com pared to th e approach in the case?
(b) Peak Flectronics: Su ppose the company did not use experimental design to examine and solve the broken tent problem. Imagine how they would have approached
the problem instead. What difficulties would they have likely encountered? Would
it have been possible to identify interactions between factors? If so, how?
Exercise 3

Pick a Web site on the Internet. Suppose you were designing an experiment for

increasing visitors' response to a product or service offered on the site. What seven factors
do you think woul9 be most important to test? In each case, if possible, specify two levels
(values) for each of the factors.

A REVIEW Of BASIC STAT I STIC AL C ONCEPTS

in nature. Here, anv number-obvi o usly within a certain interval-is a feasible outco me.
We ca ll s uch random va ria bl es continuous random va riables, and we use continu o us di stributions to characterize th e vari ability. The norm al distribution, the t-distribution, and the
F- disrribution are important exa mpl es.

2.2.1 Discrete Random Variables


Th e distribution of a discrete random variable is described by th e
collecti on of possible distin ct outcom es, and
th eir associated probabilities. Probabilities are numbers between 0 an d l, and the sum
of the probabilities over all possible outcomes must be I . The probabilities may represent prior beli efs , come from previous studies, or be implied by a theoretical mod el.
It is standard and usefu l notation to use capital letters to denote the random variable
(X, Y, Z, . .. ), and lowercase letters (x, y, z, .. . ) to denote the possible outcomes. The nota-

ti o n P[ Y =

yJ

stands for the probability th at the random variable Y takes on the value y.

Example I The random vari ab le Y describes a custom er's purchasi ng decision. Possible
ou tcomes are I (purcha se) and 0 (no purchase). Based on historical data, it is estimated that
5% of customers will pla ce an o rder. Thus, P[ Y
0.05 = 0.95.
Example 2

I]

= 0.05, and

hence P[ Y

= 0] =

I -

The random variable Y is the number of flaws on a circuit board produced on

an assembly line. The possible o utcom es are

y = 0 (no fl aw)
y

I (exactl y on e flaw)

2 (exact ly two flaws), and so o n

The following probabilities are given: P[ Y = 0] = 0. 90, P[ Y = I J = 0.08, and Pl Y = 2] =


0.02. This probability distributi o n implies that producing an item having three or more
fl aws is impossible.
I

l
r

Example 3 Let the random variable Y be the number showing on a thrown die. The possible outcomes are y = 1, 2, 3, 4, 5, 6. Assuming the die is fair, the outcomes are equ ally
likely. Hence P[Y = I]= P[Y = 2] = ... = P[Y = 6) = 1/6.
Example 4 Let th e random variable Y be the number of times a customer orders from
a catalog during a specified time period . The possible outcomes are y = 0, I, 2, 3, with
P[ Y = OJ = 0.2, P[ Y = I] = 0.5, P[Y = 2] = 0.2, P[Y = 3] = 0.1. Note that the probabilities sum to J. No tice also that ordering four or more times has zero probability; it cannot

occur.
We can easily calculate the probabilities of various events. For example, the probability

of at m ost two orders is given by P[at most 2]

t
f

= P[Y :'.'S 2] = P[Y =

0 o r Y = 1 or Y = 2]

10

P[ Y

0J

A REVIEW OF BASIC STATlSTlCAL CONCEPTS

+ Pi Y -

+ P[ Y =

l]

2] --' 0.2

+ 0.5 + 0.2

= 0.9 Similarly, the probability of at

least one order is P[ Y 2 l] - P[ Y = l or Y = 2 or Y - 3 J = P] Y = I J

P[ Y

3 J = 0.5

+ 0.2 + 0.1

= 0.8. Alternatively, Fl Y 2 l] = I - P[ Y < I j

P] Y - 2]

I - Pl Y - 0] ~

1 - 0.2 = 0.8

Mean of a Discrete Distribution


The mean of a discrete distribution (also called its expected vah1e) with outcomes y and
probabilities Pl Y - y) is given by

LYP [Y = y]

We use the Greek letter, to denote the mean. It is the weighted sum of the possible out comes, with each outcome weighted by its probability. The mean or expected value is the
long - run average.

Example I

(0)(0.95)

Exarnple2

= (0)(0.90)

Exarnple3

= (l)(l/6)

Example4

= (0)(0.2)

+ (I )(0.05)

0.05. J'hc mcan ts U.05.

+ (1)(0.08) +

(2)(0.02) - 0.12 . fheexpt'Ctl'd number of flaws

is 0.12.

+
+

(1)(0.5)

+ (6)( 1/6)

(2)(116)

+ (2)(0.2) + (3)(0.l)

3.5. The mean is 3.5.


- 1.2.Thecompanyexpects,on

average, 1.2 orders per customer. Of course, the number of orders can only be an intcger ;
howl'ver, in the long run (i.e., over many custorners) the number ot orders averages to 1.2 .

Variance of a Discrete Distribution


<T 2 -

L (y -

, )2 Pf y

yJ

The variance, denoted by the Greek letter sigma squared (rr 2 ) is a measure of spread. Jt is
the weighted sum of squared deviations from the mean, with squared deviations weighted
by their probability of occurrence.

Standard Deviation of a Discrete Distribution


u =

JL(y _

, f P[ Y =

y]

The standard deviation is equal to the square root of the vari,rnce. If the units oft he random
variable are, say, dollars, the variance will be in uni ts of doll a rs squared. That make~ the vari ance difficult to interpret. Taking the square root of the variance to obtain the standard
deviation expresses the spread of the distribution in the same units as the random variablein th is case, dollars.

A R E VIEW OF BASI C S TATIS T I C AL CONC EPT S

Example 1 rr

[0.1456]

[2.9167]

Example 4
(0 .1)]

'

(I -

0.1 2) 2 (0.08)

= [0.0475]
+

05

= 0.218.

(2 - 0.12 )2( 0.02 )] 05

a- = [(1 - 3.5)2(116) + (2 - 3.5)2(1/6) + + (6 - 3.5 ) 2(1/6 )] 05 =

Example 3
05

= f(O

- 0.12) 2 (0.90)
= 0.382 (flaws ).

Example 2 rr
05

= [(O -- 0.05) 2 (0.95) +(I - 0.05) 2 (0.05}] 5

I II

1. 7 1.

a = f(0 - 1.2) 2 (0.2) + ( I - 1.2 )2(0.S) + (2 - 1. 2) 2 (0.2) + (3 - 1.2) 2

= [0.76 J0 " ~' 0.87 (ord ers).

The Binomial Distribution


Th e binomial is the most important discrete probability di stribution . A binomial situation is o ne th at is analogous to repeatedly toss ing a coin (not necessarily a fair on e) a nd
counting the number of heads. The purchasing decisio n of buying o r not buying, or the
o utcom es of a pass/fail inspection, can be viewed as th e outcomes of such coin tosses.
The ass umptio ns are as foll ows:
Each indi vidu al expe rim ent (also ca lled a trial) ca n result in only o ne o f two out comes. We refer to the outco mes as success (S) and failure (F). We assume that th e
probabili ty of a success is P(S)

11' for each trial, an d hence the probability of a

failure is P(F) = I - 11'.


T here arc n such ind epend ent tri als. Ind ependence means th at th e outcome of any
trial does no t affect the out co me of any other tri al.
Th e random var iab le Y rep resents the number of successes inn independ ent trials.
The rand o m vari able Y has outco mes y "" 0, I, 2, ... , n. The probabilities associated with
these n

+ I outcomes arc give n by the bino mial formula


P[ Y

y] =

n!

) 11'Y( J -

y! n - y !

11')" - y for y

0, I, 2, . . . , n

H ere y factorial is defined as y! = (I )(2) . . . (y - l )(y). For example, 3! = (1)(2)(3) = 6,


and 5! = ( I )(2) (3 )( 4 )(5)

120. By definition, O!

= 1. The number of trials n and the prob-

ability of success in a sin gle trial 11' are called the parameters of the binomial distribution.
It can be shown that the mean of the binomial distribution is given by
, = n11'

The stand ard devi at ion is given by

The binomial di stributi o n is tab ulated in statistics textbooks. Also, its pro babilities can easily he determined using functions in computer packages such as Excel o r Mini tab.

12

A REVIFW Of llASIC STA 'l ISTICAI CONCEPTS

Example Assume that a production process is characterized by a J 0% defect rate. That i'i,
the probability of producing a defective item is Pl defective J = 0.1 O; the probability of producing a good item is P[good] - 0.9. Assume also that the quality of each item (defective
or not) is independent of the quality of every other item.

J\ssume that as part of a sampling inspection program, n = 10 items are selected at


random. The distribution of the number of defectives Yin a sample of size n = 10 items
is binomial with parameters n = 10 and 'TT= 0.1. The mean number of defectives is
, = (l O) (0.1) = 1; there will be one defective, on average. The standard deviation is
<r = Vl0(0.1)(0.9) = 0.9487. Individual probabilities such as
P [ Y = 2]

101

--'-(0.1) 2(0.8) 8
2!8!

0.1937

. can be calculated either from the expression above, or from the binomial function of readily available computer programs. Note that the probabilities, summed over the possible out comes, add to 1. Computer programs also calculate the cumulative probabilities such as

Pl Y < 1]

Pl Y =

0J

+ Pl Y = l]

0.3487

+ 0.3874

0. 7361, or in general

P[ Y :s y J =

.2: P[ Y =

i] for y

0, l, 2,.,, , n

1= 0

The Excel function B!NOMDIST(y, n, 'TT, PALSE) returns Pl Y = y], if n is the number of
triab and 'TT is the probability of success. Replacing FAT.Sr_ with TRUE returns the cumula tive probability, J>l Y :S y]. In Minitab, the calculations arc e<Hried out by using the conve nient pull -down menu "Cale > Probability Distributions > Binomial. "
Probabilities such as

P [ l s Y :s 3 j = P[ Y = l J t- Pl Y = 2] + pi Y
= 0.9872 - 0.3487

5 j - J>[ Y s 3 J - Pl Y s

oI

= 0.6385

can be calculated by summing the individual probabilities, or as the difference of two cumulative probabilities.

2.2.2 Continuous Random Variables


A continuous random variable Y is described by its probabi/iiy density functwn j\y ),
which is nonnegative. For any density function, the area under the curve described by the
density function is equal to 1. The probability that the random variable falls between two
constants, a and b, is the area under the density curve between a and b. That is,
ii

P[ a -s Y s bJ ~

j(y)dy

P[ }' -s bj - P[ Y <; ll J

''

Percentiles of the distribution are defined by cumulative probabilities. The ( lOOp)th per centile is given by Yp> the value of the random variable for which the area under the curve
frurn -oo to Yp equals p; that is, p -= P[ Y -s y1J

'

_J

A REVIEW Of' RASI C STATISTICAL CO NCEPTS

,.,

0.4
I

I
I

I
I

n..l

I
I

I
I
I
I

I
I
I

'
f

I l.l

,'

I
I

0. 1

,.
0.0

I
/

-+----_,:;.~-=---~---~~---~--=--

-3

-()

Va lue y
- - Standard normal distribution
Norm al with mean 3 an d standard devi ation 2

figure 2.1

Densities of Two Normal Distributions

The probability that a continuous random variable is exactly equal to a particular value
is zero; that is, P[ Y = a] = 0, for any a. Hence,

Pra :s: Y::::: h] = P[a :::; Y< b] = P[ a < y ::; b]

P[a < Y < b]

The Normal Distribution


The normal distribution is the most important distribution in statistics. It is characterized by two parameters: its m ea n. and standard deviation a . The distribution is symmetric around the mean and bell-shaped. The standard deviation rr determines the spread of
the distribution .
Densities of two normal distributions arc shown in Figure 2.1: the so-called sta -n dard
normal distribution with mean 0 and standard deviation I, and the normal distribution
with mean 3 and standard deviation 2. For any normal distribution abo ut 68% of th e values will fall within 1 standard deviation of the mean, about 95% of the values will fall withirr
2 standard devi ations o f the mea n, and 99.7911 of the values will fall within 3 standard deviations of the m ea n.
A random variable that follows a standard normal distribution (mean 0 and standard
deviation l) is d enoted by the capital letter Z. The probability den sity of the standard
normal distribution, /(z ), is shown in Figure 2.2 . Cumulative probabilities can be looked
up in the table of the standard normal distribution (the "z-table"), or they can be obtained by
pushing certain button s on advanced calculators or by executing appropriate functions of
statistical computer softwa re. The Excel function NORMSDIST(z ) returns the cumulative
probabilit y, the area und er the standard normal curve below the value z. For example:
P[ Z::::: O] = 0.5
P[7, s: - I] = 0.1587
P[Z:::; -0.6]

= 0.2743

and

P[Z

- 0,6] == 1 - P[Z:::; -0.6]

=l

- 0.2743

= 0.7257

14

A l(EV11ow OP HAsrc s1A11s11cAL coNcE_P_T_s_ _ __

0.4

97.Sth percentile =

I
I
I
-+----,---~-'-'--j--

-2. 0

.\.0

1.0

U.5

Valu~

Figure 2.2

~
.

2.U

1. 0

U.IJ

---,
\.0

Density of' the Standard Normal Distribution

P[L' ::S 0.7 J


P[ - 0.5 :s: /'.

0.7580

s J.Oj

P[ 0.4 :::; /: s 1.2 )

and

1-'[L'

>

0. 7

P[/'. <

0. 7J~ l

= 1-'[Z:::; !.OJ - P[L' :::; - 0.5] - 0.8413

0.7580

02420

0.3085 - 0.5328

0.8849 - 0.6554 - 0.2295

Important percentiles of the standard normal distribution arc


-1.96 and 97.Sth percentile z09 ,.,

1.96

2.5th percentile Zu_ 1120

5th percentile z0 _110

-1.645 and 95th percentile z0 y 5 = 1.645

""

Suppose Y has a normal distribution with mean., and standard devic1t ion r..r, <111d that for
any value a, we want the probability that Y is less than or equal to a. We convert the proba bility statement about Y into an equivalent statement about/. We have

P[ Y :so a]

P[

Y-.,
r..r

:::;

a - .,j
r..r

Note that on either side of the inequality we subtract the mean., and divide by the standard
deviation r..r. The random variable Z = ( Y - .,) /r..r follows a standard normal distribution ,
and the probability in the above equation can be looked up in the z-tahlc. Similarly,

P[ a ::::; Y < b]

a-,
P[ -

Y - .,

<

<F

b-.,1

lF

<F

Example The weight of toothpaste Yin a 2.7 ounce tube follows a 11or111al distribution
2.8 and standard deviation tr = 0.05. The fraction of underfilled tubes is
with mean.,
P [ Y --= 2.7]

2.8
0.05

~~

2.7

2.8J
r

l, [/< -2 .00 ] = 0.0228

0.0::>

A REVIEW OF AASl.C STATISTICAL C ONCEPTS

15

Statistical software allows us to obtain cumulative probabilities for any normal random
variable directly, wilhout the conversion to the standard normal. The Excel function
NORMDIST(y, ,, <r) returns the cumulative probability, that is, the area under the density
curve below y, for a normal random variable with mean, and standard deviation u.
Some percentiles of the normal distribution with mean, and standard deviation u are
the following:
50th percentile Yo.so = ,
5th percentile y005 = , - ( 1.645 )r:r and 95th percentile Yo.95 = ,
2.Sth percentile Yo 025 = , - (I. 96 )rr and 97 .5th percentile y 0975
99th percentile Yn. 99 = ,

+ (1.645 )r:r
= , + ( l. 96 )a

+ (2 .326 )rr

The Excel function NORMTNV( p, ,, rr) returns the ( 1OOp )th percentile of a normal distribution with mean, and standard deviation a. For example, suppose a random variable Y
has a normal distribution with mean 100 and standard deviation 20. Then NORMINV
(0.99, 100, 20) = 146.53, and P[Y :s 146.53) = 0.99. In Minitab, cumulative probabilities
and percentiles ( referred to as " inverse" cumulative probabilities) of a normal distribution
are obtained with the pull-down menu "Cale> Probability Distributions > Normal."

The t -Distribution
The I-distribution has one parameter

11

> 0, called its degrees of freedom. In most sta-

tistical applications, the parameter 11 is a positive integer.


The t-distributions are very similar to the standard normal distribution. They are symmetric around mean 0, and their densities resemble the bell-shaped curve of the normal.

The only difference is that the tails oft-distributions are slightly heavier than those of the
standard normal (as explained later). The standard deviation of the t-distribution with
11

> 2 degrees of freedom is given by rr = ~2).


Figure 2.3 compares the densities of the t-distributions with 3 and I 0 degrees of freedom

to the density of the standard normal. Notice that for very large or small values y of the ran~
dom variables, the densities and hence the tail areas are larger for the t-distributions compared to the normal. This gives t-distributions a somewhat larger chance to generate large
deviations from th e mean. The t-distribution converges to the standard normal as the degrees of freedom approach infinity.
Percentiles and cumulative probabilities of the t-distribution can be calculated using Excel or any other statistical software. With Excel, percentiles are found using the TINY function. The user specifies a, the area in both tails of the distribution, and the number of degrees of freedom 11. Thus a/2 is the upper tail probability and I - (a/2) is the corresponding
cumulative probability. For example, for a t-distribution with 3 degrees of freedom,
TINY(0.10, 3) returns the value 2.3534, which is the 95th percentile of the distribution .
Other examples arc
95th percentile oft( JO):

1095 (10) =

1.8125

95th percentile oft(= ), the standard normal: t0 95 (=) = 1.645

16

A Hl'Vll'W OF BASIC STA I lSTH AL CONCEPTS

0.4

/'

//

,'

--. ,,'
' '' \

,,,.- .....

I/

\ \

/1

- 0.3

\\
\ \
\\
'\

I/

/1
//

"
<::::.

I/

'/

\\

'"'

.,,

0.1 ....

()(I

-,

\'alue

~tandard normal (solid)


--- l with df 10 hhurt J.ishl

Figure 2.3

()
i

with dj

_l I

long cLish)

Densities of the Standard Normal and Two 1-llistrthutions

97.5th percentile oft(3): 10475 (3) - 3.1824


97.Sth percentile of 1( 10 ): 10475 ( 10)

2.2181

97.Sth percentile of t(oo), the standard normal: 104 ,(")u)

1.96

The Chi-Square Distribution


l'he chi-square distribution is a skewed distribution with values from 0 to ex,. lt has one
parameter v, its degrees of freedom, which t'> a pw.itne inlege1. \\e \\tile the di-,tribution .is

x2(1)' with the S}'mbol x denoting the Creek lowerLdSe letter chi.
Figure 2.4 shows the densities of three chi square distributions, with 3, 6, and 10 degrees
of freedom. rhe mean of the chi-square distribut1on is the '><tme ,t\ th degrees or freedom:

JJ.

The standard deviation is u

V2v.

The F-Distribution
The F-distribution takes on values from 0 to x, and it is -,kewed to the right. lt has two
var;u11cters, its degrees of freedom 11 > 0 and u, ""0 which 111 most st.it1stical appltcatton'>
are positive integer'>. \Ne use the notation F(1 1, u,), to de-,nibe <lll I di,trihution \\'ith 1 1 .ind
u , degrees of freedom.
I hL' mean or the Ht'1 l'2) dtslrthutwn, ,
11,/(/J,
2 ), depe11d., Olli\ Oil I',, .ind JI
ts .tl11,1ys slightly larger than I. The st,1ndard de\'iation dCJJL'lld'> on hoth p.trametcr,, l',
and''.
Figure 2.5 shows densities of four f- distribution'>: 1-'(4, !OJ .111d lH, 20), Jnd /(8, IOJ and
Fr8, 20). Percentile' and cumulative probabilities can be calculated with standard ..,tafr,tic'>
soft\\'are. ror example, the 95th percentiles of these four f- distrihuti\>tl'> arL'
f- 095 (4, 10) - 3.4780

F,m(4, 20)

2.8661

F095 (8, 10 ) = 3.0717

F09 ,,(8, 20)

2.4471

A REVIEW OF BASIC STATISTICAL CON CEPTS

17

0.25
0.20

0.15

-~

0.10

.... ...

... ...

.... :'rt, ....... .

... ...

;:i

0.05

.........

.... ........

....

....
__

---

.....

0.00 -*-~:.:....----,-----...-:;.::::::===,;,,;:,r:;-;.:.----i----0

JO

15

20

Value y

df - 3: 95th pe rccnl ilc ~ 7.81


- - d(= 6: 95 th percentil e= 12.59

Figure 2.4

df = 10: 95th percentile= 18.31

Densities of Three Chi-Square Distributions

0.8
() 7

11 .6 -

~ 0.5
;;- 0.4

0:

0.3

0.2
0.1

0.0

1L----~--------=-=-:::=:~~~~0

Value y
- - f(8, 10) : 95th percentile = 3.072
f(4, JO): 95th percentile= 3.4 78

Figure 2.5

2.3

- - F(8, 20)
- - F(4, 20)

Densities of Four F-Distributions

DESCRIBING DATA

In this book, we focus on methods for designing experiments and analyzing the resulting
data. As part of this process, simple graphical displays such as data plots and histograms,
and summary measures such as the mean, median, and standard deviation, provide extremely useful complements to the more formal statistical methodology. In this section, we
discuss these simple tools for displaying, summarizing, and analyzing data . In most cases
th e data are a sample from a larger underlying population. Occasionally, if the population
is small, the data will consist of all of its elements.

Categorical data are observations that are grouped into qualitative categories. Examples
are marital status (single, married, divorced, widowed), advertising media (radio, television,

18

A llEVIEW Or 8ASIC STATISTICAL CONCEPTS

print ), and type of real estate (residential, commercial). For categorical data, we can calcu late n:lativc frequencies of observed outcomes. For example, it ma~ ' be that among 500
client' who received an advertising message, 20 made a purchase and 480 did not. Then the
(sample) proportion of clients who purchased is p = 20/500 = 0.04 (4%), and the proportion of clients who did not isl -

p=

480/500 =- 0.96 (96%). We can display these propor-

tions in a bar chart or a pie chart. For more than two outcomes, there are more proportions
(adding up to l), more bars in the bar chart, and more pie slices in the pie chart.

Continuous data, on the other hand, reflects measurements that can be any (possibly
rounded) value within a certain mterval. We display continuow, measurement data usrng

dot diugrams and histograms. In dot diagrams, each measurement is displayed as a dot on

line graph (the x -axis). ln a histogram, the observations arc binned into nonoverlapping
equal-width intervals on the x-axis and the frequencies (either absolute or relative) are displayed on the y -axis.
Statistical software makes it easy to construct bar and pie charts for categorical data and
dot diagrams and histograms for continuous measurement data. fllustrative examples are
shown at the end of this section.
S[1111rnary statistics are useful for describing data sets. The center (ur location) of a data
set is measured by the mean or median, while its variability is described best by the standard
deviation or the interquartile range, Assume that we have a sample of n observations
y 1, Ye , ... , y 11 , such as the dollar purchases of n customers or the annual donations to a college made by n alumni. The arithmetic mean (average) is given by

2:Y,
+

Y - lY1 + Y2 + ..

Y11J! n

'

The median is the "middle" observation in rank. First, order the observations according
to their size Y(i) <o Yr2J -s

-s y(11 J; the numbers in parentheses arc the ranks. The me -

dian is the observation with rank ( n

I ) /2. If this "middle" rank is not an integer, then the

median is the average of the two observations with ranks adjacent to

(11

1)/2.

The percentile af order p, where pis a number between() and J, is the observation with
rank ( n I l)p. If this is not an integer, we take the average of the two observations with ad jacent ranks. I 00p % of the observations are ::.mailer than the pcrcl'lltik , while I 00 ( I

p }0;.,

of the observations arc larger,


The range is defined as the difference between the largest dnd the smallest observation:
Range

= Y (n) -

Y(1)

The interquartile range is the difference between the 75th percentile (the third quartile )
and the 25th percentile (the first quartile):

!QR

= Y(( n + 1)0.?sJ -

Y(( 11 - 1Jo.2sJ

The range is very sensitive to extreme observations. The interquartile range covers the
middle 50% of the observations and is less sensitive to extreme values.

A REVIEW OF flAS! C STATIST ICAL CONCEPTS

I 19

The sample standard deviation is the most commonly used measure of variability. For a
sample of n observations, it is defined as

The sample standard deviation is nonnegative; it is zero only ifthere is no variability and all
observations are the same. The standard deviation approximates the "average" distance of
the observations from their mean. In many data sets (reasonably symmetric and bellshaped), the cumulative probabilities of the normal distribution will apply approximately
and about 95% of th e observations will fall within two standard deviations from the mean,
while about 2/3 of the observations will fall within one standard deviation.
The square of th e sta nd ard deviation results in the sample variance.

2::" (y; ")

SL

ji)2

= --1 - I .... -- --
n- I

The numerator, th e sum of the squared deviations from the sample average, is referred to as
the sum of squares, co rrected for the mean. The denomi nator, n - I, reflects the degrees of
freedom of the sum of squares. The degrees of freedom of a sum of squares are the number
of"independent" components that are needed for its calculation. The sum of th e deviations
from a sample average 2.::'~ 1(y, -- Y) is always zero, and consequently specifying an y n - l
deviations determines the final deviation ; the value of the last deviation must equal the negr

ative o f the sum of I he others. The division of th e sum of squares by its degrees of freed o m

tion variance

I.

I
Ir
I

,,f

n - I, instead of th e number of observations n, res ults in a better estimate of th e pQpulaCT 2

T hi s issue is discussed furth er in our end -of-chapter notes. Of course, the

division by n - I instead of n usually won't m ake a difference, provided, of course, th at n is


reasonably large.
Scatter diagram s displ ay relationships between two measurement variables, and correlation coefficients m eas ure the degree of their linear association . Assume that the data set con tains n pairs of observati ons; for exa mple, the family income (x;) and the amount th at is being donated to the co ll ege that was attended (y;). for i

I, 2, . .. , n. The correlation

coefficient
r =

I
---=--2::"(x;-x)(r;-y)
- - -n

] i=

Sx

Sy

is always between - l an d +I. Its sign indicates the direction of the linear association. For
positive values of r, above-average values on y tend to occur with above-average values on

x. The absolute value o f r indicates the strength of th e linear association. A correlation of+ l
occurs if the observations plotted on a scatter diagram lie on a straight line with positive
slope. A correlation of - I occu rs if the observations plotted on a scatter diagram li e on a
straight line with negative slope. A high correlation does not necessa rily imply causality; this

20

A REVIEW OF BASIC STATISTICAL CONCEPTS

is e,;pccially relevant if one analyzes data from observational studie' (as compared to dJta
from designed experiments). Statistical software such as EXLcl and M i11itab can be used to
cJrry out the calculations. The Excel function CO R.l:ZEL(urray I, arruy2) returns the correla
tion coefficient. The user enters then pairs of observations into two columns of the spreadsheet, with arrayl being the cell range for one variable and array2 being the cell range for
the other.

2.3. l Example: Alumni Donations


The Ille contribution (available on our Web site) summarizes the 2004 contributions received by a selective private liberal arts college in the Midwest. The college has a very large endowment and, like all private colleges, keeps detailed records on alumni donations. Here, we
analyze the 2004 contributions offive graduating classes (the cohorts who have graduated in
1957, 1967, 1977, 1987, and 1997). The data set consists of n
1,635 in di vi duals. In addition
to donations in 2004 and class, the data set includes several other variables such as donations
made in previous years, gender, ma rited status, college major, sub:,equcnt graduate work, and
whether alumni have attended a fundraising evrnt. Not all varic1blcs arL' used in thio C\,lllljllc.
The summary statistics are shown next. The variable "don,ttion" is categorical, with two
0Utlllll1CS: no donation and donation. The overall prupurtiun ur alu111111 du11,1ting to thL'
college is given by 570/ I ,635 = 0.349, or 34.9/<J. Bar and ic L.harl:i arc shown in Figure 2.b.
The proportions of donors for the five cohorts arc given in Table 2.1, and a bar chart of this
inCormation is shown in Figure 2.6.
WL' show a dot plot of the 570 donation amount:, in Figure 2.7, where each Jot represents
up to seven individual measurements. It is useful to aggregate the information in the form
of a histogram. The dot diagram shows that the distribution is skewed to the right with a
very long right tail. The largest contribution is $14,655. To display the rmjority of the donations more clearly, we redraw the histogram for donatiom that are $2,000 or less. Furthermore, we stratify the histogram and show separate histograms for each of the five graduating classes. To bring out differences, we have drawn these rive histog1ams on the same
scale. rhc y-axes on these histograms represent relative frequcncie:,,
Summary statistics are given in Table 2.1. For each year separately, we show the summary
statistics for all donations, but also for donations that are less than $2,000, and donations
that are $2,000 or more. The means arc influenced by occasional large contributions. The
comparisons of the five groups may be more meaningful after omitting these large donations that are difficult to predict, and focusing on donations that arc less than $2,000. Alternatively, we can compare the cohorts in terms of their medians, which are not affected by
rare large donations.
First and third quartiles of the donation amounts are calculated for each graduating year,
and the information is displayed in Figure 2.7 through comparative box plots. Box plots
have a box around the middle 50% of the observations (i.e., the observations between the
first and third quartiles) and lines added that point to the extremes. These plots, as well as
the information in Table 2.1, show that both the proportion of donor' and the magnitude
of the donations increase with the time since graduation.

22

A REYIP.W OF BASIC STAJ'lSTICAI. CONCEPTS

TABLE 2.1

Summary Statistics u/2004 Cu!Le/!,e IJmwt:ons


JlHOPUH.!IOI" 01 ALL'M'JI DONAl"INC

Nu dcll1c1L1un
lJon,llion

Pcru:nLage

1957

1967

1977

1987

157
95
37.7

159
120
43.0

215
119
35.6

214
108
31.6

/\II

1997

3UU

1,065
570
34.9

128
29.9

---

PROPOHTlO'\ OF ALUMNJ UONATING AND THE MAGNITUDE OF ALUMl'd

DONATION~

UUNA I !UNS

Year
1957
1967
1977
1987
1997

Nu

NuDo

')bl)o

2s2

95
120
119
108
128

37.7

600

4.l.11
35.6
l l.6
29.CJ

559
35()
246

2,-9

llI
l'l2
128

JIRUPURTIUN OJ

StDev

Max

'Ju

158
158
120

1,480
1,804
879

11,50<1
l 'l,ll:'"iS

90

470

IH

I Ill

89
113
113
!Ob
128

Mean Median

73

-.... $2,0UO

< $2,UOO

ALL DONATIONS

(l,.JlllJ
2,71 b
I ,IJIJIJ

'.>tlJev

Nu
{l

IOU

I lb
218
2li8

8()
IX

_l4 l
110

.l,MO
2,(1118

11

(IHI dc!Ll)

Mean Median
1;8
l :12

.2LJK
211
181
201
'l

Mean
5,IMl
h, I

~)Ci

Al.UMNl DONA11N<; Af'.;D 1-'.Vf-.f\; I Ar 1 t 'I IJ;\;-..,cr

!>ONA I

ION~

iicrLl'lll..igc

Nn prior attendance

Prior attendance
1\ll

No

Ye>

647
418
1,065

182
388
570

I Jo11al111[!

829

21.95
48.14
34.86

806

1,6.h

MAGNlTUIJI OF ALUMNI OONJ\TIONS AND EVFNT ATThNDANCl'


LlUNA 1 JOi'-''i

ALL lJONAllONS

Allendance

$2,oou

<. $2,000

:-.;u

Mean

Median

Nu

Mean

Median

Nu

No

182

Yes

.\H8

134
460

50
100

182
367

134
210

50
lOO

21

,'vledw1

:-.1can

No Jata

4,820

J,000

V\'e investigate whether attendance at alumni fund-raising events affects donations. It is


reaso11,1ble to suppose that people whu attt'nd college functiurb ,ire

lll<HT

likely lo give.

l"lll'

information in Table 2. l shows that almost 50% of those who attend fund-raising events are
do11ating, while the proportion of donors among nonattendi11g ,ilurnni is only 22'Yt" AJso,
the magnitude of the donation increases if alumni attend such events.
A SL alter plot of 2004 donations against 2003 donations is shown in hgure 2. 7. We consider then = 410 alumni who have given donations of$ l ,000 or less in both years. The scatter plot and the correlation coefficient r = 0.8 l 2 indicate that the 111ag11i tu des of donations
in di ffcrent years arc strongly related.

2.4 SAMPLING ISSUES


An important objective of statistical analysis is tu generalize tindi11gs thut are based on a

sample tu the population from which the sample was drawn. The l1upulatinn um..,isto of all

I
~4

A RFV!EW UI' BASIC STt\ l"IS'l ICAL t:UNCEl'TS

Box plot 012004 donaliom

400

300

s
;;;
g 200
-0

""

0
0

100

0
Box plots of2004 Junalions by class

600

500
~

g 400
;;;

g 300
"" 200
g

-0

JOO

_ ;j:;,._.,,._,

..

1957

1967

1977

1987

1997

Class

1,000

II

800

;;;
c

""

0
0

'

.
...

..
.
. ...
: .lI .

600

"

400

200


: .- .

0
0

200

---,-400

,------ - - r
600

800

l,000

2003 donation

Figure 2.7

Continued

elements, whereas the sample consists of a subset of the population. It is important that the
sample be representative of the population, otherwise reliable inferences about characteristics of the population would not be possible. Characteristics of the population are usually
referred to as parameters, and summaries that are calculated from the sam pie arc referred to
as sample statistics.

I
!

Scattcrplot 012004 donut ion vs. 2003 Junal1on

A RFVIFW OF flASIC STATISTICAL CONCEPTS

25

Consider the population of all graduating seniors at State University. There are about one
thousand each year; we denote the population size by N = 1,000. We may be interested in
population characteristics such as the average grade point average (GPA), the average number of weekly study hours, and the proportio n of smokers (a percentage) among graduating se niors at State Uni versity. These characteristics crn be determined without any uncer' ,

tainty if we are willing and able to collect information on all graduating seniors. We call this
a census. Of co urse, ask in g students about this information may be subject to error; some
respondents may not tell the truth.

If the population is large, a census is not feasible, and sampling becomes an alternative.
The sample size is denoted by n; usually it is much smaller than the population size N. A
random sampling m ethod guara ntees that the sa mpl e resu lts are "representative." In this
case, one can use stat isti ca l tools to assess the likely size of the resulting sampling error.
Random sampling guara ntees that each possible sample has the same likelihood of being selected. For a large population size Na nda sm all sample size n, many sam ples are possible;
in fact, there are

( N)
n

N!
= -( --~)
-differ e nt samples. Under random sampling, each of
n! N - n !

these samples is equa ll y likely to be selected.


How is a random sample drawn from a populati on of elements? A simple approach is to
prepare slips of paper-one for each element in the population (slips with numbers l
through N), put them into a box, mix them thoroughly, and drawn items one after th e other
without replacem ent. Obviously, this method would on ly be practical for sampling from very
sma ll populations. 1n all other cases, a numbered li st of all elements in the population (the
sampling frame) and co mpu ter-ge nerated rando m numbers would be used. Minitab's function "Cale> Random Data > Sam ple From Columns" m a kes it very easy to select n items at
random a nd without replacement from a column containing a list of N distinct items.
In experiments in which two or more groups are compa red, several independent random samples may have to be drawn. Assume that we want to study the effectiveness of two
different online experimental design tutorials, whi ch we identify as A and B. A group of
State University seniors will complete the tutorial and then take an exam. Suppose we want'
30 subjects in each group and want the two groups to be different (i.e., no overlap) . As already noted, there are 1,000 graduating seniors at the school. Assume that at State University all seniors graduate, so that 1,000 students are available to take the tutorial during the
,;

school year. We enter the nam es of the students into a column of length 1,000 and select 60
of them at random and without replacement by executing the Minitab command "Cale >
Random Data > Sa mpl e From Columns." The first 30 students in the sample become the
students for tutori al A, and the second group of 30 st udents use tutorial B.
Assume that gender plays a role. A sampling strategy such as the one just discussed may
not be optimal because it cou ld lead to an unbalanced gender composition in the sample.
The student body at State includes about the same number of men and women. However,
it could be-by bad lu ck of the draw-that th e first sa mple for A includes only 40%
wo m en, while th e second for B includes 65%. It is better to ta ke stratified random samples.
From the 500 women, select at random 30 and randomly divide th e 30 into two groups of

15 to receive A or R. The sa me is done with the 500 men.

r
i

26

A REVIEW OF BASICS l"ATISTICAL CONCEPTS

-~--

2.5

-----

-----

STATISTICAL INFERENCE

2.5.1 Central Limit Effect for Averages


Suppose we have a very large population, and the random variable of interest Y is continuous in nature and varies around a certain unknown mean,, with standard deviation u.
Our objective is to estimate the unknown mean, from the results of a random sample of
size n.
Many different samples of size n from N elements are possible, and each one results in a

y = (y 1 +

y2 + + y,,)! n. Random sampling, which gives


each of the samples the same probability of being selected, induces a sampling distribution
particular sample mean

for the sample average Y. This sampling distribution has a certain mean,\. and standard
deviation

Uy.

The sampling distribution has the following characteristics:

The mean of the sampling distribution of

The sampling distribution of

Ii

Y is given by,. That is,

Y is centered at the population mean,. Repeated

sample averages fluctuate around,. Averages of some samples are smaller, and averages of others are larger; however, the mean of sample averages from repeated
samples will be,.
The standard deviation of the sampling distribution of Y is givrn by rr\

rr/ v'~,

and its variance is

Averaging reduces the variability, with averages varying less than individual population values. '.:i.imple results from J single observatJun (11
the population mean with standard deviation

<T.

I ) fl uctuc1te Jrou nd

Averages ol 11 ob>ervatiuns iluc -

tuate around the same mean with standard deviation <Fl


large, the sampling variability approaches ;ero, and

L.111

r.r / ' / 11. If n becomes


be cstirnated perkctl y.

l:lut, of course, taking a very large sample would in most rnses be prohibitively
expensive.
i-;or reasonably large sample sizes, the distribution ofY is approximately normal,
regardless of the distribution of Y.
The bulleted paragraphs are consequences of the central \.imit theorem, one of the most important results in statistics.

2.5.2 Confidence Intervals for a Population Mean


A random sample of size n is taken from a process with mean and standard deviation
u. The sample average y provides a point estimate of the population mean,. Suppose the
process (population) standard deviation

f.T

is known. The standard deviation of the sample

averJge <Fy = fTIVn quantifies the estimation error; it tells us how far the estimate could be

;Tl

A REVIEW Of' llASIC STATISTI C AL CON C EPTS

27

from the true population mean . Then a 95% confidence interval for the population m ean is
given by the interval

y ::!:

l.96a y

or

y ::!:::

l.96a/Vn

- y) /(n --

Suppose a is unknown. The sample standard deviations = ~Y;

I
I

I
r

Ii

1)

provides an estimate of a. Replacing a in <Ty = a l Vn by its estimates gives us an estimated


standard deviation of a sample average. We refer to it as the standard error of the sample average, and we write it as ser = s/ Vn. For a reasonably large sample size, an approximate
95% confidence interval for the population mean is given by the interval

y ::!:

l.96ser

or

y ::!:

l.96s / Vn

The factor 1.96 follows from the central limit effect and the approximating normal distribution ; it is the 97.5th percentile of th e standard normal distribution. Using the factor 2
rather th an 1.96 results in a close approximation.
For small sample sizes, and under the additional assumption that the distribution of Y
in the population is normal, we replace the factor 1.96 with the 97.5th percentile of the
t-distribution with n - I degrees of freedom. Then the 95% confidence interval is given by

Y ::!::: r tnn s(n

- I) ]s/Vn

where 10 97 s( n - I) is the 97.5th percentile of the t-distribution with n - I degrees of freedom. For sample sizes larger than 30, the difference between percentiles of the t- and normal distrihution is small , and it docs not matter which distribution is used .
Thus, 95% confidence intervals cover the true population mean in 95% of repeated
samples. Intervals with other coverages, such as 90% or 99% confidence intervals, can be
obtained by using different percentiles, such as t0 95(n - I) for a 90% or t0995(n - I ) for a
99% confidence interval.

Example

A random sample of 60 customers selected from among all customers who have.
ordered from a catalog in 2005 showed an average purchase amount of y = 125 dollars,
with a sample standard deviation of s = 24 dollars. The standard error of the average is
se-y = 24/ v'60 = 3.098, and a 95% confidence interval for the mean purchase amount in
the population is given by
125 2: (1.96)(3.098)

or

(118 .9to131.1)

2.5.3 Central Limit Effect for Proportions


Assume that we are interested in estimating an unknown proportion 7T, such as the proportion of smokers among State University graduating seniors. The sample proportion

p = (number ofsuccesses)/n

y = (y 1 +

y2

+ + Yn)!n

is an average of n sample responses. Each response is the o utcome of a discrete random variable Y with possible values 0 or I (smoker), and associated probabilities I -

1T

and

1T.

The

28

A RFv11ow 01

BASIC STAI 1sr1lAI coN< LPTs

variable Y follows a binomial distribution from a single trial and with success probabilitv 7T.
Section 2.2.1 shows that its mean is 7T, and its standard deviation is \, 1i( l
/i ). Applying
the central limit effect (Section 2.5.1) to the sample proportion P
Y, we find that for rea
sonahly large samples, the sampling distribution of a proportion can be approximated by a
normal distribution with mean 7T and standard deviation Up= V77(1-- 77)/n. Sample
proportions fluctuate around the population proportion 77, and their standard deviation
decreases with the square root of the sample size.
The sample size needs to be large for the central limit theorem to take effect-certainly
much larger than when averages of continuous measurement data arc considered. ~ample
si1es of 100 or more will be sufficient as long as the population proportion is not too close
to 0 or I. If /i is close to the boundary (O or l ), the distribution of the sample proportion
will be skewed (and not normal) even frir large values ot n.

2.5.4 Confidence Intervals for a Population Proportion


A random sample of size n is taken from a population. The resulting ,dmple proportion

p provides an estimate of the population proportion 77. The substitution of this estimate
into the standard deviation of the sample proportion \/ 77( J
7T )/ n provides the standard
error sei' = Vp( I
p )! n. The standard error 4uantifies the estimation error, telling us
how i'ar the estimate Lan be from th<:' true poulation proportion. An approximate 95%
confidence interval for the population proportion 77 1s given hy the interval

l.%sc1

or

1.96\,/i(I

p)/11

Example A random sample of 400 customers selected al random lrom all our Latalog
custu111ers found that 108, or 27%, arc repeat cmtomers; in other words, 2710 is our best cs
timate for the proportion of repeat buyers in the population of all our customers. A 95111
confidence interval for the population proportion is
0.27

(1.96)V(o.n)(o.n)/4oo

The interval extends from 0.226 to 0.314.


Comment. The term "margin of error" is often used in reporting the results of political
and other polls. For example, a report might say that 43/c> favored candidate A with a margin of error of 3 percentage points. The margin of error is half the width of a 95% confidence interval, which for the population proportion is approximately (2) Vp( l
p )! n.

2.5.5 Statistical Tests of Hypotheses


Prior to collecting data, the decision maker often has a ct:rtain hypothesis about the popuL1tio11 characteristic of interest. I-or example, she 111ay be interested 111 the mc1111 purchas
ing amount of catalog customers and hypothesize that it is larger than 115 dollars. Or, she
may he interested in the proportion of repeat buyers and hypothes11e that it is less than J0 1V.1.
Or, she may be interested in whether or not two advertising strategies affect the mean purchasing amount. Suppose experiments arc urnducted to le<trn <1bout the v<1lidity of thest:

A REVIEW OF BASIC STATISTICAL CON CE PTS

29

hypotheses. A sample from the customer base is taken, and the average purchasing amount
and the sa mple proportion of repeat customers arc calculated. An experiment with two different advertising strategies is also conducted, and the average sales response for each group
is calculated.
Hypotheses address unknown population characteristics. The research hypothesis ( i.e.,
the hypothesis we put forward as the hypothesis to be tested) is called the alternative hypothesis, H 1 The opposite of th e research hypothesis becomes the null hypothesis, H 0 . It is
the status quo or the fallback hypothesis in case we cannot show that the research hypothesis is more appropriate. In our first example, H 0 : :::::= 115 and H 1: , > 115. In the second
example H 0 : 7T 2: 0.30 and H 1: 7T < 0.30. In the third example, H 0 : 1 - 2 = 0 and H 1:
f.l 2 =F 0.
The burden of proof always lies on the research (i.e., the alternative) hypothesis. If our

f.l1 -

sample or experiment does not provide enough evidence against the null hypothesis, we will
not embrace the research hypothesis and will retain the status quo. We are aware that
sample information may not always give an accurate picture of the population, as sample
statistics are fraught with sampling error. We want to be reasonably confident that we do
not reject the null hypothesis (the status quo) in error. That is, if in fact the null hypothesis
is correct, we want to fix the error of rejecting it at a certain low value; say, 5%. This value
is referred to as the significance level of the test.

The test of the two hypotheses such as H 0 : ,

:::::=

115 and H 1: ,

> I I 5 proceeds as follows.

A random sample is taken, and from that sample we calculate the sample statistics

y ands.

The test statistic is the difference between the sample average and the hypothesized value,
that is, y - I I 5. If the difference is positive and large, we reject H 0 : , :::::= 115 and conclude
J-1 1: > 115; otherwise, we retain H 0 But the sample mean is subject to sampling vari ab ility, a nd its standard error, ser = s/Vn, must be taken into account and used to stand.ardy - 115 y-115
ize the difference. This results in the standardized test statistic TS = ---== --::-,r--- . If

se-y

s Iv n

this test statistic is large, larger than what could be expected under the null hypothesis, we reject the null hypothesis. Under the null hypothesis that the population mean, is 115, the standardized test statistic follows a I-distribution with n - I degrees of freedom (or a standard
normal distribution, if n is large ). The probability that the t-distributed random variable ex ceeds the computed test statistic can be found in I-tables or by using certain functions in statistical software packages. For example, one can use the Excel function TD!ST( t, n - l , I),
where tis the value of the standardized test statistic, n - I is the number of degrees of freedom, and I indicates that the user wants the upper tail probability (replacing the 1 with a 2
would return the probability in both tails of the distribution). We call this the probability value,

probability value = P[ t(n - I)

2:

5
,_Y_-_I_l_ ]

ser
A small probability value indicates that under the null hypothesis it would be unlikely to
observe such a large sample test statistic. In this case, we reject H 0 in favor of the alternative

H 1 The significance level 0.05 is taken as the cutoff value, On the other hand, a large

30

A REVIEW OF BASIC STATISTJCAL CONCEPTS

probability value (larger than the significance level 0.05) makes it plausible that the sample
test statistic resulted from the null hypothesis, and therefore we would retain H 0 .

Example 1

The sample average from purchases of 60 customers is y

125, with sample

standard deviation s = 25. We wish to test a research hypothesis about the mean purchasing amount of our catalog customers, and we hypothesize that it is larger than 115 dollars.
That is, 11 1: ,

> l 15 and H 0 : , s 115. The standardized test statistic i>


TS =

125 -

115

3.10

25 / \/60

This >talistic is quite large; certainly larger thJn 2, which is a reasonable rntoff, bcc.1use it is
close to the 97.5th percentile ( 1.%) o( the standard normal distribut101i. The probability
. value

probubi/ityvalue

P [ t(59 ) 2: 3.10 ]

0.0015

is very small, which makes the null hypothesis highly unlikely. We reject the null hypothesis in favor of the alternative that the population mean is in fact larger than 115.

Example 2

We are interested in the proportion of repeal buyers, and we want to lest the re -

search hypothesis that it is less than 30%. Here we test fl 0 : 7T 2: 0.30againstfI 1 :

7T

<: 0.30. We

reject the null hypothesis if the sample proportionp is much smaJler than the hypothesized
value of0 .30. Under the null hypothesis, the standard deviatiu11 ofp in repcall'd SJlllples or
siLl: n is V0.3( l - 0.7) / n; see Section 2.5.3. The standardized test >tatistic becomes

TS =

p - O.~ -

vD.3(1-

0.7)/ n

Suppose a random sample of 400 catalog customers found tliat 108, or 27%, were repeat
customers. The value of the standardized test statistic, TS

1.3 l, is not extreme and

within the range + 2 that we associate with a normal distribution. The

probability value =

1,1 /'. s -

l.31] - 0.0951

is !drgcr than the standard significance level, and therefore we retain the null hypothesis.
There is nut enough evidence to say that the proportion of repeal bu ye rs is less than 30%.

2.5.6 Determination of the Sample Size


Sample statistics vary around the true population characteristics that they estimate. The
sta11LL1rd deviation of the sampling distribution ( i. e., thl' stand,1rd nrur ) indicates the margin of error, and we learned that it decreases with the sample size. I low large must the
sarr1?k si.Le De \.f we want to De reasonaD\y u.1nnd.ent that om e:,tirn.ate is withi.n a ccrtai.n G.i.s\'ii.\\Cc \tom \\w. \r\l.e Y<i.\\l.e<. 'Ue\etm\n\w11:i \\\e re\.\'11re(i. :;;i.m\l\c ..;1a.c \.,, 'JCt'f \m\)1.)r\ii.n\ ,\\cc<i.\.1.,,c

we need. to know whether a certai.n sam\)\e si.2e 1s sullici.ent for esti.matm'b a popu\ation char acteristic to the desired accuracy.

A REVIEW OF BASIC ST AT ISTICAL CONCEPTS

31

Estimating a Mean
Assume that we want to estimate an unknown population mean,, and suppose th at we
want to be 95 % confident that the estimate is within -:'::. B units of the true value. How large
a sample size is needed? The standard deviation of the sa mple average is a l Vn, and a 95%
confidence interval is given by

y -:': .

l.96a/ Vn. For simplicity, replacing 1.96 with 2, the

quantity 2rr/ Vn must equal B. Solving the equation B = 2a/ Vn leads to the required
sample size

A prior estimate o f th e sta nd a rd deviation of individu al measurements is needed . One could


argue that it would he unreaso nable to know er if, were unknown. But often one ha s access to prior data and ex perim ents that looked at similar issues. In this case, an estimate of
a from th ese prior studi es would be used. Alternatively, if no previous estimates were ava il -

able, we could first take a small preliminary sample of (say) 50 observations and use it to
estimate rr.
Example

Assum e that we want to estimate the mea n G PA for undergraduate stud ents at

the Central University. Similar studies on GPA may have been conducted at other comparable schools, and we m ay even have access to estimates of the variability in GPA at Ce ntral
University for previous years. Suppose these studi es indicate that a good planning value for
the standard devi ation among individual GPAs is a = 0.8.
Suppose that we want to be 95% confident th at our estimate is within :!:: 0. 15 of th e true
population mean . How large must the sample be? Using the equation just given , we find th at
2

2(0.8))
n = ( - -= 113.8 = 114
0.15
Estimating a Proportion
Assume that we wJnt to estimate an unknown propo rtion

7r,

and suppose that we want

to be 95 % confident th at our estimate is within :!:: B units of the true population propo rti o n.
How large a sample d o we need?
The standard devi atio n of th e sample proportion is ~7r )! n, and an approximate
95% confidence interval for the population proportion is p ::+:: (2)v7T(l.:_ 7r)l n. Solving
the equation R = (2) \/iT(t"=--:;:;. )/ n leads to the required sample size

n=

47T( l - 7T)

---2 - - :::;

The value I/ R 2 is an upper bound on n, the required sample size. It results from setting
7r

= ~. The fun ction 7T( I - 7r) resembles a half-dom e shape, with a maximum value of~
7T = ~ . If we had prior knowlcd~ about the proportion 7T, we could substitute this

when

value into the equation just given. Previous studies with similar objectives would help with
this selection. On th e o th er hand, we could substitute

! and use the sa fe

upper bound

J2

A HFVIEW OF llASIC STATISTICAL CONCEPTS

n - I I 8 2 if no prior guess on

1T

is available. Setting

1T =

0.5 is frequently used in deter-

mining the sample sizes in political polls.

Example

We know from past studies that two-party electi0'1s are Lluse, with the proba

bility uf the candidate of the incumbent party winning,

77,

at around 0.5. Usually there is

much interest in "calling" an upcoming close election, and we want tu estimate the proportion of votes for the candidate uf the incumbent party from a random sample o! likely
voters. We want to be 95% confident that our estimate is within -+-().02 (i.e., 2 percentage
points) of the true value. How large a sample is needed? The above equation implies that we
should take a sample of size

n= (

)
O.o2 2

2,500

Although this is not an overly large number, the challenge of sampling is in making sure that
a true 1andom sample is taken, and that each possible sample from the population of interest is_given the same chance of being selected. We need to be certain that our sample does
not exclude voters that are difficult to reach, nor do we want to include in our sample people
who will not be eligible or willing to vote at election time.
How does the sample size change if we want to be 99%, or 90% confident? for that we
need to replace the factor 2 (which is roughly the 97.5th percentile of the standard normal
distribution) with the 99.5th percentile (which is 2.576), or the 95th percentile (which is
1.645), and solve for n.

2.5.7 Confidence intervals and Tests of Hypotheses:


Comparing Means of Two Independent Samples
We may be interested in whether or not two advcrtisi ng :-.lrategico (A and H) affect the
mean purchasing amounts of catalog customers. Suppose we have no prior opinion on
whether one strategy is better than the other. We merely want tu test our research hypoth
esis that they are diffrrcnt. In this case l lu: 11

11

1-LH

0 and // 1: I-LA

1-LH f

0, whc1-c 1-l.\ and

arc the average purchase amounts of customers exposed to advcrti;,ing strategics A and

B, respectively.
Assume we conduct the following experiment. One sub;,ct of n 1 LUstomer' is drawn
randomly from our regular customer ba::.e (the populdtion) and sent adverti,,crncnt A.
A second, and different, randomly selected subset of size

n,,

i' sent advertisement H. Pur-

chases over the next 6 months are monitored. Suppose in this particular experiment we
selected n 1 = n 2 = 30 customers in each group, and found that y 11
132 and s1\ - 20,
and YH = 141 and

s/J

25. ls this enough evidence to conclude that the effects of the two

strategies differ?
Herc we base our decision on the difference between the two sample means, y 1\

Yu

However, one must realize that many different independent pairs of samples could have been
drawn, and that the difference of the resulting means would have changed with each pair of
samples. What is tht' sampling variability of the difference ofsamplt' averages from two inde-

A REVIEW OF RASIC STATISTICAL CONCEPTS

33

pendent random samples? Another version of the central limit effect implies the following:
The mean of the sampling distribution of YA - Y8 is given by I-LA - , 8, which says
that the sampling distribution is centered at the difference of the population means.
The standard deviation of the sampling distribution of YA

YB is given by

The sample standard deviations sA and s8 can be substituted for the unknown
population standard deviations. sey, -y,, =

s~
n1

52

B_

nz

is referred to as the standard

error of the difference of two sample averages.

For reasonably large sample sizes, the distribution of YA - Y8 is approximately


normal.
Consequently, an approximate 95% confidence interval for I-LA - /.LB is given by
or
If the sample sizes arc small (smaller than 20 to 30), the percentile of the normal distribution should be replaced by the percentile of a I-distribution. Available computer programs
calculate the appropriate degrees of freedom automatically, using an approximation due to
Welch (1937).
A test of H0: !-LA - I-LB= 0 and H 1: I-LA - I-LB* 0 is based on the standardized test statistic

We reject H 0 : !-LA - /.Lu= O in favor of the two-sided alternative J-1 1: I-LA - /.Lu* 0 ifthe test
statistic is a large positive or large negative value, with :: 2 being a good cutoff value. In addition, we can calculate the probability value
probabilityvalue= P[Z2

iTS/] +

P[Zs

-/TS!]=

2P[Z2

ITSIJ

Z follows the standard normal distribution, and the probability can be looked up in the
z-table. Because of the two-sided nature of the alternative hypothesis we must double the
tail probability. This was not needed in the one-sided alternative of the two previous examples. We reject the null hypothesis in favor of the alternative hypothesis if the probability value is smaller than the significance level 0.05. Equivalently, for a 5% significance level,
we reject the null hypothesis if a 95% confidence interval fails to include the value zero.

Example

In our experiment we considered n 1 = n 2 = 30, and found that

YA

= 132

and sA = 20, and Y!i = 141 and s8 = 25. The standard error of the difference of the two

\.I

A i{J \

<11-erages 1s sev, ;-

I h

Of

20'

\ 30

BASIC

s IA"!

25 2
30

IS

11(

II

( 0'-l'l'I' IS

- 5.85, <1nd the 95'Yo confidence interv<1l for ,.1

f-lH

1s

( 132
141) ::+:: ( l. % )( 5.85). The confidence interval extends from 20.46 to 2.46. The
value 1ero is within this interval, which indicates that the "no difference" hypothe.,1s L<rnnot
be rejected with this data.
The identical conclusion is reached with the probability value. The test statistic for H,:
.1
11
0 and ff 1: A
J-lH
0 is ( 132
141) /5.85
l.54, with probabdm v,due
2JJ / l.54J - 2 0.0618)
0.1236. Since it is larger than the significance level 0.05, we

find no reason to reject the null hypothesis. The effects of the two advertismg strategies are
about the same.

2.5.8 Inference in the Blocked Experiment: Comparing Means of


Two Dependent Samples
In comparative experiments, treatrncnls need to be ass1gnLd to expn1mental units. h>r
namf)le, <1n experilllent comparing the yield oltwo corn hylrnd., must .iss1g11 hyhndsA ,md
H to each of several small test fields within a larger expcrimL'lltal plot. In many industrial
studies, experiments ,ire conducted seque11t1ally in t1rne, .111d the ,tssig11111ent of thL' treat

menh A and B lo the available time slots needs to be addrL'ssed. l'hL s,1rne issue ,mse:, 111
medical trials for evaluating the eftcL111-c11ess of new drugs, \\'here the treatmenh are the
drugs that are tested ,111d the subjells .ire the expenmental un1l'>.
One design approach 1s to rundomizc the assig11ment of tre<tlmcnh to the experiml'ntal
unit'> . !'he expenme11ter would Ji-,t the L'XJlL'l"llllL'fll,d u111h the ll''>t fields, till' ,11,t1L1ble
t1111e-,, l>r the available rnbiects - ,md randornlv assign trtatmenh to u111l'>. Randon111ation
is important and certainly better than a nonrandom arrangement, as 1t spreads the existing
variability among the experimental u111ts fa1rlv acros-, all treatment'>. I J01\L'1er, the e\pen
menler ca11 do con-,iderabl} bl'tter if the ex penmen ta I u111h Ldll be gruuped into groups or
hlod.s, .'>uch that the u11its are homogeneous within the same block hut difkr across blocks.
For example, test fields close together are more similar than fields far ap,1rt. Or, ex pen men ts
run 011 the same day benefit from more homogeneous condttiom than experiments that Jre
cnndultcd on diffrre11t days. Or, a within-subject comparisun of"thc effectiveness ofa drug
is exposed to fewer interfering variables than a comparison auoss subject'>. In rundo1111zcd
block experiment>, one randomues the assignment within eaLh block. ~or ex,1mplc, if 20
cxperi!llents need to be carried out over 5 days, the expcrilllcnter would rando11111L thc or
dcr of two A and two B experiments on each day. Or, mstcad of assigning a certain blood
pressure medication to 50 patients and "no treatment" to SO others <llld co!llparing the
blood pressure readings of these two groups after a period of 3 months, a better approach
would be to establish the initial blood pressure (the no-treatment group) on all 100 patients,
then put all patients on the new medicine, and analyze chang-:s after 3 months.
Example Jn Table 2.2, we report results of a blood pressure experiment on 10 patients.
lniti,d blood pressures (x) and blood pressures ,dter 5 months on the Ill'\\ drug (yl Me listed.
"\'he table abo \\::;\::;the ;,,urnrnary ::;\ati::;ti<...;,, (.mean anJ. \\,rnd.ard deviatiun) o\ the in1\ia.\ b\ood

A RF.VIEW OF AAS I C STATISTICAL CON CEPTS

35

TABL E 2.2

Initial Hlood Pressure and Blood Pressure After 3 Months: JO Subjects


Initial Blood
Pressure (x)

Patient
--- -------

-- - ---

--

'

5
6

7
8
9

10
- - - --

--------

Blood Pressure
after 3 Months (y)

- - ---- -- ----- -- - - -- -- -

Redu cti on
(x - y)

------ -----

190
221
212
232
200
178
186
220
204
196

181
211
200
218
185
175
169
212
191
187

13
9

203.9
17.22

192.9
16.69

11.0
4.06

9
10
12
14
15
3

17
8

Mea n
Standard
deviation

pressu re and the blood pressure after 3 mo nths. Furthermore, it lists the changes for each
patient, d; = x, - y;, the ave rage change

d = Li=1d;I n and the standard deviation of the

changes sd = ~? 1 -(d~ - d) n-=-1).


Comparative dot plots of th e initial blood pressures and blood pressures after 3 month s,
shown on the same scale, are given in Figure 2.8. We notice considerable variability among
th e blood press ures, initi ally as well as after 3 months. In other wo rds, there is considerable
variation in blood pressure levels across patients. If we treated the two groups (i nitial, after
3 months) as independent, it would be difficult to conclude that the medi cation has mad e a
2
/(

difference. A two-sample test treating the two sa mples as independent fails to show any improvement due to the medi cation. The test statistic

an d its probabilit y value l'IL :.:;: 1.45] = 0.073 5 arc inconclusive and d o not allow us to reject the null hypot hesis H 0: /.li niti I - J.lMw = 0. Note that we used the one-tail probability,
because our resea rch hypoth es is specifies an improvement H 1: /.linitial - J.l Aftcr > 0. Also, observe that we used the norm al distribution; we could have used the t-distribution with th e
ap propriate degrees of freedom, but the results would have been very simil ar, and our conclusions wou ld not have cha nged .
The assumption that th e two samples in this experimen t are independent is incorrec t.
Two blood p ressure readin gs (x an d y) are taken on the same person. If the in itial reading
on one subject is high compared to all other subjects, we would expect that also his or her
readi ng after 3 months would be high compared to the other patients. Each subject acts as
hi s or her own block. The va riabili ty between the two read ings from the same subject is
small, certainly much smaller than the variability across patients. It is the differences in the
blood pressure readings th at need to be analyzed. Taking differences elimin ates the subject
variability, which constitutes a large part of the va riability that we see in Figure 2.8.

J6

A llFV 1 EW OJ

IJASJC .,TA I' I ST I l A I ('()NC E PTS

---.-----,-

lnitIJI ----, - - -- - - _ _ _ _ _ _
_ -~- --~--

After l 111onths ------~


---~----~----, -- - ------,.--------,--

170

Figure 2.8

180

190

200

210

230

220

Dot Plots for the Blood Pm.sure Experiment

160

170

180

200

190

220

210

240

230

Blood f>rcssu re

Figure 2.9

Blood Pressure l::xpcnmcnt. Measurements for the Same Subject arc Connected

Wl' have redrawn the information in rigurc 2.9, but

WC

have COlllll'ltt'd the observations

th,1t uime from the -.amc subject. lt is obviow, lrnrn this graph th.it till' t1 pc of tn:allllL'llt
make;, a big diffrrenu.:. In all subjccb, blood pressure 1s rcdul.cd b)' tlic medicatiun.
Tlic rnrrect test procedure in this blocked (paired) experiment
ences and test whether 1 lu: 0

'"'''"'

. . . d

appropriate lest !'.tat1sllc

i;,

f.l:..11cr

II

s) v / n

1s

to uJmider the differ-

1,",,il

0 .1ga1 nst / / 1: ,,

\ltci

0. The

8.57. Jb probabi!it)' \'alue P[ t( 9) 8.5 7J

4.06/ \/JO

- 0.0000 I is essentially 1ero. Hence, there is vcry strong evidence that the 111edic.llion has
lowered the blood pressure. The average reduction isl J unib; the 95% confidence interval
for the reduction is given by

d :!:

t0

m(9)s)Vn, or 11 + (2.2622)(4 .06 )/YlO . The inter-

val extends from 8.10 to 13.9.

Comment. Herc we assess whether a particular drug "works." Of course, one should be
concerned that the observed effect is a com bi n.iti(ln oft wo effect\: the real effectiveness of the
drug and a placebo effect due to the person's belief of being given something useful. Apart
from much higher sample sizes, rDA approved drug studies w,ually mm pare a new experimental drug to the currently available "best-practice" drug. The best-practice drug could be
a placebo. Jn such a study, one would divide patients into two groups (preferably, at random)
and conduct the experiment discussed in this example with both groups. This would result in
two sets of blood pressure differences (final readings minus initial readings), one set for eJch
group. The procedure in Section 2.5. 7 for comparing the means of two independent samples
can he applied to test whether the me<tn dfclliveness of these tv\(l drug-, 1s diffrrcnt.

2.6 CASE STUDY: ADTEL


The following disrnssion is adapted from a J larvard Business School case reported in Chap
ter 5 ot Clarke ( 1987). In the past, the Barrett loods Company had enjoyed a market leadership position for ib peanut butter, but recent!)' was faced with

,1

dcclining market share

for this product. The company commissioned Ad !'cl, a marketing resea1ch company, to
sess thL' impact of

;I',

dramatically i11cre;1!-.cd advertising budget and dl'terminc the

potential payoff of a $6 million television advertising campaign \crsus the current S2 mill1011

A REVI E W OF llASIC STATISTICAL CONCEPTS

37

strategy. M anageme nt had estimated that a 15% sales in crease (established with 90% confidence or higher) would he required to justify the add ed expense.
AdTel maintained a 2,000-fa mil y panel. It also employed a dual-cable television system
to determin e the sa les effect of television advertising alternatives . Ad Tel had two separate ca b le circuits. Television sets owned by half of t he test-families were wired to cable A, while
those of the other half were wired to cable B. The panels were carefully balanced according
to demograrh ic characteristics and shopping prefe ren ces. Ry the push of a button, AdTel
was ab le to hlock the commercia l hroadcast on one sid e of th e cahle and simultaneously cut
in the desired test commercial , while the other side carried the regular program. The pan el
fam ilies record ed their rurchases in weekl y diaries.
The basic stud y covered a period of 18 months. The first 6 months represented a control
period, where hoth circuits received the same advertis ing at the level of the $2 million campaign. The ne xt 12 months represented the test period where advertising for panel A tripled.
To avoid distortions by families joining and dropp ing the panel durin g the test, a static
samp le was created that only included those fami lies return ing at least 80% of their di aries.
Panel A con tained 829 families, while panel B comprised 922. The average monthly volu mes
per fami ly and the m onthl y market shares of Barrett's peanut butter for the 18 m onths
(6 pretest and 12 test periods )"a re shown in Table 2.3.

Ti me sequence gra phs of average sale volu mes for Panels A and Bare shown in Figure 2.10.
Time series graphs of market sha res for Barrett's pean ut butter are given in Figure 2. I I.
The pretest data (weeks I - 6) show th at there is no appreciable difference between the
two panels. The grap hs also show convincingly that sales and market shares-for both panels A and B-change with the reporting period. Hence period is an important blocking vari able, and the analysis needs to be conducted with the monthl y differences between A and B.

TABLE

2.3

Volum e and Market Share for Barrett's Peanut Butter


---- - - - - -

Volume
Panel A

- ----- - -- --- - ---~-- -- - - - ------- -

Volume
Panel R

Period
( month )

Pretest
and Test

I
2
3
4

12
11
14
l'i
16
17

Pretest
Pretest
Pretest
Pretest
Pre test
Pretest
Test
Test
Test
Test
Test
Tc., t
Test
Test
Test
Test
Test

45
65

38

18

1est

4(!

47

-- ------ - - -- -----

5
6
7
8
9

10
II

43
22
31
17
29
.'\ I
22
21
29
2'J
46
40

38
5.1

47

Market
Share
Panel A

Market
Share
Panel fl
50.0
30.0
39.5
24.0
44.0
35.0
20.0
26.0
30.0

4
5
9
15
13
19
27

50.0
30.0
40.0
23.0
45.0
39.0
20.0
23.0
33.0
27.0
44.0
32.0
29 .0
38.0
41.0
43 .0
55.0

n.5
30.0
27.0
33.0
38.0
34.0
54.0

9.0
1.0

.7

51.0

57.0

-6.0

Volume
A- R

MMket
Share
A- R

41
23
31
18
25
25
22
23
25
32
42
35
29
38
34

26

2
- I
0
- I
4
6

0
- 2
4
)

- --------- -

.no

0.0

0.0
0.5
J.()

1.0
4.0
0.0
- \.0
.~. ()

6.0
0.5
2.0
2.0
5.0
.Hl

\81

A llLVll'W Or BASIC STAT!~,] IC Al CON< FPTS

Prele~l

65

SS

~
"
"
E
"
~

45

Cl.

15

-,--,-

1 -'--r

-r

T--r

,----,

Ill

(l

Jj

lh

18

l'criuJ

Figure 2.10

Panel A

Volume per Family: l'anl'ls A and B

l.l

lb

IH

Panel B

figurl' 2.11

Market ~hares: Panels A and B

Dot plot\ of monthh dtlferences of A and H for volume and market share are sho\\"11

111

Figure 2. 12.

\\'e lllm1der thl' t<.:st period (months 7- 18) and test//,:..,.

/J-H

()

sdl V t1

>

0. !'he test Stalislll> MC

0.88

4.321 Vl2

0.0167 and P[t(I I)

s) \

t1

9.98/\ 12

/J- ..\

2.43 for volume, and

0. 71 for market share. l'hc probabilit )'"\',ii ucs arc

I'! t( I I )

' 2.U]

0.88 [ = O. l 988, respectively.

A REVIEW OF BASIC STATISTICAL CONCEPTS

20

JO

39

Dotplot for A - B: volume during test period

Dot plot for A -- B: market shares during test period

Figure 2.12

Dot Plots of Monthly Differences in Volume and Market Share

There is evidence that the increased advertising has increased the volume. The average
increase of seven units is statistically significant; a 90% confidence interval for the mean increase extends from 7 - (1.7959)(9.98)/\/T:Z = 1.83 to 7 + (l.7959)(9.98)/\/12 = 12. I 7.
An increase of seven units over the average for panel B with standard marketing (which
is 32.58 units) represents a 21.5% increase in sales. However, the lower limit of a 90910 confidence interval for the percent increase in sales amounts to only I 00( 1.83/32.58)

5.6%.

It appears from the graph in Figure 2.10 that the extra advertising has done very little during the first 6 months of the test period. It is only during periods 13 through 17 that we notice appreciahle differences. The last period is also quite remarkable, in that the benefit of
the extra advertising has disappeared completely. In summary, while we sec some increase
in volume due to the increased advertising, it is doubtful that this strategy meets management's goal of a I 5/ri sales increase that can be established with minimum 90% confidence.
A conclusion that increased advertising has affected market share is even less convincing; the small average increase of 0.88 percentage points is not statistically significant ..

2.7

NOBODY ASKED US, BUT . . .

What is now called the normal distribution first appeared in 1733 in a paper by the French
mathematician Abraham de Moivre. (For a discussion of the paper, see Anders Hald ( 1986),
History of Probability and Statistics and Their Applications Before 1750.) At the time, games
of chance such as tossing coins or rolling dice were very popular, and both gamblers and
mathematicians were interested in knowing the probabilities of various outcomes. The binomial distribution was well known, but calculating binomial probabilities was extremely
difficult computationally if the number of trials n was fairly large, and impossible if n was
very large. In his paper, de Moivre derived the equation that would later be called the normal density function as an approximation to the binomial when the number of trials is very
large. Later, Laplace in 1783 and Gauss in 1809 made important contributions by developing theoretical arguments to support the normal distribution as a model of errors of measurement, in particular for errors in the observations of heavenly bodies. Over time, interest in this probability distrihution continued to grow with noteworthy contributions made
by the Belgian social statistician Adolphe Quetelet (1796-1874) who used the normal distribution to describe data on measurements of physical characteristics. Quetelet used the

40

A REVIEW OF BASIC STA ' [ ISTICAL t.UNLEPTS

norm JI distribution to measure variations about the "average rnan." The name normal was
first i!pplicd to the distribution in the 1870s by Calton and several others, reflecting the fact
that the distribution described the nurmul or natural variation in many observed phenom ena. (See Chapter 22 of Stigler, 1999, Statistics un the Table, for an interesting discussion of
how the normal got its name.)

Jn 1908, in the paper "The Probable Error of a Mean," William Gossett derived the prob ability distribution that became known as the -distribution. Gossett, a young chemist and
statistician, was studying quality problems at the Guinness brewery in Dublin. I le was interested in calculating the probability that a population mean was within a specified dis tance tlf a sample average. The approach had been to use s/"Vn as an estimate of <Tl V~z, the
standard deviation of the distribution of sample averages, and to calculate the probabilities
using the normal distribution. Gossett knew this worked well for large samples where s
would be close to <T. But he realized that when n was small, calculated values ofs would vary
greatly and therefore so would the estimate s!Vn. As a consequence, errors in the calculated
probabilities would be large. This led him to derive a theoretical density function for the
. bl e '/.
ran d om vana

= -y -, /
s/ v n

. tot h e d ens1ty
. 1"'unction
.
"' ,'/_
an d to compare 1t
1or

= y - - , tI1c
u!Vn

density function for the standard normal distribution. He showed that because oi'thc variability in the estimate s/Vn, his new distribution was more likely than the normal to take
on values in the tails. Gossett's employers at Cuinness viewed their quality efforts as pro prietary, and as a result, he published his papers under the name Student. His distribution
became known as Student's t-distribution.
Students in introductory statistics courses are often puzzled by the foct that the sample
variance s 2

2. ;'

2
l, and not by n.
1(y, - Y) /(n - J) divides the sum of squares bv 11
The division by n - I is a consequence of having to calculate the sum of squares around the
-

sarnpk 111ea11y instead ui'the unknuw1 1 pupulc1t1<111 mca11 .If the pupul,il1<111111ea11 .were
knuw11, the quantity :L;'~ 1(y - ) 2111 would be an unbiased esti111ak ui' the pupulatiun
variance u 2 It is easy to show that the sum of squares 2: '.' 1(y,
. ) 2 i, smallest if. = y.

Y) 2 is always -;mailer than the SU Ill or squares aruund


the unknown, and the estimate of CT 2 that divides :L ;'~ 1(y, - Y) 2 by n would be too small.
Hence the sum of squares :L ;'- 1(Y,

Dividing the sum of squares by n -

I compensates for this and eliminates the bias.

Hased on the central limit theorem, we stated that for reasonably large sample sizes, the
distribution of the sample average will be approximately normal regardless of the population distribution of the individual values. How large due; the sample si/.e n have to be'
Many introductory textbooks specify or at least suggest that n should he at least 30. But in
most cases of practical interest that figure is too high. For example, in statistical process control, sa111ple averages are plotted on X - bar control charts, sometimes called Shcwhart chJrts
for their originator, Walter Shewhart. Typically, samples of size 4 or 5 arc used, and the
control limits arc based on sample averages following a normal distribution. In his classic
book, Economic Control u[Quality o[Manujuctured Product, originally published in 193l,
Shewhart presented the results of his experiments taking 1,000 sample averages of size 4
from populations that were rectangular (uniform ) and triangular. In b(lth cases, the sample
averages were well approximated by a normal distribution. As Shewhart said, "The close-

/\ RF.VJF W OF llASI C ST/\T I ST I C/\T. C ONC E PTS

-- - --

- - -

11

ness of tit ~s striking and illustrates the rapid approach of the distribution to normality as
the sample size is increased. Such evidence ... leads us to believe that in almost all cases in
practice we may establish sampling limits for averages of four or more upon the basis of
normal law theory. " f n some instances, sample sizes greater than 30 may be needed if the
distributions of population values are highly skewed with long tails (sampling from an exponential distribution would be one example). Rut in the experiments we consider in this
book, these situations would be highly unlikel y.
As we discussed in this chapter, for confidence intervals and hypothesi s tests on the population proportion rr, large samples are typically needed before the central limit theorem
takes effect. In these cases the normal distribution is approximating the binomial distribution, which for rr = 0.5 is symmetric. As the value of 1T moves away from 0.5, the binomial
distribution becomes increasi ngly skewed, and larger samples are needed before the normal
approximates it well. A useful rule of thumb is that the normal distribution is a good approximation if nrr > IO for rr :s 0.5, and n(l - rr) > JO for 1T 2 0.5. With this rule, a
sample size of I00 will be large eno ugh (for the normal to be a good approximation) as long
as rr is no smaller than 0.1 or no larger than 0.9.

A REVIEW OF BASIC STATJSTJCAL CONCEPTS

43

success proportions is detected with reasonably large power. A planning value for the common success proportion (7T) and a meaningful detectable difference (8) of the two success
proportions need to be specified. Information on the success rate is usually available from
prior experiments, and worthwhile changes are determined with economic considerations
in mind. In our illustration, this has led us to the values

1T

0.03 and iS

= 0.005.

The discussion of the sample size in Section 2.5.6 is different in two respects. First, it focuses on the one sample situation, not on comparative experiments. Second, it determines
the sample size that is needed to achieve a certain precision of the estimate (either a single
mean or a single proportion), but does not address the power of detecting a certain meaningful difference.
EXERCISES

Exercise I The file cantrihution summarizes the 2004 contributions to a selective private
liberal arts college. Refer to Section 2.3 for a description of the data set.
(a) Confirm the information in Table 2.1 and Figures 2.6 and 2. 7. Use available computer software such as Excel or Mini tab.
(b) Consider some of the other factors that were not used in Section 2.3. In particular,
assess the effect of gender, marital status, graduation status (graduated/not graduated), and major on the likelihood of donating and the donation amount.
Exercise 2 Consider the data in Section 2.6 on AdTel. Recreate the information in Figures 2.10 through 2.12 and the results of the hypothesis test discussed in this section.
Exercise 3 Search the Web for useful statistics applets. You can do this by searching for expressions such as "applets for central limit effect," "applets for confidence intervals," "applets for hypothesis testing," "applets for visualization of statistical concepts," or "applets
for sample size." Experiment with these applets. These applets will reinforce the concepts'
discussed in Section 2.5. They will demonstrate the central limit effect by drawing repeated
samples of a certain specified size. They show through simulations how the variability of
sample statistics such as the sample mean decreases with increasing sample size. They show
through simulations that 95'Vr1 confidence intervals for a mean or a proportion cover the
population (process) mean and proportion in 95% of the cases. Applets for the correlation
coefficient illustrate the connection between scatter plots and correlation coefficients, and
they show how the correlation changes if observations are changed.
Exercise 4 John, in charge of custodial services at the business school, installed brand new
lightbulbs into the offices of the marketing faculty. He kept track of burned-out bulbs and
the times when he had to replace them. After 12 months, he had to replace 25 of the 30
bulbs. The length of life (in weeks) for the 25 bulbs is given below:
33

19

11

22

22

15

37

10

38

19

20

23

50

30

22

10

15

37

15

22

40

22

46

A REVIEW OP BASIC STATISTICAL CONCEPTS

45

to reject the null hypothesis that, = 1.00 lb, in favor of the alternative that
, > 1.00 lh?
( c) Assume that the distribution of weights is normal; furthermore assume that the
sample average and standard deviation are good estimates of the corresponding
population characteristics, and a. Calculate the proportion of loaves that are underweight (i.e., weigh less than 1.0 pounds).
(d) Predict the weight of a single loaf from this morning's production. Obtain an approximate 95% prediction interval.
Exercise 8 Prior studies showed that the standard deviation among individual measurements of a certain air pollutant is 0.6 parts per million (ppm). You are planning on using
the information from a random sample (i.e., the sample average) to estimate the unknown
process (population) mean,. How large do you have to select the sample size if you want
to be 95'Yo certain that your sample average is within plus or minus 0.2 ppm of the unknown
process mean?
Exercise 9 Thirty lightbulbs were selected randomly from among a very large production
batch, and they were put on test to determine the time until they burn out. The average failure time for these 30 bulbs was 1,080 hours; the sample standard deviation was 210 hours.
The lightbulbs are advertised as having a mean life length of 1,200 hours. Test this hypothesis against the alternative that the mean life length of this batch is actually smaller than
l ,200 hours.
Exercise 10 A new emergency procedure was developed to reduce the time that is required to fix a certain manufacturing problem. Past data under the old system were available ( n = 25). The staff was trained under the new procedure, and the response times for
the next 15 occurrences of this manufacturing problem were recorded.
Old Procedure

4.3

6.5

4.6

4.3

6.4

4.8

5.1

6.8

4.9

4.5

5.1

7.3

5.0

4.6

7.0

5. l

3.8

5.2

4.1

5.7

4.6

5.9

3.1

6.2

6.0

3.3

New Procedure

6.2

4.0

3.3

4.5

2.3

3.0

3.2

3.7

4.5

5.3

4.0

5.4

4.3

3.8

Compare the response times under the old and the new procedure. Are there differences in
the mean responses? Discuss using appropriate graphs, summary statistics, and statistical
tests. Would you switch to the new procedure?
Exercise I I

Two different fabrics are tested on a wear tester. A wear tester is a mechanical

device that rubs the attached fabric against a fixed object. This particular machine has two
separate attachments that allow us to compare two pieces of fabrics in the same run. The

44

A REVIEW OF HAS!C STATISTICAL CONCEPTS

(a) Obtain a dot diagram and calculate the mean, median, and standard deviation of
the 25 observations. Calculate the 90th percentile.
(h) We would like to obtain the mean and the median lifr length ofall 30 lightbulbs.
However, by the end of the 12 months, five bulbs had nut yet burned out. Can you
calculate the median of the 30 observations without waiting until the five remaining bulbs faiF
Can you calculate the mean of the 30 observations without waiting until the five
remaining bulbs fail? If not, what can you say about this mean?

Exerci~e

The data set given below list;, the annual 2005 sc1lary (in ~:>i,000) ctnd the edu-

cational background for a sample uf 25 employees Jt a large J\lidwestl'rn manufaLturing


company. Educational background is measured by the number of years of formal schooling
( 12 refers to a high school graduate; 16 refers to a college graduate; 17 through 20 rcCer tu
college degree plus the number of years of graduate work).
Emplorcc

3
4
6
7

9
10
II
12
13

Education

Salary

1'.mployee

Education

Sabi y

16
12
12
I6
18
15
11
12
II

52.J
43.7
39.5
47.8
53.0
49.0
33.7
J2. I

lI
15
16
17
18
19
20
21

4<J.4
45.4
4U
37.(,

9.8

22

20

37.7
26.3
22.0
27.0

23
24

17
16
13
12
12
19
16
17
16
12
]6
16

I')
16
16

25

.lU
M.8
'ilJ."
54.5
27.3
14.8
2 l .7
33.~

Comtruct a scatter diagram of salary against educational achievement. Calculate the cum:latio11 coefficient.

Exercise 6

Jn an NYT/CBS poll, 561<, of 2,000 randomly selected voters in New York City

said that they would vote for the incumbent in a certain two-person race. Calculate a 95%
confidence interval for the population proportion. Discuss ib i1nplication. Carefully discuss
what 1s meant by the population, how you would carry out the random sampling, and what
other foctors could lead to differences between the responses in the survey and the actmd
votes

011

the day of the election.

Exercise 7 A S<lmp\c oF n = 5ll bread loaves is taken frnm the si1.ab\e productiun that left
our bc1K.ery this lTlomintj. 'We \'md that the aveta\!,e wei\l,ht u\ the 5ll luaves 1s ~ .ll5 \)\lUnds,
the standard deviation is s - 0.06 pound.
(a) Obtain a 95% confidence interval for the mean weight ulthis morning's production.
( b) One of the employees claims that the current process produces loaves that are
heavier than one pound, on average. Is there enough information in our sample

46

A REVIEW OF llASJC STATISTICAL CONCEPTS

weight losses (in milligrams) from 8 runs are as follows:


RUN

Fahnl
)\

36

l:l

3\1

26
2>

31
35

38

28

17

22

2\1

42

_JI

39

21

_\ 2

Analyze the data and determine whether the mean wear of fabric A is different from that of
fabric G. If it is different, how docs it differ?
Uiscuss why the design of assigning both fabrics to <.:ach run is preferable tu a design that
assigns fabric A to both positions of runs I - 4, and fabric B to both positions of runs 5 - 8.
Exercise 12 In the past, the sign-up rate for your credit card has bt:<.:11 around 6;(1. Your
marketing team wants to decide between two different sets of promotional materials that it
plans to send to potential customers: a traditional set that is vny similar to the one that has
been used in the past, and a new, bolder set that is expected to increase the sign-up rate. Be fore switching to the new materials, your company wants to run a comparative experiment
that evaluates the two sign-up rates. Assuming a significance level oJ 0.05, determine the
common sample size for the two groups that can detect a l % increase in the sign-up rate
with power of0.95. How does the sample size change if you require less power (say, 0.)10 ur
0.80 )? How docs the sample size change if you want to detect a differrnce of one-half of a
percent? You may want to use computer software to carry out the calculations.
Exercise 13 Using a com put er software of your choice, perform Shcwhart's experiment of
drawing random samples of size 4 from a continuous uniform d istrihution between 0 and l.
You may use the Minitab function "Cale > Random Data > Uniform " or the Excel command RAND(). Generate four columns of I ,000 random numbers, ,1nd calculate l,000

sample averages from samples of size 4. Construct a histogram of the 4,000 individual ob servations, and calculate their mean and standard deviation. Construct a histogram of the
I ,000 sample averages, and calculate their mean and standard dcvi<1tiu1l. Compare the twu
histograms and the two mean and standard deviation estimates. Arc tlll'sl' rc,ults what )'OU
expected to see? Explain.

____,

because each of the three promotions is assigned to every block. !'he design is called a ran
dom11.ed complete block design as, within a given block, promution., arL' r.111du111l: a ...... 1g11ed
to weeks. In some situations, the number of treatmcnb is greater than the size of the bl<>Lk
and d u>mplete block experiment is not possible. In those situatiom, whid1 we will not
di.,cuss, it is possible to construct incomplete block experiments (sec Bo\, I lunter, <lild
Hunter, 2005).
The blocking approach has potential advantages. Suppose one partiLular store has some
charalleristic that would make its sales volume particularly high under all three promotions
A, B, and C. In the earlier completely randomized design, a store is assigned to a single promotion, and it is due to chance whether the well-performing store becomes part of promotion A, B, or C. If the store were assigned to A, then A would benefit. I lowever, this benefit
i., not due to the treatment, but due to the store effect. ll the store happened to be assigned
\o group C, then C would benefit. Through "blocking," we LOntrol for the "luck of the
draw" bv assigning all three treatments to every store (block). 1n the randomized block de
sign, we can focus on the relative changes within the block, thw, 1...1111..cltng uut pms1blc block
effects. Consequently, any differences in results will be due to the treatments, not the stores.
If there is an actual block effect, the r.1ndom1zed complete bloLk de-,1g11 11ill inLrl".Ise our
abilit: to detect differences in the trcatrncnb.
Br blocking on '>lon:s, we have eliminated a possible stun: dll'Ll. But -,,tics 1rnght ,d-,o be
affected by a lime (week) effect. The three I-week penods might not be homogeneou'>; ex
pccted sales might vary from week to week. Randomi1ing the a,,1gn111cnl ol the pronwttons
aero.,, the three l 11cck periods o( each hl<llk i, import,rnt l1cL.1use 11 'prcad'> a pm..,tlilc
week\ dfcct across the three lreat111e11h. But rt 1s pos..,ihlc to do hctte1, b) block111g the ex
perimL'lll with respect lo weeks as well .1s '>lores. l-,xpenmrnts that hilllk on two factors
(here, stores and weeks) are called /11//11 s111111re dl'si.;ns. \\'e drsllt'-s th rs type of dc ..,ign 111
C:haptn 7 and in Case I I of the ca'>e stud)' appendix.
Our discussion of the randomi1ed LOmplcte block expenment is ,111.1tural exlem1011 of
the material covered in Chapter 2. There, in comparing two mcam, we di.,cusscd the difference between the completely randomi1ed experiment in which two tre<1l111cnts are assigned
to two groups of different experimental unit'>, and the paired cornpan-,011 (blocked expcri
me11t) in which each unit receive'> both treat111e111'>. We iilLrslrated the p.11rLd Lurnparr.,on
approad1 in a test of a hluod prcssu1T 111cd1cat101l. We showed that 111L'.Isunng blood pres
sure on the same patient before <llld alter treatment eli1111n,1tcd the 1ar1,1tion in prL.,.,urc
among patients, increasing the preusion ot the test and hence it'> ability to dctcLI diltcrcncc.'>
in the two trealmenu,, In this chapter, in the randomized cornpletc block experiment, we
will apply this idea to the comparison of more than two treatments.
).21'HE COMPLE1'ELY RANDOMlZED fXPERlMEN'f

lmn planned and executed a le:,l of the e\Tecllvene::.s of three <liffcrent prnduct display::..
hfteen stores were available for the test, and each display \\<1s used in live different stores.
I<.1 make the results comparable and lo minim11e bias, displays and sturcs were randomly
assigned. Sales volume for the week during which the displa1 was present was measured and

f\

TESTING DIFFER E N CES AMONG SEVERAL MEANS

49

3.1

TABLE

Mark eting Study: Results of a Test of Three Product Displays


DISPLAY

2
9 .5
3.2
4.7
75
8.3

Samrlc si ze
Sample mea n
Sample v;uiancc

8.5

7.7

9.0

11.3
9.7

7.9

'i.O
3.2
------- ------

II .5
-

-----

12.4
- ----- ------

6.64
6.82

6.72

10.52

6.28

3.43

TA BLE

3.2

Observations an d Summary Statistics fork Treatment Groups


(the General Case)

-------- - - -

---

TRfA TM F. NT GROUPS

k
Y1 1
Y12

- - - - - - - - - - - --

Sample si1.c
Sample mea n

Sa mple vari<Jnce

compared to the hase sa les of that store. Percentage changes were calculated, and they are
given in Table 3.1.
In this particul ar example, the observations come from three treatment groups. In gen-.
era I, there arc k treatm ent gro ups with observation s y1,. The first subscript denot es th e treatment group ( 1 = I, 2, . . . , kJ, while the second subscript j denotes the replication. A li stin g
of the observations for the ca se of k treatment gro ups is shown in Table 3.2.
Note that the number of observations in the k treatment groups need not be th e same.
Let us denote the number of observations by n 1, n 2, .. . , nk, and the total number of obse rvations by N = L: ~ _1 n,. We ca ll the study balanced if the sa mple sizes in the k groups a rc the
same. There are adva ntages tt) having equal (or nea rl y equal) sample sizes. Balanced des igns
allow us to estimate the treatment means with uniform precision, and they maximi ze the
power of the test procedure (the F-test) that is discussed in this section.
Table 3.2 also lists summ ary statistics for the k treatment groups. The sample m ean a nd
the sample variance for treatment group i are given by

r;1
j=
I

y = -I
n,

(Y;i - Y;)

and

j=I

sf=---- - n; -

so

rESTJNG DIHERENCF.s AMONG sLVERAL MEANS

See Section 2.3. There are k = 3 groups in Table 3.1, with cqwil sJmple sizes n1 = n2 n3 = 5 and total sample size N = 15. You may want to check the sample means and the
sample variances that are given in Table 3.1, using a calculator or a statistical software
package.
We assume that the samples were randomly drawn from mrmal populations with possibly different means , 1, , 2 , .. , ,k, but each with the same variance cr 2. Civen the sample
result;, we wish to test the null hypothesis that the k population means arc equal against the
altcrnati ve that at least one of the means is different. Formally, we have

H 1 : Not all population meam are the same.

In the following sections we will discuss a statistical test for this null hypothesis. Initially, we
i1ssume that the sample sizes in the k treatment groups are the same (n
n2
11,
11), because this will make it easier for us to motivate till' procedure. Liter we will relax tlii:., ,1ssumption, considering thL' more ge11eral L.JSl' when s.1mpk .,i1.cs a1e different.
1

3.2.1 Variation Within Samples


The sample variance of each treatment group is an estimate uf the
variance u 2 The sample variance in group i is given by

co111111011

population

SS,

n - 1

n - l

The numerator of the sample variance is the sum of the squared deviations of the observation> from their mean, and we denote it by SS,. The denorni11<1tor 11
l is the number of
degrees of freedom that is associated with this sum of squares.
Jn our example of three groups (k = 3) of five observatiom each (n - 5), we have
52
I

SS 1
4

(9.5 - 6.64) 2

+ (3.2

- 6.64) 1 I (4.7

6.64) 2

+ (7.5 - 6.64) 2 + (8.3 - 6.64) 2

4
27.27

d-

= 6.82

(8.5 - 6.72) 2

+ (9.0 - 6.72) 2 + (7.9

6.72)
4

25.11
-

6.28

'

(5.0

6.72)'

-+-

(3.2

6.72f

TESTING DIFFER E N CE S AMONG SEVERAL MEANS

s~ =

SI

4
(7.7 - 10.52 ) 2

+ (11.3 - 10.52) 2 + (9.7 - 10.52) 2 + (11.5 - 10.52) 2 + (12 .4 - 10.52) 2


4

13.73

= -

--

3.43

Each sample varian ce is a n estimate of the common population variance rr 2 , and this is true
whether or not the population means are the same. The average of these three vari a nces,

( s;' + s;

mate of <T

si)/ 3 "" ( <i .82 + 6.28 + 3.43 )/3

= 5 .51, provides an even better pooled esti-

It estimates the variation within samples.

Jn the general c;ise with k treatments (groups) and varying sample sizes, the pooled estimate of the population variance u 2 is given by

sfv

(n1

--

l) si + (n2

----- -

( n1 -

--

l }s ~

+
----

+ (n1 -

I ) + ( n; - I) -t

+ ( n1

--

I )s l

n1 -

nz -

----- s1 + - - - si +
N-k
N-k

l}

n, -

+ .. -- s2
N - k

It is a weighted avernge of the k individual within-sample variances; hence the subscript W


( Wfor "within"). The weights are proportional to th e degrees of freedom that are associated
with each variance. In the case of equal sample sizes, the pooled estimate simplifies to the
unweighted average of the individual estimates.
The pooled estimate is called the within-sample (or within-treatment) estimate of u 1
Since (n; - l )s~ = SS;, we can write it as
k

n,

L L (rij - rY

i= I j = I

N- k
The numerator in this equation is called the sum of squares within groups (SSW). Its degrees
of freedom are given by the sum of the degrees of freedom of the individual sums of squares,

+ (nk - l) = ( 2:~= 1n;) - k = N - k. The pooled estimate of the population variances~ is the ratio of the sum of squares SSW and its degrees of
freedom N - k; it is called the mean square error within groups.

(n 1 - I) + (n 2 - l) +

In our example, N - k = 15 - 3
5 f + s~ +s ~

2 = ------
Sw

----- -

12, and

SS 1 + SS2 + SS3
--- -----------

12

27.72

25.11
-

+ 13.73

-----~ - - - - -

12

5.51

3.2.2 Variation Between Samples


We discussed the central limit effect for a sample average in Chapter 2. Suppose we
take random samples of n observations from a population with mean , and standard
deviation rr. We learned in Section 2.5. l that the sampling distribution of sample averages

y=

(y 1 + y2 + -; : Yn) I n is approximately normal with mean , and standard

52

deviation

TESTING DIFFERENCES AMONG S~VERAL MEANS

Uy=

cr!Vn. The variance of the sampling distribution is given by the square of

u; =

the standard deviation,

n.

<J" /

Assume that the sample sizes of the k treatment groups arc the same and suppose that the
nullhypothesis, 1 = , 2

,,istrue.Thenthegroupaveragesy,(fori= 1,2, ... , k)

are realizations from the same distribution with a common mean and variance
sample variance of the k group averages y1, )'i, ... , Yk>

<J"

! n. The

52

(5'1 - y)2 + (Yi - y)2 +

y )2

L(Y,
;

k -

k- l

is an C'>timate of<r 7/ n. Hence, a second estimatl' of the variance

<J"

is given by

L n(y, 2 SB -

ns;

The numerator 'L~


mean Y -

y)2

I _ _ __

n(Y,- Y) 2 measuresthevariationofthekgroupmcansfrumthegrand

2:, 2: 1y,/ N;

it represents the variability between the group' tnd is \.Jllcd the sum

of squares between groups (SSE). The denominator k -

l is the number oC degrees

of freedom that is associated with this sum of squares; note that there arc k means and one
restriction. The estimate s~ is called the mean square between groups; hence the subscript B.
In the example in Table 3.1 with n = 5, y 1 = 6.64,
6.72 I 10.52)/3 = 7.96,

52 =

(6.64

7.96) 2

--

+ (6.72 - 7.96) 2
--

y2

= 6.72,J\ = 10.52,y = (6.64

-1 ( 10.52

7.96 )'

4.92

and
k

L n(r, slB --

1=

rl2

(5)(4.92) - 24 .6

3.2.3 Comparing the Within-Sample and the Between-Sample


Estimates of if 2
ln the example, the between-sarrnritc estimate s1i ~ 24.6 is much larg,cr than the withinsamp\e estimates~\ = 5.5 l. What inferences can we draw frnm this \focrepanc/~
The within-sample estimate

IS

an estimate

ur the

population v.iriance cr 2 , and this is

true whether or not the population means are equal. The bdwcen-samplc estimate is abo
an estimate of u 2, but only if the population means are ihe same. lf they are nut, the
between-sample estimate is inflated, in that it also reflects the Jifferences between the population means.

TESTING DIFFEREN ~ F.S AMONG SEVERAL MEANS

53

A test of the null hypothesis that the population means are equal examines the ratio of
the hetween-samplc and the within-sample variance estimates, syil s~ = 24.6/5.51 = 4.46.
Under the null hypothesis of equal population means, the two estimates of rr 2 will be similar in magnitude, and the ratio will be close to I. If the null hypothesis is false, the numerator in this ratio will be larger' than the denominator, and the ratio will be greater than I.
How large does the ratio have to be before one can reject the null hypothesis that the population means arc equal? Th e answer is given by the F-distribution, which was introduced
in Section 2.2.2. The f-distribution is used to test the equality of two variances and arises in
the following way. Suppose we take two independent samples from a normal distribution
with variance rr 2: one sample of size n 1 and the other of size n2 Then the ratio of the two

sf

sample variances --:;

S2

and n 2

2:;'c b1 1 -Y1) l(n1 -l)


----- - -- follows an F-distribution with
L " '- b z, - Y2) 2/ (n 2 - I)

11 1 - -

I degrees of fr eedom.

Applying this result to our problem, a test of the null hypothesis that the k population
means are equal is given by the ratio of the between-sample and the within-sample varian ce
estimates,
k

L nCr, -

SSHl(k - I)
--- - -

r) 21(k - 1)

i= l

-------

SSWl(N - k)

L L (Yi; 1=

i-

y,)2/(N - k)

Under the null hypothesis of equal population means, this F-ratio follows an F-distribution
with k - 1 and N - k degrees of freedom. In our example, F = s ~I s~ = 24.6/ 5.51 = 4.46.
The numerator has k - I = 3 - 1 = 2 degrees of freedom, while the denominator has

N -- k

15 - 3 = 12 degrees of freedom. The probability value is the probability of ob-

taining the value 4.46 or larger from this F-distribution. It is given by P [ F(2, 12)

4.46 J =

0.036. This is small er than the usually adopted 5% significance level and gives us reason to .
reject the null hypothesis. We conclude that there are differences among the three population means.
Excel or any other statistical software package can be used to find the probability value .
For example, the Excel command FDIST(4.46, 2, 12) returns the probability value 0 .036.
Alternatively, we ohtain the 95th percentile of the F(2, 12) distribution and use it as the critical value for the test. Th e Excel command FINV(0.05, 2, 12) returns 3.89. Our test stati stic
exceeds this critical value.

3.2.4 The Analysis of Variance Table and the Output


of Standard Computer Software
The earlier equation for the F-statistic assumes that the sample sizes are the same for all
treatments groups. For the more general case with different sample sizes, the equation must
he modified slightly. The only change is in the expression for the sum of squares between

s4

T F s T 1N c, P 1 I r 1c R EN c: Es A iYI o N c, s r.

v r: RA L

rABLE

ME A N s

3. 3

ANO VA Table Jor the Curnpletely Randomized bcpenment


'.i<llll"le of

Sum of Squares !JS

Varlcllton

L n,(y,

Between groups

Degrees of
heedom df

i\lear
Squares MS

k- I

5513/(k - I)

N - k

SSW!(N

ni

F-l<alio

S::>Wl (N

'

SSBI( k

--

I)
k)

"
L L (1,1 - _yy

Wi1hin groups

'

I I

"
2: 2: (y,J -

lt>t.il

I I

'

groups, SSB = 2:~

k)

N - I

r) 2

n,(Y , - j/)2. The common sample size

in the earlier expression is re -

rt

placed by n,. The F-statistic is given by


k

L n,(y, F-

y) 2!(k - I)

L 2: (y,J I

y,)2/(N - k)

The calculations are summarized in a convenient format in the anu/ysis of variance

(ANO VA) table; see Table 3.3. lt can be shown that the total sum of thL' squared deviations
of the observations from their common mean SST - 2: ~

~ ;' 1 (y,,

Y) 2 can be parti -

tioned els
k

2
1

2, (y,,
I

y) 2

211,(Y,
I

)', )

Y )'

.'lSH
\.'i Betwi:t:n ( Jruup.,

.)_\\ \

\.\ \I\ I Lit l 11 l

J/ l l LJ f l~

The lolill sum ofsl/lllln:s (SST) is -,l10w11 in the lc1st row ulTabll' .1.3. It llll'asurcs the vari abilit} of the obscrvatiom around their LOlllmon mean y
grees of freedom arc N

~ ~ 1 ~ ; 1y ,/ N , and il\ de -

l.

Programs for calculating the A NOVA table and for testing the hypothesis that the population means are the same are part of most statistical software packages. TC.) illustrate, we use
Minitab, a popular and useful statistical software package. Commands in Minitab carry out
statistical analyses of data that are entered into columns and rows of a oprcadsheet. In this
example, one enters the response (percent changes) and the treatment identifier (I, 2, 3)
into two columns, say, columns I and 2. There <lfC 15 rows in each colullln because there
are 15 stores. The fi.rst row has 9.5 in column I and I in colu1rn 2; the second ruw has 3.2
in column I and I in column 2; ... ; the last row has 12.4 in column I and 3 in column 2.
The consecutive arrangement of the 15 stores is arbitrary given that there is no particular
order lo the stores. The Minitab command "ANOVA >One-Way" provides the ANOVA
table and the confidence intervals that you see at the bottom of Table 3.4. Other statistical
software packages such as JMP and SPSS work in pretty much the same way.

TESTIN G DIFl'ERFN CE S AMONG SEVERAL MF.ANS

TARLE

S'i

3.4

Minitah Output: 1(>st of Three Product Displays


-

Source DF
SS
Display 2 49.17
Error
12
66.11
Total
14 115. 28

MS
24.58
5. 51

- --

---------- - -- - -----

F
P
4.46 0.036
Individual 95% Cis For Mean Based on
Pooled StDev

Display
1
2
3

N
Mean
5 6.640
5 6.720
5 10.520

StDev
2.611
2.505
1.853

---+---------+----- ---- +---- -----+----( --------- ---------)

,,

(------- - *----- - --)


( --- ----- ,, - - ------)
--- +-- -- ---~- +---------+---------+----. -.

5. 0

7. 5

10. 0

12. 5

Pooled StDev=2.347
Table 3.4 shows the F-statistic and the probability value that we had calculated earlier.
The test result provides fairly strong evidence that the means are different. Once we have decided that there arc differences among th e population means, we look at how they differ.
This can be done by di splaying the data graphically. Dot diagrams, separate for each group
but shown o n the same scale, are very informative because they show differences in the levels as well as differen ces in the variability (which, for our test to work, should be about the
sa me). The sample means and their 9S% confidence intervals shown in Table 3.4 arc also
very informative. The mean square error within the groups,

stv =

S.S J, estimates the vari-

ance of individual observations by pooling the variability across the k groups. Its square root
gives the pooled standard deviiltion

Sw

= vs.Si =

2.347, which is also listed in the

Minitah output. We use it to calculate confidence intervals of the population mea ns. The
9S% confidence interval for }.L, is given by

y, ::+: ( t)--;;w-' , where


\ n,

tis the 97.S th percentile of


.

the I-distribution with N - k degrees of freedom, which arc the degrees of freedom that are
associated with the poo led estimate.
The confidence intervals show how the means differ. The third display is more effective.
than the first two. Also, there is not much difference between displays 2 and 3.

3.3

THE RANDOMIZED COMPLETE BLOCK EXPERIMENT

A large supermarket chain plans to test three different versions of an in-store promotion .
The firm identifies l S stores in one region to participate in the experiment. A test of a particular version of the promotion in a store will run for one week. The company wants to run
th e entire experiment over a consecutive 5-week period and plans to run three tests per
week. Initially, the marketing director decided to use the following completely randomized
d esign. Each promotion strategy would be randomly assigned to five stores. Then three of
the 15 stores would be randomly chosen for the first week, three other stores would be randomly chosen for th e second week, and so forth.
A young analyst in the marketing department, fresh from a course in experimental
design, suggests an alternative. She reali zes that in a completely randomized design there
is a chance that all three stores selected for week I could have been assigned promotion

S6

TES'! !NG DIFFERENCES AMONG SEVERAL MEANS

TAJJLE 3.5
Results of a Blocked Experiment with Three Different In-Store Promotions
HLOLK (WEEK)

--

Treatment
Version l
Version 2
Version 3
Block (week) Average

Week I

Week2

52
60
56
56

47
55
48
50

Treatment
Average

TABLE

Week3

Week 4

Week 5

44

51
52
44

43

38

47.2
51.8
46.2

49

41

48.4

49
45
46

42

3.6

Responses from the Randomized Complete Hluck Experimcti t (the Grnaal Case)
BLULK

Treatment

Treatment
Average

Y21

Y12
Y22

Y11.
Yi.

y,.
y,

Yu

Yk2

YH
Yv

y..

Y11

llluck Average

Y1

Yt

---

y,

(treatment) A, while in week 2, none of the stores would be k.'>ting A. She points out that
the dT1:lliveness of .t promotion might depend on the week in which it is run, due to differences in the weather and other conditions that might vary l:orn week to week. She explains, "In this case there are two sources of variation, the promotions themselves and 1he
week. 'fo eliminate the week as a source of variation, we should run the experiment in
'blocks' with each of the three versions of the promotion tested in every week." The marketing director is impressed and decides to follow the analyst's ,1dvicc. The design and results arc shown in Table 3.5. Based on the percentage increase in sales, the firm measures
the effectiveness of a promotion on a scale from 0 to lOO.
!11 the general randomized complete block design, there arc k treatments and /;blocks,
and n ~ bk observations. The observations y, 1, for i = 1, 2, ... , k (treatments) and j = 1,
2, ... , 11 (blocks ), are arranged in Table 3.6. The observations irt the second column represent the k treatment responses on the first block; the observations in the third column are
the responses on the second block, and so forth; and the penultimate column contains the
responses on block b. The treatment means (averaged over the blocks) are denoted by JI,.,
for i = l, 2, ... , k. The block means (averaged over the k treatments) are given by )1. 1 , for

j = I, 2, ... , b. The symbol JI .. denotes the overall average.

Observe our dot notation for averages. A dot in place oC an index expresses the fact
that we have averaged the observations over that index. For example,

y.

is the average for

block), averaging over all treatment:, in that block. The overall average y. is the result uf averaging over both inc.exes i and j.
We model the observation y,1 i11 treatment group i and block .I through an overall level
and additive effects of treatment i and block j. Using the surnm<.try data in Table 3.6, we estimate the observation y,J by y..

+ (y,. - y..) + (Y., - y..).

The first term is an estimate

of the grand mean; the second term is an estimate of the incremental effect of treatment i;

11 'i-l JNC, DI! 11.R !-N( FS AMONG S!-Vl-.RAJ

TA

111

) -

MF/\r-.iS

57

:\.\'()\'.\ /ri/1/1 /or 1/ic F<m11/0111izet! Complctc /l/ock f:xpcr1111111/


I )cgrcc' of
I rccdorn

'i!lllHl'tlf

S11111 111 \q11.1rc" ,\\

\'.1ric1t1nn

.\1edn '>quarcs

_\,/'-,

I rc,llmcnt

\\( 11()

fi~(y.

y.)"

,\fS(TI~)

Hinck

.'i.'i ( Ill )

k 2:V ,

y_) -

/1

.\l'>(BI.)

1-rror SS( error)

y., + y. ) 2 (k

y,.
I

2:

lnJ.d

M'i( IR )

MS( error)
MS(IH)

MS( error)

M'i(crror)

l )(b - I)

)'.)"

~(}'.,
I

ll-111"

kh

,111d 1he L1'>l leri11 i'> .in e'>tim, te of the incremental effect ofhlockj. The difference bctwecn
the observation and this e'1imate is the error component, y,1
y .. + (y,.
)' ) -+
( )'
}' ) - \'
\'..
}',
. \' ..
J'hl'> model ,dlrn1, 11s to express the de1i,1tio11 of each observation from the grand mclll <ts

y,,

)!. '

y .. ) + (y., - y .. )

(y,.

(y,, - y,. - Y; + y .. )

The first component on the 1ight-hand side compares the treatment mean to the overall
mean; the second compone1rt compares the block mean to the overall mean; and the last
uimponenl measures !he c'rrnr after correcting the observation for its treatment average
and block average.
'>qu.mng the Jett .ind right hand sides of the equation, summing over both inde.xes
1 and/, .rnd LJ<,ing till' foct !hat .di sums of cross produll terms arc 1cro lead to the stfm of
squ.1rl''> dc'composil1on:
k

)'. )' +

L L (Y,,
t

y,.

y., + \' )

,.
lr

'i\/

SS(TR) t \S(Bl) t- ~\(error)

'>im1L1ri)', the degrcc, of freedom crn he p.irtitioned as


( kh

I)

(k

I) 1 (

1)

+ ( k - I) ( h

I)

The sum of squares and their degrees of freedom become part of the ANOVA table 1n
T.ihle "\.7.
I krc, we .ire testing whether the treatment means differ and whether the block mc.1m
differ; thus, there <1rc two F r.. t1os. The relevant stat1st1c for testing whether there arc statis. .-- .
. I I
. - MS(TR)
t1c,1 11 v '>1gnlf1cant
trc,1tment , fleets
1s
g1Hn 111
t ll' ast co Iumn. fl lL' I -stat1stic
)

MS( error

needs to he compared to the percentiles of the F(k ~ I, (k - l )(b

I)) distribution.;\ test

58

'l'l'SJ'ING lJlrrERENCES AMONl; SEVERA! MEANS

!'A I! I I

.\ .

M1r11tab Output: l3locked l:xpenrnent with 1"hrl'c


V1/fercnt In Store l'rvrnotwns

F
P
OF
SS
MS
Source
Treatment
2 89.2 44.60 7.62 0.014
Block (Week) 4 363.6 90.90 15.54 0.001
8 46.8 5.85
Error
14 499.6
Total
at -,1gn iii ca nu.> lel'l~I 0.05 concludes that t hne arc sign Jfic1 n t t rcat lllL'n t cfll'Ll'> if t l11s
/..'>latistil is larger than the 95th pnccntile of that d1stribut.on. The second f stat1st1L
,\/'>(\-IL)

1\il'>(nrm)
(k

I J(h

tesh Im hloLk etfccb ,111d 1s corn pared to the LJ'>ti1 f'l'rLl'lltilc of thL }-( /J

I,

I)) di:.trihution.

Our example rn:1s1dcrs k

3 displavs and b

5 hlolb (weeks). The oh-.cr\atiom in

Tahll' 5.5, together with the treatment and block means, can be used to calculate the sums
of squares in !"able 5.7. We have done tlm tor illustrallon, even though stallst1cal computer
soft\\'arc will be used in practice.
SY!

(52

48.4) 2

SS(IRJ

51(47.2

SS(BIJ

31(56

+ (47

48.4)

t- . . . + (60

48.4) 2 +(51.8
48.4) 2 t (SO

48.4) ) t.

48.4) t-( 46.2

48.4) 2

48.4)

48.4) 2

(46

(49

(3H

48.4) 2

t99.6

89.2

18 .4 ). + (41

t18.4) 'J

363.6
and
SS(error) - SST

SS(TR)

SS(HL)

499.6

89.2

363.6 - 46.8

We use Minitab for the calculations. The ANO\'A table in Table 3.8 was obtained with
the ,\Iinitab command "ANO\'A

>

Two-Way." The spreadsheet rnntaining the data

consists of three columns: column I contains the response (effectiveness rating); column 2,
the trl'atment (promotion) md1L,1tors (I, 2, 5); ,md rnlunrn i, thl' hlt1Lk1ng group-, (\\L'l'k>
I through 5) . There arc 15 rows to ead1 column. Row I cunt,11m 52 in Lolumn \,I in rnlu11111 2, and I 111 column 3. !{ow 2 Lonta1m 47 in ullu111n I, I i11 ullunrn 2, ,111d 2 in t.olu11111
d so forth; the 15th row contains 38 in rnlumn I, 3 in u,lu11111 2, ,md 5 in column 0\.

The ill er in wh1d1 the ro1v'> arc entl'rcd 1s arbitrary.


l"hc ANOVA table shows that the /-ratio for kst111g thl' null hypothl'-.is of 110 treatment
effect IS MS(! R.) ,\/.'l(crror)
14,(1()15 .115
7.62. rhc LJ5th [11.'f"LL'Iltik of the F(2, 8) distri
butio111s4.46. We LOncludc that lhLre ire tre,1t111e11t dilfcrL'llu, hL'L,lList' llll" F ratio is IMgcr
th,111 this perLent1k . V\'c Lome to thl' s.1111c Lt>11clu'>lu11 when look111g .1t the '>111,i\I prob.1bilit:
value Pl l-\2, 8)

7.b2,

0.0\ \. \hen: arc 1.hfkrenccs ,1mong, t\w lhri.:i.: \)rnmotiom, \\"ith

promotion 2 scorinl_!, '>ig,ni\icant\y h1g,hi.:r than prnrnotions I a Pd'> whi(\1,ni.: .1bou\ the same.
'\ <ib\c 3.8 shows \hat there is urnsidernb\e variability anrnng, the weeb \ !~ statistic
15.54, with \)robability value 0.001 ), and 1ha11t b important to IHL\.H\XHatc this vMi<1b1\it)

into the analysis when comparing the three treatments. What would happen if the blocking
effcd were ignored? Cornbining .'>S( block)

S.S( error )

36.\.6

46.8

410.4 with

TESTING DIFFERENCES AMONG SEVERAL MEANS

+8

59

= 12 degrees of freedom , and calculating the test statistic that is appropriate for the

completely randomi zed design in Section 3.2, would have led to the F-statistic

MS(TR)

- -- - - .

MS( error)

89.2/2

-- ---

( 563.6

---- --

= 1.30

46.8 )/l 2

with probability value P[ F(2, 12) > 1.30 ) = 0.31. An analysis that ignores the week e ffects
would have made the error of accepting the null hypothesis and concluding that there are
no differences among the treatments.

Note. The analysis in this section assumes that we have exactly one observation at each
factor-block combin ation. Th e Minitab command "ANOVA > Two-Way" fails if th ere are
any missi ng observations. In this situation, one needs to use the general linear (regression)
model approach (Minitab command "ANOVA > Genera l Linear Model") to analyze the
data; this will be discussed in C hapter 8.

3.4

CASE STUDY

Thi s case is ad apted from Clarke ( 1987) . Additional relevant details and further analyses are
discussed in Case 11 of the case study appendix.
Researche rs at th e United I )airy Industry Association (UDIA) evaluated the results of a rece nt field experiment to test the impact of varying levels of advertising on the sales of cheese.
The principal objective of the study was to measure the retail sales response (pounds of cheese
sold) to varying levels of advertising. Four test markets, selected from differe nt geographic regions, were used in this study. Executives determined the levels of advertising to be tested
in the experiment. It was believed that the levels should be distinct enough to generate measurable differences in the results . They decided to test the impact of four levels of advert.ising:

Ocents (level A), 3 cents (R), 6 cents (C), and 9 cents (D), all expressed on a per-capita basis.
The 6-cents per-capita level represents a national campaign costing approximately$ L2 million (in 1973). The principal medium for advertising was television, with point-of-purchase
display materials in stores and newspaper ads playing a secondary role. Each of the four levels of advertising was impleme nted within each test market during a 3-month period between
May I 972 and April 1975; see Table 3.9. The sequence in which the advertising levels were
tested was selected so that each adve rtising level was used in only one test market during any
one ti me period. You ca n check that each letter in Table 3. 9 (A, B, C, D) appears only once in
each column and each row. Su ch an arrangement is referred to as a Latin square design. In
Case 11 of the case study appendix, we will discuss the analysis of observations that originate
from a Latin square design, and we will illustrate that this design can be used to further isolate
a possible time effect. However, for the purpose of this illustration, we ignore the time effect
and assume that the observations are from a blocked experiment that studi es four treatments
(A - D) on each of four blocks (test markets) .
Within each market, UDIA executives obtained the cooperation of approximately 30 supermarkets in obtaining quarterly audits of cheese sales. The average cheese sales (in pounds
per store) during 3-month pe riods between May 197 2 and April 1973 arc listed in Table 3.9.
The A NOVA ta hie and the treatment and block means are given in Table 3.10. The results
show that there are large differences between test markets (with highest sales in Rockford

.1. 9

TA ll l I

L'VIA Study ie>l Markel:;, 'Jrcalml'nl>, 'Jest J>cnvd>, and /fr_,wb u\n'rXL' 'lab /'er )loll'
Jl"il

!'v1a1 I uh 72
Au lkt 72
~\n
Jan . , ~
leb Apr 7.\

\1:\ HI-. I l"""

l<Dd,JorJ

.\lbuquL'llJUL'

( .h,lltdllD0.1

Ji

/)

H
\

A
Ii

;\

/J

/)
Ii
\

/J

Bill(" (1 1sl MAHK1-1)

~alc>/~tor~

Treatme111

Binghamton

RocklorJ

Albuquerque

7,360
7,364
8,049
9,010

I :l,15.l
I I ,258
I .l,880
I l,147

I I ,852
12,089
I l,KOtl
11,15()

c
[)

("hat t.tnoua
7.557

7,900
8,)01
7,77fl

'J AB I I J. I 0
>\.\'()\'A /'able: L'J>IA Stwfr

Source
Treatment
Block (Market)
Error
Total
Treatment
A
B

c
D

DF
SS
3
1917416
3 79308210
9
4380871
15 85606498

Mean
9980.5
9652. 8
10557.5
10345.8

MS
639139
26436070
486763

Block
Binghamton
Rockford
Albuquerque
Chattanooga

F
1. 31

54.31

0. 329
0.000

Mean
794 5. 8
12859.5
11797.8
7933.5

and Albuquerque), and that sales increase with Ihe amount of advertising.

1 lowevcr,

the

advntising effects arc not statistically signi11cant ( probabilitv \,due 0. \.2tJl. V\'c will rcvi.,it this
in ( .ase 11 of the case study appendix and invcstlg,tte whether the u.,e of tirnt' a.'> an <tdd1tional
blmh.111g variable d1anges our lOilLlu,1011s u11 thc s1gnif1L.tl1Ll nl thL t1cat111c11t L'lil'Lls.

3.5

NOBODY ASKED US, BUT . . .

i'hL tL'fm unulys1~ of l'l1rwnu: (At\OVA) gl\ cs no indiLation th,1t the prtllcdurc is about
LOllll'<ll'ing means. But as we have '>l'en 1n this Lhapter, Wt' te-,t whether '>l'\l'ral means differ
b) n11nparing vari<1I1Ll's. 1 he/.. distribution l'lays the key rok 111 the analy-,is, and pcrhap.,
not surprisingly, Fstands for hshcr, the most important statistiLian ol the 20th century. But
hsher did not invent this distribution; it was derived bv c;t'orge '->ncdecor, who named it I
to honor Fisher.
In the randomized complete block experiment, the lcrm block corm:., from the origins of
this design in agricultural studies. Blocks were created by aggregatll1g Lonliguous parLels
that were homogeneous in terms of soil comosition and hence fertility. Fisher described
these kinds of experiments in his hook '>tatistirnl Metlwds for lfrscun Ii I\'orkcrs, which wa-.

published in 1925.

Tl SI'"'' !>IHI Hl'N( lS \\IONG SrVFHAI

MFANS

61

EXERCISES

Exercise 1 c:o nsidl'r thl' d.it,1 from a L{)mpletclv randomi1ed experiment (Table 1.2). hpre.ss the deviation of the ohsenat1on from the overall mean as
v) -'- (y,,

(I',

\'

)',)

!'he first component on the right-hand side compares the treatment mean to the mTrall
mean; the second component expresses the within-sample variation. Take the square of the
c\pression, sum the squares mer both indexes i ( i = 1, 2, ... , k) and j (j = I, 2, . , . , 11 1),
and pro\'C the sum of squares decomposition in Section 3.2.4:
!

2: 2: (y,,

....,1
li11;il '-,11rn n! ...,qu.irn

~hell\

L 11,(Y,

)')

r)' +

2: 2: (r,,

Y,)"

SS\.1\

\\Ii

SS Within ( 1rotll''

th.it the .sum of the Lross products 1.s 1ero; that 1s,

'

'\' -..J
'\' (1' I

....._,

l' )( 1,,

Y,)

()

Exercise 2 C:onsidn the d; 1 l,1 from a randomi1ed wmplcte block e\pcriment (Table 3.6).
I q1rL'Ss lhL' dc\i,11ion of the ,1hscrv.1tion frnm the overall mean <lS
( )',

)'..) + (y.,

y .. ) r (y,,

y,.

)'.,

y )

lhe first rnmpontnt on the right-hand side compares the treatment mean to the overall
mean; the second tomponc it compares the block mean to the overall mean; and the last
component mt'<lsures the L'f'rnr after correcting the observation for its treatment .wer,1ge
.111d block average. !)rove the sum of squares decompo.sition in Section 3.3:

'

2: 2: <>"

I/

)' )2

'

,,

'12:(1'.

)' )2 + k L(Y .

v.

l' +

2::
r

II
!nt,11 '-.11111 n! '-,q11.irn

/,

2:(1,,
I

y,.

y., + )'. )

I\

\'

B!ntk

Frrnr

Exercise 3 You study the monthly amounts seventh- and eighth-grade boys and girls
spend on entertainment such as movies, music CDs, and candy. Representative samples of
children within the Iowa City school district were selected, and children were asked about
their spending habits. The following results were obtained.

~.llllJ'lc Sill

1\lc,111 f SI
~land.ml dt'\'i.111011 (~

(a)

"th C,raclc Bovs

Hlh (,radc Bovs

7th (,rade (,iris

.'10
20.1
11.0

2'i
23.2

.'10
19.6
5..'l

S.6

Hlh Crade

(;1r1,

25
25.0
7.0

lest whether or not th l' four groups differ with respect to their mean 'ipcnd111g
amounts.

62

TES'l IN(; DIFFERENCES AMON(, SEVl:KAL MEANS

(b) Follow up 011 your analysis in (a) if you find differences. In particular, assess
whether there are differences in the mean spending amuunts ur seventh- and
eighth-grade boys and in the mean spending amounts of seventh- and eighthgrade girls. ~!~est whether the yearly changes in the mean spending amounts differ
between boys and girls.
Exercise 4

Your goal is to determine the effectiveness of four differcn l TV spots.

(a) To avoid program overlap, you select four different market regions, :rnd you assign
one of the four TV spots to each region. The programs are aired for one month,
and sales of the advertised product are recorded in 16 stores in each of the four
markets. Store-specific sales for the previous month are also available. Discuss how
you would analyze the data to learn which of the four TV spots is preferable. What
additional assumption do you need to make to infer that the winning ad would
also work best in future months?
(b) Assume that all your stores are in a single market, and that all fuur TV spots must
he aired in this single market. You decide to run the sputs in four comecutivc
months. You collect sales data on 16 stores, with each store being observed under
,111 four TV spots. Discuss how you would analyze the d . lld tu learn which of the
four TV spots is preferable. Discuss the differences to yoc1r earlier strategy in (a).
(c) Discuss Lhe advantage and the danger of your design in ( b). For example, would
1our analysis he affected if sales were seasOJ1aJ? Discuss wavs of i11Lurporating
known seasonality into your analysis. ])iscuss ways of blocking thl' experiment
with respect to stores as well as months.
Exercise 5

See Exercise 11 in Chapter 2. Three different fabrics (/\, H, C) arc tested on a


wear tester that can compare three materials in a single rnn. A wear tester is a mechanical
device that rubs a material against an object. Our particular machine has three attachments
that allow the comparison of three materials. Variability from one run tu the next is expected. J lowever, within the same run, the conditions for the three fabrics arc fairly homogeneous. The assignment of the fabric to the attachment is randomiZL'd in each run. The
weight losses (in milligrams) from JO rum arc given in the following table.
Test whether or not the average weight losses of the three fabrics are the same. Which
fabric(sJ has the lowest weight loss? What would happen to your conclusions if you ignored
the run effect?
RUN

----

Fabric

A
B

JO

--

.\ 6

26

31

J8

28

J7

n.

_1,1

25

JU

.l9
40

27
28

35
34

42
43

_\J

39

21

37

34

30

39

22

J6

28
27

33

Exercise 6 The female cuckoo lays her eggs into the nests of fos1 er parents. The foster parents are usually deceived, probably because of the similarity in the si1.es of the eggs. Lengths

I'
I

Tl 'iTIN(,

1>1111

IUNC E'> AMON(, SFVI RAI

MHANS

6.l

of cuckoo eggs (in millimeters) found in the nests of hedge sparrows, robins, and wrens are
shown helow.
22.0, .23.LJ, 20.LJ, 2))1, 2'i.O, 24.0. 21.7, 23.8, 22 .X, 23.1, 23.1, 2.L'i.

2.HI , 21.0

Roh111:

21.~.

23.0, 2.U, 22.4. 23.0, 23.0, 23.0, 22.4, 21.LJ, 22.3, 22.0, 22.fi,

22.0 22. I, 21. I, 23.0

\ \' ren:

1 LJ.i-:.

22.1, 21.5, 20.9. 22.0, 21.0. 22.3, 21.0, 20. \ 20.9, 22.0,

::io.o,

20.8, 21.2, 2 I .0

It i, hcl ll'\l'd that t hl' 'ill' oft Ill' egg i nnuc'nLeS the female lUCkoo in her selection oft he foster parent. Do the data support this hypothesis' Test whether or not the mean lengths of
cuckoo eggs found 111 nests of the three foster-parent species arc the same.
Exercise 7 The plant manager wants to investigate the productivity of three groups of
workers: those with little, average, and considerable work experience. Since the productivity
depends to some degrel' on the day to-day variability of the available raw materials, which
.1ffects all groups in ;1 similar fashion, the manager suspects that the comparison should he
hlockl'd \\Ith re'>pCLI to d;l\'. J"he re.suits (productivity, in percent) from flve production days
.ire given in the following table:
[l,\Y
J .XptT\l'llU..'

57

I\

60

(.1 J

62

04

f>O
M
69

t\re there dilfl'rcnn, 111 thl' mean productivity among the three groups?

(h) Has the blocking made a difference'


t\ feedlot opcrat1ir w.111ts to compare the effectiveness of three different L.lltlc
feed supplements. I le select'> J random sample of 15 one-year old heifers from hi., lot of
more than 1,000 ,rnd d1v1dt'' them into th rel' groups at random. Lllh group gets a differl'nt
feed supplement. The weight gains over a 6-month period are shown below. One ilL'ifer in
group \was ill.st duv to an .JLtident.

1'.xcrcise 8

C rroup

1\
II

soo, 11:>0. >.>o. 680


700, 020, 780, 8.\o, 8110
'iOO, 020, -!DO, 'ihll, 110

(a) Arc there differences in the mean weight gains of the three feed supplements'
(b) Ifynu could <.,1zi1t the experiment over, would you suggest improvements that
would help nrnkc the comparisons more precise? What about a blocking arrangement on initial weight?

11\'0-I I VI I

IA( 'IORIAI

FXl'FRIXHNTS

65

the f.rncier font works better 1\ith the blue background' These arc the kinds of issues we will
address in this chapter.
In this and the next two chJptcrs, we discuss experiments where each factor is studied at
1ust two lcn'k In ( h.1ptcr'> 7 and 8, we consider the more general case when a factor may
h.nc more than two il'veb, t(ir example, four different lc1cls for ad copy, three background
color.'>, and two different font'>.
4. I. I Basic Term'>
\\'e '>1.1rt hv dl'fi111ng somL 1111port,rnt tnms.
lhc /11(fors .lrL' thL 1,1r1,1hk-; whose effrct-; arc he1ng '>tudicd. In the .. dvcrt1s1ng l'\j1LTI
mcnt d1..,u1sscd earl1cr, the f<tdor.., are the ad copy, the lont, and the background color. In
.in 1nduo;tr1al expcrilllcnt, till' f.ictor.'> might he tcrnpn.1turc, pressure, and the type ofLhernical c.1t,ilysl. In an agricultur;1I experiment, the factor'> might be type of seed, type of fertil11cr, and the amount of water whereas in a m.1rketing experiment, the factors might he the
rnlor of the box, the price, and the dollars spent on advertising.
The levels arc the 'pccified values of each factor. /l.s noted earlier, initially we will focus
on 2 lenI designs; that 1,, each factor is set at one of two possible levels. for example, in the
marketing expenllll'llt, thL h1t\ 1s either red or blue.
The response vrmahlc 1s tl1c performance measure, the dollar sales in the marketing expcnml'nt, or the number of bushels of corn in the agriculture expenmcnl.
1\ run 1s a partirnl.ir expenmcnt with each factor at a specified level.
Each factor may he co11ti1nous or categorical, and as we will show in this chapter, the dist1nclln111s important. Lictor, such as temperature, pressure, amount of fertilizer, price, and
dollar'> -;pent on ach ert1'.ing .ire continuous. Factors such as the type of ad copy, font, background color, uilor of the hox, and catalyst arc categorical.

4.2

AN EXAMPLE: REDUCING THE NUMBER OF CRACKED POTS

rhe following example illustrates some basic, important concepts. /I. company manufactures clay pots that arc used lo hold plants. For one of their newest products, the comp.rny
has been experiencing an un.1cceptahly high percentage of pots that crack during the m.rnufacturing proces'>. Companv production engineers have identified three key factor.., they
helic1L' will .iffect nack111g, ,1;1d they decide to run an experiment to k.1rn about the most
llllport,lllt !.Jltor('>J. J"Jie f<iL tOr'> studied .ire thl' pe,1k temperature Ill the kiln, the r,llL' at
"hich pots arc cooled .1ftcr bl1ng heated to the peak temperature, and" coefficient th.it dc'-LTihco; the expamion of the cL1v pot. A higher peak temperature can he expected to reduce
the perc:entage of uacb, hut he higher temperature al'o increa-;cs operating costs. ( ooling
the pols al a faster rate would mean an increase in the number of pots produced per hour,
hut it could also 1ncre<1sc thL percentage of cracked pots. The coefficient of expansion depends on the composition c! the clay. /I. supplier has offered the company a new clay mix
that 1t assert.'> has a lown coefficient of expansion. The mix is being offered at the same price
as the raw materi<d that is currently used. A lower coefficient of expansion should decrease
the incidence of cracks.

66

J__ l'\V0-11'.\'l'l

lACTOlllAI

EXPFRIMENI

'>

'l AH I I 4. I
The lhrce ractun Utlli Their l.cvcls
I

I ~ \:

l: L

l.ictors

ll.Hc ol cool111g I<)


Tern pcrature ( I)
f'oefliuent of

C\f'J'"""' Cl

~low
2000~
J.O\\..

List
201>01
fl1gh

The firm wants to determine how changt's in tht'se thrct' laLtOh 1,ould ,iffell the per
CL'lllagL' of cracked pots. and the product ton cngt11eer-, decide to L'\.pernncnt with e.1L11 f.tL
(or .11 two levels. 1'!1c current settings arc lower peak tc111pcr.1lurc, '>lown u1ol111g rate, ,rnd
highn coefficient of cxpmsion. In expenmcnt,d cks1gn, \Vt' usL c1 stand.trd notation with tht
low Jc,el of each f~1Llor cknoted by 111inu'> ( ) and the high k"L'I of each faLlm denoted li) '
plus ( ' ). Leiter in 1 hts chapter, we wtll discuss ho\\' these low .ind high lcvd., would be de
term incd in particular situations. lab le '1.1 lists the thret' factors and their levels.

4.2. l A Common Approach to Experimentation: Varying One ractor at a J'imc


1\ 1cry common but inefficient approach to stud71ng the cfccts ofk (here~
3) factors
is to carry out successive experiments in which the levels of each factor arc changed one 111

a lu11c. Such experiments typically start with the current settings of the (actors and begtn

b!'

changing the level of the one factor that is considered the most i111port.1nl. The res po mes at
the low and high settings of this one factor are compared while keeping all other factors
fixed, tnd if there i'> a difftrence, the level at which the rc,po11'L'" !JL,t 1' lockcd 111 f(ir the
ne\t stage. !'he factor that is considered second most import.nt is vc1ried next. Ag.1in, re
spoml''> .it the low and high levels of this factor are compared, rnd the best level of th ts foe
tor, i(tht'n' is a differemc, is locked in f(>r ,t/l ,uh>cqucnt rum. I ht'> proLess u1nt1nuc., un
ti! the last factor is reached.
[ ,ldllr ('(coefficient o(expansion ofthl' d.t\') \\'as COnsidncd nlO'>t LrltlLa/. 1\ppi)-'illg thi-,
approach, we first set I<
slow and I
2,000, then e<1rry out 4 run'>'' 1th (' low ( ) and
4 ru11' ''1th C
high ( . lach ru111s the ll'>U.d product10n h.~ tLh of 100 poh. 'iuppme th,lt
the pm portions of -:racked pots were 5, 8, _) and 8% for the 4 Ill 11'> .it the low ( ) level
Jnd 15, 12, 11, and 1610 for the 4 runs Jt the high ( +-) levt'I oi C, resulting in averages of 6
and I ).5%, respectivelv. The (positive) difference of 7.5% inuicates that it is better to set
factor ( at its low level. But tt would be premature to conclude th,1t the Ill'\\ cl<I\' mix with a
low coefficient of expansion (factor CJ decrea'>e'> cr;1cking in gcner<tl. i\l thi-, point, all we
can sa1 is that we ob-,ervcd a difference of7.'i% at a particul.1r temperature, I
2,000 ( ),
and .i partiLular coolrng rate, N.
-,low ( ). \\'e don't Kill>\\ ii the .'>.tnlc' re'>ult would hold
for ll'mperature r 2,060 or cooling rate u f,1sl.
Temperature was comidered the second most LriticaJ fallot. \il'\t, \\'L' '>et the coefficient
ofexpansion C .it its low (best) levt'i, and fix the cooling rate U at slow ( ). \\'e need to com
pare 4 runs with F
2,000 and 4 runs with F
2,060. We ,iJrcad1 hall' 4 rum with T
2,000. ~o, we do 4 runs with I
2,060. :iupposc we found I
2,060 to result in the bettt'r
respolhl' (fewer ua ... kcd pots).

ur ("

J'\\'<l

11\11

TABI 1-

I\(

IClRIAI

lXl'l;l~l"v1F'1

IS

67

4.2

ncs1g11 .\!fatri.>. f(1r the (~racked Pots f'rohlrrn

Run
(st.rnd.1rdor.lcr)

Rate of
Cooling
II

r cmperaturc
I

C:ocffinenl of
Expansion

2,060) ,rnd the rncffiLirnt of expansion (C

lo\\') at

their better setting., thc11 u1mp<1Jc I rum with R sl<rn to 4 runs with R fast. \'\'e .dread)
ha\'C 1 run' with N ,Jow, 'n we add four runs with/?
fast.
lh1>-<1pprn.1Lh ofli1angi11g one factor at a time requires 16 runs i\s a result of these
16 runs, we would onlv knmv the effect of each factor at one particular combination of set
tlllg'> ol the otlwr t\\'o. \\L' \\'ould not kno\\' .llwthing about interactions ,1rnong the l.tLtor~,,
hir cx.implc. we \\'ould not know whether the effect ol Lhanging tern pl .iturc from 101' to
high depenlb on the lcnl ol the cooling rate. II such interactions are present, the C\peri
mcnt ofLh.rnging ont laLtor ,1t .1 time Lou Id lead to the wrong comlusions, because it might
not 1(kntil1 till' bL''>t .'>L'lt1ng' !'or the foctor .... \\'e di..,cu<,<, the shortcoming.., of this approach
in more dl'tail in 1\ppendix L l.

4.2.2 A Better Approach: Changing Factor Levels Simultaneously


'>11 l{on,ild I 1<>hn I '!\'ii ,howed th.ll a hcttn approalh is to vat) the factors sim11/1111u'011,;/y and to studv the respome at each possible factor level combination. With three fac-

tor'> at t\\'O lei-cl., each (as 111 our example), his lactorial design requires just 8 run., fewer
than the l61n the earlier appro,1ch of changing one factor at a time. In addition to the economy of fewer runs, the factorial design provides estimates of possible interactions and thus
produces more information.
Table 4.2 shows the 8 runs of the factorial design with 3 factors at 2 levels each. \\'e use
minus ( - ) and plus ( +-) signs to represent the low and high levels of each factor.
I he 8 run' arc Ji,tcd in the rn callecl stancl.ird order. For example, in run I, all three l.1ctor" arc at their low ll'vcl,, while in run 8, all three foctors are at their high levels. The design
rnatnx in 'itandard order is easy to construct. We start with the first fa( tor (in this c.1'-L'. R)
with a minus sign and altern<lte the signs until we complete the column for the second factor('/'), we . . tart with two mi, us signs and alternate the signs in groups <lftwo. For the third
lac tor (C), we start with four minus signs and alternate the signs in groups of four. This pro8 factor-level combinations. These 8 factor-level combinations c.rn
LCdurc gives u" all 2\
he rcprT'>l'Iltl'd a-. the \'lTtiLe.s of,1 cube; sec rigure 1.1.
i\ faL!ori.1! dc,1g11 \\'Ith 011!)' two factor,, A and fl, h.1s two columns, one for each l.1L tor,

.rnd .J rum. The sequence

makes up the first column, and the second cnlumn

6H

TWO

Ll'Vil

FA< IORIAI

l Xl'i'IUMFN IS

Run,
(

,t,11

Run K

~ 1~.)

(Run.. ~ )r

_/
Run n
(+, ,t)

l"'"'

Factor l IC )
Run .l
'
'

Run I
, , )

'

I
I

l 'aLior 2 IF)

laL!or I I i< J

Run 2
(i,

Figure 4.1

contains

C.raphical Repre-,entat1011
2 Levels

ur the !{um Ill ,I htdorial lk-.1gn with 3 I ,\l.lor-., I ach al

+ +-.The four factor level combin,1tions in this factorial design, arranged in


,H
), (A
+, H
), (A
,H
t ), .ind (A
+,Ii
+ ).

standard order, are (A

This method of generating the runs is easily extended to ,rny number of factors. With
16 runs. h>r the first laLlm IA) we -,tart with a mi four factors, the design consists of .2 1
nus '>ign and alternate "gm until the lcnb of,tll I Ci rum h,t\L' hecn specified. l'.tdor Ii '>la rt-.
with twu minus '>lgns and <tlternalc'> signs 111 groups of two. hH lur ( '>larb with four rnrnus
sigm <llld alternate' -,igns in groups of four, .111d fall or I> h.1-. light 1111nu-. -,1gm lollo"ed b\
eight 11lus signs.
i"hl'> rnethod of gennating the runs 1n '>tand.1rd order 1s u-,lul ,1-. It hLlp-, L'n-,ure th.It 110
co111h111.1tions ML' mi-,sed. I lowevcr, i11 L.1rry1ng out the l'\peri111L'llt, it i-, L'ssc11t1al tu
perlorm the rum in random order. ['his ra11do11111atio11 is important hccau-.e there rn,11 he
add1ti1111,tl factors not 111duded Ill the e\pe11111L'nt th,1t could 111llUL'llLL' the rL-,ult-.. h11
exam pk, there ma} be a da7 of week cfted or other unknown f~,ctor-, th<tt change with time.
By randomizing, we ensure that the effects of these lurking or noise variables arc distributed
randomly across the factors. A simple way to do so is to put slips of paper' into a box (num
bercd from I to 8 in the case of three factors) and draw them randomll', carrying out the
rum in the order in which the slips were drawn.

4.3

THE TWO-LEVEL FACTORIAL DESIGN

In th1.-, ,1nd the next two chapters, we fou1s on 2 level designs. In ChaptL'l"' 7 .tnd 8, we will
extend the methods developed here to cases \\here fallors h.iH more than two kels.
We use the notation 2' to designate a factorial design with k factors, each having two
levels. Such an experiment requires a total of 2k distinct runs:.~ 4 runs fork
2 factors,
2'
8 runs fork - 3 factors, 2 4
16 runs fork = 4 factor,, <1 id 'ill on.

1 \\' 0 - I I \' I I

I ;\l 1 C1 R I A I. F X /> f- Ill M p,JT S

o9

TA HI. I 1J. 3
A 2' Fr1rtnrial I Jcs1gn Motnx and Results for the Cracked Pots Problem

--

Run
(standard cmlcrl

Percclliage of

Pots wi1h C:racks


6
12
6

16

5
6
~

-+-

'

'~

16
34
14

14

Let's return to nur discussion of the ceramic pot examp le, Assume that a 2 1 factorial experiment has been run, resulting in the data shown in T:1ble 4.3.

4.3.1 Calculating Ma in Effects


The main ejfect ofa factor is defined as the change in the response variable when the level
(low) to+ (high) . Thus the main effect of the cooling rate
R is the change in the percentage of pots that crack when the cooling rate is changed from
slow to fast. It is the average percent of cracked pots at the fast cooling rate minus the averof the factor is changed from

age percent of cracked pots at the slow cooling rate. That is,
cooli11g rate effect: R

l 2 + 16 +- 34 + 34

6+6+16+14

24 -

l O.S

I "'1.S

In T;1hlc 4. ), notice that 12, 16, 34, and 34 ;ire the responses when R is -+ (runs 2, 4, 6, and
H, rL'Sf1ectivcl; ), \\'hilc h, h, 16, ,incl 14 arc the responses when R is - (runs I, 3, S, .rnd 7,

respectivelv).
lhc rn.iin effect of temperature 7 is the average percent of cracked pots at the high(+)
level nf factor Tm in us the average pcrccn tat the low ( - ) level of T,

temperature effect: T

(1

I 16 + 14 + 34

6 + 12 + 16 + 34

17.5 -

17

0.5

The main effect of the coefficient of expansion C is the average percent of cracked pots at
the high (+)level minus the average proportion at the low (-)level,
16 + 34 + 14 + 34
expansion effect: C = - - - - - - - - 4

6+12+6+16
4

24.5 - 10 - 14.5.

Note that we arc using the same notation for the factors (R, T, C) and their esl'i111ated
111ain effects. Jn most cases, it will be clear whether we are referring to the factor or to its
csti111atcd effect. Jn cases where there is the possibility of confusion, we will introduce sepdfatc not,1tinn for the cstim.1tcd effects hy putting parentheses around the factors, such as
(!?.), (T), and (C).

4.3.2 Calculating 2-Factor Interactions


NC Interaction
As 1\l' will show, I he effect of cooling rate is not independe11 of the coefficient of ex pan

sion. There is an interaction between these two factors, and we ienote it b\' RC. We estimate
the interaction between the cooling rate and the coefficient of expansion by comparing the
effect of cooling rate (factor R) at the two levels of the coefficient of expansion (fallor C).
\\'ith coefficient of expansion at +,the change
cooling rate is changed from its low (
34

34

16

1t1

the pcrLent of cracked pots when the

) to high (+)setting is

14

19

34

In L1ble 4.3, notice that 34 and 34 arc the rcspomcs whe1 (


and 8, n:spectively), and 16 and 14 .in: the response'> when C
and 7, respectively).
\\'ith coefficient of expansion at
cooling rate is changed from its low (
12

t-

lh

6
2

' and /~
and /~

(rum 6
(rum 5

, the change in the percent of crad,cd pots when the


) to high (+)setting i,

14

The effect of the co1lli11g rate is much greater when the

Ind i] LJ%) than when 1l 1s c1t the

uici'fiu~nt

of L\pansion 1' ,1l the

level (8
II the Loefl1LIL'nl ol e\['dl1s1011 1s high, 11e L'X
pecl more cracks when the cooling rate is lllLrcased from the 'Im, r<llL' lo lhL List r,JlL'.
11,o).

By convention, tl1e interaction between the two factors is dLfined a' one half of the dif
fcrcnce between the average cooling rate efkLl with uiefficient of cxpam1011 ,1t ' ,1nd the
<IVL'rage cooling rate effect with coertiuent of expansion ,11 . I hu' the 111leradio11 between
factors/~ and C denoted by UC 1s given by (I LJ
8) 12
5.5. I .,Iler 111 tlm Ll1apter, we will
show how to deter111111e whether an cffell is stati.,tically signitiL,lllt I or no11', let Lh "''u1m
th<tl this interaction is statistically s1gnif1cant and not the re'> ult of random variation.
The square diagram in Figure 4.2 and the interaction di,1gr,1n1 1n 1 1gure 4.3 reprc,ent
convenient ways to compute and display an interaction. The square diagram lists the aver
age response at each of the four possible combinations of settings of factors J< and (. l:ach
of the four numbers 1s the average of two responses. ror example. when both the coefficient
of expansion (C) and cooling rnte (U) are ,It their + levels (runs 6 and 8), the average response 1s (34 ' 34)12
34. When the LOel'iiL1ent ol expans101 < 1s .it thL lm1
le1el, the
b
8, hut whl'll C is at tlw high ( ) le1cl, the elil'Ll ol
effect ot the cooling rate N is 14
the cooling rate J< is 34
15
19.
I hl' interaction 1s shown graph1call} 111 the rnterallion d1c1gram 1n l 1gure l.3. I !ere, IH'
connect the average response at the low and high levels of the Llloling rate, and we do this
separately for e<1ch !eve! of the coefficient of expansion. Not1LL' that thl' two lines have different ,]opes, reflectmg the fact that there 1s an interallion. If there were 110 1nteractio11, the
two lines would be parallel or nearly so.

TWO-I EVEI FACTORIAL EXPEIUM RNTS

71

' 15

< llcffic1en1 nl
CXJ1<llllll!ll

("

- 6 -----------+

Cooling ralc ( Pl

Figure 4.2

Square I li,1gram :-,1iowing the CR I ntcraction

J(l

_:;:
~

25

s 20
~

&;-

1:1
[(I

--,

------

---------~----

+
Cooling rate (P)
- - l'xf'il 11s1on al high

Figure 4.3

(I)

level

Expilnsion

at

low (-)leve l

lntcrnction Diagram Between Coefficient of Expansion and Cooling Rate

RT Interaction
The square diagram for I his interaction is shown in Figure 4.4. With temperature -1 <ll-+,
to + is
the effect of changing the c0<1li11g rate/? from
16

+ 34

+ 14

25 - 10

15

With temperature Tat - , th "~ effect of changing the cooling rate from - to

12 + 34
2

+
2

16

23 - 11 = 12

15 - 12
The R'J' interaction is one-half of the difference, which is - - 2

l.5.

+ is

72

TWO-LEVEL FACTORIAL EXPERIMENTS

25

1e mp er ature ( n

-23

Figurt' 4.4

Square Diagram Showing the Vf Interaction


II -

Te mperature

-2 1

Crl

+
Coelfairnt of expansion (CJ

Figure 4.5

Square Diagram Showing the TC Interaction

TC Interaction
The square diagram for this interaction is shown in Figure 4.5. With temperature Tat
the effect of changing the coefficient of expansion from - to
14

34

34

16

\t\'ith lL'mperaturL' 'f'at

16

24

11

+,

is

13

, the effect ofd1angi11g tliL u>eCllLic11t t>l L'\ll<tmio11 fro111 - It> +- i:-

6 t 12

25

16

Tlw 'IC interaction is one-half the diCfrrcnce, which is

IJ

] Ii

,:;,

4.3.3 Calculating a 3-factor Interaction


!11 this 3 -factor example, there are three 2- foctor interaction s ( J~ C, J(J, 'fC ) and one
3- foctor interaction, denoted by RTC. A significant 3- factor interaction mcam that the

TWO I FVlI

lA( TOl<IAI

FXPFT<l.'v!FN

rs

73

2-foctor interaction between any two of the factors depends on the level of the third
fiictor. Fquivalcntly, it means that the effect of changing a particular fador from
to -t
depend-; on lhc level<, of the ithcr two factors. The 3 factor interaction is calculated <ls
fo 11 O\\s:
I ind the JU 1ntcralt1on with the coefficient of expansion Cat '

\\'ith T
With

(and C

J ),

the effect ofR

(and (

t ),

the effect of R

Therefore the /ff interaction with C !'ind the

\\'ith T
\\'ith .,

In

is

14

14

20.

.14

16
18

18.

20

I.

1nteractio11 \\'ith the coefficient of expansion Cat

"

(,1nd (

), the effeLI of R

I.ind (

), the cfkL Inf/?

f herctore the JU inll'rat .1<rn with C

16

10.

12

6.

IO

IS

2.

I he 3 fad or interaction ,., defined a., one -half the difference of thc.,e two 2-factor

1ntcra<..1ion.,, N IC

O."i. 1\ .., we 'how later, the 3-f<1ctor interaction 1s not

'J(i s11111marizc, we L.1k11l.itcd the 2 factor interaction RT with Cat

+ and the 2-factor in -

teraction NJ with ( at
, then took one half the difference. Note that the chrnce of
which 2 factor interaction to use in the calculation is arbitrary. We would have obtained the
same result by taking h.1lf thc difference of the 2-factor interaction RC with Tat + and the
2-factor interaction r<.c with rat , or by taking half the difference of the 2-factor int~rac
tion TC with Rat + zind the 2-factor interaction with U at - .
If there were a fourth fal.lor (',ay, factor[)), we could also calculate a 4-factor interaLlion.
1he 1 f,1clor rntcractron /ff(/) 1s one-half of the diffcrrnce between the 3-factor intcr.1Ltion !<. /{ lalculated when the fourth factor I> is at its high (+)level and the 3-factor rntcr.iction /.IJ( when/) 1., .it 11' low ( ) level.

4.3.4 A Simple Method for Calculating Effects: Using Calculation


Columns Obtained by Multiplying Signs
Lan .1h"1p c'>ti111.1IL' till' effects from their dcfinitiom as we have just done, hut doing
so would lw cumber-;ome. Fortunately, there is a much ;,impler way. Comider the cxp<tnded
\\L'

design matrix .,Jrnwn in T.1b ,l' 4.4. \'\.'e have added four so-called calculation columns th,1t
.1llmv us to cstim.ite the intcr<1ctions simplv and directly.
The signs in the added co urnns (RT, RC, TC, RIC) were found hy multiplying the srgns
in the design col um m ( R, F, ( ') row by row. f<or example, in the R7 column, the sign in the
fir'>! r<l\\ ( 1 ) j., the product ,,f the fir-;t row of the R column, which is - , and the fir.,t row
of the I column, 1,hich j.,, lso . Simil.irly, ohscrYe that the sign in the seventh 1,i11 nf
Lolurnn /ff(' i.'>

, ,1., it is the product of

(for/<.),

(for T) and + ((or CJ.

74

TWO! EVEL fACTOR!AI. EXf'l'RIMl'NTS

--~------------------~

~--~--------

TABLE 4.4
'J'able of Signs for Calculating FJ]ects in the 2' Factorial /)esw1: Crocknl l,ots t:xamplc

PerLL'ntagL'

u!'
l\u11

Ii

VI

He

re

}(/(

Put.~

"''th (

Jdcb

(l

12
-r

(l

J(l
j(l
(1

34

t-

34

14

We obtain the main effect of R by applying the signs in co umn N. to the res po mes in the
last column. We have
-6

main effect of R

12

l6 -

16

34

54

14 t 34

13.5

The minus and plus signs in the numerator of this expression correspond to the signs i.n the
first column. We divide the linear combination of the responses in the numerator by the
number of plus signs, which in this case is 4. This expression is equivalent to taking the av
erage of the four responses with 1\ at + ~rnd subtracting frori it the average of the four re
spomes with R at , which is how we defined and calculakd the main effect previously.
Simil~1rly, for the two other main effects WL' have
main effect of T

main effect of C

-6 -

12

16 -

16 - 34

14

34

12

16

- 0.5

16

34 i

14

58

34

14.5

interaction is obtained bv applying the signs in the /(T column to the responses
i11 the last column and dividing the result by 4, the number of plus sigm in that culumn.
We ha\'c

The]({

R7 i11tcractio11 =

12

-+

16 + 16

)..j

ll +- 34

1.5

Similarly,
6 - 12

RC interaction = -

6
TC interaction = -

16 -

16

+ 34 - 14 + 34

+- 12

- 6

16 -

16 - 34

14

34

l2 i

6 -

16

22

5.5

-6

RTC interaction = -

+
4

-6

- LS

l6

34

14 + 34

-2
4

-0.5

r
IW0

4.4

I I'\' I I

I i\ C I () R Ii\ I

F X I' f. R f M I NT S

75

DETERMINING WHICH EFFECTS ARE SIGNIFICANT

J\ 2-lcvel factorial design with k factors leads tn many estimated main effecb and intcrav

tiom. The number of/ factor interactions is given by


tions of k items taken i at a time.

~or

i 1( k

k! .) , the number of comhina


I .1

example, consider the 2 factorial experiment co mist


71

ing of 128 runs. There arc <.even main effects,


7'
)! (7

.
35 three fallor interactions,

(
41 7

2!(7 - 2
7!
4 )!

) - 21 two -factor interactions,


I

35 four-factor interaction,, and

SO Oil.

lortun,1telv, \\'e L.lll ex1wLt the great majority of these effects to he negligible. Experience
has shown that the Pareto principle is generally at work here, with a small number of effects
constituting what the Pareto principle calls "the vital few" and the remainder comprising
the " tri1 ial manv." I he phra..,c "effect.., sparcity" is also used to conve1 the same idea. In
,1ddition, there tenth to he a hierarchical ordering of effects with main effects larger in mag
nitudL and hL'llll' more imp1 1rt,1nt than 2 factor interaLtiom, 2 factor interactions larger
than ) l.1ctor intc1aL1io11,, ,rnd so forth. In experiments for which 4 factor and higher
onk1 intn,1Ltion .s c.111 he cst mated, thcv arc .1lnH1\t cert,1in to he negli~,1hlc. In 1110..,t cases,
thL' .\ f.1L1or i11teraL1111ns will he ncgligihlc .is well.
l'hLTL' 1.., ..,uh-,1.rnt1,d empmL.ii L'1"1de11Lc nl thl'> hicr.1rchical ordering primiplc ha..,ed
llll thL ,ll.LlllllUl,1tilln ol l'\pe11mental re.,ulh 1n n11merous settings llVl'r 111.rny years. In the
Lase of cont1nuom factors, ,here is also theoretical support. Smooth response fumt1ons
can he approximated hy their l'aylor series expansions, with first-order terms (main elklls),
scurnd order terms (2 fallor interactions), and so on, with higher-order terms C{)rrt'spondmg to higher order effects diminishing in magnitude.
The calculated effi.cts arc estimates th<1t are subject to uncertainty. Repeating a particular
experimental run would inv<1riably result in a response that is somewhat different from the
orig1n.il lllll', duL' to L'\pcr1111L'nt,1l error. 1:or e\,1111f1lc, 111 the crJcked pots experiment there
dre numerous sources of experimental error: differences in clay composition from batch to
batd1, vari.ibility 111 actual pe.1k kiln temperature around each of the two target settings, differences in how pot'> .ire h.rndlcd bv workers, and so forth. As a consequence, a seul!ld 111
dependent exernt1on of the entire experiment would result in calculated effects that would
differ from the estimates oht,1ined before. In the light of this vari;ihility, the experimenter
need.., to determine 1\hich estimates arest11tist1cnllysignificant . In assessing thestatistiLal signifiLance of ,rn estimated effect, the question is whether the evidence is strong enough lor
the cxperi111e11tcr to c:oncluu1 beyond a reasonable doubt, that the tru1 (or mean) cffed is
not equal to zero.
J'hcre are four approadle!> to determining the statistical significance of effects in ,1 factorial experiment, which arc discussed in the next sections:
I. Replicating all or part of the desig11 (i.e., multiple runs under the <>ame experimental
rnndit1ons ),

Using prior information dbout the experimental error,

"\. \ssuming higher order interalliom arc ncgl1giblc so tlut thl'1r L''>ti111ate-, IL')Hl''>L'llt
noise (experimental error), and
4. '.'\ormal probability plots

4.4.1 Replicated Runs


Let us return to the cracked pot example. Suppose that each of the eight test conditions
was run twice and that the response (percentage of cracked pots) used in calculating the effects is the average response from two runs. Jn this case, there are two "replications" for each
experimental condition. The 16 runs were performed in random order, which is essential,

and the results arc shown in Table 4.5. for a particular combination of factor settings (e.g.,

+ + +),the difference in the two percentages 1s due to cxpcr1111cntal error. 'vVith eight dis
ti net combinations of factor scttmgs (the 8 runs in standard order), we L<lll Lakul,1te eight
separ<tte estimates of the variance of the experimental error, .v1th e<1d1 L''>t1rn<1te having onl'
degree ol freedom These estimates arc shown ll1 I dblc 4.5.
\\'c average the eight estimates to obtain the pooled e-,timate

s;.'

8 i 2

18 i 2 t 18

Thl' pooled stand<IrLi deviation s1,

\,

8.25

H t 2

8 2S

2.87 lllL'asurcs the \dJ'1,1bil1ty 111 thl' re

sponse of an individual run; it has 8 degrees of freedom (the -,11111 of the degrees of freedom
of the eight separate cstm1ates). )111ag111c repeatedlv Ldlryrng nut a part1Lular run, (sav) run
3 (I~
,1
+, (
). Because of experimental error, the uutwmcs >rnuld vary from
run lo run, with sl'
2.87 estimating the variability in the-,c responses.
LILh cstm1ated cffi..ll is the d1ffcrc11Le ol two a\L'ragcs: thl ,J\'l'r<tgL of eight rLsponses ,Jt
the + level and till' ,1vcrage of eight responses at the
Icici I he rl'spoll'-L's arc unLcrta111
(random variables), and hence each cstim,1tcd effect is a rando 11 \ariahlc ,1s well, with a cc1
ta in unknown me<l!l. ( )1ll..c the experiment is canicd out, WL ohl.1111 ,1 r1,1rt1cul<11 \.due !or
each e-,timatcd effect. h>r example, we Lakulatcd that the estimated m,1in eff(:ct of looling
rate I.'> 13.5. )f we repeated the entire experiment and rccakul.itcd the 111a111 cffctt of U from
the )6 new runs, we would obtain anothercstJlll<lle for the effect of the L<ioling rate. BecaU'>L'
of the variability in the I 6 individual rl''>f)()Jlses, this second LalculatLd estimate would al
111ost ll'rtainly be different from the first. II we repeated the n.pcrime11t 111am times and ,n
cragcd the estimates of R calculated each time, we would obtain '>0111clh111g llosc to the long
run d\crage or mean effect of R.
'fo test whether the estimate R = 13.5 is statisticalJy sign ficant, we ask the following
13.5
question: If the mean effect were actually 1.cro, how likely is 1 that the estimate R
would occur by chance? 'fo answer this question, we will u>mlruLl a 9"i"o confiderllL' inter-

val on the mean effect of cooling rate. If the rnnfidrnce in ten ti docs not include 0, we will
reject the hypothesis that the true mean effect of the muling rate is() ;111d Londudc that the
C'>t1n1,11c 13.5 is -,ut1stiL.1lly signiliLant. I or c.1d1 ol thc esl1111atLd cllcch, \\e ,,Jij lollo1, till'
same procedure to determine which effects arc significant.

-,--

I WO

11 Vi'I

~A(

TOHIAI. FXl'l'RI 'vHNTS

77

I A fl I I 4.::;
/lc.-11/t.1 of the Cmcknl Pots hxamplc with lfrpl1catcd R1111>
R111

I""

I 1t.1r1d.ird

i'crLcnlagc of ( raLkcd
(1nd1v1dual cxpcrimcnlsl

II

order)

1\vcr.1gc

l'crcrn t.1~,. ol
C:r<ickcd l'nt1

I
I.'

11
9

17
19

I)

(i

I(>

12
In

.n

11

12

L~umatcd

(8

variance.'.
h) +- { 1

(>r

\ 19

In) " ( I\

16 ) 2

('l t>

14 ) + (32

34 )'

'1

12 ) 2 + ( J l

(II

12)
2

6 )2

(9

~ (3

n)'

",,

18

'
lh )

( I"
'i

'

i l'i

s,

IhI

'

'"
18

- 8
14 ) 1 + ( 16

( 12

14 ) 2
8

\.1 ~

( )l

" (:lS

>4 l'

let rr,. 11 ,.,, he the 1l,111dard dcvi.ition (\tand.ird error) of an cffell, an.d let <T~"" be the
\ariallle o! the respome of a ' 1ngle run. f.ach estimated effect (main effect as well as interaction) i., the difference of two averages,

y . In the current example with 2 rum at

each of the cighl d1sl1nct foLttr level combinations and 16 runs in total, each average 1s .rn
erage of the result., from 8 r' 111'. I ct N he the tot<1l number of runs in the expen men t, d nd

<l\

let

1r ,.

,\'

, . Recall that !he v;1riance of a sample average p from a sample of si7e n is given hy

11
IT

, where <T 1., the popul.1t1on standard deviation. Also, we know that the variance o!

II

!he difference nf two independent random V<iriahles is the sum of the variances. I klllc
we find
\'I.Jr

(effect )

\'II r( )',

y )

var(y,)
J

.~

.... <T n1n


II

II

4 2
l\J (T run
Replacing 1r;"" by its estimate s1'. leads to
estimated var(effcct) =

Seffecr

4 '
= Nsp

var(y )

--

'!WO I rVEL l'ACTORIAL fXl'rRIM_F_N_'J_s_ __

TABLE

4.6

Confictezce Intervals for Main Effects and Interactions: Cracked Pots l:xample
95% Confidence lnterval

losti111ale
MAlN EFFECTS

13.5
0.5
14.5

l.l.5 + 2.306 x j .44


0.5 !: 2.306 x 1.44
14.5 + 2.306 x l.44

13.5 + u
0.5 + 3.3
14.5 ::':: 3.3

1.5
5.5
- 1.5

1.5 + 2.306 x 1.44


5.5 + 2.306 x 1.44
1.5
2.306 X I JJ4

1.5 ::': 3.3


5.5 ::':: 3.3
J.5 I 5.3

Cooling rate Ii
1ernperalure T
Coefficient of expansion C
lWO-l;ACTOl{ l'.'JTFRACTJONS

UT
l<C

re
rHREl-FACTOR IN IE RAC I ION

() s

!WT

"<>

11 :

0.:1

2.

1()(1

I .44

Significant effects are shown in boldface.

In uur example, the total number of runs N


variance of an effect is s~rie, 1 =

4
16

(8 .25)

16 and

sf,

K.25. Therefore, the C'itimatcd

2 .0625, and the standard nrnr of an effeLI

1'

v2.0625 ~ I .44.

sei"''

We use the standard error of

lll

cflect tu Lonstruct a YS~u u>1dide11cc' 111krv,il for the

mean effect. In our example we use the 97.5th percentile (0.025 tail probability) ot the
t-distribution with 8 degrees of freedom (2.306), because the variance

sf, was pooled

from

eight separate variances . Suppose in this example, there were three replications at each mm
bi nation of settings for a total of 24 runs. Each of the eight separate variance estimates would
have 2 degrees of freedom, and the pooled estimate would havt' I 6 degrees of freedom. The
confidence intervals for the seven effects are given in Table 4.6. Significant effects arc shown
in boldface. They are the main effects of cooling rate Rand coefficient of expansion C:, and
the I<.(: interaction.
Instead of constructing confidence intcrvab, cyuivalently, we can compare the

ratios

(estimated effects divided by their standard error) with ='-2.306 . Effect.'> larger than 2.306 in
absolute value are considered significant.

Interpretation of Results
The estimated main effect of temperature Tis very small and not statistically significant.
Also, the factor T does not interact with either of the other factors. Since increasing the temperature of the kiln would be more costly, it is dear from these results that it would be best
to keep the current kiln temperature. The main effects of cooling rate]( and coefficient of
expansion C, and the interaction between these two factors are stati,ticalJy significant . The
main effect of a factor should be looked at by itself only if th" factor dues not have a statl:,tically -;ignificant interaction with another foctor. Because of the RC inll'raction, we should
and will examine the two factors jointly.
hom the square diagram in Figure 4.2 we observe the nature of the inll'raction. With cu
et11Lic11t of expansion c; at+ (the current material), increasing the cooling rate from
i11crc~1.,cs

to t

the pcn.:cntage of crackcd puts lrnm I 5'ViJ tu 34%. With coclticicr11 ol cxpamiu11

tu + inc rca~e' the pcrcentagc


oC crncked pots from 6% to 14%. The lowest erccntage ul' nacked pots, 6%, occurs with
at

(the new matt.rial), increasing the cooling rate from

J"\VO

11 Vi:J

!A< J()ll/AI

FXl'l-RIMf-'\JIS

79

10th loclficie11t ol expan-;ion ('and cooling rate Rat their low(-) levels. Since the cost of
he ne\' material with the lower coefficient of expansion is the same as the cost of the current m,lterial, it is clear from the experiment that the new material 1s preferred. The question ofthL' best cooling rate would require additional analysis. The issue 1s whether the cost
s,l\'lngs from lowering the proportion of broken pots from 14% to 6% is greater than the
LO.'>t oft he deu-e.i.-,c in prod uLl J\ 1ty that wou Id result from operating ,it t ht> u1rrent ( sJm,Tr)
cooling r.itc compared to the Li.,tn rate.

4.4.2 Prior Information about <T;un' the Variance of the Experimental Error
'-.omct11lll''>, ha-,cd on prL11ou' C\ll'nsivc cxpn1rnLn1'. the cxpcrimcntn <.:dn .issu11w th.1t
the 1.n1,rnLL' of I hl rc,~1011,c I rn m a '>1 llgle run, 1r
I.'> known. Jn t fw, case, WC SU hst It U I c' I Ill'>
1a!t1L' i1110 the cq11.1t1ll11 lor the '>Lrndard error -,hown in the prcviotl'> s11h.-,ectio11. t\l<.ll , 1n
11111

rcpL1LL' the pc1u1111k' of the I drstnhutinn (I 1<1lucs) by the percentiles of the normal dis
trrhulron (.:- 1,iiuL', ).

4.4.3 Assuming Higher-Order Interactions Are Negligible


We continue with the cracked pots example. Suppose after analyzing the results of the 2 1
f<1Llorral experiment, management decides to .idd an additional factor to the test. One of the
production workers suggests that cracking may he due to movement of the pots as they
tr,l\cl on the convcvcir and ,HL' handled hv workers. She suggests using a rubberized e<Hrier
instead ot'the Lurrent rnctd! c.irricr. We add that factor, labeled[), to the design, with thL
;eve! being the current metal carrier and the + level berng the new rubberized carrier. f'he
Jcsign matrix with the 16 runs in standard order, the results of each run (only a single run
was taken at each cxpenmcnt,1! condition ), and the estimated effects arc shown in Table 4.7.
You ma; want to d1cLk the \alue of the 4 factor interaction. Obtain the calculation col signs of the lour design columm, apply the -,igm
urnn N /'( /) h}' multiplying lhL' ,111d
ot'this calrnlation uilumn 10 the observations, and divide the result hy the number of plus
signs, which is 8. You will find th,lt TffC[) - 1.500.
There arc four 3-f.1L1or inter-actions and one 4 factor interaction. lfwe assume that L'ach
of these interactions i-. ncglig1hlc (i.e., the true ( mean) effects are 1ero) we can use the-.c' five
estimated effects to obtain an estim<lte of the standard error of an effect. We can think nf
thc-.e five values as a -.ample him a distribution with mean 0 and unknown , but common
stand,1rd dcvrat1on (the st.md ,1rd error of an effect ). To estimate the variance, we takL' each
1aluc mi nu'> 0, '>t]L1<11-c the rc-.ult, -.um the S squared deviations, and divide by 5. Since we .is
sumc thdt the mean i-. 0, we do not use the somple average in the calculation nfthe variance
c-.t1111,11c. ,111d \\l' d111dc 111' 'i, the number of ohscrv,itrom, rather than 5
I. Our estrrn.11c
h,1, ::; degree''"( frcLdom.
l''>limated var(cffi.:ct )
1.5

0)

s', 1, "
t \

1.75

0) 2

(0.25
5

8.1875
)

1.637S

0) 2 + ( o.7s - 0) 2 + (Ls - o)'

80

TWO-LEVEL l'ACTOIUAL

EXP~RIMENJ'S

TABLE

4.7

Design Matrix, N.esults, und Estimated cjfects fur


Cracked Pots Problem with !'our Factors

Run

[)

1'1
16
K
22
19

l'crccntJge ol
Cracked Pots

\7
211

7
K

IX
I

9
10

J J

+-

12

JI)

12

jj

14
15
16

\()

I\

_\()

fstwwted ejjects:
l'.flect
Average
R
T

c
D

RT
RC

IW

0.500
-0.250
0.5UO
1.11011
l.:>00
1.750
0.250
U.750
1.500

'](_

nJ
( /)

/(/(.

/(//)

RC:IJ

rcu
/ff CD
No TE.

17.625

12.500
j .000
14.500
-8.250
1.250
5.250

Significant effects are shown in boldface.

standard error( effect)

s,. 1re"

\/I.6375 ~

I .::>8

The 97.Sth perccnule uf a 1-distributiun with 5 dt:grees uffrt:edom ts gil'en by 2.571. I knee,
the 9510 confidence interval for each effect is obtained by adding and subtracting (2.57 l) X
3.29 i"rorn the estimate. Any effect with absolute value grc;1tn than 3.29 is stat is ti

( 1.28)

cally significant. We find that R, C, lJ, and /~C arc significant, ;rnd they Jre shown in bu Id face
111 Table 4.7. The rubberized carrier (lJJ reduces the percentc1ge uf cracked puts by 8.25%.

4.4.4 Normal Probability Plot


I low can we determine which effecb are significant if there are no replications? In that
case, a simple graphical procedure called a normal probabiliry plot will be useful. Assume

I\\'()

I I VlI

I\{ lf)l<IAI

TXl'fHIMI "''

I RI

thdt the re.,ponse dm, not dqwnd on anv of the L1ctor,, and that observations vary around
a rnnst.rnt lcvLI. !'hen all main ,,nd interaction effects, which arc linear comhin.ltions olthc
rcspon.'>es, should vary <1round zero. Furthermore, because of the central limit effect, their
distribution should be approximately normal, because estimated effects a\Trage the observations (with half of the weights at l and half of the weights at + l ). A dot plot or a histogram of the effects is useful ,is such plots can highlight big effects that do not follow a nor!llal d1stnhut1on .1round zero. 1lowevcr, with seven estimated effects (in the factorial with
"\factors) or IS l''llmated effects (in the factorial with k 4), the construction ofa his1ogram .rnd ,1 check ol whether 1t 1s hell shaped around zero are not verr reliable, given that
there arc iu't ton few observations. A normal prohahilitv plot of the effects, on the other
hand, provides a useful tool.
Let m denote the 111 e't im"tcd effect' lw / 1, J;, ... , /;,,. In general, there will he 111
I eflcl.h. !or c\,1mplc, for/..:
3, there will he se\l'n estimated cffecl.,. 1he procedure
i' the follow111g. I 1r,t , \\'e onlu the effcch frolll '>lll.1llcst to largest. Next, a' described he
lm,,11't' 11lnt till' nhscr1Ld ll'lllJ'irIL.ill u1nrnl.1tiVL' proh;1hilitin associ,1tl'<l 11ith the cstilll;11l'd
cflcch again.'>! the estimated dkcts. 1 he x-axis represent> the effects, and they axis repre
'>cnts the cumulative prohabili!ies. The 'ca le of they axis 1s constructed 111 such a wa1 th.it
if the cL1ta pnmts follow a normal distrihut1<l11, the L11mulativc probabilities will plot
as a straight line. ror effects tl1.1t arc from a normal distribution with mean zero, the plot
of the effects should .1pprox1mate a <;traight line with the line passing through the point
(x 0, v O.S). Signitiunt effects much different from 0 will fall away from this li11L'. rJfeLt., th.it .ll"L' unusu,1111 slllall or 1.irge and I.iii to follow the straight line pattern are judged
to he significant. .'>tatistical '>Oftwarc packages can readily LOnstruct such normal proh.1hd1t} plots.
Tii illustrate the procedure, consider the 2' design in Table 4.4, from which we estin)ate
the Se\ ell cffL'l.l.s ,J10wn Ill T.1hlc ..J.6. \\'corder the . . even effects from small to large: n'
- 1.5 , RCT
0.5, r
0.5, !ff
1.5, RC S.5, R = 13.5, C = 14.5. The 1,mallest among
,he se\Tn effects ( /'(
1.5) represents a cumulative probability between 0 and 1/7 .111d is
.issigncd ,1 cumulat1n prohab1iity (y-val ue ) at the midpoint of that interval, which is (J.07 I..J.
fhe second smallest among th e seven effects (RCT
0.5) represents a cumulative probability between 1/7 and 2/7 and is assigned ay-value at the midpoint of that interval, which
1s 0.2143. The third smallest effect is assigned a cumulative probability at the midpoint of
the interval 2/7 to 3/7, and so f;lrth. In general, with m effects, the ith sm2llest effect is plotted at a cumulative prnh,1hilit) of(i 0.5)/m.
Now let w. return to the cra_ked pots example in which four factors were tested, the origIS
inal three factors plus factor[), the type of carrier. Table 4.7 lists the data and them
effects that can be cst1matcd from this unrcplicatcd experiment. Which effects arc significant? t\s noted above, a simp!C' dot plot of the 15 effects (there are 15 effects in a 2 4 design)
could highlight h1g effects th,,t are far from zero, but a better approach is to construct a
norlllal prohahilitv plot. Figure 4.6 shows the normal probability plot created by Minitah.
'.'\ote that i\11nit,1h tises a -,light I} different definition ofcmp1ncal percentiles; the ith l.1rge<;t
effect" plotted .1t a rn11llll.1t1\L' probability()' v<1luc) of (i
0.3)/(m t 0..1), instead of ,1t
(1

(l.S)/111.

I lmve\l'r, this ch.rnge is of little practical -;ignificancc. Observe that the .,L,tle

82

rwo-LEVl-.L

rACTORIAJ

LXPrlUMENTS

NormJ! probJbi!ity plot of the elicl t;


(response is% cracb, Alpha = O.O'i

99

.c

95
90

j(

RC

80
70

~ ~8

~ 40

30
20

I~ ~ [)
I -+---------,-~~~~~
-JO

-5

l.rnth' P:,l

1.5

--,
Ill

I)

Effect
Effect type:

Figure 4.6

Not significant

Significant

Normal Plot of Effects in the Cracked Pots Problem: 16 Runs, with 4 factors at
2 Levels Each

on the y-axis (cumulative probability, in percent) in Figure 4.6 is nut linear. This is where
the normal distribution comes into play. With rn effects, the 1th smallest effect is associated
with a cumulative probability of (i 0.3) /(m + 0.4). The percentile of order l OO(i
0.3) I
(m f 0.4) implied by the standard normal distribution is called the norrn1.1/ score or z-score
of the ith smallest effect. A normal probability plot is a plot ofnorrnal scores against the estimated effects. I-or illustration, the normal score of the smallest elleLl /) 8.25 w1 th cumulative probability 0.0455 is given by -1.690 (the 4.55th percrnt1k frnrn the standard
normal). The norm<d score of the Sl'collli s111,dlcst ,1111ong the I 'i eflc'd' I /(J'/)
1.7'11 with
cumul.1tive prnbability 0.1104 is

1.224, ,ind so otl. !'he bbel on the v ,1xis Jcnutc:. the cu

rnulative probabilities multiplied by 100. Minitab makes it easy tu distinguish between insignificant effects (with mean 0) and significant effects by titting a straight line to the 111iddlc
portion of the graph. It also adds information on Lenth's ( 1989) PSE, a method that we explain in Appendix 4.3. The effects k, C, D, and RC are signihcant, which is consistent with
the 1-csults we obtained by assuming that 3- and 4-factor interactions are zero.

4.5 CASE STUDY: A DIRECT MAIL CRED I T CARD OFFER


The financial industry-including insurance, investment, credit card, and banking hrmswas among the first to use experimental design techniques for marketing testing. The project described here is from a leading Fortune 500 financial products and services firm. The
company name and proprietary details have been removed, but the test strategy, designs, results, and insights are accurate.
The focus of this particular experiment was on increasing the response rate: the number
of people who respond to a credit card offer. The marketing team decided lo study the ef-

! \'\' () - l I'\' I [

T ;\I\

I I

f A ( ! () IU A I

F x pr R I M

r NT s

83

,1 R

/Jcscnptwn af the hmr rurtors


lat tnr

Control

( +- )New Idea

.'\ 1\ n n t1.1I fct

( ,urrent

I ower

fl 1\ccot1nt opening fee


( lniti81 interest rate
fl I ong term interest r.1tc

No
Curren I
I.ow

Yes

Lower
I ligh

fccts of interest rates and fees, Jsing the four factors shown in Table 4.8. These factors arc
the annual fee, a fee for opening an account, the initial interest rate, and the long-term
interest rate. The company wanted to test the effects of lowering the annual fee and initiat ing an account-opening fee. Although the account-opening fee was likely to reduce the re.;ponsc, one manager thought the fee would give an impression of exclusivity that would
11itigate the magnitude oft he response decline. The team also wanted to test the effect nfa
m1,1ll inL rea<;c in the long-term interest rate. i\t the same time, they wanted to test the cllcct
of two different initial interest rates, both lower than the long-term rate.
10 studv interactions along with all main effects, the consultant recommended a 2 1
design. The marketing team Lhed columns A- D of the test matrix in Tahle 4.9 to create the
16 mail packages. Many advertisements had to be sent to targeted customers as the average
respomc rate to such mail ;ids is only in the 2-3% range. Each of the 16 different mailings
was sent to 7,500 customers, requiring a tot<ll of 120,000 mailings. The numbers of orders
received and the order rates arc given in Table 4.9. In total, 2,837 ord ers were received,
rc:,ulting in an ovcr,111 order rate of I 00(2,8371120,000) - 2.364A1.
The +/

ulmhinations in the l l interaction (product) columns Me used solely for the

stati'>tical an;1lysi' of the results. The 15 main effects and interactions (the six 2-factor in te1ictions, the four 3-lactor interactions, anci one 4-factor interaction) arc obtained \1\' c;1l cu la ting lineilr com bi nations oft he response rates using the weigh ts ( ::!:: I) in the design ;i nci
intcr,1ctinn columns, and dividing the results by 8. for example, the main effect offallnr A
i.'> given hy
(!\ )

2.!J5

.>.36

2.16

2.29 - 2.49

- 2.0!J + 2.03]/8

0.4075.

The (A/5C) interaction is obtained by first forming the product column ARC (which is given
1-, - , +, +, - , +, - , -, +)and calrnlating
hy - , + , +, - , -+ ,
(ABC) = [ - 2.45 I- 3.36 + 2.16 - 2.29 + 2.49 - 2.04 + 2.03]/8 = -0.0525.

We have put parentheses around the effects to distinguish them from the factors and cnlculation columns. The cs ti mated effects are shown in Table 4. I 0. Significant effects exceeding
two standard errors are indicated in boldface. The calculation of standard errors is explained in Section 4.6.4.
Figure 4.7 graphs the effects in the order of their absolute magnitudes. The broken line
indicates estimates th;1t arc larger than two standard errors. For simpliciiy, we use the factor 2 to ap~miximatc the 97.5th percentile of the standard normal distribution ( 1.96).

TWO

LEVEL- FACTORIAL

1\ I\ 1 f'

EXPER1MF.N~8_s_ __

4 I()

hli111111ed l:f)rrrs
lffrLI %1
2.16112

l '011_..,1;111!

( A)
(fl)
((")
( /))

0.405
-0.518
0.252
- 0.498
-0.302
0.002
0.108
0.048
0.102
0. 158

(ARI
(\("I

1:\/)\

l!iC)
(fl/))

(CD)
I A IJ( '
( AJS/J)

(close to rwo standard errors, 0.175)

0.052
(l.088

0.008
0.108

(11(,/))
(fl({))

\/Will

O.O'i2

that exceed 2 s1andard errors,


'0.1 ~5, .ire .<hown 111 hold face.

No T F. Fst1m,1!cs

(2)(0.08""

fl: i\ccount-opening

fee

-0.518

Il: Long term interest rate - 0.498

A: /\nnual fee +0.405


0.302

CJ) ----

+0.2'i2
ln1t1n.l 1ntcrc..,t r<1ll'

All
. .
--\/)

Signific,int effects (above line)

0.158

0.108

/l("j)

+0.102

Rfl

!\fl{)

All<

0.085

fl.O'i2

1\ /l( '{I

') OS2

fl(

11.1118

\t /I

\(

0.108

0.IH),;

10.00:
f--------

(J.(Jtlll

---,
0.125

11.250

0.375

() 'iOO

1-Jlcct 1n perccnl<igc p0>nt'

Figure 4.7

Estimated Effects Ordered by Their Absolute Magnitudes

As shown in hgure 4.7, all four main effects and one (and perhaps a second) interaction
(the All and the en interactions) arc significant. Note that the CD interaction is just slightly
smaller than 2 times the standard error.
B - : ,\'o acco11nt-opl'ningfee. One manager thought that charging an initial fee would
give the impression of exclusivity. However, this fee had the largest negative effect,
n::ducing the response r,\lc by 0.518 percentage points.

r
.

.
'

TWO - I FVEI

FACTORIAi. EXPERIMENTS

87

initial rate has a large impact, with the response changing from 1.91 % to 2.32%. Jn contrast
to the main effeLls th.11 suggc-,t both interest rates should be low, these results, follo\\'ed by
.1dd1tional .rnalv.sis u-,ing the company\ flnanci<il models, showed that a lower long term
r,1te rnupled with the current (higher ) initial rate would be the most profltablc.
O\cr.11!, thc.'>e 1ntcractio11' gi\t' important insight into the true relationship among the
t~1ctors .111d help to hcttn qu.intify their effects on profitability. Rv combining all main effect'> .rnd signifiLant 1nteract1om into one model, the marketing team could analyze different comhin.111ons a11d est1m.1tc response rates and profits more accurately.

4.6

ADDITIONAL ISSUES IN DESIGNING AND ANALYZING

FACTORIAL EXPERIMFNTS

In thi-, -,ection \\T discus-, some additional important issues related to factorial dcsigm. We
hq.~1111'1th ,1 di-,lll'-'ion ofhm1 to -,ct foLtnr ll'\el' sud1 a' the plw; and minus values for kiln
tempc1.1turc. :\L'\t , \H' 'lrnl'. ho11 ,1 facton.11 des1g11 L.111 he represented as a reg1-c.,.,ion
model , and ho\\' thi' model can he used to predict the rL'Sponsc as a function of the factor
'-l'tt111g.,. \\c then tkl111L' .ind di'>cms a11 1mporta11t mathematical property, orthogo11t1!rty ,
\\h1Lh 2 icl'cl f,1ctori.d des1g11' h,l\e, and whid1 leads to the independent estimatio11 ofcffell.,. ,\l.,o in this scLtion, IH' 011sidcr the special charalleristics of experiments such ,1-, di reLt mail, where the rcspomt' \.iriahlc is the fraLtion of people who resf' Ond. \Ve shm, how
to determine the needed sa1rplc si1c and we explain how to assess which effects arc statisti
L.1111 '1gn1l1c.1nt. I 110 ll'lcl l.i donal dc-,igm .ire lineM models that will he inadequatL' ii the
relationship between the response \ariablc and one or more factors 1s nonlinear. We end the
section b7 showing how to check for curvature by adding runs to the original design.

4.6. l Choosing the Levels for Each Factor


It is standard notation to .1sc plus (+)and minus ( - )to denote the high and low levels
the factors. The unit.s, of course, depend on the particular situation at hand. For a cont1nuou-, faLtor such ,i., the pm:c ofa relatively inexpensive product like a hamburger, the Im'
.rnd high lcnls ma1 rq1rL-,ent prill'S that arc I 0 cents lower / higher than 1.he usual pn,L'. for
.rn cxpcnsl\'C product -,uLh, 'a car, the low and high levels may he prices that arc 'ii 1,000
lower higher th,111 normal. I :n advertising, the low/ high levels may rL.prescnt advnt1sing
expenditures that .ire 10% lower and 1010 higher than the standard. Us.i<Jlly it takes careful
thought to specify the level., nf continuous factors. The levels should not be too close, hcc,1mc then not muLh of a ch .tnge in response would be expected, and its magnitude might
o(

he overwhelmed by the 1nhc ..-nt variability. On the other hand, the levels should not he too
different, either, hcc1use then the effect might no longer be linear over the studied range.
r\lso, note th.11 1n 2 lc\cl experiments it i-, not possible to detect nonlinearity, since"' ith just
two levels only a straight lim can be flt to the responses. In Section 4.6 ,5, we discuss how to
use additional runs to check 1clr nonlinearity.
In gcncr<1I, the'>c LOmiderations do not apply to unordered catcgo - cal factors '-lllh <ls
type of ad copy, font, color, and catalyst, because levels in-between the categories .ire not

t1om .rnd lead., to thl' st.1nd.ird error of the csti!llatcs. I his is equivak11t to our appro.11..h 1n
:-iectio11 4.4.3 ofassurrnng that higher-order interactions arc ncgligihle.
I he rcgres.'>ron appro.1ch ha.'> -,omc advantages. It is more flexible for analyzmg cxpcri!llents th<1t include lllis.'>ing oh-,ervations. Also, the regression approach is needed when factor'> Me co11tinuou., with more than two levels, and if one wishes to model the function-ii re
l.1tiomhip het\\"l'l'll the I"L'spo11sc ,111d thL' faLtors. Jrnagr11c an experiment that asscs'>e'> the
relationship between thL -,,ilL' ofa maga1.1ne and ih u1vcr price. r.xperimcnts at four clrfferL'nt pnLe level<, S 1.0, 'i I .'i, S2.0, and $3.0
rna1 h;1vc heen conducted, and one m.11 '' ish
to rnodLI the ft111Lt1011.1l 1Tl.1tio11,hq1 het1H'Cll .,,1lc'> and prrcc, deterrn1Ih' ''hether till' -,ales
price rLlat1onsh1p 1' Ir near or quadratiL, and h11d the prrcc at which sail's arc maximi1Ld. I-or
that, one need, regrL'''1c111.

4.6.J Orthogonality
Definition .. \ de.,1gn 1' orthogonal ilfor anv two design foctors, each f.1ctor-level LOlllhination ha., the .,,1mc number ii rum.
The 21 f.ictorial design is ,1n orthogonal design. Jn the 2k factorial design each pair ol desrgn fall or'> is '>tud1ed at four pmsihle combinations, and at each of these combinations, 21

runs arc carried out. Consider the 2 1 design for factors A, B, and C shown in Table 4. I I. The
lour level u1mhi11.1t ions ofl,Ktors A and H, for example, arc ( - I, 1), ( + I, 1), ( I, I I),
and (+I, +I), and 2 runs drc conducted at cac h combination. The same is true for the other
two pairs: factors\ and(', ,1J1d factor'> Tl and C:.
'.\:m,, ignore the response column, and consider any two columns (design, as well as cal
culation columns) 1n the matrix in Table 4.11. Multiply the entries in each row of the two
selected columns, and sum the products. It will give zero for any pair of columns. For illustration, take the product O columns C and AHC; you obtain the s1m +I - I
I+
I I I
I
I t I
0. T11is characteristic is a property of orthog< nal 2-lcvcl dc'<ign'>.
lkc.iuse of thi' orthogon, I design <anrcturc, effects .ire estimated indepcndenth for
example, the main effect of r\ docs not depend on the main effect of H because the uirre'ponding uilu111m .Hl' uncorrLl.1ted. It is edS} to sec why this is the case. Whenever I I'> at
f I, H 1s equally likelv to be .it + I or -1. Similarly, whenever!\ 1s at
I, His cqualh likcly
to he at+ 1 or

I. As a resul, any change in one effect is canceled out in the estimate of any

other effect. For example, suppose whenever!\ is + 1 the response is increased lw some
amount.:.. In c.1kulat1ng the main effect of Tl, the amount L\ will be added for each of the

- - - - 1wo 11

VFI. IACTORIAI

rXl'l RIMlNTS

4.6.5 Determining the Required Sample Size

When the

Re~ponsc

ls a Proportion

In the case studv of 'iect1on 4.5, the total sample s11c was 120,000. A fundamcnt.il question 1n problem'> of th1-, t\pe 1s hmv large ,i -,ample si1c is needed? Stat1sticcll pack.iges,
mcluding .\1initah ,111d Jl\11', provide wcful software for makmg this dctcrmmation. 1\ppend1x 2.1 di.'>LU'>'>L''> the theon behind their Lakulat1nns. Suppose that, based on prior cxpencnLe in 111.iiling..,, the fin,1nc1al -;erviLcs Lompanv described in the c1se study cst1111.1tcd
that the over.ill respon-,e r.1tt would he .ihout 0.02'i or 2.5%. further, suppose the firm de
cided that ,i ch<111gc 1n 1c-.s11on'c of 0.2510 w;is economically meaningful. I lcncc, the hrrn
1v.inted to he .ihlc lo Liclell '-.L~Lh .t Lhangc (L'1ther an 111Lreasc from 0.02'> to 0.0275, or ,1 decrease from 0.025 to 0.022S) with high prohahilitv.
I lerc we illustrate .\1111itah's power and sample size routines . .\.1initah includes a function
for dctcrrrnning the -,ample -,i1c in the compariwn of two proportions ("Stat > Pown and
Sample Size > 2 Proportions"). In the case study, the total sample size was 120,000 with
7,500 people receiving the p.ickage mailing defined hy each of the 16 runs. Each cffeLl is the
difference 111 two sample proportions (p
p ), with 60,000 people exposed to the 4 level
,ind 60,000 people exposed to the
level. This means that in estimating each effect we arc
uimp.mng two 1mkpenden1 -,,11nples of silt' 60,000 each. We enter 60,000 for the sample
si1e, 0.025 for proportion I, rnd "not equal" in the options for the altc'rnative hypothesis.
7T 2
This setup tests the null hvpothco;is f!0 : 7T 1
0.025 or 7T 1 - 1T 2
0 against the ;ilterrr , ct. 0. We use a (l.05 significance level when testing the null
n.ll1\e lwpothcSJ.., //: 7T
lwpothcsi-, that the t1rn !11opulat1on) proportions equal 0.025, and that there is no diffcr1

l'l1Le het1Vcen thl' pn1port1011s.


!"he power of.i IL''t rs the ;rnhahilrtv of rejecting the null hypnthesio; if the altern,itin' lw
pothesis i.'> trllL'; it i., I 1111nu~ the prohahilit: of a Type 11 error. We w,1111 the proh.ihilit1 of
tTJecting the null h) pothec,1s to he l.irge if there ,ire eLonomically meaningful diffcrl'llLL''> in
the rcspomc rates at the plus and minus levels. In this case, we want to be able to dt'lect a
diffcrenLe of 0.250. 'Jo find the power of the test, we first enter 0.022"i for proportion 2,
\\htLh LOtrespon,t... to rr 1
rr:
0.002'.i (or 0.25"). The resulting power is 0.812. \\c' then
repeat the procedure hy entering 0.0275 for proportion 2, which corresponds to 7T 1 rr,
0.0025. !"he rL''ulting pown is 0.773. Thus, a tot.ii s.imple si1c of 120,000 (for Calh eff(.'LI,
60,000 people arc L'\poscd to the + level, and 60,000 people arc exposed to the
lnelJ
results in a power ol about 80%. I lcnce, we arc 80% likely to detect economically meaningful change.s.
This particular l\.linitab function is very useful. Herc, we have specified the sample site
and used ,\1init<Jh to calculate the power of the test. Alternatively, the user can spell Iv the
desired power, and the program will provide the required sample si1c. Similar tu net ions .ire
included in other programs ,;uch as TMP.
4.6.6 Checking for Curvature in the Response
One advanrngc of 2-lcvel factorial designs is that they are very efficient in terms of the
number of runs that need to be carried out. A disadvantage of these designs is that in the

TW<l

Ji'Vll

lA< IORIAI

l'd'llU\ll~i'

case of continuous !Jctors, the effects arc assumed to be linear, and there is no way to Lhcck
whether this assumption is reasonable without adding runs to <he design. 'Jo illustrate, rnn
sider ,1 single contifluous factor with two levels. With two re-,ponse averages-one at the
km ,111d one at the high level- we can fit a str,1ight line pe1 fell I). hut we cannot check
whether the linear model is appropriate.
If we want to flt models with linear and quadratic effects, we need at least three factor lev
els. However, often this is costly in tt:rrns of the number of required runs. At the i111ti,d stage,
where usually one starts with many factors that may or may .rnt have an effect on the respome, this is not a practical approach. h1ctonals with factors at three or 111ore lcveb may
be appropriate at a later stage, after the expenmenter h,1-, rcdUll'd the number
a sm,dlcr set.

u(

faLtors to

. \t the initial scn:cning stage, only a -,1mplc chcLk for 11011l11e,mt1 1s needed. This LJll he
achincd by adding to the factorial design 011c or 111orc rum at till' u:11ll'1 polllt. I he ,,.,11,1
pu111t ol ,1 2 level foLtorial cxpernnent set'> c,1ch fdltor cqu,d to till' ,l\L'ragc of it-. lo\\' ,111d
high levels. In coded u111b, it 1s thl' run with x 1

.\

\,

!J. A Ll'lltl'f 110111t i-,

appropriate for experiml'nts with continuow, factors, but not fm catcgonLal fallor-, \\'here
an in between level h,1s no meaning.
:\s.,ume that we collect

11,

mdependcnt repl1ut1ons at the LL'lltcr point and obtain their

aver.1gc y, and standard deviation s,. The standard error of the average of the n, obser\ations
at the LCnter point 1s given bys) V n"
'\Jcxl, consider the average y of the 111casureme11b at the 2' aLlor Incl Lombmations. II
the response function is linear, then this average will abo be a;1 csl1111ate of the level (mean)
at the Lenter point. However, this is not the case ii the respo1he !u11ct1011 is no11l111car.
lhl' difference bl'tween the two .t\crages J' ,1nd y is a 111e,is ire of the Llirl'<llurc in thL rL'
sponse function. 1\ large nonzero difkrellLL' po1nh to ,1 no11.11ll'<lr IL'l.1t1omh1p. ThL st,1n
dard error of the dillere11ce is needed to ,1s-,ess the st.tl1stic,d .,1gnifiL,ll1Ll'. It 1s rcason.1blc to
assurnl' that the variability of individual respomes at the factorial design points 1s -,irnilar lo
the variability at the center point. Hence, the standard error of y, the average of the obser
vat ions at the 21 factor-level combinations, is given bys) V 2k. Because of the independence
of the two averages, the standard error of their difference is given by
standard error (y

y,)

I
!>

' \ 21

n,

Comment. Obseve that the calculation ot the standard en or of thl' di!krencc rL'lJU1res
inftrnnation on the \ariability of individual responses. Often, <his done above, the standard
deviation is obta111cd from mdepcndcnt replications at the center point. '>0111cti111c">, as
shown in the following example, the standard devi,1tion is obt.11ncd from rq1liL.1tiom at ,di
design point....
Example Case 2 (Maga11ne Price Test) i., med a'> an i\lu.,tr.ition. You .,\wuld refer to the
case study df'pcmlis for,\ detailed disLussio11 uf the cspL'rllllL"lt ,111d the resultlllg d,11,1. r\ 2 '
factonal experimlllt with three continuous !JLtors-rnver t'ricc (~5.':l':l and ~5.':l':l), sub
scripl1on price ($1 and SJ), and nurnber of newsstand cop1.s ( 1/3 less th.Ill current, and

absolute value greater than (2)(0.0877)

0.175%.

IW<>-1 rv11

TA

Jl I I

h\C TORIAI

EXPERIMJt-."TS

9,

4' I 2

lfrmlts of rl ,\/c1g1w1w l'nn lit


\

fl

( '1nTr

~ub,cr1p11011

<:np1cs nn

Prtlc

Price

'Jew\st.u1d

Pcrccnl

Change
111 Sale'
2. >I
S.S1

I 1,2
.10
IX.lO
1.4 I
22.61!
0.71
2.llX

'

<I

II

II

II' more than cu rrcn t) wa' carried out with the objective of asse..,si ng the impact or t he..,e
!actor.., on mag.11111c .-.ale'> .. \ ccnter polllt, with a Sl.99 cover price, a $2 subscription price,
.rnd the currentlv u'ed number of newsst,1nd ulp1es was also considerl'd. Each of the nine
comb1n.ition-. \\'.l'> run O\'er <l S week period, and the resulting .wenge weekly pcru:nt
change' in -,ales .ire o,hown in rahle 4.12. The week to-week variation was used as a measure
of expennH:nul error. \'ari,rnces among weekly percent changes, calculated for each of the
nine runs, were averaged, resulting in an estimated standard deviation of the change in
weekl\' o;alcs. Thi' '1<111dard deviation was calculated to be 5%.
:\fain cffrcts and interact1rrns are estimated in the case, and the main effects plots shown
there illustrate mmtlv linear relationships between sales and the three studied factors. The
average of thl' response' at tbe eight factorial points is y
3.63, and ii, distance to the re'>pono,c at the ccntn prnnt "3.h3
2.08
1.5::>. The percent changes listed in Table -1.12
.ire ,1\'l'r<1gc' ol n
) wceklv observations with standard deviation 5, and hence thei1 stan2.236. l'his becomes the estimate;., rom the earl1c'I' disd.ird dcv1.it1on i.., gl\cn I)\ 'i. \ 11
uis-,ion. \\'ith a single rc'spllnsc at the cei;ter point ( n - I), the sta11dard error of this
d1ffereme 1s
,I

st1111rl11rd cr-rnr; v

v ) - 2.23\/

2.37

The '>t.rnd.ird error exceeds the observed difference 3.6.1


2.08
1.55, and thus there is
no evidence of cunature 111 the response. If the observed difference were large (say, 2 times
the '>tand,1rd error), th,Jt wm!lcl he evidence that the linear model is inadequate. Jn th.it c.isc,
,111 experiment with each focit)r at 3 levels (we discuss these designs in Chapter 7) would he
needed.

4.7

NO RODY ASKED l.. S, BUT . ..

,\, di..,c ussed in this L h.1ptl'r, experience h<lS shown th;1t there is a hier.irchical ordering of
effects with main eff(:cts larger than 2-factnr interactions, 2-factor interactions larger than
3-foltor interaction'>, and '>O forth. \Ve noted that for continuous factors the Taylor series

94

TWO - LEVEL FACTORIAi

EXPEHIMENTS

----

------ ---

expansion of a continuous response function provides theoretical support for this finJing.
But this is only true for continuous factors with smooth rcspome !"unctions. With categorical factors, there is no such theoretical justification. In some cases, for example, it mc1y
be true that a 3-factor interaction is just as large (or even larger) than main effects and
2-factor interactions. To illustrate how this might occur, consider plant growth as the respon'->c (a continuous variable), and the three categorical factors: water (no/yes), fertilizer
(nu / yes ), and temperature (0 degrees / 25 Jegrces). Only one ul the eight f.tctor -lcvcl wm
bimllions (water, fe rtilizer, and temperature of25 degrees ) leads to plant growth, resulting
in ,1Luge3-factor interaction among water, fertilizer, and lcfll[>LTaturc. This and similar ex am pk.., do nut mean th,1t hierarchi._al ordering dues nut c1ppl, tu qu,tl1t,1ti1c facttH'-> at ,di.
Experience has shown that in general it docs. It just means that when the factors under in vestigation are qualitative (red background/blue background, headline I/headline 2, and so
forth ) some caution is needed before one automatically assune::. that 3-foctor (and higherorder ) interactions will be negligible.
In Section 4.4.1, we described how replicated runs arc used to estimate the variance uf
the experimental error (variance of a run) and to find the standard error of an effect to determine which effects are statistically significant. [t is essenti<d that the repeated runs at a
partiuilar combination of factor settings be geni1ine indepcrwail rep/1rntians. !or example,
in the cracked pots example a production batch consisted of I 00 pots. Simply following one
batch with a second would not constitute a genuine independent replication. The variation
in the percentage of broken pots between these two batches is very likely to underestimate
the experimental error. A true replicate requires that each setup procedure in the process be
done independently before each run. Thi> wuuld mean (arrying out tlw '->tep.'> needed to set
the pl'ak tcmeraturc in the kiln, >etting the conveyor speclb that determine the llJoling
rate, and choosing a batch of clay matl'rial from the appropriate supply (low or high coeffi cient of expansion) in a fashion that rel1ects thL variability tli,1t exisb 111 this ra\\ 111,1te1ial.
In general, a common mistake in manufacturing processc.; is to take repeated measurements from the same run and treat each as a replicate. But thi ., only captures measurement
error, which is typically only a small part of the total expcrimllltal error.
Randomization is important in experimentation and no less so when replicates arc in
clucil'd in the experiment. Carrying out 2. runs in succession <lt the s~1me factor settings
would likely lead to underestimating the experimental errnr bvcau::.e these rLspunscs would
be more alike than if the order of the 2 runs were determined randomly.
In the cracked pot example, the 8-run factorial design wa ., entirely replicated, resulting
in a total of 16 runs. for a factorial design with 5 factors or even 4, replicating all of the runs
might be uneconomical. For example, in a 4-factor, 16 -rur: L1ctorial design, the cxperi
menler might rand1Jmly choose (savJ 8 ul the 16 runs tu repliLate. 111 th.it ca,e, th.: Lalcula
tio11 of effects and their 'tandard errors would havt" to be done using regrc,sion bc(au-;e only
half of the 16 experimental conditions would be run twice. Also, there would be 8 degrees
of freedom associated with the standard error of an effect compared tu the 16, if all 16 runs
were repeated. Th~ resulting confidence intervals for the effects would be wider, because a
I-value from a distribution with 8 dq~rees of freedom would be larger.

TWO I FVl I. FA< rORIAI

EXl'LRIMFNTS

9S

Designed ex peri men ts arr pa rt of a sci en ti fie learning process. The goals of th is process a re
to co 11 f1 rm or rcfu tc prim k nnwl edge and t n suggest new hypotheses for future study. Clearly,
it is important that this experimental approach he efficient and lead to the right answers. We
showed that factorial designs where factors arc varied simultaneously are more efficient dnd
provide more information than experiments that vary each factor one at a time.

/\\"ll

,,

11 \'II

I/\<

l<lHl/\I

Factor"\

~~

if

! 'r

Av

//

120

100

I
I
I

,. L
hgurc4.A.l

I'~

l)(j ----------~

~()

l XPl-Hl."1J'\,;

60

!-actor I

lll11str.l11on of the \ppro.ich

HO

90
of(

h.1ngingOnc I-actor.ii a Ti111c in the

hJliOl"ldl r.XJ'LTllllL'Jli

'itart1ng \\tth ( \ 1
I, x
I ), the .1pprnach of changing one f.1ctor .1t a time would '>Ct
the ftr-.t factor .it ih low lc\cl (80 1s l<1rgcr than 70) . Locking in the low level of factor I ,ind
\.II") 111g follor 2, one \1ould se leLI the lcm level of fodnr 2 (90 is larger than 80). I l<l\\L'\'l'r,
the combination (x
, .\
) with y
90 has not located the overall ma\1nH1m

y
11 () .it (x 1
-t, x,
). !'he reason why the approach of changing one fad or at a
time f~ub I'> hecau'>e of the interaction hct\\'cen f~ictcm, I and 2.
It I'> not possible to est1111. tc .rn 1nteradion with the data generated under the approach
of lh.rnging one LH.tor .it ,1 t llll'. \\'c l.lnnot uimpare the main effells of a factor at 2 levels
of the ntlll'r f.1ctor if \\'e don t h.ne d.1ta .11 .ill four factor level combination'>.
htrthcrmore, the main effects cstim;1tcs from the approach of chang111g one fall or at a
t 1mc c;rnnot he general 11cd heL.rnsc the; ,ire 111.11 n effects at spcciflL IC\ cb oft he other f.1ctor'>. ~1nce we ML' unccrt,iin 11hethcr there is an interaction, we cannot generalize thc.,c cffccts to other lc\cl'> of the factor'>.
!"he -;amc' prn11t<. c,1n he niade 1n the context ofa 2' factorial design, which can he\ 1su.di1t'd ,1., the HTtice., of a Lllhe; .'>CC hgure ..J.1\. I.lour runs at each of the low and the high '>c'ttings ofc<1ch faltor .ire u-;cd to estimate m.1in effects The approach th<lt changes onl' foctnr
.it ,1 111111.: doc> ll()t rnm1dcr all 8 lcvcl rnmh1natinns, but only the 4 that arc outlined 111 I 1g
ure 4.A. l. Again, we learn the following: (I) The approach of changing one factor at a time
is inefficient in terms of the number of runs. for the same precision, we:: need 8 runs tor establishing the main effect of the first factor and the best level of the flr<t factor; we need 4
more to establish the level of the second factor; and 4 more for the thi d factor. Hence, we
need a total of 16 runs to estimate the main effects with the same preci-,ion that is achieved

hy the foctorial design. (2) It 1s not possible to estimate interaction terms (3) In the presence
of intcr.1ct1on, the procedure of changing one factor at a time can miss the optimurn. The
oh1ect1n 111 I igurc -1.1\. l is 1 find the maximum. I !ere, we st.1rt hy Vdl")'ing factor.\ first,

98

TWO - LEVSL FACTORIAi EXPERIMENTS

with foctor 2 second. The method rnis~es the optimum)' - 120 at (x 1 +, x 2


+ ).
Nott' that the correct optimum is reached ifwc vary factor I first. 1lowever, the appropri ate order is not known. (4) Main effects at the studied settings cannot be extended to other
settings.
How does this generalize to comparisons with k factors. ~or the same precision, the
approach of changing one factor at a time needs a total of 2k + 2' (k - 1) - 2k(I +
(k 1)/2)runs.Thenumberofrunsisincreasedbyafactorof(l t- (k - 1)/2).Fork = 4,
1

this factor amounts to 2.5. Moreover, there is no guarantt'l' thd this prut.edurl' will tind the
optimum.

102

TWO - LEVI ' ! f'ACTOIUAL EXPF.IUMEN 'IS

52 ~

( ~A. 2)

n- p
wherL'

= 2

is the number of regression coefficients; here, t here arc two coefficients, {3 11

and (3 1
Another import ant result specifics the covariance matrix o r the regression estimates

/J.

It can be shown that

Cov(~u,$ 1 ) ]

s2(X'X) 1

( 4A.3 )

V({3 1 )
The 1ariances of the regression estimates arc in the diagonal of this matrix. Thci r square roots,
the sundard errors of the estimates, are used to comtruct con t1dcnu.' intervals for the regrcs sion coefficients. A diagonal covariance matrix implies that the estimates arc uncorrelated.
A nice feature of these matrix exprcssiom is the fact that th ey work fur general models.
Consider the gencr <:1 linear regression model with k rcgressor, (factors I; that is,

Define the observations of the ith case (the ith experimental run) as y, (for the response),
and x 11 , x; 2,

x,k (for the studied values of the k factors ) . Nute that the t1rst subscript in

.. ,

this double subscript notation is the index for the run (I, 2, ... , n) , and the second sub script is the index for the factor ( l , 2, ... , k ). The n X ( k

X1

X11

X1 2

X2 1

Xn

+ I ) matrix X is given by

xkJ

X2

Xn

1, 1

x ,1

x,,

l .l

xn~

The equati.?n: fo~ the lca_:;t syuares estimates in equation (A4. I ) ( now there arc{' ~ k + J
estimates,{3 0 , {3 1, {3 2, ... , f3 k) and for the covariance matrix in equation ( A4.3 ) (which is now
ak+ I X k

matrix, with variances in the diagonal ) carr y <Jl'er to thL general case. Thl'

computational aspects (in particular, taking the inverse (X' X)

arc more difficult, but

nothing that a computer program cannot handle.

y, -

The titted values from the regression fit,


f3 11 + 1x11 t
t- f3 ,x,1.> are obtained
by replacing the parameters in the model equation by their estimates. The residuals are
the differences between the observations and the fitted values, v, f3 1x 11 +
+ f3kx,k) An estimate of the variance cr 2 is obt a rncd from
ti

L [y, 52

I =

($u +

f31Xi1

-- - - -

k-

+- f3,x,k) ]2

y,

y, - ((3 0 +

1.1>vE1

_____
T_wo

- ({3, ll

"'
,, r
L.,, I y,

The numerator SSF

-j-

{3 1X11

-j-

l'ACTORJAL FXPF. JUMENTS

101

+ f3kx,k) ]2 is referred to as the er-

ror sum of squares; it measures the variability that is not explained by the model. The sum
of the -;qua red d ist;i nces of the observations from their sample mean, SST= L :'~ 1[y,
y]2,
is called the total sum of squares. It expresses the variability in the observations, without any
adjustment for the explanatory variables. The difference between the total sum of squares
and the error sum of squares expresses the sum of squares that is explained by the regression. It is called the regression sum of squares,
SSR - SSR(x 1 ,

xk) - SSF - SSH

- 2: [y, -

y]2

L [y, I

(f3n

f31X;1

+ f3kx,k) !'

The coefficient o( determination, P. 1


SSJ.i I SST, measures the propor~ion of the va 1iation
that is explained hy the rcgrcs.-,ion model.
Computer softwa1e, such ,is Minitah, )MP, and even Excel, provides detailed regression
output including the estimates and their standard errors, the sums of squares, s 2 and /~ 1 , and
the fitted values and rcsidu< ls . /\\\a user has to do is enter the data into a worksheet and
specify the column of the response and the columns containing the regressor variables.
Special Case. Consider how these matrix re;,u Its specialize for the Ii near regression model
1

with a '>inglc rcgrc;,;,or (k

I).

Multiplying the transpose X'

1
X1

, leads to the product (X' X)

with the matrix X

more, X'y

(X'X)

1
X2

X 11

2:x, ]
L(x,) 2 .

Fu rt her-

.\ ,,

}y, ]. 1\ll sums go from i


x,y,

= l through i

n. The inverse of(X' X) is

L.,

- ~x,]

II

Multiplying the inverse (X' X)


estimates

with X'y leads-after some algcbra--to the least squMes

:L (x,

x)r,

L (x, -

x)2

and

Substituting the inverse (X' X) 1 shown above into the expression V({3) - s2(X' ;,:)- 1,
leads-again after some algebra-to the variances of the least squares estimates
'

V(f31) -

'

"'-"' (.
L.,

x, -

=)2
)(

and

10

The orthogonalitv of the f1Ltorial design (:-.cc the cfocussion in SeLtwn 4.6.3 ) 1mpl1c-. a

d111go11al X' X 11111tnx. You La11 d1eck this with the X matrix in Table 4.1 I that resulh from
the 2 de-,ign. 1he multipliution of the transpmc X' with the matrix X leads to an 8 x 8
d1agonc1l matrix 1\ith K 111 the d1,1gonal.
h1r the general 21 L1ctorial design, the entries Ill the diagonal of X' X arc all equal to:'.'", ,md
,111 off d1agon,il clcnwnh ol .\ X arc 1cro. rhc diagonal structure of X' X implies that (X' X) 1
i-, di,1gonal with di,1gon,1l elcme11ts 112'. I lcnce the estimate of an clement of f3 is gi\'en hy
I

(effect)

1\lll'll' tlw 11c1gh1', ,HL' the clements 111 the corresponding des1gn /takulation column.
1\parl from till' ditfrrcnt normalization, these estimates coincide with our previous dehni1
t1011 of m,1in and 1ntnact1011 effects in SeLtion 1.3. The only difference is the factor . The
deti111t1011ofeffells111 Sect1011 ~-~looks at the difference 111 the aver.1ge responses at thl high
and low settings of a factor. J"he regression estimates cut this into half; the coefficients in
the regression equation represent the slope or the change in the response per unit change of
the factor.
The orthogonalitv of the cesign has se\'eral fortunate consequences as far as estimation
I<, LOllLl'l"lled.
I.

rhe cstim,11c-. of the eflcLts arc u!llorrcl.ited. 'I he diagonal X' X matrix implies a di
,1goml col'an,lllce matrix V(f3).

!"he e'1imate' do not cl1ange when we omit factors from the model. Let us ilJw,trate
this with cL1ta trnm tht factorial experiment that includes 3 factors and the 8 runs in
I ahil' 4.4. I ct us ignorl the third factor and consider the regression model with just
f,llttll"' I ,ind 2,

In thi-, l,lsc, till'\' rn;1tnx has fewer lOlumns (only four as compared to the eight it
f,1ctor 3 i.<., lllLIUdcd ). rhc llldll"IX X' ,\ IS still d1,1gonal, although of 'mailer dillll'll
s1on (4 X I ), ,llld its d ,1gonal clcmcnb arc still 8. The inverse ( X' X ) 1 is diagonal
with diagon.il clement<, I /R, and the estimates in {3 = (X' X) 1X'y, consisting of

/3r,, /3 1, f3 ., /3 1 , ,ire the same as the estimates in the model that

includes all eight

rcgrL''-'nrs.
3. In orthogo11,1l dc-,igns, the joint (combined) regression sum of squares of the ctTcds
L<lll

he p.irt1tioned into the sum of the regression sums of squares of the individual

effects. Th,lt is,

')Sf?(x, ) t 5Sl?(x 1 )

+ SSR(xk) + SSR(x l' ) + SSR(x ,)

t- SSR(x 12

The regression sum-, r f squares arc additive. The regression sums of squares from
regressing the respon'c vector yon each single column x of the design matrix

_1_u6~1__r_w(~ I \~ L~_A_c_n_i_R_L_Il

x I' I

J(

IM 1 ' - I '

-,eparately can be added to obtain the regression sum o. '>q uarcs of the complete
model. This decomposition docs not work for nonorthugonal designs with
nondiagonal X' X matrices.
EXERCISES

Exercise I

Cons,der

(~ase

l (Fagle Brands) from the case '.t11dv appendix.

(a) Assume that there is one factor that increases average store '>ale., by SI 00. You want
to be 80% cuntident that a 5/ii signiticanLe test Lan detect '>t1d1 a large increase.
Determine the sample si1e. L''e computer .,oft ware 'ULh a'> \l1nitah or )\IP to
check your cakulat1011s.
What if you wanted to be 70% 1..011fident to detcLt c1 LhangL' ,1, large ,is SHW
f {Ul/ Li'>e the approadl outlined 111 !\ppendix 2. J. I IO\\'L'\'l'r, note that hell' the
effect 1..0111pares two uwrugc.,, insll'ad ol t\\'o pn>['Oll1<1 :1s. As'>lllllL' th.It till' st.111d.11d
deviation of 111div1dual sales 1' gi1e11 b7 er. The v,1r1.inc, 1r rqil.tcL''> 77 ( I
;; ), tllL'
variance of the 0/1 random 1ariahle in Appendix 2.1. l hi-, -,uh-,t1tutlllll lead-, lo the
expression for the required -,ample -,i;c that i, .,fwwn l1 ere:
n

21r 2 rz1 ,

21

/Jr

iY

Also note that 11 1s the sample size of each of the two g1 oups. J'he -,ample s11e of the
factorial experiment 1s obtamed b) multiplying the ab\l\e l'\pre,s1on h: 2.
(h l Now that }'OU know the sample size, disLU'>s the ,1Lh,111age' ol.i 1miltif:1Llor l'\pcri
ment over the approach of changing one factor at ,1 time
(L I Eagle Brands wanb lo learn ,1hou t the elfrlls of six i<ll 'ors. 1\ I ull 2 '' faLton,d 111
64 runs could be considered. Uisrnss the advantages and disaLhantages ofsuch
a design.

(d ) Discuss the protocol that you would use to carry out the experiment.
(e J Discuss whether one should a11aly1e absolute or relative (pro11ortion,d) Lhange'>
in sales .
Exercise 2

Comider Case 2 (Maga1i11e Price Test) from the Leise .,tud; .ippendix.

(a) Consider

~ales.

l:st1mate the marn effects (A, H, C J and the i111cr,1Lt1u11 effcch (1\13,
AC, BC. AUC ), ,md construct .i normal proh,1bility plot. \.,.,e.,., the '>ig11ifi1...111ce ul
the estimated effclls. Note that l\lin1t.1b w1ll 1..akul.1te I enth., i l9K9 ) PSI (sec :\p
pend ix 4. l l and help }'OU\\ 1th the .isse~,s111c11t.
Note: !'his amounts to ,1ssLss111g the 'ignifiL.1nce of effl:cts lrom .1n unrepl1L.1ted
design. You will notice that/\, C .rnd AC c1rc large c111d ~.ig11Jf1c,111t.

( b I Obtain the L(1cfti1..ients 111 the rcgrL'.s'>io11 model ol s.ilL' 011 111 ,1111 cfkLts .111d 1111L'r
clLtiom Jlld COil\ incl' yourself tl1c1t the u1cifiLll'llls .!IL' Olll' h<til ul liiL' e'tl!ll.lil'lJ
L'ffcch. Run the regression tw1Le: OnLe \\1th thL eight 1:1Llori.d re.,p1>n'>L'' ,111d onLL'

TWO-I LVbl

h\CTORIAI

EXPEFUMl.NTS

107

with all 9 runs including the response at the center point. You will notice that all
coefficient;, 111 lhcsc t110 regressions, except the intercept, arc the same. ExpL1in
these findings.

Hint: Use the regression formulation in Appendix 4.4. The intercept is the
average response; hence, the intercept in the regression that incl udes the center
point is 8/9 (average response from factorial runs) + 1/9 (response at center
point).
(c) ( :onsidcr suhscriptions. Fsti111.1te the main effect:, (A, fl, C) and the interaction cf
fects (AH, 1\C:, HC, Afl<:), and construct a normal probability plot. Assess the significance of the estimated effects.
,\!o/i'. Ynu will notice th.it A, fl .. rnd AR ,1rc l.irge and signiflcrnt.
(d) Recreate the mdin and interaction plots that arc given in this case.
(cl i\vcr.1ges in the table .ire ca\culatccl from five weekly pe1cent ch2nges. The GI.Sc also
11rnvidcs <lll cs ti mat c oft he stancLnd dcviat ion of weekly percent changes: 5!ir for
sale.s change,, dlld 15'"'1 for subscription changes. Use these estimates to ohtain
standard errors of the cstimdted effects for both sales and suhscriptions (sec Section 4.4.2). Check whether these standard errors change the conclusions you
reached in (.i) and (b) . J)iscuss the assumptions that one makes when using weckto-wcek changes to estimate the variability.
(f) Use the standard dcvidtions of weekly percent changes to test for curvature in hoth

sa les and subscriptions (sec Section 4.6.6).


Exercise 3

A 2 2 factorial experiment with two independent replications at each of th e four

design points was conducted.


llllnr

9. 12
2n, 22
I;, 19
.l2, 27

(a) Estimate the main effects and the interaction.


(h) Use the independent rcplirntions to obtain a standard error of the effects and
scss the significance Df the effects.

<lS-

(c) Test the h\pnthesis th;it the two main effects MC the same.
Hmt: Use the fact that this design is orthogonal and that the estimates ilrl' stat is
tically independent. This implies that var( effect I - effect 2) = vnr(effect I) +
var(cffcct

2)

Con.sider three categorical factors at two levels each. Assume that only one of
thecightcxpcrirncntalconditionshasaneffectonthercsponse(rcsponseis lOat( +, ', -+ )),
Exercise 4

while till' sevrn others have

110

effect (response is ;cro). Analyze the data. Estimate main

,111d inlcraction effcd<;. Discuss our co111111cnt in Section 4.7 that effect sparcity and effect

1oll_j_

JWO

'!'\'II

!ACTOR/Al l Xl'l'R/Ml'iT'>

hierarchy, which arc useful design principles for experiment> th<tt inrnlve continuous faL
tors, may have less applicability for categorical factors.
Exercise 5

Montgomery ( 1996, p. 543) used a 2 1 factorial experiment in developmg a

nitride etch process on a single waver plasma etcher. The etching process uses C: !,.
(perfluorocthane) as the reactant gas. four factors can be varied: the gas flow, the power ap
pliLd to the cathode, the pressure 111 the reaLlllJll chamber, ,111d the gc1p between the c11HHiL
and the cathode. l'he response variable is the t:tch rate fur sil1LOn nitnde (1n angstroms per
1111nutc). bch factor is varied at a high- and a low-level Sl'tt1ng. 'I he objeLtive is to find the
f<ILL11r level settings th,it maximize tht' etLh rate. !'he levels lor gap JaLtor ,,\)arc 0.8 ,111d
1.2 un; the levels for pressure (faLtor H) arc ISO and 550 m lorr; the levels for the L 1-,, llm\
(factor C) are 125 and 200 seem (standard cc/minute); the lcveb for power (factor/)) <tre
27'i and 325 watts. for further background on the etching pruLess ,1nd dt'lc11b of the expn

iment, you can consult the original source for this exercise, Yin and Jillie ( 1987).
Run

A
(gap)

B
( ~nessure)

(now)

lJ
(power)

Re, po me
l etLh rate)

5'i0
669
604

(lj()

(142

601
nYi
1,0 \7

(1\.\

')

7~9

10
II

12
11
14

I ,O'i2
- I

I'>

16

86H
1,075
Kh()
1,0'1.1
72'1

A11aly1e the results of the 2 4 factorial experiment. hnd the important main eftcch ,111d inter
actions. Assess their sign ificancc by using no rm al probability plots and /or Le11 th 's ( 1989) P:i f
appro.tch. I low would you select the faLtor level st:ttings so that you ali11cve high etLh rate:/
Exercise 6 ,\kredith Corpor,1tion, the puhlishc1 of l.ud1t"> I !tJlll<' /011mu/ m,1g.11inc, 'ends
morL than a m1ll1011 ll'ltcrs each year to potential suhsu1hcr., lwp111g to .,ecUl'l' .ts m.tll) suh
scripl1ons as possible. I he marketing team looks tor the right 1111\ of promotion,d 111c1teri
,1ls, and it experiments urnstantl} with \,Jrlous aspcLl., ofthL lnoLhllrl', urdLr L,trd, cnLlo,ed
tcstirnonials, and offns. The June 2005 L<lllljlaign, for exampll', tc:.ted different \<:J.,ions of
the lront page ot the brochure, and different messages on the ~ront and the back side of the
ordn card.
lrcmt side a/brochure. One version (level I) shows a radiant looking Kelly R.1pa (the
star of the ABC show Live wit Ii lfrgi> llllll Kelly), while the other version (level t I)
features Dr. Phil (known from his nc1tiunally syndiL,tted T\ .,hm, ,111d publiL,1tions

on life strategics and re\ationships).

l \\()

IT \'l'I

Ft\CTORTAI

l.Xl'l'Rl.\H"ITS

109

f.ro11t side o( th<' order curd. Level I ( I) highlights the message "Double our Best Offer," while in'el 2 ( t I J draws attention to the message "\\'e never had a bigger '>ale."
Hatk side o/the order card. level I ( I) emphasizes "Two extra years free," while level
2 (+-I) feature-, magMine covers of previous issues.
The re-,ults (number of letter' sent and the number of orders that were received) are shown
below.
( lrdl'r

( lrdl'r

Card I rnnl

( ,ard Back

l.cllcr'
Broe. hun~

~l'lll

Ord cf'

ProportlCll

1'i,042
I 'i,042
I 'i,01:'

'.)/)

61'1
'i6.l

I ,,!M2
I 'i,1112
J 'i,lll)

nln

O.OJHOlJ.l
0.042RI l
0.0.\7428.
0.0109'i20

;().t

o.or.J<J'>o

'i'i()

I 'i,042
I 'i,1112

'; ""'.';

0.0\i>'iM l
ll.0.lH226 l
0.0.lh7n.F

';_\

1\n,d~ IL' the d.1ta. I ..rin1.itc r11.1in and interaction effech. Displav the effects graph1c.illv
thrn11gh main cill'Lh ,md 1nlL'r.iction ploh.
\''l''' the <,1g111l1c.tlllL' of thL' effeLt-,, 11-;111g thL' .ipproach disLll'>'-ed 1n '.'iection 4.6. I.
'.'iummari1e your LOllLiusi.lll'>.

Exercise 7 This cxL'flise applies the general regression results in 1\ppcndix 4.4 to the spcual model without intcrccpl, y
f3x t t:, that reldtes a response vellor y to a single regrcssor vector x. Show that
(a)

/3

(h)

\\F(x)

~x 1 ;,

~x;
'\'

~\

\',

he mdtri\ '{ 1n Appendix 4.4 is the 11 X I vector x. The total sum of


'>qu.irL" 111 a model without an intercept is given by LY~ The eror sum of squares is
given by ~(y,
)'.The regression sum of squares is the difference, SSR(x)
2
LY~
L(y,
. Result (b) follows after substituting result (a) into this
equation.
//1111: 'J

/3x
/3x,)

This cxerLisc w<1s inspired by a real problem described in the article, "Str.itcgic
i'e'>ting <..,top-, I cak: I 1tter ( .artons in Their Tracks" (Packaging /)1gcst, August 2001 J. The
exercise resembles what the actual company did, but the data are not rcilL
rhe makers of"C:ilts lovL It" cat litter arc facing a serious problem. Retail custOlllL'rs are
reporting that Cdrtons of the firm's premium brand cat litter are leaking the product onto
'>tore shelves. The rnmpanv ealizes that while cat lovers arc used to cleaning stray spray-s of
litter tracked through the house, thcv arc not willing to put up with cartons that leak on the
\\.l) home'.
,\l.rnagemL'llt h,1, dctcrn,1ned that the problem i'> with the carton-scaling proce'>s. Carlon'> .ire tilled .ind '>l.ilcd on .1 produL11on line run hy 20 workers. The ..:ompany decides to
perform ,1 3-factor factori,1i experiment. r\ run consish of filling and scaling 200 c.1rtons.
Exercise 8

1101

J\\'()

11\11

IXl'IHl~ll'.'I'>

IA< l<)Rl\I

l'hl' l,1L.lors to bl' tested Jnd levels of each arc shown below. I actor A 1-. line -,peed with thl'
llllllll'> level at 22 cartons per minutl' and the plus level at rn c1rto11s pl'r 111inute. h1ctor His
the pressure applied by the gluing mJch1ne, with the minus Incl be111g lower pressure and
the plus level berng higher pressure. I actor C is the amount l>I glue w,ed, with the plus level
being the currrnt amount and the minus level being 40% less glue.
The design 111atrix and the estimated effects are shown b1 :ow. l'he response 1s the
port ion of cartons that leak.

[1ro-

I EV I I

A: I 11w speed
H (;JUL' pn.':-.'.'iure
( ,\111nunl of glue

J(un

f-Jst

Lower

I lighc1
.\lore

l.e"

Slow

Res po me
8

I'
+

47
JO

,,

8
10
II
8

hll111utiu11 results:
Average

25.875

A
A Ii
(<11

3.2'1

1.25
14.75

AC

Ji(

0.75

'\Ji(

) :"

--)

\Vhat 1s the e-,timated 111a111 effect of factor,\? \\'h,1t ''the cst1111,1ted t\(
interacllon?

(hi Supprn,e each rc>ponse is the <1verage ol 2 rep!JCated runs (note that the nulllber-,
have been rounded). Suppose the pooled estimate ot the \aria nee of the re">pome
of an individual run is equal to lb. l\,1sed on <)5% co11t1dcnLl' 111tcrvab for the cf
fects, which cfkcts are s1gnilic,111t'
(c

Based on the results ofth1s expenmcnt, what Je,els would you reu1111mc11d for
each factor!

(di Wlwt is the regres-.1011 prcdict1011 cquat1u11, and what is the p1cd1Ltcd rcspon-,c
(proportion of leaking cartons) if your recommended settings arc used'
Exercise 9 A 2 1 factorial experiment is to be conducted. The variance of the response of
an individual run is known to be equal to 4 from previous experiments. Suppose that we
want the width of a <)5% confidence 1ntl'.rval for the mean uf an effect tu be \.Hor smalh:r.

How many runs need to be made for each lest condition, and how m,111y runs ,ire needed in
totdl 7 Assume we have the same number of runs for each test condition.

112

'I WO-I lVl I rRAC I I01'AI rAC I OR!Al

LHSl(,'>S

TABlr5.I

Factors and Lfvcls

in

the Coffee l:xprrimc11t

Facto1

2
4

In1ttal temperature
flame temperature
Color
Supplier
Machine

l.owcr

1.ower
Lighter

I !tghn
I lighcr
Darkc1

( urrc1ll

:-.!e"

l urrent

:\l'\.\

I he 5 factors and levels are shown tll Table 5.1. For i111t1,J tcmper,tture, b,1sed on experil'llce and the operating ranges recommended by equipment makers, the levels are set an
equal distance above and below the norm,d setting. hJr tLime tempcr<1ture, the minus level
1s the minimulll temperature required to roast the beans, \\'hile the plus lc\'cl 1s the highc.,t
ll'lllpcraturc in the operating range. In pral11ce, the chief roa:,ti:r vane'> the Lolor ol roasted
hea11' depending on the variety of the coffee, but fo\'or'> lightn roasl111g, reu1gni1ing th,1t
u>lfre roasted too dark will have a bitter ,rnd burnt taste. Fur the <..ulor factor, the minus
lc\'l'l Lorrespond> to his normal color for Ken\'a .\ :\, 11l1tk the pJu, il'lcl "u1n,1dnahh
darkn. The two suppliers for the test an: thL' existing one ,ll\d ,t 1,ell rq~,1rded u1lll~'L't1t111.
The last factor is the roasting mad1111e. The mill pan: has a 'n1,dl ( 5 pound c,1~1aut \ J ro,1st
ing lllachine that 1t uses to test new suurLL's ot green beam. ThL' Lhil'I ro,1stl'l 11-.111ts to l'I al
uatL' ,lllother small machine made b: a different lllanutdltUrn ,l!ld seL'' this experiment ,ts
an opportunity to do so.
Suppose the rnmpany is willing and able to do onl}' 16 runs rather than the .32 runs of J
tu II 2 factorial design. What is the best design for doing so, a 11d I\ hat is lost by carrvi ng out
only J 6 runs?

5.2. I Constructing the Design Matrix


\\'e build the 16-run design in L1ble 'i.2. We begin hy writing dm, 11 the full 1'illon,1l dL'
sign for 4 factors, J, 2, 3, and 4, which consists of 16 runs. The four u1lu11111s are sh,1ded to
highlight the familiar pattern of pluses and minuses that 111<1kc up the 1 1 factorial design.
for now, ignore the shaded column for factor 5. The rcm;;ining columns reprc'>ent the
11 interactions in the 2 4 design. There are six 2-factor interactions, four .3-factor interactions, and one 4-factor interaction. The signs for these columns, ,dso referred to as the l,ii
culat1on columns in Chapter 4, were lound by multiplying the signs ol the design columns.
I or example, the signs for column 123 were found by multiplying the signs of columns I,
2, and 3. As we disntsscd in Chapter l, the 15 columm arc p.1irll'ise ollhogonal <llld consL'
qucntly each of the corresponding J 5 effects 111 the 2' factorial dcstgn is estimat<:d indc
p<:ndcntly of every other effect.
!--actor 5 completes the design. We made the stgm in wlullln 'i id<:ntical to the signs tor
the 1234 interaction. \\'e will explain this choic<: in a 1110n1e11t But tirsl, '>uppose instead we
had llldde the signs in column 5 identical to th<: signs in column I. !'his would lllL\111 that
wh<:nevcr fallor 5 was ,1t its minus level, faltor I would al'>o bt 111i11us, ,111d whLncvcr lac.tor '-i

TWO

I FV FI

T ,,

11 I

lHACTTONAI

FACTOR IA I !Jr. S IGNS

Ill

F 5.2

<:omtruct1ng 11 ln1Ction11! Factorial Ocsign for 5 Factors in 16 Runs


CALC:l'! Al!ON COI.UMN!-i FOR 111F

rA<

Run

!OR

12

ll

14

2.1

21

+
\

H
9
10
II
12
I\
II
15
16

+
+
+

+
+

+
+
+
+
+

+
+

121

DF.S!C1N

I 34

2.lI

12.\4

-r

+
+

+
+

+
+

+-

+
+
+

+
+

+
+
+
+
+

I 2.l

2<1
4

34

AN 11

+
4

I, 2, .\,

IN fACTOllS

was at its plus level, factor I would be plus as well. In this case, the average of the responses
when 5 ( = I) is at the plus level (y. ) mi nus the average of the responses when 5 ( = I) is at
the minus level

(y ) is actually an estimate of the main effect of5 plus 1he main effect of I.

With this arrangement, the two main effects arc said to be confounded, and it is not possible
to ,,cpM<lte them. The main effect of factor 5 and the main effect of factor I are called aliases
of each other. The calculated effect (Y+ - y_) might be due to the main effect of factor 5,
the main effect of factor I, m some combination of the two main effects. Confounding two
main effects in this way would be a poor choice since main effects tend to be the largu;t and
most important effects.
The hes I choice i.s to confound the main effect of factor 5 with an effect that is least likely
to he important, which is the 4-factor interaction 1234. Therefore we set 5 = 1234, con-_
founding the main effecl of -1 with the 1234 interaction. Taking the average of the responses
when 5 ( = I 234) is :it the pl us level (y. ) min us the average of the res po mes when 5 ( 1234)
is at the minus level (y ), ei>timates the main effect of factor 5 plus the 4-factor interac11on
I 234. Since 4-factor interactions are almost certain to be negligible, we arc left with ;rn esti-

mate of the main effect offactor 'i.

Effects Arc Confounded in Pairs. By setting 5 = 1234, we not only confound thc'c two
effect>, hut illl other effects become confounded in pairs as well. For example, consider in
Table 5.2 the column of signs representing the 12 interaction. Writing this column as a row
to save space, we ha\'e
12

+ - -

+- - ++ -

+ + - - +

Now multiply the signs for factors 3, 4, and 5 to obtain a column representing the 345
intcr.iction. It is
345

-+-f---++---++--+

J
114

TWO-U:VEI

FRACTIONAi f-ACTORIAL !JESIGNS

The two columns are identical; the effects 12 and 345 are cu1\founded. Taking th.: average
of the responses when the signs in column 12 ( = 345) are pl u'i (y. ) mi nus the average of the
responses when the signs in column 12 (

345) arc minus (y ) rcsulh in an estimate that is

the sum of the 2-factor interaction 12 and the 3-factor interaction 345.
The entire confounding pattern can be found in the same way, by multiplying columns
o( signs for every interaction and idcntif ying pairs of columns that arc identical. However,

this hrutc force approach is not necessary; in Section 5.2.3 we will present a much simpler
method for determining which columns have identical signs.
5.2.2 The Design Matrix, Confounding Pattern,

and Results of the Coffee Experiment


Tc1hle 5.3 shows the 16-run design that is used in the coffee expcri111ent. The runs are perfurined in random order and a sa111ple

or ro,1sted beans

is

t-ike11 fro111 c\1ch nin ,rnd brc\\'cd

using the same LOffee maker. A blind tc1ste test is carried out, with each sampk rated on a
scale lrom I (lowest quality) to 10 (highest quality). The last column in Table 5.3 shows the
quaUty ratings of the brewed coffee that resulted from the various ro,1sts.
The lower part of the table shows the 15 effects that are independen ti y csti1mted and the
cunl(1unding patterns that arise from this design. Each main clfect is confounded with a
4-factor interaction, and each 2-factur interaction is confounded with a J-l~1ctor interactio\1. In showing the confounding pattern, we introduce some new notation.
:\otice that the column ofsigm associated with each estirnc1tc is idrntical tu une ut 15 culumns in the 2 4 factorial design. To calculate each effect, we apply the signs in its column to
the observations and divide the result by the number ofplu~ signs, 8. Since each is a linear
function of the observations and compares two averages (response averages at the plus and
the

111 in us

levels of that column), we refer to the estimated effect as a linear contrast. We use

the il'lter I to denote the estim<.1tc (I for linear), and we use the rnlumn label as a subsnipt
to idl'ntify the column that is involvl'd. For example, the estimate I, appli.:s the signs in column 5 to the responses, obtains the sum, and divides the sum by 8. The estimated effect

+3

I5 - -

9 - 5

10 t 7 f 8

-j-

3 -; 3

-I

10 -t 8

3 + 5+ 7 + 8+ 3 +

+ 7

7 t 6

10

9 + 5

7.5

10

8
5.5

,- 7

-2.0

is a difference (contrast) between the two averages at the plus and minus levels ufculumn 5.
We use the notation 15

--?

1234 tu show that /5 estimates the main effect uf factor 5 plus

the 1234 interaction. The arrow mean:, "estimates." Similarly, to cJlculatc 112 , for example,
we apply the signs in column 12 to the responses, sum them, ,rnd divide bv 8. This contrast
estimates 12

+ 345, and

we indiLatc this lJy writing 112

---c>

I:! + 345.

In this design, if we assume that 3 foctor and 4-factor interactions

a1-e

negligible, 1-vhich

is vcrv likel y, we are leCt with cbtr estimates of all main cff~,ct~ and 2 factor interactions.

rwo-J JVJJ

'I

I RAt 110,AI

A Jl I I

JACTORIAJ

J)JSJGNS

!IS

5. 3

I !t.\1,~11 .\1atrzx, htimatcd Ijfcct_\, and Confnund111g Pottcms


111 the Cofj(e f:xpcnmc11t
I,\{

l OH

Re,ponse
Hun

R.1 111~

10

x
l)

'/

111
11
I.'
I;

11

'!

I;

10

I(,

I ffc, ts th.it ma1 he c'11matcd ,rnd their confounding pdttcrn:


In 6. ~ ~ L1\l'rL1gc
tl.25-> 12 + )15
I
0 I t 2 'Is
0--> 24 +
l,2
I,
il.50--> I' t 245 I,
0. 75 -> 25
I
o. '' -> 2 ' I >11
0.75--> 34
ll.25-> 14 t- 2-'5 /q
I, = 4.00 -> 3 + 1245
I,
O.Stl -> 15 2.14
/,,
() <15 ~
I ,
I,
o - ' -> 1 12 >s
0.21->231145
0.25--> 15
I, = - 2.00 -> 5 + 1234 I
11
'o 1 1 : !'he si;~ns in the fallor 5 column are identical to the
c,i!Ltila11on u1!t11 n. S1grntlcant effects arc shown m boldface.

si~ns

I )5
+ L't
-r 121
124
t 12.l

in the 12.>4

And this is accomplished with only 16 runs, compared to the 2''


32 runs that would be
required in a full factorial design.
To determine which estimated effects arc significant, we examine the normal probability
plot ofthc cstllnall'd cftccts in 1--igurc 5.1 that was generated with the Minitab software. Two
effect'> Me s1gni11cant: lhe 111.iin effect offallor 3 (color), ,rnd the main effect offactor 5 (ma-

Lh111c 1. r\ Lh,rngc Ill the uilm nf the rna-.ted beans from lighter to darKcr increases the ta'>te
rc1ting hy 4 point.'> on average, while a change to the new machine decreases the r;1ting hy
2 p0111ts.
The average of the 16 responses is 6.5. Civen this average and our estimates oft he l wo
signil1L.lllt effect\, the 1mpl1cd regression prediction equation is y 6.5 + 2x1
Ix. At
the he-.t setting,, \vill'n L1L1013 is at+ (cLirker color) and factor 5 is at
(current rn,1Lh111e),
thL' prL'd1cted ta<.tl' r.iting is i
6.5 + 2( ~I)
I( I) - 9.5.
rlw u mcl ll'>IOJ1' from th i:- cxperi men t arc clca r: Keep the cu rrcn t ma chi nc and, most i m
port,rnt, rna'>I the coffee to the darkl'r color. !'he chief roaster rcali1cd immediately that he
had hccn ro;1st111g the ls.cnya 1\t\ too light, and from that point on, he began roastlllg it to
the d,1rker color.
1his stor\ hcg,111 vd1c11 a L<>lllpctitor\ coffee wa;, judged superior in a blind t,1'>IL' IL'st.
t\ll rnnccrncd pc1rtics at the roasting comp;rny were extrcrnelv pleased when, in the next

TWO .-LEVEL FRACTIONAL FACTORIAL DESIGNS

116

Normal probabilil y plot of the effects


(response is rating, Alpha = 0.05)

99
J

95
90
80 - 70 c:

60

~ 50

::; 40
i:i...

30

20
10

lcnth 's
--2

-1

PSI'-~

0.375

I)

Effed
Effect type:

Figure 5.1

No t signiticanl

S1gi11tican1

Normal Probability Plot of the Estimated Effects in the Co ffee Experiment

blind taste test ag;rinst a fresh batch of the competitor's coffee, the chief roaster's experimentally designed :Kenya AA was judged best.

5.2.3 Finding the Confounding Pattern: Generator and Defining Relation


The 16-run fractional design for the coffee experiment was found by first writing out the
design matrix for a 4-factor factorial design and then setting column 5 equal to column
1234. The relation 5 = 1234 is called the generator of the design . The generator is used to
find the complete confounding pattern for the design. Before we describe this procedure,
we introduce several basic and important rules for multiplying columns of signs.
I . The capital letter I denotes a column of all plus signs. Multiplying any column

(of plus and .minus signs) b y it self results in/. Multiplying a plus entry by it sel f
yields a plu s, a nd multiplying a minus entry by itself yields a plus, also.
2. Multiplying a column by I leaves the column unchanged. This is analogous to multiplication by I in ordinary arithmetic.

3. When multipl ying columns together, the o rder of multiplicatiun docs not matter.
Forexample,2 123 = 2213 = 2132.
Now we proceed to find the confounding pattern for the 5-facto r 16-run design with generator 5 = 1234. Multiplying both sides of the generator by column 5, we obtain
5 x .:; = 1234 x 5
I = 12345

I = 12345 is called' the defining relation of the d esign.

jw
\\'c ll<>L' the defining rel.111on to find the wnlounding pattern among the I 5 indcprndent
effect estimates (linear comb111ations of the responses) that can be calculated in this design.
hlr ex,1mplc, lo find wh,it I'> urn founded with I (the main effect of fallor I), we multiply

both sides of the defining relation by column I:


I ( !)

11234)
2345

because I ( I)

I .rnd (I)( I)

/. Thus the signs 1n column I and the 2 q5 interaction col -

umn arc 1dcnt1cal, and the n1d1n effect of factor I is confounded with the 2345 inter.iction.
'lo check that thi<> i-. correct, multiply the signs ofcolumm 2, 3, 4, and 5 and show that they
arc ident1Lal to the '1gns 111 column I.
<..,imil.1rly, multipll'ing both <>idco, of the defining relation by (column ) 34 results in

) ,j

125

l he right hand -,idc 1s 125 because 34 X 34 - I, and I X 125


I 25. The 2-factor intcr;iLt 1011 ~4 io, urn fou ndcd with t e 3-lac tor intcr<Jct ion 125. The average of the response' when
q
125) I'> plus ( 1 ) m1nt1' the ,1vcrage of the re'>~1011'-C'> when _:q ( 125 ) is mi nu' ()' )
e'>limateo, the o,um oftht )4 <l'ld the 12 .5 interactions.
l'lw 5 f.1Ltor dL,1gn 1' L.illcd a half fraction of the full 2 ' factorial design. It cons1sh of
half'nl lh c )2 rnn' th .1t ,HL' IT<JlllrL'd for <l full i.1Ltor1,1I Lks1gn \\llh S foctor'>. We use 1he not.1t1on 2' (2 to the pmwr of 5 minus I ) to denote thi, design. The 2 denotes that there are
2 il'vel-. for e.ith faLlor, the i 1nd1c.ltes there arc 5 factors, and the I in the exponent tells us
that 1t is ,1 h.1lf fral.lion involving one generator. The notation also expresses the fact that
thcrc<1re l6runs (2
21
16 ).
The Importance of Maintaining Orthogonality. Setting 5 - l 234 1s the best choice 111 this
c.1se, hut we could have set the signs of column 5 equal to the signs of any one of the 15 or thogon,11 columns in the 2 ' fKtorial design. By choosing one of these 15 columns, we maintain the important property ,if orthogonality. l:ach of the 15 effect estimates in the fractional
design uses one of the 15 rnlumns in the 2 1 design, which means that they are inclepcndentlv estimated. But now each estimated effect is actually the sum of two effects, either a
111a111 effect and a 4-factor 1ntcraction or a 2-factor interaction and a 3- factor interaction.
The price wc pay for reducing the number of runs from 32 to I 6 is the introduction of confounding. But 111 this case, since 3- and 4-factor interactions are likely to be negligible, we
h<ne lo'>t \cry little.

5.3

CRACKED POTS REDUX

Let LI'> return to the cracked nots example in Chapter 4. In that problem we examined 1 factors 111a2 1 factori,11 design. The factors were cooling rate, temperature of the kiln, coeffi cient of expansion, and carrier (metal or rubberized). The results of that experiment arc

118

TWO - LEVEL FRACTIONAL FAC'J'UHIAI. DESIGNS

TABL E

5.4

The Cracked Pots Problem: Design Matrix, Estimated Effects, and Confounding Pattern
for a 4-Factor Experiment in 8 Hun s
INTERA CT IO N CO LUMNS FOR

FACT UH

-- -- - - - - ------ - - - - - -- ---- - ---

J
Cooling Temperatu re Coefficient
Expan sion
of Kiln
Rate
2

Ru n

4
= 123

Ca rri er

THE CA LC CI.AT I UN

Or

RESPON SE

REMAINING E FFECTS

- 12

Percentage of
- - - - --- -14 ( = 23 ) C: ra~keJ Pots
13

- - - - ---- - -- -

+
+

+
+

5
6
7

+
-t

+
+
+
+
+

/3 =
14 =

N o TE:

/13=5.0 -+ 13
114 = 1.0 ..--+14

3
18

34
21
27

12

Effec ts that may be \:Stimated and their confo undin g pat tern:
10 = 17.25-+ averag,e
11 = 9.0 -+ I + 234
1, 2 = 1. 5 -+ 12 + 34
134 '
12.5..--+3 + 124
-9.5..--+4 + 123

15

+
+

---- -------

12 =0.0..--+2 +

+
+

+
+

-----------------

.,

24
23

Significant effects are shown in boldface.

shown in Table 4.7 of Chapter 4. We fo und very large main etfrcts for cooling rate and coeffici ent of expansion and a signifi cant interaction between these two factors. We also found
a significant main effect for carrier.

5.3 .1 Testing 4 Factors in 8 Runs


Suppose the company had decided to do only 8 runs rather than the 16 that were ca rried
out. What design would have been best? We label the factors I , 2, 3, and 4 for coo ling rate,
temperature, coefficient of expansion, and carrier, respectively. The design mat ri x fo r an
8-run experiment is shown in Table 5.4. Using an approac h that is an alogous to the one we
used in the coffee experiment, we start with the first three columns of the 8-run 2 3 design.
The generator 4 = 123 sets the signs fo r factor 4 equal to the signs for the 123 interaction
column (the highest order interac tion in the 2 5 design). The defining relation is I = 1234.
From the defining re lation , we can easily find the co n foun ding pattern using the method described above. For: example, 2 = 134 (the mai n effect of 2 is confounded with the 134 in teraction ), while 13 = 24 (the 2-factor interacti on 13 is confounded with th e 2-factor interaction 24). You can multiply signs to confirm th at these are correct. The table also shows
the interaction columns in this design that are needed to calcuiate three of the effects. In this
design, each main effect is confounded with a 3-facto r interaction, and each 2-facto r inter action is confounded with another 2-factor interaction. Compared to the fractional design
in the coffee experiment (5 factors in I 6 runs), th e confo unding in this design is worse. Here
if we ass ume th at 3-facto r interact ions are negligible, we have dear est irnall's of ca(h main
effect. But now each 2-factor interaction is confounded with another 2-factor interaction.

l\\'O I !\'!I

IRA< llONAI

IACTOHIAI

lll",lf,J\:S

119

5.3.2 Results of the Experiment


\s-,umc the CXJ)l'f"llllent 1v.1s L.lrricd out

'A

1th the results of eaLi1 run (%cracked pots)

shown in the last rnlumn of tt1e table. We calculate each effect estimate from its column of
signs. I or C\ample \\'C hal'c
I:;

12

IR

I'

21 + 27

34

15 + 3

34

'

27

Il.75

12 t 21

19.75

8 t 18

'-.uJ)pmL' th.1t b,J',cd on r1re11ous experiments, the company is confident that the vari,111Le
ot the rc-,ponsc of a run is 8.1. In ~eel ion 4.4. I of Cha ptcr 4, we showed that the est 1mated
i 41 N)s~, where N is the total number of runs, which in this case
variance of an effects 'rrc1
is 8. Thus we have that s~ 11 ""
(4/8)(8.5) - 4.25 anJ 5effrct
V4.25
2.06. A 9510 confidence 1nterv<1 I for the rnea n of each effect is given by the estimated effect :: 2.06( 1.% ). Fffccts larger than 2.06( 1.%)
4.04 arc statistically significant. There arc four significant efleLl': (I-+ 234), (3 t 124), ll ' 123), and (13 t 24). Assuming that 3-foctor inter.1Ltions
arc 1cro, we would wncludc th,1t the first three significant effects are estimates of the main
cffcLl of l (cooling rate,, the main effect of3 (rnefficient of expansion), and the m<iin effect
of 4 (Lamer), rc,pel11l'ely. The three es tim.1tes have values that <ire close to what we found
mthe2 1 16 run' of experiment of Chapter 4.
In the full fallon.1! experiment, there was no confounding of course, and we found that
there was a significant 1nter<iCt1on between cooling rate and coefficient of expansion, here
labell'd .is the IJ 1nteraLtion. But in this R-run design, there is some umertainty. The I~ inlL'r.tLtion 1s umlountkd l\ith the 21 111ter.iL1ion (an 111tcraction between kiln tcmpn.1ture
and Clrrier). ff thi-, 8-run L'Xf1L'rimcnt had been run rather than the J6-run fuJ] factorial,
1,ould 1\l' h.l\'e hccn <lhk to idcntifv with uinfidence the interaction between cooling rate
.ind LOL'i.flLIL'nt ofl'\J"ll1sH111 the Ii intcr.1ctio11 <I'> we have labeled it here)? It is hard 10 ,,1v
for .-,ure. The fact th.it the two main effects, cooling rate and coefficient of expansion, arc
large would lead us to believe that the significant effect ( 13 + 24) is due to the 13 interaction hut an interaction between kiln temperature and carrier (factors 2 and 4) is also con
ceiv<1hlc. \\'e h.1ve ga111ed hy cutting the required number of runs in half, but we h.11c 111
trnduccd u111fou11ding and thus some uncertainty in the interpretatioP of the results.

5.4

nr~S ICN

RESOLUTION

We have discussed two fraction<il factorial designs one with 5 factors rnd another with 4.
The full factorial 2 design requires 32 runs, and there is no confounding among the estimated effects. !'he half fractir. n 25 1design is more economical with only 16 runs, but it induces confounding: Main effects arc confounded with 4-factor interactions, while 2-factor
interdctions are confounded with 3-factor interactions. The 24 1design is a half-fraction of

120

TWO-L EVEL FRA CTIONAL FACTORIAi. DESIGNS

the 2 4 = 16-run full factorial design. Compared to the 2 5 - 1 d esign, its confounding pattern
is worse. For this 8-run design for 4 factors, main effects are co nfounded with 3-factor interactions, and 2-factor interactions are confounded with other 2-factor interactions.
The resolution R of a fractional design is an index (usually written as a roman numeral )
that exp resses the degree of confounding.
I. A design of resolution R = Ill confounds main effects ;With 2-factor interactions.

(We will see exa mples of these designs shortl y).


2. A design of.resolution R = IV confounds m ai n effects with 3-facto r interactions,
and 2-factor interactions with other 2-factor interactilins.
3. A design of resolution V confounds main effects with 4 -factor interactions, and
2-facto r interactions with 3-factor interaction s.
The 16-run design for 5 factors with generator 5 "" 1234 has resolutio n V. It confounds
main effec ts with 4-factor interactions, and 2-factor interactiom with 3-fa ctor interact ion s.
We denote this design by 2~ - 1. It consists of 2 5

24

16 rum. The " I" in the ex ponent

exp r~sses the fact that there is a single generator. The subscript "V" denotes its resolution .

The 8-run design for 4 factors with generator 4 = 12 3 has reso lution IV. It confounds main
effects with 3- factor interactions, and each 2-factor interaction with another 2-factor inter2t.; 1.

action. We denote it by

In constructing our 2 5 1 design, we cou ld have used any interaction or main effec t col umn to accommodate the 5th fa ctor. We chose the generator 5 = 1234, which yields the
half-fraction with the highest possible resolution. (Similarly; in the 2 4 - 1 design we set 4 =
123 ). Suppose we had set 5 = 123 instead. Then th e defining relation wo uld be I = 123 5,
and the design would be resolution IV. Jn general, with k fac tors, the generator k = 123 ...
(k - I) produces a half-fraction wi th highes t possible resolution. For example, with k ~ 3,
the best generator would be 3 = 12.
The resolution of a design can be determined directly fr om its defining relation. Each
term in the defining relation to the right ofl is called a "word." For example, for the 2 5 - 1
design with gene~ator 5 = 1234, the defining relation I = 12345 consists of the single
word 12345. For the 24 - 1 design with generator 4 = 123, the defining relation I = 1234
consists of the single word 1234. Shortly, we will see examples of designs with defining relatio ns that consist of more than one word. The resolution of a design is the length of
the shortest word in the defining relation. In the first case (/ = 12345 ), the length of the
single: word is 5 ( it consists of 5 numbers ), an d the design is resolution V. In the second case
(I = 1234), the length of the (sho rtest) word is 4 (it consists of 4 numbe rs ), and the design
is resolutio n IV.
In choosing a design , the experimenter must consider both design resolution and cost.
Higher-resolution designs have more att rac tive co nfoundi ng patterns but re4uire more
runs and are therefore more costly. Resolution V des igns aru especially useful because we
can obtain unambigu ous estimates of main effects and 2-factor interactions if we assume
that interactions of order three and higher ;ue negligible. ln i1~s 1 a 11l:e s where the ..:osb assu
ciatcJ with each run a re relatively small, th ese designs are particu larly .ittrac tive, because the
expe rimenter is confiden t beforehand of achieving unambiguous results.

TWO

5.5

LEVI I. FHACT!ONAI. FACTOH!AI. DEilGNS

FRACTIONAL DESIGNS IN

121

RUNS

Table 5.5 shows fractional designs for 4, 5, 6, and 7 factors. All 8-run fractional designs are
constructed from the same building block - the design matrix and the calculation (interaction ) columns of the 2 ' factwial design. In each case, the design matrix starts with colu111ns
A, 13, and C (the darker shaded area in T~1hlc 5.5) and is completed by associating each additional factor in tht' design with a colu111n olsigns from the lighter shaded area con1<1ining
the four interaction Lnlumn!-. The column for each generator is set equal to one of the four
interaction column'> (Ail, AC , !IC, or AFlC ~ ).
The lower rart o( the tahk has one row tor each design. The first colu111n identifies the
design ,ind its resolution, the second column shows the gcnerator(sJ, and the last sevcn cnl u111ns show the seven effect<; that are independently estimated in the design.

5.5.1 Five Factors in 8 Runs: The 2 5 - 2 Design


The 21'11 2 resolution Ill design (main effects are confounded with 2- faclor interactions)
has two generators:/) = AH and F, = AC (the " 2" in the exponent denotes this). From these
two generators, we construct the defining relation. Multiplying both sides of the first generator by[), and both sides of the second generator by E, results in the equation
1 = ATlJ)

ACE

But since A!)[) - I and ACF !, it is also true that their product equals i; that is 1 = /\HD
AC!: = BCDF. 1lcncc the dcf1 n ing relation is given by
T =- A llD

A(!~'

- RC{)f:'

lt consists olthrec words. Th e length oft he shortest word is three (letters), which show:<; that
this is a rcsnluticrn Ill design.
lil tind the cnnfouncling pattern for this or nny other 8-run fractional design, those in
the table or designs th.it use other generators, we always follow the same procedure. We mul t iply the defining relation hy each ol the seven columns A, B, C, AR, AC, RC, A BC that
make up the design matrix and the calculation columns of the building block, the 2 ' focto rial design. For example, multiplying by A, we have
A! - AAH[) = AACF = AHC[)F,
1\

/if)

Cf' ~

AflC[)E

Multiplying hy [i( ', we h<1ve


!!U

IlCAR/J

/lCACL

!lC'/JCDF

l\1ultiplying lw /\fl(,', we hav e


AllU - ABC!Jl[) = AllCACE = ARCBCDE

ABC = CD = RE= ADE

11

]\\()I]'\']]

11<\C 1101'Al

1Jrs1<;r-:s

fACTOl<IAI

TA HI I :; .6

1\ Frt1ct1011t1/ /Jn1g11 for ) h1ctors 111 8 /~11ns wrth ( ;r11cmtnrs /)


and lt.1 ConjiHmdrng l'attcrn

Ali 1111d /-

AC,

l:'\C TORS

Run

{) - AB

\(

Response

v
}'
)'

)'_,

)'
h

I ffeLh that mav he est1111ated .rnd their confounding pattern('\ factor and higher mdcr inter.1Ltiom
.ire .iss11 med

to he

1ero):

I,

~ i.1\ L'r,1gc

/0

I\

+ \

Ii \Ii

H/J

I, I<

--> (' + 1\i'

1,, /) .
/1

--> /

\Ii

In, ~ fl<
I,"--> ( I J

/Jf

Iii.

I<

The Ulnfounding 1>.1ttnn lo1 L'<lLh design (,1s.s11ming th.it '\-factor and higher order i11tn.1c:
tions .ire 7ero) is given in Table 5.5, with each estimated effect shown with its aliases. hir the
2;1 design, the senn estimated effech arc shown as (A+ fl[)+ CF:), (fl +-AD), (C 111'-),
(!) 1 All), (f + ,\(),WC + nt), and (flr-+ C/J).
The de-.1gn mat ri\, cst1m<1tcd effects and confounding pattern for this 5-factor dc-.1gn in
H run' 1> '>hmvn 111 1,1hlc 'i.6. l.aLh of the seven contr<1sts uses one of the columns of sigm in
the 2 'building block. \Ve emphasize that important fact in labeling the contrasts.

5.5.2 Six factors in 8 Runs: The 26 - 1 Design


The 6 faLtor 2:11 I rcsolut1on Ill design ha.'> three generators: n A3, E =AC, and r:
I or morL' th.in t\\'(l gcmr llor'>, ,1, in this L.lSC, the procedure for find111g the defining rc-

/i(.

l,1tion is analogow. to the procedure used when there arc two generators. We multiplv hoth
sides of the first gcncr.itor by /), both sides of the second hy F, and hoth sides of the third
hy r. The rc'>ult is th<1t each right-hand side is equal to!, forming the first three term-. (or
words) in the defining relation,
A/![)

AC /

l lowcver, there arc more ter1ns

RCF
111

the defining relation as all products of these three terms,

(1\H!J)(AC!), (t\ll/J)(HU), (ACE)(BCF), and (ALW)(ACI:)(HCF) arc!. !'he complete de-

fln111g rel.it ion consist'> of se,cn words,

The length of the shortest word is three; hence this is a resolution Ill design. ThL' urn
founding p.ittern ,Jiown 111 T.ihlc 5.5 is found hy multiplying the defining relation lw L1ch
of tlw sc\cn Lolunrn' A, /!, ( , AR, AC, HC, A/iC th;it make up the de,ign matrix ,md the

124

TWO \. EVEL FRACTlONAL FACTORIAL DESIGNS

calculation columns of the 2 3 factorial building block design . for example


BC = ACD = ABE = F = DE = ABDF "" ACEF = RCOFF

and
ABC

= CD = BE = AF = ADE = BDF = CEF =

ABCUC'F

5.5.3 Seven Factors in 8 Runs: The 2 7- 4 Design


Th e final 8-run.design shown in th e last row of Table 5.5 is the 7-factor 2f114 resolution
III design . Its four generators use all four available interaction columns (AB, AC, BC, and
ABC) of the 2 3 factorial building block design. Because of this, the design is said to be satu-

rated. We will discuss this design in detail in Section 5.7.

5.6

FRACTIONAL DESIGNS IN

16

RUNS

Table 5. 7 shows a set of 16-run fractional factorial designs with the number of factors ranging from 5 to 15. T)1e table is constructed in the same fashion;as Table 5.5. In that table for
8-run designs, the building block was the 8-run 2 3 factorial d tsign with its four interaction
columns. Here, the building block is the 16 -run 2 4 factorial C!es ign and its 11 interaction
columns. For each)6-run fractional design, the design matrix starts with columns A, B, C,
and D (the darker shaded area in Table 5. 7) and is completed by associ at ing each additional
factor in the design with a column of signs from the lighter shaded area containing the l l
interaction columns.
The confounding patterns shown in Table 5.7 ignore 3-factor and higher-order interactions. Table 5.7 shows the generators for each of these designs, but to save space, we have
omi tted the confounding patterns for designs with 10 through 14 factors. Software such as
Mini tab and )MP provide the generators and th e confounding patterns for all of the designs
in Tables 5.5 and 5.7 automatically. The user simply enters the number of factors and the
number of runs. In Section 8.3 of Chapter 8, we di sc uss the capabilities of these software
programs in more detail.

5.6.1 Seven Factors in 16 Runs: The 2 7 - 3 Design


Table 5.7 specifies the generators of this.design as E
The defining relation is given by
I

= ABCE =

BCDF = ACDG = ADEF

AHC, F = HClJ, and G = ACD.

= BDF(;

= ABH; = Cl::FG

The last four words are obtained by forming all products of the first three words of the de fining relation. The' length of th e shortest word is four; hence, this is a resolution IV design.
The 15 es timated dlects and their confounding pattern an; ~ shown in the lower part of
Table 5.7. To find the confounding pattern for this or any 16-run fractional design, we always follow the same procedure. We multiply its defining re\ation by each of the 15 effect
col umns A, B, C, D, AB, AC, AD, BC, BD, CD, ABC, ABD, ACD, BCD , and ABCD that make

..

- - - - - - - --

,.,,
,._ ,..,,
.

- - - ----

- --

TW0-1.E Vl' L FRACTIONAL FACTORIAL DE S IG NS

125

up the design matrix and the calculation columns of the 2 4 factorial design. For example,
multiplying by ABD we have
ABD

The contrast

IABD

= cm. = ACF =

BCG = BFF

= AEG =

DFG

= ABCDEFG

estimates ABD plus the sum of six other 3-factor interactions plus a

7-factor interaction. However, none of these interactions are visible in Table 5.7 because
aliases of order 3 and higher are not shown. Since 3- fac tor interactio ns are usually negligible, this contrast is an estimate of the noise.
The design matrix, estimated effects and confounding pattern for this design are shown
in Table 5.8. We have labeled each contrast to emphasize the fact that each effect uses one of
the 15 columns of signs of the 2 4 building block. By using this approach and listing the effects in the order of the 15 columns of the building block, we also provide a systematic way
to identify each of the 15 effects.
5.6.2 Testing 15 Factors in 16 Runs: The 2 15 -

11

Saturated

Fractional Factorial Design


This is a very efficient design in terms of the number of runs as 15 effects are estimated
from just 16 runs. Table 5.7 shows that all I I interaction (calculation) columns of the 2' full
factorial building block design are assigned to the additional factors (E through P). Because
all interaction columns are used to define the generators, it is a saturated design. There are 11
generators, and the defining relation consists of many words (in fact, 2 11 - l words). This is
a resolution Ill design, and as shown in the last row of Table 5.7, each main effect is confounded (aliased) with a string of seven 2-factor interactions. This design is useful in screening experiments where it is reasonable to ignore 2-factor interactions in the initial stage o f an
investigation. In these cases the experimenter wants to test ma ny factors in relatively few runs
to identify a few import<rnt ones for further study. Also, as we discuss in the next sec tion. it
is a n easy matter to add 16 runs to this design to create a 32-run design of resolution TV.

5.7

RESOLVING AMBIGUITIES IN FRACTIONAL

FACTORIAL DESIGNS

Experimentation often proceeds sequentially, starting with an initial experiment followed


by an additional set of runs to resolve the ambiguities that arise from the confounding of effects. Frequently, a series of experiments would begin with a resolution II! design, which is
quite economical in terms of the number of runs required. For example, we saw earlier that
7 factors could be studied in 8 runs, and 15 factors in 16 runs. But resolution III designs
confound main effects and 2factor interactions, and at the conclusion of the experiment,
the decision maker would no~ know whether a significant effect is the result of a main effect
or one or several 2-factor interactions among other factors. If experiments were only oneshot affairs. these resolution Ill designs would have less value. But as we show in this section, it is possible and desirable to augment an initial experiment with additional runs that
can clarify open questions.

128

TWO-LEVEL FRACTIONAL FACTORIAL DESIGNS

TAHl.E 5.8
A 2{v ' Design with Generators F = AriC, F = HCU, and

(J

Run

c:

c; -

ACJJ

/:

/'

/)

7
8

'I

Ill
11

!..'

I>

-i-

j.J

h
lb

+-

Effects !hat may be estimJted and their co11f\1u11ding palkrn (3 fJLtor


are ,i"umed to be zero):
10 = v--> average
I, -> ,. \
/Ali--> Ai"! + CJ:

Iii ->Ii
"__, (.
L/)--> I J

+ FG

/,\( __,AC I BE + DC
LAJl __, AIJ + CG I EF
Lu, __,BC +- Al: t DF
Im,--> Hf) + CF + b'G
Im--> ClJ + AG + BF

IA/j(

.111d

highn order intnact1011.,

11 __, F

IA 1w __, :l factor interact1u1.,


LAU!
( __, c;
Luu,
11 --> f11,Hu! - 1"'1--> Ar+ BC ~ DF

5.7.l An Example: Improving Online Learning


A university tint specializes in online learning wants to i11,provc the et"fectivencss of its
8-week experimental design course. To accomplish this goal, t seems natural to conduct an
experiment to test some key factors. Professor John Pesky, head of the statistics group, identifies 7 factors to test at 2 levels each,

JS

,1iow11 in Table 5 .9. Th<: cf'fcllinnc:-s of each change

will he measured by final exam scores. (Note that there have been some major experiments
examining the 1-elatio11ship between class size and learning. But except l(Jt these studies, in
searching the literature we found few exarnples of statistical experiments in education and
even !ewer that studied more than one factor. We believe there art' many upportunitie' to
USt' t'Xperimental design methods to improve the effectiveness of education. The example in
this seLtion is not an aLtuaJ one, but it demonstrates the kind of experiments that could be
performed.)

Tire Factors and Levels. i-:actor /\ is the textbook, and Pt'sky-wants tu com part' a new book
to the une he has been using. He also wants to see if additional readings would improve per
forn1<1nce (factor B). hKtor C is the amount of homework, <1nd the two level-,

LHL'

the cur

rent 5 hours per week (which students have complained aliout) and a less de111a11di11g
3 hours per week. Factor D's two levcb compare a new softwa re package tu the existing one,
while factor Eis tht' number of lectures. Currently there arc 4 one-hour video leLtures per

TWO-LEVEL FRACTIONAL FA C TORIAL DESIGNS

TABLE

129

5,9 .

Factors .and Levels in the Online Learning Example


----- - --. -- - - LEV EL

ractor

A Textboo k
R Readings
C Hori1 ework
[)Software
f Sessio ns
Review
(; Leet ure notes

C urrent
No
5 Hours
Current
4 per week
No
No

New
Yes
3 Hours
New
3 pe r wee k
Yes
Yes

week, but Professor Pesky is under pressure from the administration to cut back th e number to reduce costs. His superior Dean Takahashi believes that three sessions per week would
be just as effective. Finally, the last two factors will test two other changes to the course,
adding an on line revi ew session for the final (factor F) and the addition of a set of lecture
notes (factor G).

The Design. The professor decides to use the 8-run 2;1;- 4 saturated design shown in
Table 5. I0. The generators arc f) == AB, E == AC, F == BC, G == ABC, and the defining relation consists of 15 wo rds.
I= ARTJ == ACF " BCF = ABCG =AFG = BEG == COG = DEF
= ABEF = ACDF = ADEG

== BCDE == BDFG == CEFG "'' ABCDEFC;

The first four words in the defining relation correspond to the four gen erators. Th e other
words were determined by multiplying th ese four words, taking two at a time (six combinations), three at a time (four combinations), and all four together. The confounding_ pattern (ignoring 3-factor and higher-order interactions) is shown in the last row of Table 5.5.
This is a resolution TII design with main effects confounded with 2-factor interactions.
Each of the 8 runs defines the characteristics of a section, and 20 students are randomly
assigned to each of the eight sections. At the end of the course, each stud ent takes a final
exa m. The response variable shown in Ta hie 5. l 0 is the average score for each section of
20 students,

Which Effects Are Significant? The sample variance calculated from the test scores of
th e same section provides an estimate of the variability of an individual test score. Th e variances can he averaged across ,the eight sections to obtain an even better estimate of the variability, resulting in the pooled es timates ~ with (8)(19) = 157 degrees of freedom. Section
4.4. l of Chapter 4 showed that the estimated variance of an effect is given by s;rrect = ( 41N)s~,
where N = (8) (20) = 160 is the total number of students in the experiment.
Professor Pesky performs this calculation and finds thats~ == I 06.4. H ence the variance
of an effect s ~ffect

\rreci =

\!i.66

= ( 4/ N)s~

== ( 4/ 160 )(I 06.4) = 2.66, and the standard error of an effect

= l.63. A 95% confidence interval for the mean of an effect is the estimated

effect :+: ( l .96)( l .63 ); an estimated effect is statistically significant if its absolute value is
greater th an ( J.96)( 1.63)

3.19. Two estimated effects exceed this threshold: (A

CE -t FG) and (E -+ AC+ BG + DF).

+ RD +

'J

j,

AH I I

I0

(.In/inc Leurn1ng l:.rnmple: Jfr.,ult> o/thc Jmtwl 21. 1.' fxper11m't1/


l\l

!{l"'lltllhl'

j()J{

\\ l'r.igl

i(u11

All

/!

AC

HI

"iUHt

+
+
(1

Efli:cb {rounded to one significant figure) that

Ill.ti'

be cstimall'd
he 1ero 1:

.111cl

their u111tound1ng p.ttll"l"l1s

(J f<IL tor and higher .irder 1ntcract1ons are as.sumed to


IA
I 0.8 ~A + HU + Ct + FC
lu
I I ~ii + A/J ' CT+ FG

I,
Ip
11

o~ C +

I,
I,

1.0 ~ f + AG r BC + /Jl:
1.2 ~ c; + AI- + Hf + C/J

2.2

5.8

lJ

~I

At: ... JiJ + lJC,


Ali I u, t /:,/

' AC

HG

+ IJ/

'>ut1pose we (and Professor Pesky) assume that in (A -' H/J t- CF l FC;) and IF

~AC+

BC, /JF), the six 2-factor interactions are ncglig1ble. Then w: would haw estimates ot the
two main effects: f', and E. With this interpretation the best b1els are + for factor A (textbook). and
for factor E (session-, per week). Changing to the Ill'\\' hook increa-,cs the ,11
erage o.,core by almost 11 points, while reduung the number d sessJOm per week from the
current four sessiom to three sessions decrca;.cs the average score hr rll'arh 6 points ...\grad
u<1tc 'itt1dent leaks the news to Dean J'.1kahasl11, who 1s not pleased to he,tr 1t.
But what if the 2 factor interactions arc not tll'gligibll'? I hen till' oh'il'ned e'>trrnatL''>
might he due to one or more 2-factor interallions rather than the 111.iin cfkcts. lkL,lUsc ol
thcsL' uncertainties, Professor Peskv decide' to do a second '>et of 8 run.., dc'ilgned to cLmh
the i11it1al results.
l'hl expernnenul de,1gn course h ,1huut tu he otkrcd ag,1111. J>nk1 uL"tte' eight 'L'llton'
b1 randomlv assigning 20 studcnh to each run 111 this >ecollll de'ilgn, ,ind ,1t the end ol thL
coUt'iL', he calculates the linal exam average' for tILh sec:tion.
A 'iccvnd 8-Run l:xperiment Switch mg the Sigm vfCvh111111 A. J ,1hle ..,. I I ,ho1'' thl' urrg
in<il Lies1g11 matrrx (runs I 8) followed bv the design m,llrn. tor the 'ieLOnd experrmcnt
(rum ':>-16). This second set of 8 runs was constructed from he original design matrix by
sw1tcl1111g the signs in column A while leaving the other colu11111s unch.inged.
RL1ersing the signs in column A means that the sigm for each interaction column involving A are reversed as well. For cx.imple, .is Table 5.10 'ilHl vs, in the origin.ii de..,tgn the
signs of columns H, Al), CJ-, and(,' arc identical, and Iii (the <1\.eragc of the 1-c'>pon'>L's 1,hcn
Bis at the plus level minus the average of the responses whe11 Ii j, at the 111i11us ieffl) l''>llmatcs I 13-+ A/J _,_ r I- t- l:.'G). In the second design, the signs I ir Ul!un111.., Ii, ( /', ,rnd I(; .irL'
still identical, but the signs for column A/J are now rever'ied. As .i result, the ,1ver.tge of
the rL'sponses when l3 is at the plus level minus the <1veragc uf the re..,pu1\',es when H is ,1t
the 111111us level, which we denote by I~ (with superscript (for "follow up"), estimates

.,m!f!,F
-' - - --

- -- - ----

TW0- 1. EVF. 1. FRAl.TIONAI. FA C TORIAL DESIGNS

131

5. I I

TA fl LE

Th e 2i/i 4 Design (Runs 1-8) Joined by the Design That Switches the Signs of Factor A:
Online /.earning Example
-- - ------- --------

- --- --

--

--

---- - -- - ..

Response

FA CT O RS
- - - --

Run

--------------- ---- --- --- - - -

[)

- - - - --

2f11

F
-- - -

1.-

5
6
7
8
4

2(11 with
sig ns o f facto r A
switc hed

9
10
11
12
13
14
15
16

+
+

79. I
'i8 .7
77 . 1

+
+
+

-1-

+
+

+
+

+
+

+
+

63. l
72. 7
70.2
69 .4
65.4

+
+

+
- - --

- - --- - --

+
+

- -~.---- -

- - - ---

Fs timatcd effect s: O rigina l 2i11- <


IA " 1o.H ....... A + rm + CF + F<;
111 ~' I . I ---> B + AD + CF+ u;
l, = O--> C + AE + BF + pc;
11 , = 2.2 --> /) + AB + <:C + U
11 -q ; __,. ' + M . + ru;' nr
Ir ~ 1.0 ....... r t M; + 11c + vr
1,, = - 1.2 --> C + AF+ HE + CJ)

Esti mated effects: Follow- up (s igns of A switch ed )


11, = 10.2-->A - RD - C F - FC
I ~= -1.4--> B - AD + CF + EG
Ii: = -0. I --> C - AE + BF + [)(;
I{, = 1.2 --> D -- AB + C C + EF
I ~ = 7.0 --> t: - A C + HC + DF
=- - 2.6 --> F - AG + RC + DE
I ~; = - 2.0 - -> c; - AF + BE + CD

Com binin g the es tim ated effects:


( 1/ 2)(/A + / ~) ~ 10. 5 --'> /\
( 112)(/u + / ~ ) = -0. 2 --> R + CF + EG
( 1/ 2)(/, +
= () .......
+ Bl' + DG
( 1/2)(/I! + l{J) = 1.7 --> D + CG+ EF
( 112) (/F. + 1{) = 0.6--> E + BG+ DF
( 1/ 2)(11 + / ~ ) = -0.H-->F+ u ;+ DE
( 1/ 2) (11; +
-1 .o-->G+ .BE + C D

( 112) (/A ( 1/ 2)( /H ( 1/ 2) ( /c ( 112 )(1 0 ( 112)(/E( 1/ 2)(/1 ( 112 )(le -

tU

JU =

---

63.6
76. 8
60.3
80.3
67.2
7 1.3
68.3
71 ..\

+
+

- -- --- -

+
+

2
3

Average
Sco re

ti

/~) =c 0.3 --> BO+ CE+ FC


1{1) = 1.3-->AD
If) = 0.1 --> AE
If;) = 0.5---> AB
l{) = -6.4-->A C
/ ~) = 1.8--> AG
/ ~ ) = 0.4 ---> AF

(B - AD + CF+ EG) . As shown in the table, eac h of the o ther five co nt rasts (I{; thro ugh

I{;) that include a 2-facto r int~raction involving A are changed in the sa m e manner.
Now, consider colum n A. In the o riginal design t he signs of column s A, BD, CE, an d
FG a re ide ntical, a nd IAes tim ates (A + BD + CE + FG) . In the seco nd design , the signs for
co lu mn s BD, CE, an d FG are st ill identical. But now in every run, the signs in these three
2-foc tor in teract ion colum ns are the oppos ite of th e sign in colum n A. As a resul t, 1 ~1 (th e
ave rage of the responses when A is plus m in us the ave rage of the respo nses whe n A is
m in us) estimates (A - RD - CE - FG ).
Combining the Estimates from the Two 8-Run Experiments. We use two simple algehraic
operatio ns (add iti o n a nd subtracti o n ) to co mbin e the two sets of estimated effects and to
reveal th e confounding pattern fo r the en tire 16-run experiment. Conside r IA and I~ . From
Table 5. 11

132

IWO-llVJ.L lRACTIONAL f'A( llll([AJ. IJESIGNS

~A

IA

1'.4 ~ A -

BD

CF

j-

HD - Cl:' -

FC
FG

I lcnce

IA + l~

BD

~---

CE

FC

-~--

A+ BD + CE+ FG - A+ BO+ Cf+ FG


~

These two operations separate A from (BD


(I 0.8 + l 0.2) 12

A - BD - CE - FG

2
l~

-1-

Cf t FC). ln our example (i/2)(111 -+ /~)


111 )

l 0.5 is an estimate of the main effect of A, u1d (I I 2 )(11

I 0.2) 12 = 0.3 is an estimate of (B[)

+ Cl +

(I 0.8

f(;).

In Table 5.11, we perform these two operations repeatedly Io mm bi ne the cs ti mates from
the two 8-run experiments. The result is not only an estimat-. tJf the main dlect of A that is
JlO

longer confou1 ded, but clear estimates

Thetwo largest effects are A

or all 2-factor in~:radiom involving A as well.

I 0.5 and AC -

6.4. \'\'e obt;1111eJ these L'st im,Jtes bv com-

hi 11i11g estimates from the two 8-run experiments. EquivJkndy, we can

ubt~1i11

the cstimatcs

directly from the c:ombined 16-run design. hir cxample, to c;timatc AC, we Jctcrminc the
signs for the AC column by multiplying the signs in columns A and C and then apply thcse
signs to the responses. We have

l1t

63.6

76.8

60.3 71 .3 + 74.3

58.7

6-1. I

80.) t 67.2 I 68.3 + 7':!. l t 77.1

72.7

-r-

70.2

6':!.4

65.4

66.675 -- 73.lJ5 = -6.4


The interaction diagram in figure 5.2 shows the naturc of thL intcractiun bctwcen A (textbook) and C (homl'work). Notice that the new textbook is .il\\ays bt:ttcr than the old one. JC
the 11ew textbook is used, more homework is better, but with the uld tcxtbuuk, more home
work i, worse. One explanation is that with the current textbook, as;,igning more homcwurk
increases frustration and hurts final exam performance. Then~ w book with more homework
appe<1rs lo be the best option . The other effects arc not signifi,dnt. In p.irticular, clas;, note,,
additional readings, and a review class for the final apparently Jffcr no benefits.
There is one more very important finding. A main effecb interpretation of the original
expe1i ment identified the main effect of E (number of sessions) as significant, with four sessium per week being better than three sessions per week. But ;"Jat conclusion was wrong. As
we determined aftu the second experiment, it was the AC i1 craction, not the 111ai11 l'flect
of E, that was resp<insible for the statistically significant estimate of (f-. tlC t Ji(, L>F).
In the combined results of Table 5. l I, the estimate of (1:: + HC -t- /)/--')is very small (0.6). As

Dean Takahash[ w[[[ be happy to karn, the rcsu\b show that there Is <1e\ual\y nu J[\krenu:
In student performance If three rather than four \ectures are given each week.

TWO - LEVEi. FRA C TIONAi. FA C TORIAi. D E SIGNS

1.H

I ntera cl ion plot between A (1 c x1book) and C (homework )


80

.. __ ---

75

--- --- ---

--- --- ----.

65

60

~---.- -,-----------------~- ---

5 hours

3 hours

C ( homework)
- - New textbook

Figure 5.2

.....,... Old textbook

Interaction Diagram Between Factors A (textbook) and C (homework) Using


Results of All 16 Runs

5.7.2 Foldover: Switching the Signs of Every Column


Suppose in the previous example in carrying out 8 more runs, we switched the signs in
every column. The original 2iri- 4 design is shown in the top panel of Table 5.12 (runs 1-8)
with the second design (runs 9-16) shown helow it. We refer to the second design as a
(complete) foldovcr. What is the confounding pattern for the foldover design, and what is
the confounding pattern for the combined 16-run design?
In the original design , each main effect is confounded with three 2-factor interac tions.
!'or example, as shown in Table 5.10, IA estimates A

BD

CE

FG. Jn the foldover

design, switching the signs of all columns leaves the signs of every 2- factor interaction .
column unchanged, and now th e signs in column A are the opposite of those in the three
interaction columns. As a result I ~ estimates A - RD - CE - FG. Using the same procedure as in Table 5.11, we combine the two estimates: (112)(/A + I~)~ A and (1/2)

(IA -- / ~ ) ~ BD + CE + FG . The main effect of A is no longer confounded with the three


2-factor interactions . Combining each of the estimates in this way, the result is that we now
have clear ( unconfounded) estimates of all main effects, while 2-factor interactions remain
confounded as before .
The original 8-run design was resolution III, while the 16-run combined design is resolution TV The generators for the combined design are E = BCD, F = ACD, and G

00

ABC.

The defining relation for th is design consists of the following seven words:

I = RCDE = ACDF = ARCG = ABEF = ADEG = BDFG = CEFG

Notice that these arc 7 of the 5 words in the defining relation for the original 2i114 design,
which we showed at the beginning of this section. They remain in the defining relation for
the combined design, whereas the other eight words in the original defining relation drop

134

TWO 11 VF!. lRACTIOSAI

f-A\' IORIAL lJESIGNS

TA!lLL 5.12
Tlie 2 111 4 Design (Runs 1-8) Joined by the J)esign Thal Switches the Signs
of All Columns (Runs 9-16)
l:H I OR:-.

Run

+
+

+
+
+
+

/)

(,

3
4
6
7
8

211 1with signs ol


,di lactL>rs switchcJ

9
IO

II
12
11
14

IS

+
+

+
+

+
+
+

+
+
+

+
+
+

lo

Fst1!ll,1ted effects: Orig111al 2111


1.1 -> 1\ r HU + (.'!- + n;

+ AD + Cl + f-G
!, -> ( . + Al: + l!F + l>C
Iii ~ I!

lu-> /J +AH+ Cc,+/}


11 --> I- + AC -r BC, + lJF
11 -> I i AC ' JJC t- J)/;
I,,-.(,+ Af- +HI~ C/J

Estimated elkcts: h>llow-up (loldmn ol all "gnsJ


/14 -> A

IW

Cl:

/~->Ii

AV
AF
AH

Cf - };(;
HJ.
IJ(,
cc; /:'l
JJC,
llf
UC
/JI
/!/
('/)

l ~ -> ('
t;,-> [)
I~ -> t:
!~->I/~, -> (,

A('

/1G
;\/

J.G

C:o111h111ing the est1111,1ted ellcLts


(112J(l 1 + 11,)->A

( J/2)(/H
( 1/2)(1,
(l/2)(1u
(1/2)(/1

+ /~J)-> }j
+ !{)-> c
+ lb)->D

+ i{)->F
(112)(1,. + /~)-> J.

(112)(1,, +IL)-> c;

(l/2)(/..1
11,)-> LW + CI: t /(,
(l/2)(/H
!\;)-tA/J+U+f(;
( 112)(1,
/()-> 1\E ~ HI- 1 l!<.'
(1/2)(/u - l~,)->AB +CC+ FF
(112)(11 - tl)-.AC + liC + LJ!
(112)(1 1.
!;)-.AC;+ BC+ l>L
( l/2)(1,,
ti.)-> Al + /JF + ( /)

out. Why? The seven words that stay in all have four letters Ian even number), while the
eight words that drnp out such as AHIJ and AC'f-. have an odd number of letters (seven have
three letters each while one, ABC/Jl::FC, has seven letters). Switching the signs of aJI rnlumns leaves the signs of the four letter words unchanged, and thq rern,1in in the ddlning
relation of the fol dover design.
Hut in the defining relation of the foldover design the eight words with an odd number
of letters appear with their signs d1a11gcd. lur example, I
AH/!. In thl' Lomhinl'd Llc,,1g11
AHlJ is no longer equal to I (ABIJ
I in runs l-8, but /\Hf)
I in runs LJ 16). As a result, r\/ilJ anJ the othcr words with an odd number or letter::. drup out or thl' deJi11i11g 1elc1
lion for the combined design.
hum the defi111ng relation for the combined 16-run design we can find the entire con
founding pattern in the usual way. h>r example, multiplying the deil11i11g relc1titln hy /\/)

- - - - - ----

TW0-1.EVEL FRACTIONAL FACTORIAL DESIGNS

~--

a nd ignoring interactions of circler 3 or higher, we have that AD= CF= EC, which is identical to what we found by combining the two 8-run designs as shown in Table 5.12. In estimating the effects for the combined design, we can combine the estimates from the two
8-run d es igns as shown in Table 5.12. Rut the simplest approach is to estimate the effects
directly from the combined 16-run design . For example, for AD ( = CF= EC) we multiply
the signs in column A and D .to determine the signs for the AD column. Then we calculate

/An, the difference between th e average response when AD is at the plus level and th e average response when A JJ is at the minus level. This is an estimate of AD + CF + EC.
Suppme we had followed this compl ete fold over proced ure in the online lea rning
example of the last section. What would the result have heen? In the original 8 runs, there
were two significant estimates IA --= I 0.8 -A + HJ> + CE + FG and 11: = - 5.8 -

E + AC +
RC + OF. Switching the signs of column A gave us clear estimates of the main effect of A
and the AC interaction which were the two significant effects.
A complete foldover would have given us a clear estimate of the main effect of A. In addition, it would have separated E from (AC + BG + OF). The estimate of the main effect of
1:; would h;ivc heen small, while the estim ate of (A C+ RC + DF) would have been similar
to -5.8. Given that the mairi effect of A was large, we might have guessed that the AC interaction was respon sib le for this significant estimate, but we would not have been certain.
In this case, switching the sign of column A turned out to be a better choice.
Implementing the Sequential Approach. We have discussed two distinct methods for resolving th e ambiguities in a resolution III design: switching the signs of one column and
switching the signs of all columns ( foldover). A fol dover increases the resolution of the design from resolution Ill to resolution JV. The decision regarding which method to use
should be made after the results of the initial experiment have been obtained. If the initial
experiment shows that a number of estim ated effects are significant, a foldover is prohably
the preferred choice. On the other hand, if the initial experiment shows that only one estimate (as in our on lin e learning exa mple) or perhaps two are significant, switching the signs
of the factor associated with the largest estimate is likely to be the best choice because this.
will unconfound all the effects involving that seemingly important factor.
Decisions about the appropri ate follow-up design should be made after the results of the
initial experiments have been analyzed. For example, it does not make much sense to decide on a foldover before the results arc known, since in that case the experimenter could
have implemented a 16-run resolution JV design in the first place.
A complete foldover does not make sense if the initial experiment is already a resolution
TV design si nce in this case the main effects are already isolated. Switching the column signs
for one factor , however, may be useful as it will unconfound all interact ions involving that
factor.

5.8

2~;- 4 DESIGN: IMPROVING E-MAIL ADVERTISING

With consistent growth and solid profit from their stores, an office supplies company that
we call ARC decided to expand an e-mail program that directs potential small-business customers to its Web site. After brainstorming ideas and trimming the list to the boldest ideas,
the marketing team identified 8 factors and selected two different versions of each factor to

~TWO~.l\'11.

RACTIONAL~ORIAI

LJES!t;Ns

TAlllE 5.13

E-mail Advertising l:xpenrnertt: Fucturs um! Levels


C.u11trnl (

>ltw IJea (

\ I ink to online tatalog

No

Ye'

ll l k>ign of e- rna I

"trunger brand imagt'


()ff2r> from Iwo parlner rnmpan1n

( l'artner promolrnm

Simple
1\one

1 J ~av1gat10n bar un . . 1dc

( urrl'nt

'\c(l1titl!lal llUllilll'>

I l\.1ckground color

\\"h11c

Ill""

11 ll\counl offer

I ~u1u o!f

0,.(J dl..,dHl/ll

t, ~ubject line
11 I ree gill

I xclu"ve e mail olle1

~ptu.._tl

~()fll'

}rL'1..' pl'll ,111d j1L'lllil 'll'l

olkr lur uLH U'itumcr:,


1.._

test. lor each factor, the company refers to the currrnt setting as the control. T.tble 5.13
summdrizes the control and the new idea to be tested for each fat.tor.

A: Link to online catalog. The e-mail included a "Shop our catalog online" button near
the bottom of the message. The team felt that an obviow, link to the Web site would
encourage customers to browse through the selection of products.

H: l>esign ofe-mai/. In the past, e-rn<.1ib used a basic font 11ith a small company logo at
the top. The team wanted to test a stronger brand image, with a larger logo, more
stylized font, and greater use of the company's brand colors.

C: Partner promotions. The marketing team believed that promoting several wellknown brand-name products would encourage customers to make a purchase. They
decided to promote two specific brands in two bright boxes under "Offers from our
partners" at the bottom of the e-111ail.

]): Navigation urir on side. E-mails currently went out with .1 sidebar similar to the navigation bar or' the co111pany Weh .,ite, but with a shorte1 list of link.,. The rnm11any
decided to test the current navigation bar versus one with more choices.

F: l!ackground color. All e-mails were sent with dark text un a white background. The
creative diredor thought that changing to a blue background might help the e-mail
stand out.

F: /Jiscount ojjcr. The Internet director had gone back and forth between offering a
special e-mail discount or not. I k thought the discount helped but never had
quantified whcthcr it generated enough -,ales to ju-,tif)' the lown n1.1rg111.

( ;: Sub;rct line. The 1nternet di rector h.1d been testing di flt rent e-mail subject Ii ncs.
Until this test, "Exclusive e-mail offer from AHC" was the winncr. Since he knew the
subject line was important, he wanted to test another 1,rsion, "'1peu,Ii offer lur our
hest customers."

H: free giji. They h<1d never before offered a free gift with onlinc orders. They knew
other companies were doing so, and decided it was worth tr:1ng, selecting

c111

attrac

tive but low-cost pen-and-pencil set as the free gift.


The firm used the 16-run 2rv

design in Table 5.14 to study the impact of these 8 factors on

the order rate. Each version of the e-mail was sent to 1,000 addresses randomly chosen from

TWO-LFVEL. FRA C TIONA L FAC: T OR I AL DES I GNS

TAR l.F

13 7

5 . 14

The n es ign and the Results


FACTO R

A
Li nk

Design

D
Navigatio n
Ba r

C
Partner

f,

Co lo r

G
Subject

F
Discount
-

Li ne

- --

Respo nse:
H
Gi ft

Purchase
Rate (% )

- ------

- -------- ---- --

2.23

+
+

+
+
+

1.8 1

+
+
+

D R
2.03

2..lO

1.47
+

+
+
+

no

1.62
1. 28
J.98
1.78

IS i
I 11

-1

1.H I
1.9'1

l.xl

----- -----

Estimat e

Flfrc1

1.8363

Average
A
H

I>
F

F
(;

H
1\ll > CF: 1 /JI' + C:ll

AC + BF. + DG + FH
AD + BF + CG + EH
AF: + BC + f)fl+ FG
AF+ Rn+ Ci! + F.G
A<i I HII + CD 1- HF
A ll + RG + CF + Df.
---- -- -- - - --

0.0550
0.0850
-0.2775
-0. 0275
0.0325
-0.5675
0.0450
0.24 50
0.0425
0. 1650
0.0300
0.0250
0.0600
-0.0325
0.0875

- -- --- --- -- -- --- -- - - ~- ----

No TE: Sign ificant effec ts a re shown in b oldface.

a list th at the fi rm had purchased. The response variabl e, th e proporti on of custo mers that
o rdered from the In ternet site, is given in the last column. Th e average response was 1.84%.
The first fo ur colu mns of the design rep resent a full 2 4 facto ri al in th e fac tors A , B, C, and

D. Th e levels of th e remaining fo ur facto rs in the adjacent columns are obtained fro m the
four generators
F,

=ARC

F = A RD

G = ACD

H =BCD

These arc the generators in Tab le 5.7, hut wi th F and H interchanged.


The design is reso luti on JV. Ignorin g intera cti ons of ord er th ree and hi gher, thi s design
all ows clear (un confo unded ) estim ates of all eight main effects. The 2-fac tor inter<1ctio ns
are con fo unded in str ings (gro ups) of fo ur.

l_\8

J_ I

\\{l-j

rVl-l rRAC'l~~\I

f'Al'Tlll!IAI

llr.Sll.:-IS

In '>ection 4.6.4 of Chapter 4 we showed how to find the st.tndard error of an e'>t11nated
effect when the responses are proport1om, a-, 1s the case in this e.\<tlllpk. i"lll' sl.lndard nro1
IS

gl\ en by
standll rd error(cffect) -

where p

/ p( I - p)

'
\'

p( I

;v.2

p)

N2

0.00184 is the overa!J success proportion and N

'4

f'( I

p)
,\'

16,000 i-, thL' total -,am11lc -,ill.

This results in
-

stalllll' rd

error( effect)

4(0.00184)( I
\,

16,000

0.00184)
O.OOOC178 or 0.068""

At the 5% significance level, an estimated effect is significant 1f its absolute value is greater
than 0.068(1.96) = 0.133%.
The factors C (partner promotion), F (c.focount offer), and H (free gift) are significant.
Making available offers trnm two-p,trtner companies reduces the rc-.pomc r,1te by 0.28 pe1
cent,igc points. The team theorized that additional offers may have rnnf used the mess,tgc and
given customers too many disjointed offers to choose from. '\ll1t offering a 15% disu>unt de
crca-.es the response by 0.57 perLentage porn ts. The tea111 cakul,1tcd that thL lo.,., of margin b)
offering a 15% discount 1s more than covered by the increase in the number of orders. A free
en ,111d-pe11cil gift set increases the purchasing rate by 0.2'.'l l'en.:Lntage points. A11alv11ng
profitahilitv, the cu-,t of the gift was ea-,ily CO\Crcd hv the 1nue;he 111 ordn-,.
I he fourth largest effect, l4 c = 0.165, is a11 estimate ot AC
Hi
/)(, t l/f. J'hc-,e lolll
2-fallor intcract1om Ml' confounded, and l\'L' unh know th.tt their -,uni 1-, 0.1 hS. l lo\1e\er,
we do know that bctors F c111d Ji <1re '>Jgnllicant. lxpenencl' h.1-, shown thdt '>tgntftcant

2 l.iLlnr interactiom tend to involve !actor-. th.it h,ne '>ig11il1L,111t 111.ii11 dkLt,, an L'\pcri
ment.1! design principle th.it is called cf}ect heredity. :i1nLe 1- ,111d H 11crc identified ,1s -,1gniti
cant 111ain effects, effect heredity suggests that the esti111atcd effect is 111ost likely due to an
interallion between I (discount) and f-1 (free offer).
ThL 111teract1on diagra111 for factors 1- and// 1s shown in Figure 5 ..l. l'he interaction '>LIP
porh the ma111 effect'>: the 15% di.,u1u11t (/.'

, both point'> 011 the kit j., alwavs better, and

the lreL' gift (H +,the top line) mcrL'<l'>es the response over 1w free grit. The 111terall1on can
be understood by comparing both points on the left with both point> on the right. On the
right (with no diswunt), offering the free prn and-pencil set give-, a h1rgc jump in re.,pome
versu-, offrring no free gift; the respome changes by 0.41 percentage po111h lro111 1.35% tu
l.76'hi. In contrast, the pomts on the left show that, with the 15% discount (1--), the tree
gift i11Lreases re>ponse only -,lightly (!"10111 2.08;(, to 2.16%). (herall, this 111te11ctio11 -,huw-,
thdt thL 15% discount is great, the free gitt is good, but both t"gether mav be overkill the
free gilt adds little to the benefit of the discount offer. These lfota helped the 111arkcting team
gain deeper insight into customer behavior, showing that one strong incentive is valuable,
but additional incentives are probably unnecessary.
The company decided to offer the 15% discount and avoic.1 the partner promotions. In
addition, they planned to run further experi111ents to study the interaction between smaller
discounts and the free gift offer.

TWO-LF.VEI. FRACTIONAi. FACTORIAL IJF.SIC;Ns

139

Interaction plot hetween F (discount rate) and H (free gift)


2

21

2.1
-- 2.0
"

~ 1.9

;3

~ 1.8
v
~ 1.7
Oil

E 1.6

1.5
1.4

1.3

~--- ----------------~----

-I

15% discount

No discount
F (discount rate)

- - H (free gift)

Figure 53

5.11

Yes

H (free gift) = No

Interaction Diagram Between Factors F (discount) and H (free gift)

NOBODY ASKED US, BUT . . .

A strategy of carrying out one single, all-encompassing experiment is usually ill-conceived


because it leaves no room for subsequent experimentation, and shortcuts the accumulation
of knowledge. Learning is sequential, and as R.A. Fisher has said so well, the best time to
plan an experiment is after you have done it. An approach that spends a portion of the available resources (perhaps 50%) on the initial experiment and saves the remainder for followup runs is more efficient. Two-level fractional designs and their foldovers (of a single factor
or all factors) are ideally suited for this purpose.
At the outset of a study, one often encounters a large number of conflicting theories and.
numerous factors that are thought to have an effect on the response. At that stage, 2-level
fractional designs arc especially useful for screening purposes, to separate what Joseph
Juran has called the "vital few" factors from the "trivial many."
In a resolution III fractional factorial design it is possible that a significant main effect
and its confounded 2-factor interaction might have about the same magnitude but opposite signs and cancel each other out. In that case the experimenter would miss two significant effects. It is unlikely, but to paraphrase a once-popular bumper sticker, "stuff
happens." The approaches in this chapter are powerful but not foolproof. The key is to
experiment. Missing something occasionally will be of small consequence compared to the
accumulation of insights over time.
In Chapter 4, we discuss~d estimating the variance (and standard error) of an effect by
assuming that higher-order .interactions are negligible and represent experimental error.
But that approach does not apply in general to fractional factorial designs. The problem is
that higher-order interactions are usually confounded with main effects or 2-factor interactions. In the 2iv- 3 design, we pointed out that one of the 15 estimated effects is a string of

highn order interallions and could bl' used to estinh1te the ,ariancc oi',111 cftl:Lt, but thi, c'
ti mall' would only have I degree offrl'edom. Similarly, in the 2:\ design shown 111 I a hie">./.
two of the 15 estimates arc strings of higher order inleracttom ,1nd LOuld be used lo est 1mali:
the variance of an effect with 2 degrees of freedom. With 32-run cxpenmcnls, there arc more
opportunities to use this approach. lor example, a i' 1 32-ri.. 11 experiment with generator
6
12345 would be resolution VI with main effects confounded with 5 factor interactions
and 2 factor interactions confounded with 4-factor interactJ<;Jls. These leave each J-factor
intcrallion confounded with one other 3 foctor interaction. I here .ire IO sud1 p.11rs, ,,h1d1
could be used to estimate the vana1Ke of an effect with I0 dcgr'.'es ot'lrcedom.
In the 2i11 2 fractional design we gave the generators as lJ
AB ;111d F AC. A natural
ljUl'stion is, Why not set one of the generators equal to the u.lunrn ol signs of the J fauo1
interaction ABC? For example, we might use]) AB and F ,iBC. If you work out the con
founding pattern for this choice ofgcnnator'> you v.ill sec that tis i:qui'<ilent lo th.: p.1ttcrn
Ali(
for the generators that we have given. I or the 2i\ : design we gave the gi:ncrators a_, Iand I
BCJ>. In tpis case, you might ,isk, \\'hv nol I
Ali( I JI I he dl'frn111g rclat1on 111th
E 1\/i(,'LJ rather than f Al3C is I :\/5( ])/: Ji( /JI- AU. l hnlwnesl \\ord h.i, three
lettcrs, so this design would have resolut10n JI!, while the one we listed 1s re,olut1on I\'.
111 gLneral, we ''anl lo choose gcnn;ilor' lo ad1il'\c the h1ghLst pm.,ihk rnolut1011, hut
therl' 111ay be a numhcr ofcho1ct:s th,ll lcad lo dc-,1gns with lhL' s,1ntL' rc,olut1011. Jill\\ do l\L'
chlHhL' among them< I or example, consider ,1 2
design for 7 l~1clors 1n 32 rum. I 1rn of
the Lho1ces arc de-,ign I wJth generator'> (1
12 \and 7
I 2Vi ,llld de,1gn 2 with gL'llL'I"Jlor'>
6
12.\ and 7 145. hlr dco,ign I the ddinlllg rel,1t10111s I
I 23b
12-157 3!5(1/, \\'h1k
for de-,1gn 2, it is I

12.lb

1457

2Ji567. !'he length oJ the sl10rlcsl word Is 4111 cach case,

so thtse designs are both resolution IV, with 2-factor 1nteract1ons rnnfounded with other
2-factor interactions. But design 2 has six pairs of confounded.> factor mtcract1ons: ( 12, 36),
(13, 26), ( 16, 23), ( ! 4, 57), ( 15, 4 7), ( I 7, 45), wh.:n:as design I has on l!' three pairs l hat art
aliase;,: ( 12, 36), ( 13, 26), (16, 23). As a result, design I 1s prefc:ablc lo design 2. :\otice that
design 2 has two 4-letter words in its defining relation, whneas design I has only one. !or design I, the 4-letter word is 1236, ,md only p.ms of 2 factor 111tn.iLl1om 1111oll 111g tiie-,e -~LIL
tors will be confounded. But design 2 has a second 4 letter word, 14'17, ,1I1d p.iir'> of 2 lallor
interallions involving these four factor;, will be LOnfoundt:d c1s well. ~o the tic breaker hL'
tween these two (or more) resolution IV dt:signs is th.: number of4 ktter words in each de
fining relation. Hoth designs are resolution l \' designs, so the}' each must have at least one
4-letter word, but design l has the fewest 4-letter words and is called the m1t111num uberrution
design. Note that all of the desigm in Tables 5.5 and 5.7 .ire 11i11i111um aberration desigm.
In ( hapter 3, w: rntroduced the 1mortanl idea ofbluck111g. The same concet appl1t:s to
factori,d and fractional factorial desigm. hir example, in'' 2 L'\pl'limenl (with factor'> l,1
beled ,\, B, C), it might be necessary to perform 4 runs on one da)' ,1nd 4 runs on another.
Randomizing the assignment of runs lo days will result in a valid experiment, but if there is
a day effect, it will lead to an increast: in expcnrnental error. A11 allernat1vt: Is lo perform the
experiment in two h\ocks of L\ rum each. The runs in the f1r't b\ock (day I') W\luld be tho'e
for \\'hich the signs of the ABC column an: plus, with the rum in the -,.:cond block (dav 2)
being the runs for which AHC i' minus. This arrangement will confound the (po,s1blc) d,I\

TWO -LEV EL FRAC TIONAL FA C T OR IAL DES IGN S

141

effect with the 3-factor interaction that is very likely to be zero and can be ignored. Jn the
online experiment that we discussed in Section 5.7. I, the two 8-run experiments were run
at different times, which also introduces a block effect, as the students might perform betle r durin g one time period co mpa red to the other. But in this case, th e same 8-run ex periment is repeated in the second block. It may well be th at the scores in o ne block are higher
than the scores in the oth er, hut any block effect would he added to each of the 8 runs in the
hlock and would have no influence on the estimated effects. Th e differe nce hetween thi s and
the first example is that in the first, a bl ock effect would influence only 4 of the runs. The
firs I situation would be analogous to the on lin e experiment if the complete 2 3 experiment
were repeated on the second day.
EXERCISES

Exercise I An experiment is to be performed with 4 factors, labeled A, B, C, and D, each


at 2 levels. The design consists of 8 runs with the generator chosen to have the highest possible resolution . Write do wn the design matrix for this d es ign , and show the complete confounding pattern ignoring 3-facto r and higher-order interactions.
Raw materi al, which is not one of the 4 facto rs being tested, is available from two different suppliers. Th e re is only enough raw material from supplier I to do four runs. The other
four runs have to use raw material from supplier 2. Management is concerned that th e difference in raw mat erial between suppliers might affect the results. Suppose managem en t is
certain that the AD and BC interactions are both zero. Which of the 4 runs in your design
matrix sh o uld use material from supplier I, and which should use materi al from supplier 2?
Explain why you made this choice.
Exercise 2 Two Master Black Belts, Bill and Karen, meet at an online dating site and go
out on a date. Karen brings along a piece of paper showing a 26 - 2 design with generators

E = ABC and F = BCD. In a passionate discussio n, Bill argues that replacing the gen erator
E = ABC with the generator E = ABCD will result in a better confounding pattern. "We
would confound F: with a 4-factor interaction so this pattern is obviously going to he better." Karen repli es, "No Hill, you arc wrong, th e pattern is goin g to be equivalent." Co mm ent on the opinions expressed by these two lovebi rds.
Exercise 3

Karen I )on cg<rn, a st<i ffcr at the business school, is thinkin g about bu ying some

new golf eq uipm en t to improve her game in preparation for the annual golf tournament.
One day she comes across som e notes on experimental design that she fi nds on top of a file
cabinet nea r her office. "Hmm," she thinks to herself as she begins to read them, "this looks
interesting. Maybe this will help me decide what equipment to purchase."
K<iren has been thinking about replacing her current steel-shaft, small-head driver. The
company that makes her clubs has three other drivers that she is considering. The first has
the sa me steel shaft hut a newl y designed very la rge head . The second has a graphite shaft
with the sa me small head that is on her current driver, while the third has the graphite shaft
with the newly d esigned very large head. She is also wondering whether a new pair of golf

142

TWO-LEVEL FRACTIONAL FAC'I ORIAL lJEStGNS

shot's might help, as well as switching to an expensive golf ball (at $3.0ll per ball) rath<:'r than
sticking with her discount store special (at $0.75 per ball). She also has her eye on a rather
expen<>ive new golf sweater, which should put her in the right frame of mind to hit some re ally long drives. Finally, she does not wear a golf glove when she plays, and her husband
pointed out to her that she is no Fred Couples, one of the few golf professionals who does
not wear a glove. After spending an evening reading the notes, Karen concludes that "this is
prett: clear" and decides to perform th<:' <:'xperimcnt whose cit-sign matnx is shown below.
She includes 6 factors and decides to do l6 runs, all to be pt.rformed at the local driving
range. Each run is performed in random order.
Karen buys the necessary golf balls and the glove, but is able to borrow the shoes and golf
clubs from a local stor<:'. She has her own steel-shaft, small-hc,1d driver, and she has 10 days
lo return the sweater for a full refund.
The design matrix and the results of the experiment are shown below. Each run is a single
drive (shot), and the response vJriable is the distance the shot ccHrics in yards. Karen's goal
is lo lind the equipment that will 111<1ximi1.e her distJnce, bL:l she doesn't w<1nt lo spend
money on anything that will not help in that regard.
LEVEL

Fu ct or

Small club head

1.arge club heud

Steel shah
Cheap ball
No glove
Old shoes
Old sweater

Crahite shaft
Expensive ball
Glove
New shol's
Expensive new sweater

c
J)

E
F

lJES ILN W !TH GEN ERA !'ORS: L

Run

ARC, I-= IJCU

---

I:

L)

--F
Response
182
157

155
22h

184
166
177
218

+
+

8
9
IU
11

178

152

12

1.l
14
15

JJS
232

+
+
+

173
156
15,1
223

lb

btimated effects: Average = 179.25

IA - 24.00 ---7 A
lu = 2 1.50 ---7 B
4.25 ---7 c
le
~

/AH =

45.50

IA( = -5 .25
IAJJ

6.75

---7

---7

---7

L'

AH+ CF
AC+ BE

/AIJC

11

/AHi!

().75

---7

.\-factor interactions

+ I:F

/,\UJ

0.50

---7

l-foctor

AlJ

9.25

---7

intcr~1ctions

TWO-LEVEL PRA CTIONAL FACTORIAL DESIGNS

14.>

In= -7.75 ~ D 18 c = 1.75 ~BC+ AE + DF l8 cn = Ir= 1.00 -~ F


18 v = -0.25 ~ BD + CF ,/ABCD = /AF= 0.00 ~AF+ DE /CD = -2.00 ~CD

BF

(a) Verify the estimated effects and their confounding patterns (assume interactions of
order 3 or higher are negligible).
(b) Find all the effects including higher-order interactions that are confounded with C,
with CD, with AB.
(c) Suppose th at based

011

prior extensive experience at the driving range measuring

the length of her drives, Karen has determined that the standard deviation of a
single drive is I 3 yards. What is the 95% confidence interval for an effect? Which
effects arc significant'
(d ) Based on yo ur a nal ys is of the results, what arc the most reasonable and likely
conclusions you can draw without doing any additional runs? What should Karen
do? Show square di ag rams, if appropriate. What is the regression prediction
equation'
(e) How much additional yardage could Karen expect to get if she followed your
advice'
(f) Suppose instead of doing all 16 runs on one day, Karen decides to do 8 of the runs

on one day and th e other 8 runs the next day. Karen's husband Harold suggests she
do the even -numhen;d runs on one day and the odd-numbered runs on th e other.
ls this a good id ea? h plain. A professor at the business school suggests she do runs
l, 3, 6, 8, I 0, 12, 13, a i1d I 5 on one day and the others on the ne.xt day. Karen asks,
why? The professor says, "Look at the column of signs for the ACD interaction .
The runs with - signs should be run on one day and the runs with

+ signs should

be run on the other day. That's how I picked the runs for each day." Explain why
the professor's suggestion is better than Harold's. It may help to think of the day as
the "seventh factor" in the experiment.
Exercise 4 The Natural De iight Food Company makes a range of frozen foods. One product is the Natural Delight Soy Burger, a non meat product. The company is interested in testing a number of factors relat ed to this product.
Factor A: Location. The choice is between the natural foods freezer case and the freezer

case where beef hamburgers arc sold. The supermarket has offered the same amount
of shelf space at e ither location, and the company has to decide which location to
choose. In the past, the product has been sold in the natural foods case. The company wonders whether the higher customer flow past the hamburger freezer case
might lead to higher sales.
Factor B: Packaf!.e co lor. The existing package is green for the environment. A market-

ing manager s uggests red lettering on a white background.

Factor C. An in -s tein; spe(.ial display located halfway between the two alternative free ze r

locations. The display would show a happy person eating a Natural Delight Soy

144

J'WO-J f'VEL f'RACTIONAI

f'ACTORIAI

l>FSlliNS

Hurger. The company logo would be displayed, and the ollowing words boldly
shown: "Oh Doy, you will love our Soy Burger!"

Fuctur 1>: Free samples. The brand manager feels strongly t at sett111g up a table next to
the location of the display, with an extremely attractive -,tore employee cooking and
offering shoppers samples of the product, would be very helpful. "If we can just get
people to try our product, they wiJJ buy it," he says.

Fuctur E: Sticker. The company has been adding a fancy sticker to the package with the
words "stay healthy." They would like to test whether or not the sticker has an effect
on sales. The sticker would match the package color.

Factor F: Package feHering. Currently, the lettering on the package uses a modern
font. A summer intern, the marketing manager's nephew, suggesb testing a more
traditional-style font.
The company has identified a group of l6 stores allot ver)' sirnila1 si1.e, with ahuut the
same weekly sales of the Soy Burger. The experiment will be ru11 over

4-week period fr-um

lllid ) uly to mid-August. The response variable is dollar sales o( the product over the 4-week
period . As shown below, the design matrix consists of 16 runs, each a specific setting of each
of the 6 factors. Each of the 16 runs is randomly assigned to one of the l6 stores.
Based on an extensive analysis of a very large amount of historical data for the test stores,
the company estimates that the variance of the response ofa si11gle run (the variance of dollar sales at a store over the 4-week period) is $800.
The design matr:x and the experimental results are shown helow.
LLV El

Factor
A
LJ

Natural i"<><JJ, case


C ree11 package

/)

f;

No d"play
Nu free 'ampks !able
"Stay healthy" sticker

( 'urrl'!lt muder11 !etkring

IZun

Ht'cl hamburger !nu.en !uuJ:i, ,1:-.e

!Zed lettering un a white


I !"pl.iv
I-rec ,,1111plc, talik
'1u slilke1
l"rad1t1u11,1I kttcr111g

J)

4
5

-+

7
8
9
10
11
12
13
14
15
16

+
+

+
+

1ZesprJ11't'
l,l 10
l I 120
970
l ,o25
I ,000
980
l I 125
1,095
%(1
Y75
9.\()

'15U
9.HI

Y7~

+-

960

+-

back~round

950

I
(,1) Ohta in the estimated effects and determine the confounding patterns (see
Exercise 3 ).
(hJ B.1sed on a 9510 confidence 1ntcrv,1l, which effects are signiflLant?

(c) What settings of the factors would you recommend? Explain whv you picked these
settings.
(d) What is the regression prediction equation and what arc the predicted 4-week sales
given your answer to part c?
( e) \!\'hat additional comment,, suggestions, or observations, if anv, do you have'
Excrci!>c 5 C.on-,1dl'r I Xl'rl 1,e I in C'ha pt er 4. I .1glc Brands studic.., the effect" of 6 f,1ctor ....
1'
( "ons1der the 2''
, and 2" ' fractional foctonal designs. Discuss the confounding p.11-

lL'rn' ,111d the rc..,olut1on of ti csc designs.


Exercise 6

Consider Case

I Mother Jones

( B)) from the case study appendix.

(a ) Analv7e the results ot the experiment. Which effects arc statistically signiticant at
the 5% level' At the 10% level'
(h )

\\hat '>l'lllllg'> for the factor-, would vou recommend?

r L l \\'h.it 1.., the rl'gres..,ion prediction equation and the predicted response if signifiL.1111

l.ict<ir..,
f: xcrci!>e 7
(a)

.HL'

..,el at thc11 best levcl-; 1

C.01Ntkr (,,1-,e:; (Office

~upplics

I mail Test ) from the ca'>e study appendix.

In the ..,cct1on on Pl.urning the Test , it i' stated that with 35,000 names and an ,11n age re..,pon'>l' rate of ' lh, an effect would have to change the resp<lnse hy about 20(lh
( Imm 1.00 t<l l .2o ) to h,ne ,1 50:50 chance ofheing found signif1Cant. Confirm
till', -,tatemcnt hy app h ing the .1ppropriate sample size calculati dns. Use computer

software such as Minitab or JMP. What magnitude of change could you detect if
you v.antcd to he 80'" confident'
(b) Ignoring the three customer segments, discuss the advantages/disadvantages of the

16-run 2 11

,,

and 32 run 2 11

fractional factorial designs. Discuss achievable reso -

lutions ,rnd the implied confounding patterns. l.1se design software such as
or JM!' if,n"1ilahle.

~l1nit.1h

( t) [:xplore the differenc<'s among the three customer segments. Can you conclude

whether or not the effects of the 13 studied factors depend on the customer
segment'
(d )

\\'hiLh design ( ,rnd 1, hich run si1cJ would give you unconfoundcd estimate-, of
all 2-foctor intcracti< ns among the 13 factors? ls there a smaller design you
wuld use 1f you wcrl only interested in one specific interaction (say, the Kl
111ter,1ction )'

(c ) Obtain the st<llld<ird errors of the estimated effects, using the approach described

in

~eL!JOll

4.6.4.

(e I) Note that the sample si1cs are not the same. :-\ncrtheless, -,up pose that the\
were ai.d assume that the N

34,060 names were divided equally among the

32 cells.
(e2 ) Use the variance var(p,) = p,(l - p,)ln, in deriving the variance of the estimated effects. Use the fact that an estimated effect 1s the difference of average
proportions (of size 16) at the low and high settings.
(e3) Compare your results in (cl) and (e2) with thosl given in the case.
(f)

rhe varianct: of the observed proportion, var(p,)


rr, I
rr) n,, depends on the
true proportion and the sample size, which both vary; _ross the design runs. This
violates the assumption that responses arc equally reliable, a fact that is needed for
the equal weighting of the proportion., when <..akuL1ti11s c-,ti111,1tcd effect'>
Logisti<.. regression provides the appropriate appro<i(h of .111ah-1ing catcgoric<d
response data such as the numhcr of succc>scs among a gi\cn number of trials.
Abraham and I.cdolter (20ll6, ( ' h.ip. 11) di'><..Lls>e'> thi-, 1pprn.1L h 111 lkt,111 \lu-,l
statistical software packages sud1 <Is Min1tab include rout111es fur logisli<.. rcgres-,ion. Use log1st1L regression to replicate the resulb given in th1-, Lase.

Exercise 8

Consider Case 7 [Peak Elcctron1cs: lhe Broken lent Problem (B)J (mm the

case -.t udy appendix


(a) Analyze the results. Estimate the effects, and obtain lhL'lr signifIL,111ce by com par
111g the csti111,Ited cfk<..ts V\ 1th then -,t,rndard nrnr.
tb

Whid1 are the -,igniliLant efteLts' \\hat

Me

the be.-,t '>L'lt111g> 1 Ilo\\'11lll did luu 11,1

genl!nc prcdill which variables would be -,1gn1ticanl1 Whal 1-, the regression pred1<..t1on equation fur the number l>f broken te11h 011 Lill' p.111cl~ ht1111ate l11 '1011
llluch the number of broken tcnh 11ould he reduLcd h1 using the best -,ett1ng-. .is
mm pared lo the current settings.
Exercise 9 Discu'>s applications of factorial and fractional l"Llori.il designs to que-,tiom
that arise in your field ot study (marketlllg, operations man.1ge111ent, 111,1nage111enl 111tor
malion systems, economics, engineering, ell). Discuss the fa._tors, the re-,ponse, and the
process that you would follow to conduct such experiments.
'-.earch the literature in your field ofstud1 and find applications where these design melh
ods have been used.
Exercise 10 Paper helicopter experiment. !'his experimen was first recommended by
Ceorgc Box. A discussion of this expenmcnt dnd rnmtrulli 11 guidel111e' for pape1 heli
cupter-, are given 111 ledolter and '-,\1ersc~ ( 19971. Also, ther<.. ,ire nu111crnus rcil'rl'l!Les to
this experiment on rhc V\'eb.
Construct a par'l r helicopter by varying the length and the Nidth of the bldde!>. Vary the
weight of the helico;.llcr by using d1ffcn.:nt paper stu<..k and /or ,,Jding p.ipcr Lli[" to the hcl
icopler. Drop the helicopter from a high location (say, J 2 feel), and determine the flying

]\\'(\ 11 \'I I

11<\C 111,\J

IM'TORIAI

r>ISJc,-..;s

I 17

time. !he nhjccti\L' 1s to ma imi1e the ll~mg time. C:arry nut the experiment, prcft:rahly
with repliLations. \11,dy1c thL d;1t;1 .ind disc ms your findings.
Exercise 11 Cnmider the 8 run 2' foctonal de-,1gn. Suf1pose that you conduct the 8 runs
111 !-.ta1Hlard order. I lmwvcr, 11 t~1rns out that ()nly 4 rum can be conducted on a single day.
You ;ire concerned that the experimental conditions change from day to day, and that the
mean level!> of runs ,,1rricd out on different J,1vs arc not the same.
(a)

Would tlfr,

I.id

affect :he estimates that vou obtain from your

e~pcriment?

!or ex

ample, would it ,1ffect the main effect of factor I, factor 2, factor Y


(bJ rJi1nk .ibout ,lllllther 1llll order thdt ll<lUid llPt ,1ffect the Ill.Jill d'cdS.
Flint: \!\'hat about running the four experiments with 123
on day I, and
the lour
Exercise 12
beans:

l'XJ)t'J

llllL'lll.'> \\'ith 123

+on dav 2' \\'hv would this he a better str,1tegv'

Investigate the effects of the following 5 factors on the expansion of pinto

.\011kingfl11id: water (
'111/1111tv: no s.ilt (

) or beer ( +)

) or ,,1Jt ( + )

Aridrty: no vinegar ( ) or vinegar ( +)


\011ki11g tcmpcmturc refrigerator temperature (
\011k111g time: 2 hours (

) or room temperature ( +)

, or 6 hours ( +)

C1rn out the following experiment. Select a pinto bean, measure its "si;e," put the bean
into .1 '>oaking fluid, .ind .1ftcr ,1 certain amount ofel<Jpsed time-measure its si7L' again.
l 'l' l11e t.ihlc,poon' nl so,1klilg lluid to '>oak each be;rn. ,\fake sure that the liquid co1 er'> the
11l'dll. l '>L' rcgul.11 hL'l'I 11l'L.lll'>L' light beer might ,llt like 1v,ltcr. l or '>dlt, ,1dd 1/4 te,1spon11 to
till' ,o,1ki11g fluid. hn 1111egM, .1dd I tea'>poon to the '>oak111g flu1d.
(,1) I lI'>lU'>'> h<l\\ 1nu mc.io;ure the "site ol,1 pinto bean" and 1ls "ex11a11s1011." (;ill' a detailed description ol v;iur measurement procedure, so that it can he carried out by
other peopk; that i'>, gin' an operational definition.
(b) J)cs1g11, set up, .rnd eXL'cute a 25 1fractional factorial experimen'. Conduct two
replication,, which n;ay be run concurrently. Analyze the effects ,if the 5 factor,.
\\'rite a short report t'J<lt summari1es your findings. Support your findings with appropriate graphs ,rnd -.:alculations. Wh.1t have you learned? What was the most diffl
cult part of,.our experiment' If you had to do it over again, what would you change'

Note: In cdrrying .nit this experiment, you need ( J 6)(2) - 32 small paper con
taincrs for soaking tre beans.
Exercise 13

hbl, Kes,, and Pukelsheim ( 1992) discussed how the response variable, paint

rn<1t thickness, depends on a ~et of 6 input factors. Their objective was to find factor sett in gs
that achieve .1 desired tirget value for paint coat thickness of0.8 mm.

148

TWU-I l'VFJ

PRACI JUN AL rAl 'llJl<JAL 111:s1c:No

They consider the following 6 input factors A through F (listed here in decreasing order
of their assumed importance): belt speed, tube width, pump pressure, paint viscosity, tube
height, and heating temperature. All factors could be varied continuously. Level 0 stands for

+3

the standard operating condition. All factors were scaled so that levels between -3 and
were technically feasible, without increasing cost.
(a) The first experiment varied the factor levels between

l ~rnd +-1; it wao expected

that this experiment could detect the linear effect of the'e changes. The table given
below lists the observed paint thickness (in mm) for a 2-lcvel fractional factorial
experiment with four replications at each factor-level combination. The order or
the 32 experiments was fully randomized.
(al) Show that this design is a 2 6

fractional factorial design. Discuss the con-

founcitng patterns.

Hirit: Notice that factors A, B, and D form a fc II 2' factorial design. Write
out the calculation rnlumm, ,ind discu.-,s how th,.:_ levcb ul the rclllaining
factors C, t', and F were selected. You will !ind t I .it C - !llJ, I:

i\lJ, and

F= AB.
(a2) Calculate the averages Crum the replications, and analyze the averages. hnd
the important effects, calculate their standard errors, and interpret your
findings.
A

Ii

('

[)

I b1cknc" ul t'.11111 (.udl (llllll)


I (,)
l.L''
J

.~J

U.~b

- I

I 49
I 12
I (1'>
1.29
l ..SI

1.41
U.7 ..!

0.98

2V
I.' l

2.17
I 46

1.4~

U.81

1.71
1.04
I .S'i
11.7'!
2.36
1.42

I .SY
U.88

1.76
Ul
I .4ll
U.8J

2.12
1.40

(b) A follow-up experiment focused on the first 4 factors (factors A through D). The
results are given below. The levels of the factors were changed because of the findings in the initial experiment. When analyzing the data, you can transform the fac-

+ J; the -1 in
-1.5 on the original scale; the + 1 in

tor levels into -1 and

your new codint, of factor A corresponds to

the new coding l urresponds to 0.5 on the


original scale. You can do the same for the other facto1!',,
(bl) Show that this design is a 2 4 1 fractional factorial. Determine the generator of
the design and the confounding patterno. Can yc_1 think oCanother, and better, 24 1 fractional factorial design?
(b2) Calculate the averages from the replications, and analyze the averages. rind
the important effects, caku\atc th.cir standard error~, and interpret your
findings.

I WO

I I VI I

/)

lHM I ICl'IAI

()

I >l
I 71
I.I'>
I 71
ll.YI

()

()

DI

<,J(;N~~

_ __

I h1cknc'"
l.'>I
O.hI
I ,.J

II

lACTORIAI

()
()

I IK
{) 78
1.<JH
1.06
l.Hl
1.2'!
I hi
1..lll

(L) Another follow up experiment was conducted with just the first 3 factors. Tlw re-

'tilh from thi' 2 foL!nri.il cxpL'rimcnt arc given below.

I II

I II
I
I.II

I 'i
1.0
l.'i

< .akula!c

;~

0.;

()

11. 'i I

n.hh

0.h2

II "I

ll.h'l

II l'l

o. ')

0. iH
0.M
I IH
tl.74

0.; l,
11.-9

11.78

thL ,1\'cr,1gL':, from the repl1citiorn,, and analy1c the ,l\'erages. find the
portant cffech, calcul,1 te their standard errors, and interpret your findings.

1111

(dl Summari;c \'Our nncii:1gs from all three experiments. Can you find factor settings
that achieve the desired target value for pamt coat thickness (0.8 mm)?

TAB I I

l'ln ckctt

H1111111m

6. I

/)cs1sn (or 11p to 11 Factors in N

12 Run s

!'A CTOR

Run . .

10

II

+
\

I
/1

8
9

10

II
12

6.2

THE 12-RUN PI ACKETT-RURMAN DESIGN

J.ihk h. I shows thl' design 111.itri\ for the 12 run l'l,1ckctt Burman desi gn. The row' in th1.,
m.itrl\ rq'rL''-l'nt tiH' l'llll'- ( .\'
12 ) ,rnd the columns represent 11p to 11 factor'>. ,.\ , in all
l'i.1Lkctt Burman dc-,1gm, th : L'nt1re dc:-.1gn m.1tnx 1s comtructed from 2 11 initial row ol plm
,rnd m1nu., sign-. th.it I', given in Appendix 6.1. The J,1st entry in row I ( ) 1s pl,1ced 1n the
first position of rm' 2. !'he other entries in row I fill in the remainder of row 2, by each movlllg one pos1t1on to the right. I he third row 1s generated from the second row using thL' -.ame
method, .rnd the process continues until the next to the last row is filled in. A row of all signs is then added to comp etc the design.
The design is orthogonal, .in cc for any two factors (columns ) the number of runs .it each
of the four factor -lc1 cl u>mhinations (
), ( + ), ( + ), ( +- r) is the same (3 run'> ). Because the design is orthogonal, each of the 11 linear contrasts is independently estimated.
We obtain each l"it1n1atc 1n the usual way, by taking the average of the responses when the
column entry 1s at the plus sign (y.) minus the average of the responses when the column
entry is at the minus sign ( y ).

Cnnfounding in Plackell-Rurman Designs


Plackett -Burman designs arc resolution Ill. Main effects arc confounded with 2 1;1dor
1ntcrac t ion-,, but the nat urc oft he confouncli ng is di ffcrent from fractional factorial dc-.igns.
(Appendix 6.2 give'> .1 general discussion of confounding in Plackett-Burman desigm. ) As
we have seen, in fractional factorial designs two effects arc confounded if for each run the
signs 111 each effect column a e either the same or opposite. For confounded effects, the correlation between the signs of the two columns is always perfect ( ::':' I ).
Jn Plackett-Burman designs , a main effect and its confounded 2-foctor interaction rnl urnn cont.iins signs that arc l<)rrelatcd, but not perfectly. ror example, c:rnsider the m.iin ef12 run design. We US. + I and I to repfeet of factor I <1nd the 26 intn.iction for the N
rc-,ent the uilumn " 1' n' , ,rnd 'NL' multipl) the cntrie-. in columns 2 and h to obtain the L'ntries

1s2

PLACKETT - BURMAN DESlliNS

in column 26. Writing each column as a row to save space, rnd listing the run

11 umbers

above the entries, we have


Run

+1

- l

Column I:

+I

- l

Column 26:

+1

+l

- I

+I

I J

+ J

+1

10

11

12

+I
fl

+I

Both columns have the same number of plus and minus signs, and they <1dd up to zero. Furthermore, the sum of the squares of the entries in each column is 12, the number of runs N.
The columns are correlated. In 8 of the 12 runs, the column signs match; whereas for 4 runs,
they arc opposite. The correlation between these two mean Lern columns (let us call them x
and z) is given by

~x,z,

8 - 4

12

This correlation of 1/3 indicates that there is some linkage lJLtween the signs of these two
col u1\111s, but the correlation is fairly weak.
Jn fractional factorial designs, if two effects arc confounded, the correlation between column entries is either p = + l or p - - I. ln Plackett -l3urni_m dL'Si~11s, if two effeLh are
confounded, the absolute value of the correlation between (olumn entries is strictly less
than I. One says that the effects are purliu/Ly cunf(ninded or pcutiu/Ly aliusl'd.
Calculating the correlation between each main effect colu11111 and L'<Jch 2 !actor interaL tion column, we find that for each factor ib 111c1i11 effect is ;)artic1lly cunlounded with <ill
2-factor interactions not involving that factor. In addition, we t1nd that for all 1l factors, the
correlation between column signs for a factor's main effel.t and each of ib confounded interactions is either +I /3 or -1 /3.
The correlation between each main effect column and eacl1 2 facto1 interaction colunrn
that includes the main effect factor is Lero. You can sec this, for example , by looking ,11 the
column for factor land the interaction column 12 that are listed below.
Run
Column l:

+ I

Column 12:

+J

- l

+ I

- I

+ I

- 1

+ I

+ I

+ J

+ J

rI

10

11

- l

+ l

12

-t I

The column signs match in half of the runs, and the correlation bcrwcen these two columm
is given by

6
12

- 0

Table 6.2 lists all confounding (correlation) coefficients among main effects and 2-factor interactions in the N = 12 Plackett-Burman design with 8 factor~ .. The first eight columns in
Table 6.1 are used for the levels of the 8 factors, but any other set of eight columns could
have been taken. In showing the confounding coefficients, Wt' limit ourselves to 8 factors,
because this makes it easier to display the confounding coeflicients in a compact table.

154

PLACKETT - BURMAN DESIGNS

Consider the linear contrast for column l, 11

y, - y , which compares the average re-

sponses at the plus level of column l (runs l, 3, 7, 8, 9, 11) witn the average responses at the
min us level of column I (runs 2, 4, 5, 6, I 0, J2 ). The first rmv of Table 6.2 shows that it 1s
an estimate given by

+ - (26 + 34 + 37 + 45 + 46 + 68 + 78)

11 ~ l

l
_)

(23

+ 24 + 25 + 27 + 28 + 35 + 36 + 38 + 47

In general, the contrast (JI + -

48

+ 56 +

y ) associated with each design column

57

+ 58 + 67)

is an estimate of the

main effect plus the wf'ighted sum of the 2- factor interactions that are rn11founded with that
main effect (we are ignoring 3-factor and higher-order interactions). The weight applied to
each 2- factor interaction is the correlation (shown in Table 6.?l betwl'en the entries in the
main effect column anu the entries in that interaction col un .n. (In fractional factorial dL signs, the weights applieu to each cunfoundeu 2- factur intera< ', ion are also the corrl'lat1rn1s,
which are either +-

or -

J .)

I 111plicutivns vf Plackett-Burman Confounding


Now let us return tu considering the 12 - run design for 11 f.1Lto1s. 1\lthough each main
effect

1s

confounded with many 2-foctor interactions, in most pplical!ons, we would expect

very few 2- factor mteractions to be important (effect spa1sity ) and those that a1e to be
smalln in magnitude compared to the main l'ffects ( hierarchical ordering) . ln many situa tions , especially in the early stages of an experimental inwstigation, it is Jppropriate to ignore 2- factor interactions and assume that each of the 11 estimated effects represents a main
effect. This is reasonable as long as the magnitudes of 2-factor interactions arc relatively
small compared to main effects. But what if this is not the case?
Suppose we analyze the results of the experiment under thL assumption that each of the

11 estimates represents a main effect and a single 2-factor inll'radiu11 ( unknown to us ) is


fairly large. What efft'ct will this have on our main effect estimates' Consider 11 again . There
are 45 two - factor interactions partially confounded with the main elfrct of factor I. Su
post' that just one of these (say, 37) is large, while the rest are 1'egligible (zero ). Then I 1 esti mates I + 1/3(37). lf we always take 11 to be an estimate of l ( the main effect of l ), then our
estimate of l will be biased by an amount equal to one third of the magnitude of the 37 in teraction. If the 3 7 interaction is positive, our estimate of l will be too large, while if the 3 7
interaction is negative our estimate of I will be tuu small. A.1 J rl:'sult, Wl:' mJy 111ist.1kL'!1ly be
lieve that the main effect of l is significant when it is not, or tlut it is not significant when it
actually is. And because the 37 interaction is confounded with 8 other main effects (all but
the main effects of factors 3 and 7), its presence will bias all of those main effect estimates as
well. The presence of a second 2-factor interaction (e.g., 24) would increase or decrease the
main effect bias depending on its sign.
In contrast, with a resolution Ill fractional factorial desig1, each 2- foctor interaction is
confounded with a single main effect . [The fact that a 2-factur interaction cannot be con-

-~J

___
l'_l_A< 1'f I l

BURMAN

[l~Sf(,NS

I"

founded with two different p1;i1n effects is easy to show. Assume that a 2-factor inter.1ction
(sav, 12) were confounded \\ith two main effects (say, .land 4). Then the defining relation
would contain the \\'ords 12.1, 124, and ( 123 )( 124 l
34, making the fractional factori<il a
re,olut10\l rr ck,ign.1 But Ill this cise, the hl.l.'> Ill the ma111 effcct estimate \\'ill equal the full
magnitude of the intvraLtion not just" fr,iLt1on as 1n the Plackett-Burman design.
lkL.iu'e of the l'\IL'il'l\L' j'.Ht1al u1nfound1ng 1n PLiLkett Burman designs, the opt1c1n of
.1 cc 11111,ktc lo Ide'' l'I c.i 1l he <''l'ec 1.dh 11,L'fu I 1\dd 1ng 12 111ore ru 11s to the l'\ ist ing design h:'>Wlll~ hi ng the sigm 1n all f.ic or u1lumn .., will result 1n a 24-run resolu 1on I\ design ,,1th
main eflcch no longer u1nfoundcd with 2 fallor interactions.
C1vcn the rnmpkx urnfounding patterns of resolution III Plackett Burman dc'1gm, it
mav '>eem at 111-.,t glamc th.it they would not provide anv useful inform.111011 about 2 l,iLtor
intcraLtiom. But ,is \\c 'hcl\\ 111 the following ex,1mple, under some circu 11stances it 111.iy he
pos,ihle to estimate one or more 2-factor intcraLtions from the results of J Plackett-Burman
dc,ign.

6.3

PLACKETT-BURM.\ N DESIGN FOR N

= 20 RUNS:

A DIRECT MAIL CREDIT CARD CAMPAIGN

A leading I ortune 500 financial products and services firm uses direct mail to read1 new
customers. I J-or a detailed d '>CUSsion, sec Case 9 (Experimental Design on the 1-ront r ines
nf ;>,Jarket1ng: Testing :'\c1' Ideas to Increase Direct ,\lad Sales) in the case study appendix.I
l\ut .is u>111pet1t1on increased over the rears, response to the hrm's ,,ftns had declined
slL',idil). In .in L'ffort lo re,l'r'L' this trend, the co111p.rn: hired a consultant to help \\'Ith the
pl.nrning .ind cxeuilion ofa large mailing nfa credit card offer.
The marketing team identified 19 factors to test (T,ihlc 6.3), and the consultant spe.Lificd
the 20-run PlaLkett-1\urman design shown in .!able 6.4. Herc, as in the 12-run design, the
entire de,ign m.1tri\ i' generated from the fir<;t row listed in Appendix 6. I. The procedure
is exactly the same ,1-, before. The last rntry in row I (
) is placed in the first position of
rel\\ 2. The other ent1ie-, in rel\\ I fill in the remainder of row 2, by each moving one posi
t1on to the right. I hi' prmns u>ntinue., until the next to the last row!'> filled 111. 1\ row nf
,di n11n11 .., -,1g11s 1s then .idded to c..omplete the design.
I alto rs A f were .ipproa, hes aimed at getting more people to look ;nsidc the ell\ clo11e,
11hilc the rcm.1in111g l.iL!nrs 1cl.1ted to the offer inside. l".illor (, ('tickc I refer' to till J1L'L'I
ofl st1Lkn .1t the top ofthL' lcttL'r to he applied hr the cu'1omcr to the orcin form. The l11m'.s
m.1rket1ng st.1ff believed that a '>ticker increases involvement and is likely to incre.i'>L' the
number of orders. I actor.\'/ '1roduct selection) refer, to the number of different credit L.ird
images th.it a customer cou d chose fro111, and the term "huckslip" (factor<> Q nnd Nl describes a small separate sheL'i nf paper th.it highlights product informat1011.
A total of I 00,000 peoplL randomly Lirnsen from a list of potential customers, p.irticipated in the experiment. Each of the 20 runs in Table 6.4 describes a test package th;it was
sent to .'i,000 people. The 1-c-sponse 1ari.ihle is the fraction of people who respond to the
credit card offer.

156

PLACKETT-BURMAN DESIGNS

6,3

TABLJ:,

The 19 Factors and Their Levels

B
('

[)

L:
I
G
II

Factor

( - ) Control

(+)New Idea

Envelope teaser
Return address
"Offinal" ink-stamp on envelope
Postage
Additional graphic on envelope
Pnce graphic Oil letter
Sticker
Personalize letter copy
C:opy message
Letter headline
I .ist o 1 benefits
Postscript oil letter

General offer
Blind
Yes
Preprinted
Yes

1-'roduct-specil1c uffer
Add company name

,\-/

Signature

Product selection
Value of free gift
!Zeplv envelope
lnlurn1atio11 on buckslip
Second lrnckslip
1n terest rate

u
jJ

CJ
/(
,)

:\o

Stamp
Nu
l.arge
No
Yes

Small

Yes
No
Targeted
Headline I
~ t,1 JHIJ rd IJ:)'U u t

CeneriL

Headline 2
CreatJve !.1yout

Control \er~1un

New po-;tscnpt

Managc1
Mauy

~l'llJOr l'Xl'CUllVt'

re-... .

High

Luv.

(,u11lrol

Nl'V\ '>1Vk

Product ink
No

J.ree l''ft
Yes
1 ligh

Low

TABLF

11110

6.4

The 20-Nun P/ackeu-13urmun 1Jes1g11 and the Nesllli> uf the Experiment


-

-- --FAl I UR

----

+
+
t+

+
+

+
+

'

L:

+
+
+
+

+
+

+
+

Responses

----

+
+

}J

Ii

+
t

+
t-

134
104

-+

60
+

+
+

+
+

+
+

+
+

t
-I

-I

122

57
30

8f>

I 14
ll.bU
:!, 16
0.78
0.80
0.98
U.74
L98
1.72

4.\

(),86

47

0.\14

104

2.08

49

37

99

+
+

2,08
1.20

Ub

j')

0.84
2.68

61

40

L04
0.76

bK
108

+
+

1urs,000J l\Jte

52
38
42

+
+
+

+
+

-t

The resulting response rates are shown in Table 6.4, The estimated cffech, which are dit1Crc11ces between average responses at the plus and minus levels of the factor columns, arc
listed in Table 6.5. A Pareto chart, where estimated effects are ordered according tu their ab
solute magnitudes, is shown in Figu1e 6. I. Signillca11ce of each effect i.'> detnmined by cum
paring the estimate with (twice) its standard error, In Chapter 4, Section 4.6.4, we showed
how to calculate the standard error ufan estimated effect when the rcspunse is a proportion.

158

PLACKl'T'I-llllRMAN llf-''1t;Ns

- ~ - - - --

--

- - --

[11 this case, I ,298 people (or I .298%1) placed

-- - -

--

an order. Using this estimated proportion, the

standard error of each estimated effect is

standard errur( effect)

14(0.0129s)(o.98702J
100

100,000

0 .0 7 17 ( percent )

6.3.1 Main Effects Interpretation of the Results


/\II Plackett-Burman designs have complex confounding patterns, with each main effect
partially confounded with many 2-factor interactions. We will discuss the confounding pattern (or the 20-run design in the next section. In interpreting the results of this experiment.
we initially ignore these possible 2-foctor interactions and assurm: that each estimate is at
tributable to the main effect.
J'he following 5 factors had a significant efll:d on the rL'sptln'>L' rail' :

S : Luw interest rate. Increasing the credit card interest rate reduces the response by
0.864 percentage points. ln addition, it was very clc<.1r based un the firm 's financial
models that the gain from the higher rate would be much less th,rn the loss due to
the decrease in the number of customers.

G- : Sticker. The sticker ( G - ) increases the response by 0.556 percentage points,


resulting in

gain much greater than the cost of the sticker.

N. : Nu second bucks/ip. A main effrct interpretation shows that adding another buck slip reduces the number of buyers by 0.304 percent<.1ge points. One explanation offered for this surprising result was that the buckslip added unnecessary inCorr11ation
and obscured the simple "buy now" offer. A more compelling explanation that we
discuss in the next section is that the significant effect i' not the result of the main
effect of factor R., but is due to an interaction between 1 wo uthn factors.

/+ :Generic copy message. The targeted message(/ - ) emphasi1ed that a person could
chose a ued it card design that reflected his ur her interc,ts , whill' the generic mes sage (I +) focused on the value of the offer. The u-eati vc team was certain that appealing to a person's interest> would increase the rcsporl'>l' , hut they wen wrung.
!'he generic message increased the response by 0.296 percentage point.

: l.etter headline #1. fhe result showed that all "good" headlines were not equal. The
best wording increased the response by 0.192 percrnt<.1ge point.
6.3.2 Evidence of a 2-Factor Interaction Between

S (Interest Rate) and G (Sticker)


The main effect of each factor in the 19-factor, 20-run Pbickett-Burman design is con founded with all 2- foctor interactions not involving that factor, a total of 153 two -factor in teractions ( 181/2! 16!). l.ly enumerctting all correlations amui g design and intcr,rlliun col umns, we find th<.1t of the 153 interctctiom confounded wit,11 each main effect , I 14 have
correlations - 0.2 or + 0.2, and 9 have correlations

0.6.

Each 2- factor interaction appears in the confounding pattern of 17 main effects. For 16
of these main effects, the correlation with this interaction is - 0.2 or +0.2, whereas for a

l'I AC Kl f'l

s111gil' main cfkll, thl' u1rrl'l d: ion with tlli<, interaction is

ll\IHMA~

IHSl(,~S

0.6. This is important hcLause it

implies that a large 2 factor interaction will create a large hias ( 0.6 times the value of the
interaction ) in the estimate <1lone partiutl<H main effect.
Factors S (interest rate) 01d G (presence of a sticker) are by for the largest effclt.'> in
Table 6.5. The correlation bl'tween the main effect of R (second buckslip), and the SG
interaLtion i'> 0.6. You crn .-heck this by Lalculating the correlation between the column
entries, as we did in the previous section . 1lcnce, a significant SC interaction would hias the
estim.lte of the main cffclt of/? hv 0.6 times the value of the interaction. This suggests that
it may not he the m.un effect of foctor R that is important, but the 2-factor interaction between S and (,'.This interpr.-tatinn is supported hy the principle of effect heredity .1s the
111.iin cffecto; of~ .rnd (,arc th e most import.lilt factors. As one might expect, at the high interest rate the effect of having a sticker is small (a change from 0.776<Y,, t1J 0.956% is implied
by the re'>uih Ill r.1hlc (1.4 ), hut .it the In\\' interest rate, the effect of having the stiLkt'I is
much J.1rgcr (a Lh.1ngc from 1.2(1'1 % to 2.024""0). J'he sticker is most effective when the rns tomer reLe1ves a more .1t t raL 11ve offer.

6.3.3 Further Analysis of the Credit Card Experiment


The confounding of main effects and interactions introduces some uncertainty into our
interpret.1tion nf thl' re-,ulh. ()f <.nurse, .i straightfor\\'ard approach for obt<1ining unuln founded 11l<lln cflt. c.t'> i-. a foldo1cr of thl' original Plackett-Burman de-,ign. The combina tion elf a l'la l kl'tt J\urm.rn d,sign .md its LOmplcte fnldnver crcatl'o, a rL'snlution l\' (ks1gn
\\hL'rL main , ffrlh .1rc. not u1nfo11ndcd with 2 f:1lt(lr interactions.()( course, 2-f,ic.tor in
tcraLtions arc sttll urntounc."l'd 1n a r<1ther complicated fashion. In th( - redit card c\pcri
ment , .1 foldmn l\<ls not car ried out (with 40 runs, 1t would have greatlv increased the operational complex1t; of the n1aihng ), and hence we Lan not be certain which cnmbin,Jt1ons
ol 111<11n effcos and 111tcract:nns ,1rc responsible for the significant estimates in Table 6.5.
l'lalkett Burman de-,ign'> have traditinn<Jllv been used as main-effects designs, and gen erally they should he arn1dcd 11 the experimenter is concerned about the presence of foirly
l.1rge 111teractiom. ~ c verthelcss, these dc'>igns allow the investigator to identify, in Ct'rt,iin
circumstances, 1,elcc.ted 2 factor interactions. Box ,111d Tyssedal ( 1996 ) showed that .l l'l.1Lk
ett-Burman design produce~, for any 3 factors a complete factorial arrangement, with some
combinations replicated. We discuss this concept, called projectivity, in more detail in
Appendix 6.2.
We use this projectivity id ea to provide more evidence that the apparent main effect of/?
heurnd buckslip ) is Jctuall; a consequence of the bias created by the SC interaction. Consider the 3 foLtnrs ,), C, and /?. Of the 20 runs in Table 6.4, there is at lea'>! one run .1t each
oft he eight factor-level combinations of these 3 factors. In specifying each combination, we
let the first sign indicate thl' level of), the second sign represent thL' level of c;, and the I.1st
o,1gn represent the level of}( There arc 4 runs at each of the four combinations (
),
1 ,.. ) , ( .J. 1
) . -+
), and one run at each of the remaining fou combinations. BcL.luse we have at il'a.'>t one rc'>ponse at each combination, we have .1 full factorial ,irr,111ge 111ent in liictor'> S, <;,and/? ( ;~noring the other factors). Because the nu1;1hcr of rum dt e.ic.h
L<lmh1n<lt1on is not the sam , , we must use regression to estimate the t'tfccts. Doing so, we

lbU

f'I A< Kl I I

BURMAN

lll ~ICN'

TABLE 6.6
Three Regression Models Relating 1he Response Rate to Factors S (Interest lfote ),
G (Sticker), R (Second Buckslzp ), I (Copy M.:::sage), and
j (Letter Headline) (Minitab Outpu )
(A) REGRESSION OF RESPONSE RATE ON S, G, II., AND THEIR INTlRACTIONS

The regression equation is


Rate = 1.33 - 0.386 S - 0.320 G - 0.0613 R + 0.151 SG
- 0.0700 SR + 0.0762 GR + 0.0450 SGR
Predictor
Constant

G
R

SG
SR
GR
SGR

Coef
1.32500
-0.38625
-0.32000
-0.06125
0.15125
-0.07000
0.07625
0.04500

S = 0.236185 R-Sq

T
SE Coef
0.06602 20.07
0.06602 -5.85
0.06602 -4.85
0.06602 -0.93
2.29
0.06602
0.06602 -1. 06
1.16
0.06602
0.68
0.06602

90.2% R- Sq(adj)

0.000
0.000
0.000
0. 372

0.041
0. 310
0. 271

0.508
=

84.6%

(B) REGRESSION Of' RESPONSE RATE ON S, (;,AND)(,'

The regression equation is


Rate
1.30 - 0.432 S - 0.278 G + 0.188 SG
Predictor
Constant

G
SG

s
(c)

Coef
1.29800
'- 0.43200
-0. 27800
0.18800

0.234585 R-Sq

SE Coef
T
0.05245 24.75
0.05245 -8.24
0.05245 -5.30
0.05245
3.58
87.2% R-Sq(adj)

"c,,

RL(,!UoSSION ()!' i(l'SPONSF l(Jl 'I !-. ON

The regression equation is


Rate
1.30 - 0.432 S
0.278 G
Predictor
Constant

G
SG
I

Coef
1. 29800
-0.43200
-0. 27800
0.15130
0 .11774
-0.06574

0' 197073

R-Sq

SE Coef
0.04407
0.04407
0.04407
0.04594
0.04501
0.04501
92.1%

0.000
0.000
0.000
0.002
84.8%
I,/, ANlJ _)(,

0.151 SG

29.46
-9.80
-6.31
3.29
2 '62
-1. 46

0.000
0.000
0.000
0.005
0.020
0.166

R-Sq(adj)

0.118 I

0.0657 J

89.3%

find that the three significant effects are S, c;, and SG, confirming that it is the SG interaction, not the main effect ofR, that is significant.
Table 6.6(a) shows the Minitab output when regressing the response rate on the main
and i nleractions effects of the three factors S, G, and R. The standard errors of the estimated
reg1ession coefficients use the pooled variance from the eight factor -level cumbinations,
assuming that the other factors have no effect on the response. The 1-ratim and the proba bility values of the regression coefficients listed in this table indicate that S, G, and

sc;

are

significant, while all other effects (including the main effect uf !actor J<.) a1-e insignilicant.
Table 6.6(b) lists the results of the regression on the significant effects S, C, and SC. The re-

gression explains 87 .2% of the variability in the resronse rate.

r
l'I AC Kl I I

llURMA'< flf Sf(,:-JS

l~urman

( "hrng ( 1995) shmH"d th.it 111 the 20-run PL1Lkett

161

design, for any 4 factor-., esti-

m.lle'> oft he four m .1111 cflect' .rnd I he '>ix 2 fact or 111leracl1ons involving these four fact or-. can
he nht,uned when thLir h1ghe : order (3 - .rnd

-1

factor interactions) .ire a'isumed to he negli

g1hk. J J,mng eliminated factor I?, we appi)' Chrng's finding and consider ,1 model th,11 111
elude-. the four factor.-. that were significant in our initial main effects analysis: S, C:, I, ,1nd /,
together with their six 2-factci interactions. The result of this regression shows that all 2-factor 1nteractions except sc; <l!"L mignificant, leading to a model with the follr main effects and
the S(, interaction. !he titting results for the model with S, G, SG, and tht wo main efkct-. of
I and fa re shown in Table 6.6( c ). The five effect.;; explain 92. l % of the vari .l!ion, a rather modest improvement O\'Cr the 87. ;cy,, that is explained by S, c;, and SG. It is cleu that factors S (intere'>l r.ite ) and (; ( -.t iL kcr ) .in d their 1n t er act ion ')(;a re the main driver'> oft he response r,lle.

6.4 NOBODY ASKED US, BUT . . .


In a Plackett Burm.in design, e,1ch of the calculated effects estimates a main effect plus a
weighted sum of the 2-focto interactions that arc confounded with that main effect, with
the 11eight (coefficient ) of e.ich 2 factor intera<.:tion being the correlation between it-. column '>igns and the column sigm of the main effect. It is not too difficult to sec why this is
'>O. In Section 6.2, we ..,howed the columns of signs of the main effect of 1 and the 26 interaction and found the correlation between them to be 1/3. For simplicity, assume that the
main effect of factor I 1s confounded onl)' with the 26 interaction. Looking once ag,iin at
those two columm of sigm, we c.:<111 write / 1 in the following way,

y ) for the 8 runs (2/3 of the total number of runs ) when


the s1g11' 111 the two Lnlumm .ire the s.1111c. l1 est1mate" f 4 26 (the 111a1n effect of I l'lus the
211 i11ter,1ction ). lhe scurnd 1r1111n bracket'> is (y.
)' ) for the remaining 4 runs I in nf
the tot.ii nurnhcr of runs ) 1.hen the sigm in the two columns arc orpositc. It esti111,1tes
~h (the m,11n L'fl'eLt of I !lilnu'> the 26 inlL'raction l. The linear contr,1st / 1combine'> these
two est 1mates;
J'he fir-,t term in brackets'' ( 1

11 ~

)
-

2h

+ [I
3

- 26 ]

+ 26
3

giving us an estimate of the rnain effect plus 1/3 of the 26 interaction.


Plackett and Burman cre:1ted their designs in response to the problem of reducing the
failure rate of an explosive w<'apon used hv the Britio;h in World War II. The key to the development of these designs was the fact that the weapon LOnsisted of 22 components. Jo improve it, the engineering apr'roach was to make a single change to each component and experiment to sec which changes reduced the weapon's failure rate. This involved IL'sting
22 factors, each at two levels \old version of the component or new version). A 32 -ru n fractional factorial design would have meant testing 32 different versions of the weapon in each
experiment. Using Plackett and Burman's 24-run design, the experiments and improvement of the we<1pon were c<Fried out more quickly,

great benefit during times of war.

164

PLACKETT BURMAN DESIGNS

columns x!'" This implies that the usual main effect estimate of factor i is an estimate of

f3, +

2: 2.: fl, (vi/3,,

(A64 )

J'- r

The confounding coefficient between the main effect of facto r i and the 2-Lrctor inter~rdiun
arnung factors f and r is given by Pr (p )' the correlation coefficient between the design vector
x , Jnd the interaction (calculatmn) column x,,.
Discussion
I. The result i1nplies that a main effect is unconfounded '-"1th ,rll interactions that con tain the main effect factor. The column of products of 1he elements in ,1 design col umn x, and an interaction (cakulatiun ) column x 11 con aini11g the main effect 1s
identical to the column x, (as the product ofa column with itself leads to a column
of ones). The sum of such a column is zero, implying fJi (irJ = 0.
2. A main effect in most Plackett-Burman designs is confounded with all other inter actions that do not contain the main effect as a factor. Depending on the run size of
the Plackett-Burman design and the particular main and interaction effect being
LOnsidered, this correlation can take on various value;, .
Consider theN = 12 Plackett-Burman design i11Table6.1, and the design column x (for
factor I) and the interaction column x23 (which one gets by row-wise multiplication of the
1

entries in x 2 and xi). The correlation between these

column ~

is - I13. The correlation be-

tween x 1 and x2 r, is + 1/3. Only main effects and interactions that contain the main effect are
uncorrelated. All other confounding coefficients are either - 113 or -t II 3.
for the N - 20-run Plackett -Burman design, the confounding coefficients are either
- 0.2, +0.2, or - 0.6. I-or example, the confounding coefficient betWL'E'll the main effect of
A and the BC interaction in Table 6.4 is 0.2; it is +0.2 for thl' main efkct uf A and the 13U
interaction, and - 0.6 for the main effect of}{ and the SG interaction. Only main effects and
interactions that contain the main effect are uncorrelated.
1-01 the N = 24 run Plackett-Burman design , the nonzero -:orrelations are either - I/_) or
+ 1/3. In contrast to the N = l 2- and N = 20-run designs, not every main elfect is contou nded
with interactions that do nut contain that main ertecl; some conelatiom are in fad 1ero .
Prujectivity Propaties
l,l<1Lkett -Burm<111 de,igns are useful in ,Lree11111g '>ituatiu1b whnl' thL ub1ecti1c is tu idLn
tify important factors for more detailed study. The principle uf "effl'ct sparsity " suggests
that , most likely, only a few factors amung a large pool of potl'nti<Ii faLturs arl' important.
When choosing a design for factor screening, it is important lo consider projections of the
design into small subsets of factors.

l~ox

cllld Tyssedal ( l 996 ) de tine a factorial design to be

ofprujectivity p if it produces, for any subset ufp factors, a cumplek factorial (possibly with
some combinations replicated).
Box and Tyssedal ( 1996) show that (most ) Plackett- Burman desigm attain projectivity 3.
This is true for the N = 12 and N -_ 20 designs considered in this chapter. Exceptions to

this rule, .!nd projcLtivity le,, than 3, arc the Plackett-Burman designs in N
,Y
'i6 rum.

40 and

Convince yourself of this fact by considering the first 3 factors in the 12-run PlackcttBurman design in Ta hie 6.1. Consider the eight factor-level combinations of these 3 factors,
and -;how that each one has at least one run; four of the eight factor-level combinatiom (the
ones at (
), (-+ -+
), I+
+ ), ( -+ +))have two runs. The general result hY Box
.ind rn-.ed,1! ( 19%) implies ,h,ll this projectivity holds for any s11hsct of 3 factors, not just
the lirst .\.
PlaLkctt Burm,111 dc-,1gns ire main effects dc'>igm, and they should be avoided ifwc ,ire
concerned about possibly large interactions. Nevertheless, the projcctivity of these de-.igns
allows the investigator to identify, in certain circumstances, selected 2-foctor intcrn(tinns.
Assuming that there ,ire no more than 3 active factors, one can estimate the main effeL 1' and
the interactiom ,1111ong thc'>c 3 factors. h1rthcrmnrc, while the Plackett-Burman design in
,\'
20 runs\\ ill not proicLt into full 2 1 foL!orials, Cheng ( 1995) shows that the proiL'L ti on
of th1-, design onto ,1111 I 1;1ctnr'> ha'> the propertv th.it all main effects, and 2-factor intnac
t1on-. ol these I faLtor'> L.lll he e-.t1mated when their higher order (ordn 3 ,rnd 4) interaL
t1on-. .ire .l'>Sllllled ncgligihil' I hi., 1s a rem;irkahle re'>ult, as it shows that one can csti111.1te
,di m,11n elfrLI'> .111d .:' f.iLtor 111cr.1L1ion> of4 f.iLlor-., .111d one can do so without spcLill'1ng
a priori which~ !.1L1ors ,HL' 1111portant.
In terms of their project1l'it1, l'lackctt Burman designs have an adv1ntage over resolution Ill fraLtion,d 1.!Llorial Lks1gns that have projeLtivity of only 2. ('or1,idcr, for ex,implc,
the.:>!,; ' 1 dc-,ign gc11cratcd h> ,l'>Sociating f.1ltor'> 5-15 with the intcr<1ctinn colum11' ofa
full f.Kton,1l 1n fa(tor'> I 4. J1 1s e;1sy to Lhed; that the 16 runs of factors I, 2, and 5
I 2 in
1 able "i.7 will not generate runs at all eight factor-level combinations in these three f.ictors.
Because of their complicated alias structures, experimenters have sometimes been 1:eluctant to use Plackett Burman de'>!gi1s for experimentation (sec Draper, 1985). However, the
interesting projective properties of Plackett-Burman designs provide a compelling rationale for their use.
EXERCISES

Exercise 1 Comidcr Case 8 (hpcrimcnts in Retail Operations: Design Issues and \pplication) from the L,i.,e .study appendix.
(aJ Jn th1.s LdSL'. we 'tudy JO factors through a 24-run design that comi.sts ofa 12 run
Plackett-Burman design and its fnldover. The 2 10 5 fractional factorial design in
32 rum would he dnother potential design. Is it possible to achieve a resolution I\'
design in .\2 runs? If si, discuss the generators and the confounding patterns
(assuming that interactions of order 3 or higher are negligible).
(b) Confirm the estimatul effects in the test result section.
(c) Obtain stand.ml errc1s of the estimated effects, using the followmg approaches:
(l I) l ' >L' the perLcllt ,hanges of week I and week 2 as indcpenocnt replications,

ulLllL1tL' ,1 \arianLe estimate at each foctor-level comhination, average the

I~

Pl A( Kl'T'I

llll!{MAS

DLStc;ss

24 variances, and substitute the pooled variance Lstlll1ate lllto the equation
for the standard error in Section 4.4. Check whe; 1er the oh-.crvat1011s for
weeks I and 2 are uncorrelated.
f/1111: Calculate the numbcr of runs for which week 2 h,1-. larger -.,de-. than
week I. Under the null hvpothesis, you expect 2412
12 run.-.. You can u-.e
the binomial distribution with N
24 and 7T
0.5 (expressing the fol.I th,ll
there 1s a 50: 50 chance of increasing sales) to as;css the probability value
of the sample result.
(c2) Determine the signifie<:nce of the estimated efkcs through normal probabil
ity plots and Lenth's PSI:
( c3) Explain how regression can be used to calculate the standard errors of the es-

tim<lled effects. Hint: The regression relates the 24 X I vellor of res po mes to
4 20
a constant and the three design vectors A, lJ, and /-.This leaves 24
degrees of freedom for the standard deviation of the error, s 0. J 092. This
estimate is used in the covariance matrix V({3)
s 2(X' X) 1. The squ<tre
roots cfthe diagonal elemenh are one-half the strndard error-. of the estimated effects. Note that none of these calculations are needed when using
computer -,oft ware lo e\L'Utle the

rcgrc-.~.ion.

(c4) Discuss whether our earlier conclusion about the -.1gntfic,1nce of the cffccb

I'>

affected by the three different appruaches of cakulatlllg the standard errors of


the effects.
(d) Discuss problems that may arise in the planning and t'xeullmn of tlm experiment.
Exercise 2 Consder Cctse 9 (Expert mental Ues1g11 on the 1--r. >nl Line-. ol ,\1<Jrketing: I esting '\lw Ideas to lnlre<t'>e Direct :'>.1ail '->ak,) from lite l,tse 'tuLh .1p~w11d1x
(a) Consider the Plackett-Burm.111 dLs1gn in lablc 1\9.2. ( helk tk1t lhL de-.1g111-.

Ill

thogonal. (,omider any two foLtors, sa7 factors A ,111d /",and \l'rif\ th,1t the design
includes :'i rnm .it each of lite fuu1 l.illor lc\el u1111h111,1l1011-.. l~epeal thi-. cl1elk fut
other pat rs of fallors.
I h Reanaly1e the data from the Plackett-Burman <tnd the lactori,tl de,1gns in Table,
A9.2 and A9.6.
(c)

A binomial approximation (see Section 4.6.4) was used lo detcrminL' the stamt1rd
errors of the estimdted effect'>. L,:se ,dternative appru,H he-.:
(cl) Construct ,1 normal probability plot of the l'stin,.ited effect-. ,ind detcrmrne
the significance of the factors that wav.
(c2) Use l.enth's PSE approach to assess the significance. J)isrn-.s the similarities
and differences among these three approaches.
Note: The underlying true sign up proportion 'TT depends on the advcr
tiscment scenario for that particular run. Deusions (yes/no) by 111d1viduab in
that group are the results of lkrnouilli trials with succes-. probability 77. This

-'~Kl r r B\~RMA~ l}f-"'.1<1'-i~

implie' !hat the variance nfthe sample proportion calculated from the
v1du,i1, 111 that grnup 1s given h) mr(p) - 7T( I
7T)ln.

11

indi-

Thi' deriv;llio11 ,1ssumes that the <>amc proportion TI applies to all subject>
in the group, an 1ssumption that mdy not he correct. Sign-up rates may \,tr)'
acrms suh1ects, TT - TT + /;,,where/;, expresses the subject variability. One
can show that th heterogeneity across subjects increases the variance of a
pro port inn .ind I hat rnr(p)
TT( I
7T )! 11. Discuss the effect of hctcrogcnL'
ity on the stand.ml error of an estimated effect.

7T( 1 TT) 1 11, depends on the


true proportion that v tries across the design rurn,_ This violates the assumption
that responses arc equally reliable, a fact that is needed for the equal weighting of
the proportiom when calculating estimated effects.
It has been shown that the square root and the arcsine transformations stabili7c
the variability, and tlw1r use has been suggested when working with proportions.
J ICllLl', 1\lStl'.ld of lilkt1J,1ti11g the cffclh from the proportions O\le transforlll',
the proportions by taking their square roots and obtains the estimated effecb from
the transformed proportions. ~imilarly, one can apply the arcsine (or inverse -.in,
or sin 1 ) tr.111slorm,1t1n11, using calculator keys or command' in statistical software
program-.. The arcsim is usually given in radians and needs to be transformed into
degrees hy multiplying the re-,ult in radians with the factor 360/(2 X 3.14159).
,\n even hctlcr approach ofanalv7ing binary (yes/no) data is to use logistil rL'grcs,ion. Ch.iptcr 11 (lf 1\br;1ham ,rnd Ledolter, lntrod11ction to Ifrgrc'ssion Modrl111g
( 200(1 d1slll'-'-L'' th", 11pro.1d1 111 dct.1il.

(d) rhe \ariallll' of the ob .. crved proportion, var(p)

r,

(l') J)cterm1ne lhL' partial confounding 111the20-run Plackett-l\urn.,111 design in

I ahlc 1\Y.2. I ollo\\ tlw .1ppro<1ch in Appendix 6.2. As<>ume a model with 2-f,1L tor
1ntcralt1om, .rnd dclcrmine the h1,1.' of the m.iin effect estimate'> ror cx,1mplc,
consider the estimate that corresponds to factor A (column I). Consider the inter,iction het,,ecn f.1Ltm, (,' ,rnd H, represented hy the column of their products (LOlumn 2). rJic corrcl;1ticn between 1hcse two columns expresses the confounding
factor hct\H'l'n (;}/and the main effect of A. ()fcourse, many intcr,1ctiom will
uinfouml the 111.iin effect of 1\. You Lan use the approach in Appendix 6.2 to deter
mine the complete confounding pattern. Alternatively, using an Excel spread-,hcet,
you can enumcrite thl correlations between each main effect and its confounded
2-f,JCtor interactions.
(fl In the section on key metrics and sample size, we claim that an overall sample si1c
of I 00,000 (,rnd ,1 sample size of 5,000 in each of the 20 cells) implies a certain
power ( 80/cJ) of detecting a 17% shift (from 1% to l. 17%) in the average response
rate. How was this determined? Recreate the steps that arc involved in this analysis.
Use computer software such as Minitab or )MP.
(g) Consider a 32 run 2 19

14

fractional factorial design to study the 19 factors.

(gl) ls it pns'>ihk to achieve a resolution IV design?

168

PLACk E l'T - lJURMAN DESfGNS

(g2) If only resolution III is possible, discuss the advantages and disadvantages of
the 20 -run Plackett-Burman and the 32-run 2 19 4 fractional factorial designs.
(g3) Is it possible to construct a resolution IV design in N

40 rum ? Discuss.

Obtain a half-fraction of a 2 4 factorial design by associating the levels of factor


4 with (i) the 123 interaction, (ii) the 12 interaction.
Exercise 3

(a ) Which of the two fractions has the more preferable confounding pattern? Discuss.
(bJ Consider the projectivity properties of the two fractions. Assume that you have
reasons to believe that only 3 factors are important, but you do not know which
3 factors. Does either fraction result in ;i full 3-factor I 1ctorial of any 3 factors?
Discuss.
Consider the study in Section 6.3. Suppose instead of a 20-run Plackett Burm<1n design for 19 factors, the experimenters had elimi11<1ted 4 factors and had chosen
the 2111; 11 fractional factorial design shown in Table 5.7 of Chapter 5. Show that in contrast
to the 20 -run Plackett -Burman design, which has projectivit)' 3, this frnllional factorial de
sign has projecti\ity less than 3. To do SO, you need to show that nut ,i11 chrnces ur
J lallors have at least one run at ec1ch uf the l'ight factor -lcvd ,urnhinc1t1011 .,.
Exercise 4

170

EXPERIMENTS WITH fACTORS AT J'HREE OR MORE LEVELS

analyses when factors are continuous and explains how to test the linearity of the response
relationship. Section 7.5 considers an experiment with three continuous factors where two
factors arc studied at two levels each, and one factor is studiLd at three levels.
1-.xperiments with k

> 2 factors at three or more levels require many runs. Three-level

factorial experiments, for example, require Y runs; 9 runs art required fork= 2 factors; 27
runs, fork

= 3 factors;

81 runs, fork

= 4 factors; and so on. Such designs become

nonpar-

simonious very quickly, because they require more runs than the experimenter is usually
able or willing to carry out. Hence, it is important that the experimenter has screened the
factors beforehand and has reduced the number of factors so that a few important ones can
he ::.tudied at more than two levels. [f there are still too many factors, one can reduce the
number of runs in 3-level desig11s hy considenng ortl10gu11, I :r,1L11011s. \\'e introduLL lr<1L
tiurld! 3-level facturi:.il designs in Section 7.6.

7.2

THE ANALYSIS OF THE GENERAL 2-fACTOR

FACTORIAL EXPERIMENT
Consider 2 factors A and B. Factor A is studied at a levels, while factor Ii is studied ,it /1 levels. We assume that the same number of runs, n, is carried out at all ah factor-level com bi
nations. Factor-level combinations are sometimes rderr<:d to as celk The experiment resulb in a total of abn responses, y, 1,, for

i = I, 2, ... , a (facto1 A);j = I, 2, ... , b (factor H);

and r =I, 2, ... , n (replication).


The analysis that follows assumes that the abn observations are independent and that the
standard deviation of the experimental error is the same for each observation. It is important that the experiment is carried out in a way such that these assurnptrons are satisfied. Assu111e that you study the sales of a supermarket item as a fur Lt ion of its price and thl'. level
ofih advertising. If you want to use the analysis discussed in this section, it is important that
the ohservatiom LOll1e from abn different supermarket>, and that the .rssignme11t of the supermarkets to the treatment combinations is done at random. The analysis of the data
changes if the same stores are used across all levels of one factor (say, advertising) as a "store
eftect" may mak<: the observations dependent.
lt is important to recognize whe11 thes<: assumptions are not met and when it is not appropriate to use the methods that are discussed in this sectio11. hJr example, in agricultural
experiments on the yield effect of type of seed and fertilizer, one typLlally randomizes seed
and krtilizer within subplots of several larger (whole) plots of land. Sud1 experiment.-, arc
referred to as split-plut experiments. Although one obtains a tut al of aim observations from

a different seeds a:id b different fertilizers, the analysis explained in this section, which assume-, equal preLision and independence of th<: uhservatiom will, most likely, not be appropriate. One can exp<:ct the respo11SL'S to he morl' alikl' within l'ach whole plot, and less
alike from one whole plot to another. While the correct analv-,is ol suLh data is straightforward, it is more complicated than the analy::.is we describe in thi-, '>L'Ll1u11.
\\e write the response as
)',;, = y.

+ (y,. - y ... ) + (y.,. - y ... ) + (y,J - y, ..

y.,. + y ... ) + (y,,,

y,,.)

r_x_'_''_'R_I M '._"'_'_s_w_1_r1_1_1_,,_c_r_o_R s__


A_1_r_1_1R_1_c1_._o_R_M_o_R_1._1_r_v_r_1_s_J,_[_1_?_1

where y.. is the c)\'er.ill 111ean of the ahn observations, y, .. is the mean of the hn responses
when factor A rs at it' ith level, y. 1 is the average of the an observation when factor His at
its 1th level, and y 11 is the ave1.ige of the

11

observations at the (i,j) factor-level combination.

nee.ill the dot notation introduced in Section 3.3 of Chapter 3. Dots in place of subscripts
indicate thJt we <l\'erage the observations over these subscripts.
The difference ( )'
)'.. ) measures the effect of factor A, while averaging over all lev\' ) ,
els of factor Ii. !'here '' nn Illa in effect to factor 1\ if the diffcrcnLes (y 1
( \',
\'
) MC 1cro, or ncarlv <;o. L.1rgc differences illlply a main effect
)'. ), .. "(l',
nfL1ctor1\.
l'
) 1mw;ures the effett of factor Fl, while awraging over ,ill le\
The diffncncc (y,
els of faltcir 1\.
r he third LOlllJHinent 111 the l'XfHC'>s!On on the right hand side, v11
y.
t ( 1'

y.) (v.,
)' .)
(,l',
}'..
}'.,. i )'. .. ),mc.1surestheinter<1ll1onhetwec11 f~1c
tors A and H. It i' the ditferen L' between the oh,encd mean respnn'e at the li,.Jl factor level
colllhin.itinn and thL' prcd1Ll1>cl mean resprnl,L' that i' implied by the lll.;rn effeds oft.1ctc>rs
A and H alone. lntcract1011 i' negligible if these differences arc small. Large differences irn
pl; ,111 intcrallion l)L'twecn fa, tor.'> 1\ and ii.
rhe la..r LOlllponcnt ( 1'11 ,
y, 1 . ) represent:-. the error. It is the deviation of the oh.,crvalion frolll its rc,peLtl\e cell rnc'<lll.
'>1mrl,1r to our d1sLt1ssro11 oft he an,1lys" of variance (1\NOVA) 111Chapter3, we can partition the total sum of squares into several (four) components: the sum of squares of factor
\ ( m.1111 l'ffcll of,\ 1, t hl' <;lf fll of squarc.s of factor fl (main effect of foLtor fl), the '>LI m of
squares of the interaction 1\h, ,md the sum of squares of error.
)' .

hn

)'

y .. )2 +an 22(y.,. - y... )2 + 1122 22(rl/' - y, .. - y.,. + ;'. .. )2


I

. 2: 22 22 (J',,,
''

'lS(A)

/1

"
22 (y, ..

l'w)

I I

2}

S\(!i) ' 'lS(1\/i) t SS( error)

\\'e obtain this deuimpositi<ll bv squaring Yw


y .. and summing owr 1,;, and r. I or an
equal number of ohscrvatiom in each cell, all sums of crossproducts of' he components are
zero. ~imilarly, we can partit'<ln the degrees of freedom of the total sun of squares into the
dt'grces of freedom of its coriponcnts,
11/111

(11

I) + (/7

I ) +- ( 11

i)(h

I ) ' ob( 11

I)

These cntne' ,ire d1spl<11ed 11 thl' t\NO\'t\ in Table 7.1. The mean '>quares arc obtairll'd by
d1\'id111g the '11111' of '>qll.lrl'' bv their respeLtive degrees of freedom. The F-ratim 111 the
t\\:O\',\ t.1hlc, \1~11\/5)/,\/\(nror), ,\fS(1\)/,\L~(crror), and MS(H)/:vtS(crror), arc used to
test the prcseme of interaction, main effect of A, and main effect of H. The proh,1bility

17.:._j_ 1- X I' 1- IU ~ I'~ I S W ~II I'.\ ( I URS A I

I fl R l le. 0 R .VI 0 IU

I I \' I I S

I All I I ; . I
A NOVA 'la hie jvr the I wv-1-actor h1aonul /.' pcmncnt
~ollr(l'

\kJll ~qUML"
.\f.\

'iS( factor 1\,

"

SS(lactor /\)/(a

I)

\S( factor /i)

).) j,\LlOr HJ/( b

I)

(a

'iS

!actor \
11

lt1((tll"

lnteraLt1<111

IH

~qu.11n

iStintt..'rd . . t1011

1)(11

II

'>.'ii llllL'CCIL llllll) I


la

r rro1

)\(L'rror;
S)I total,

I< Hai

l'ru,Jbd1I\
\'.dl!L

l>l'i(l'l'L'' <>I
heedornd/

Sum ol

Pl

\'ar1luio11

ub(n

I)

I),

'>'>I l'rror) /. u/1: n

\.1S(fa,tor !\)!
\1'i L'rror ,
.\ht l.tdlli Ii
\.1\( L'rror)
\/)( llllL'!.lcllOll;'
\1'>( L'rrnr)

Proh valul'
l<1<.tor 1\
P1t1h \.tlul
tador Hi

l'rnh \.ii Ill'


f 1ntt..ractH1n)

dhll

value., of the !-'-ratios, which computer programs usually list 111 the lasl column, exprc",s the
statistical significance of these components. The probability \alues Ire obtained from the

I- distributions as follows:

probability rnluc( interaction)


P/ 1-((u

l )( b

probability value( main effect of A)

P[F(a- l,ab(n

probability viiluc( main effect of /3)

I, a/!(11

l ), 11b( ti

I))

>

,\JS(A8)/ M.S(error)

l)) ,' MS(A)I MS( error),


l))

,\f.'i(li)/M'i(crrnr)]

A main effect or an interaction is considered significant if its ,issociated probability value is


sm,illcr than the significance level that 1s usually taken as 0.1:5. lhc significance of thl' 111
tcract1011 needs to be established first, as md111 effects have little mc,i111ng 1f 1ntcractw11s are
prc-,l'lll. Interactiom are visuali1ed through i11ll'r,ill1011 11lots, s1111Ilar to the 011cs

IH' LUil

sidcrcd for the 2-level desigm in the previous d1,1pters. :--Vla111 cfkcts Mc' d1s11laycd h1 g1aph
111g IL"1(1oml' aver<igcs .1ga111st the le\cls offc1ctm'> \ .111d H. \;, thn the 1\:\( l\\ c.ikul.it1ti11'
nor the main effects and interaction plots need to be LMri,d out h1 hand as commonly
JI

,1il.1hle statistical software such as !\1initab include L<>ll\l'ilil'llt fu11ct1ons that perform

thl''>l' analyses without much effort.


(:.,1.iblish1ng tlw st<1t1st1ul s1gnlf1c.tllcl' ol lllteraLlions ,111d 111.1111 L'ifL'Lts 1s ,1 usl'ful t11'>t
sll'p, hut by 110 111cans the only impo1 tant one. lt 1s cs.'>L'Jlli<il t >k110\\ that the 111.ig111tude of
an observed effect ts larger than ih chance 1,anat101i. Othen.1sc', one \\Ould start attaching
pr.iLt1cal s1gn1l1ca11L.c to d1fferenccs that arc nut well est1mat,d. 1101\l'\Tr, once thl' s1gnlfic111Lc' of the effects has been established, it becomes important to learn how the groups differ. If interactions dre small, it is appropriate to look at the mc.111 effech sepc1rately and compare the response averages across the considcn:d factor lcvcb. ~1111plc ploh of the ,1vcragcs,
together with their standard errors and confidence intnvals, tell us hmv response .iverages
difkr across the levels. I-or example, the average IT'>P<lllsc .it cacl1 ll'vcl of laL.tor A I'> .111 a1
er,1gc of Im observation'>, and the .'>t,1ndard error of such all ,11nagc is gl\L'n h: s \ /In,
wheres - \; SS(error)iab(n
t ). hir continuous fallors, d also n1.1kes sense to explore
whether the relationship is linear. This is discussed in '.1eL11011 7..J. ( lt course, <111 1ntl'rprcta
lion of main effects alone is not meaningful if interactions are resent. In such a situation,
one needs to interpret the interaction plot. \\'e illustrate this in the following example.

7.3

x r I. R I \I I s I s \.\' I 111

" " . I () H ... A '1

n1

R FE () R M () R I I I \ '

r l. s

17l

EXAMPLE: RAKING <\ CAKF

J{esull\ of a .1 f.1Ltorial e\JWriment in a stud\' on hnw to best hake a c:dke arc giH'll in
rah le 7 .2 . 1\ com merc1all) a\'a il.1hlc cake 1111xtu re 1s baked by varying the baking tern pc rat u re
(factor A) and the baking time (factor fl). Three different temperatures arc chosen: le\ el 0
represents the temperature recommended by the instructions on the package, whereas le\' els I and+ I represent temperatures 10% below and 10% above the recommended level.
Three different times are chosen, with level 0 representing the recommended time setting,
and levels I and + I representing the time settings I 0% below and I 0% above the recommended level. Three indepem1ent replications arc made at each temperat 1 1re and time comh1nat1011, and the fi111-,hed caki.-, arc tasted hv exper ts dnd r<lted on a 0 6 quality scale.
I he t\0.0\'1\ uhk I'> ... hm,n 111 I able 7.1. J'hc entries in the table can he calculated lrom
tht' cquatior1' in 'iell1011 7.2. \lthnugh thi-, 1s not partiurlarly difficult, the c1lculation' are
tediom gi\ en th,1t the\' invnh'l' calculatmg factor A averages, factor fl averages, and average
rating.'> for each of the 11/1 lallor il'vel combinations, as well as the calculation of the various
'>Ulll'> of square,. lortun,1tcly. computing software i-; available for the c;ilculntions. Stat1sti
c.il o;oftwarc '>uch ,is \linitah and JMP includes routines that compute the Al\'OV1\ table
,1nd urnduct tesh of s1g11iflc,1:1Le for main effect\ and interaction. r\]] th .it is needed i-, fc>r
the user to enter the inform,1tion into a spreadsheet; responses rnto onL co lumn (sa;', col
umn I), the levels of factor A into a second (wlumn 2), and the levels of factor H into a third
(column 3). Jn thi'> c\ample, there arc 27 rows. The first row contains 0I 1 hc response),
(ic\'cl of,\), .1nd I ilc\cl of ill. fhc -,eLond ro1v rnntains 0, I , I; ... ; the last (27th)
rt>I\ LOnta111., 2, I, ,rnd I. I he pMt1cular order of the rows docs not matter.
There j, q111tL' a str<>ng intt'"adion effcll between time and temperature. The F-st1tistiL
t(ir

mtnad1or1, I

7..\..1, is h1ghi)' signific.111t when wmpared to percentiles of the F( I: 18)


I

.\ " I I

'2

L11kc Rrlfings
Jcrnp:r.11urc

Tirnc

Response
0, 0, 'l
0, 2, 4

()

I
I
0
I
I
0

4, 5,6
0

2, .\ 4

3,6,6

()

I, 2, 3

4,5,6
I, 3, 5
0, I, 2

TABl.F.

7.3

A,\'01'11 Tnhlr: Cake Rating>


p
MS
DF
SS
F
Source
remperc.cure 2
2 1.0000 0.47 0.630
2
2 1.0000 0.47 0.630
Time
Interaction 4 62 15.5000 7.34 0.001
18
38
Error
2' 1111
26 104
Total

174

EXPERIMEN 'IS Wl 'IH JALIORS Al THREE OR MORI li'VH'>

-~--

--

- --

--

lnteractton diagram

.. , '

' '
',

'
' '
' ',
...

' . / /
/'.
/

co

"

3 -

/
/

"''"'

"'

...... .

Bak111g ti rne
-

Figure 7.1

Temp

~emp

- - l'elllp - 0

- I

Interaction Plot: Cake Ratings

cfotribution; the probability value (0.001) is small, much smaller than the usually ,tdoted
significance level of 0.05. The interaction diagram shown i11 hgure 7.1 demonstrates the
naturT oC the interaction. For a temperature lower than the recommendation (kvel

I),

the quality of the cake is incre01sed by increasing the bah1g time. For a higher than rern111rnended temperature (level I ), the baking time, not surprisingly, should be reduced.
At the recommended temperature, quality sutler-, Lt' the bakin,,: time is either lower or higher
than the recommended value. The cake turns out best if the 1e:or11111e11datiuns for time and
tempnature are followed. But, the cake mrx is "rnbw,t " in the senoe th-it two other settings
(lower temperature, but longer time; and higher temperature, but shoner time) are equally
acceptable. A cake baked at a lower temperature has to be baked longer, whereas a cake
baked at a higher temperature requires a shorter baking time.

7.4

A USEFUL INTERPRETATION OF EFFECTS IN FACTORIAL

EXPERIMENTS WITH CONTINUOUS FACTORS

7.4.1 Orthogonal Polynomials


The sum oC squares of a main effect in the 3 2 factorial ex pen ment has two degrees of freedom; see Table 7.3. The F-test for the significance of a main effect assesses whether the three
means at the low, middle and high levels ofa factor are the same (i.e.,.

1 -

. 0 -= . 1). This

amounts to testing whether the three differences between the group means and thl' overall
mean. = (.

+ . 0 + . 1)/3 are zero;

that is,.

1 -

. - /.lo

. 1

. = 0. Since

deviations from a mean always sum to zero, two zero differences arc sufficient to establish
thdl the three means are the same. This fact explains the twu degrees of freedom in the main
effects lest in Table 7 .3. The restrictions among the means tha arc tested by the 1---tesl (here
there are two,

1 -

= G and

1 -

= G) are commonly rdcrred to as the contrasts.

_j

I \ I' I J(J

~1

I '- JS \\'I I

11

I A< I 0 H.., A I

I II HI f' 0 R M Cl Jl I' I I VI I S

J 7S

I hrec group nw,lll.., L.tn he repn:sented 111 111.rnv different, equivalent wavs. One c.in expres.., them h} their dil!L'1Tnce.., from the over.ill a1cragc (this was done ahmc). Or, one can
them h1 their difkre1i-e.., lrom the me,lJl ofa reference group (L'.S., the group \\'here
the lallor 1s at its Im, lei cl i. Or, one L.in ex pres . . the means 111 a \.vav h,11 helps u-, ll".,t
whether the function.ti relatio11ship helll'een the means ,ind the continuc>11s factor is linear.
l'\f1i'l'"'

~tatist1c1an-,

refer to the different ways of representing the means as "pirameterizat1011-,."


C:omider the following two equivalent parameteri1ations of the three group mean..:

()

\Ve have written them in 111,ltrix form. Jn the f1rst representation, 1 = {3 1 , 0 = {3 0 + {3 1,


and
{3, + f3,. 1 his 1mp.1e'> {3 0
, f3.
0
1 and /3"

1. The p.i 1, the mean of the low group, becomes the standard against which the
rameler /3 0
other means arc compared to The parameters {3 1 and (3, arc the diffcrenn::s between, and
"and this standard. t\ lest e>fthc cqu,tlity of the three means amounts to testing whether
0. 1lcre the lo\\ group 1s taken as the reference group, but any other group unrld
/3

/3'

have been w.ed.


In thL' '.eurnd reprcscn1<il1on, the
rr 1 "' (
)/2, and a,
(

1r

cocfhucnts arc such that rr 0


( 1 + 0 + 1) 'J,
2 0 + 1)/6. We obtain these expressions lw -.olv-

ing the equations th,11 relate tl1e means to the alpha coefficients. The alpha coefficients have
a niLe interpretation if the factor is continuous, and when the distances between the levels
<Jrl' meaningful. The LOeffic1cnt a
( 1 -t- 0 -+ 1)/3 is the ovcrall 111ean, and a and
n , represent the linL'.lr and q11.idrat1c components of the relationship al'1ong the mean response and the coded foctor icwls. !The rnded lel'cls (

I, 0, I) mav represent unequallv

"PlLcd k1cb 111 till' origin.ii n1L'lriL, Sil)', tempcr.itures ;1t 1,500, 2,000, and 3,000 degrcL";. J
One ,,in -,n th1.., .i.., lollm'' :\s.11111c ,1 qu,1dr.it1L model between the mean llld the factor In
cl,,, fl 1

11

11 t

In + i", with mean responses al the three levels (i

/!I c, 11
c1,111d 1
1 ,)
(11
h
2

(c1 1 /,
while the coeffilicnt
(a
/, 1 c) 6
c _,rs

a,

ff, -

I, 0, I) gi1t11 hy

/1

/J-i t.Thccocfficicnt(l' 1 ( 1 1)12


rq11T'>enh the linear term in the relatio11,hip,

pro~1ort1011.1I

2 11 I 1)/6
[(a+ h + c)
2(a-+ 0-+ O)
to the quadratic component. Factor A has no cfkll 1f

<r

, )

cl

( 1

0. The a'>s<JC1ation 1s linear if <Y,

0.

l he second reprcsentatior: of the group means in terms of linear and quadratic com po
nents is useful when describ11g the relationship between the mean response and the levels
of a continuous factor. lt has the additional advantage that the column.' in the matrix that
relates the means 1, 0 , 1 to the coefficients a,,, a,, a,,
I
()

- I2
I

l[aoj
IY 1

12

are arthngonal; vnu can check that all pairwise vector products formed with the three columns of the matrix arc zero. JThe vector product of the first two columns is (I)(

I)+

176

( 1)(0)

EXl'LRJMENTS WITII FACTORS Ar l'llREE OR MUIU

(1)(1)

O; of the first and third column, (1)(1)

l.loV~I-,

+ ( 1)(

+ (J )(1) = O; and

2)

of

the second and third column, (-1)( I)+ (0)(-2) + (J)( I) -= O.j The columns (- l, 0, l)
and ( l, -2, l) are known as the orthogonal linear and quadratic polynomials for a factor with
three levels.
Here, we have discussed the situation when each factor has three levels, and we partition
the effect into a linear and a quadratic component. Orthogonal polynomials for continuous
foctors with 4 and 5 levels arc shown in Appendix 7. l. tor 4 levels, une para111eteri1es the
4 means in terms of an overall mean and linear, quadratic <.llld cubic components.

7.4.2 Partitioning Sums of Squares into Interpretable Components


Assume that both factors in the 3 2 factorial design are

COii'

inuous. We use the orthogo-

nal linear and lluadratic polynomials to partition the sum ol squares of each factor into a
line<Jr and a quadratic component, each with one degree of freedom. This becomes useful
for testing whether the rel<.1tionship lwtwecn the response and the levels ofa luntinuuus Lil
tor Is linear.
h>r that, we construct columm for the linear and quadr<.1til compoI1ents. The column of
the linear compoI1ent, A(lin), is assigned the value

I when A IS at the low kvel, value()

when A is at the middle (OJ level, and value+ l when A is at thL high level; these Jn:: the cuefficients in the lined polynomial. The column of the quadratic component, A.(quJ), is assigned the coefficic.nts in the quadratic polynomial; the v<.1lue f-1 when/\ is either at the low
or high level, and value -2 when A is at the middle level. The same procedure is used to
construct B(lin) and B(qua), the linear and quadratic compunents (or factor 11. The four
columns are listed in Table 7.4. The length of these columns is determined by the number
or observations (runs).
The AJJ interaction sum of squares h<.1s four degrees of freedom. Abo this sum of squares
can be partitioned into four orthogonal components-the linear by linear, the linear by
quadratic, the quadratic by linear, and the quadratic by quadratic interaction components.
TABLE 7.4
Regression Formulation of the J2 Factorial /)esign, with Linear and Q11udrutic Main and Interaction H/{'ects
Rl-:GKESSUR C()! UVIS'i
UE:-iHiN
FAI

l I RS

MAl:'\J Fl'FECT

MllJN ElFH'T

OJ' lACTOH A

Of FACTUH H

(Jin)

(qua)

B
(1111)

I!
(qua)

Al!
(1111

x !111.) 11111

All

II
I

-I
0
()

l
-\
\)

0
0
l

0
0

I
-l

\)

Ali

-I

I
()

AlJ

x qua) (qua X lin) I qua X qcu)

\
2

II

II

0
[)

II

\
11
I

-2

ll

'i

ll

-2

II
I

_j

I Xl'I RIMI' Is WITH l'A( !OHS AT TllRFF OR MORF I I VI LS

177

We create the 1..olumns i\H(li11 X Jin), AR(lin X qua), AR(qua X Jin) and AR(qua X qua).
I he clement.-. 1n ml um n, \H( Jin X Ii n) arc the products of the elements in A (Jin) and fl( Ii n ),
the clements 111 the L<1l1111111 i\H(lin X qua) Ml' the products of the clements in A(lin) and
R(qu.i), and so on. These rnlumm arc also shown in Table 7.4.
\\c u1ns1der a regression rnodel that relates the response vector to these orthogon,1' columns. I hat is,
\'

f:311

-i

/3 :\(Jin)

{3 A(qua) -+ {3 H(l1n)

{3 1R(qu;1) !-

/J; AR(lin

{:3 1, 1\ll(li11 X qua) -+ {3 1\H(qua X Jin) +- {:3 8 All(qua X qua) +

Jin)
I-'

\t.111d,1rd regre-,,1on -,oft\\'.ire Lan he Lhed to oht;1111 till' e<;1imates and thL' 1cgression sums of
'>l]UMe'> t h.1t ,1rc ex pl.11 ned 11\' L'.l<..h of the-,e reg1T">'>Or Lolu mm. The ort hogonalitv oft he re
grc.,.,or uilumn.'> has 1mport.111t LOmcqucmes. \.\'c pointed out in Append;x 4.) that each regreso,1on e'timate .ind the rq~e-,s1on sum of squares of each column arc not affected hv the
1)rl''>ellLL' of other componerh in the model, ,111d that the individual regression sums of
-,quare' arc additi\l'.
l'.\l'rLise 7 in Ch.1ptn 4 'h<m., how to c.ilL tilate the regression sum of squares that 1s
explained hv a single- column, sav, x with entries >: 1, .x,, ... , x.,. The regression sum of
-,qu.irc' "g1n11 h) ~\U(x)
~ \ l' ' , ~ >:, . < ;1\en the respono;c<, 1 1, y,, ... , y,,, 1t ''easy
to calculate the regression su11 of squares for each of the rcgrcssor columm in Table 7.1. I-or
example, the column A(lin) is used to calculate SSR(A(lin)), and the LO!unrn A(qua) i-, used
to calculate SSR(/\(qua)). Tk' regression sum of squares of the main effect of faLtor A,
SS(A ), which is calculated in Sections 7.2 and 7.3, turns out to he the sum of these two LO!llponents: SS(A) SSR(A (lin)) -t SSR(A (qua)). This shows how much of the sum of squares

of the main effect of A c<1n he attributed to the linear a'isociation, and how much to the
quadratic one.
I he <,;Jllle procedure L.lll he .1pplied to the main effect of H, as well as the interaction between!\ and fl. We hnd that SS(H)
SS/~(H(lin)) -+ SSR(H(qua)), and SS(Intcraction)
,<,,',U(1\/i(lin X Jin)) r S.'iR(1\H(lm X qua)) t SSU(AH(qua X lin)) + SSR(AH(qua X qua)).
The following cx,1mplc m 1kes use of this decomposition.

7.4.3 Example: Sales of Apple Juice


1\ medium-s11ed o;uperm;'rkct was selected to study the imp,ict of price and displav on

the -,ale-, of 'evcral '>tore brand products. Products with stable sales (i.e., ro trends) and lim
1ted seaso11,il1t) were sclc<..teLl for this stud>" I !ere we focu., on the ,,iJes of the store hrand
apple juice.
i\ complete 3' faltorial experiment is carried out to assess the effects of price and d1'pl.1y.
Three price levels .lrL' cons1dned: The cost price (low level -1 ), which is the cost to the
-,upcrmarkct; the rcgul.ir pri'-e (high level +-I), which is the recommended retail priLe to
cw.tomer-. as listed 111 the rL'g!onal warehouse price manual; and the reduced price (le\'l'i O),
which is the price h,ilfway l1etwecn the recommended retail price and the cost to the
supermarket. 1 he three display Lhoiccs arc normal display space (level 0), as determined at

178

EXPERIMENT~ WITH !ACTORS,\'] Tfll{J,E OR MOR! I J'\'lI'

TAHLI'

7.5

Sales of Apple ju1cejor Changrng Price and /J1splay


Sales
34.2
SJ.5
70.5

()

4U 8
44.2
91.5
\2.0
S0.2
85.7
9.0
24.9

2'1 9

1S. LJ

ll

0
()

()

il.'I

\4.9
:i

)l)

18.U
')

the beginning of the experiment on the stock manager's n:commendation; a reduced


display (level

l ), which amounts to one-half of the normal ~isplay; Jnd an expanded dis-

play (level 1), which amounts to twice the normal display area,
With three display options and three price leveb, the de~ign ~,dis fo1 nine treatment combin<Hions, The design is replicated once, Eighteen weeks arc needed lor this stud)', and the
time arrangement of the experimental conditions is rando111i1.cd. Furthermore, each experimental week is preceded and followed by a base week (which is a week where the product
is priced at its regular price and displayed at the normal shelf pmition), For this reason, and
because of holiday weeks that are not used, the experiment spans roughly 40 consecutive
weeks. The response is the number of units (divided by 10) that sold between Wednesday
noon and Sunday 9 p.m. of each experimental week. The de::;ign and the observations are
shown in Table 7,5,
The interaction plot and the two main effects plots arc given in hgure 7,2, The interaction plot reveals very little interaction because the lines connecting average sales for different prices but from the same display are almost parallel. The absence oC an interaction
makes it appropriate to study the main effects. The sales effects of both price and display are
(roughly) linear.
The graphical analysis is supported by the AN OVA in Table 7,6. The sums ofsquart'S fur
display, price, interaction, and error can bt' obtained from the n .. pressiuns in Table 7, I and
Section 7.2. The calculations are tt'dious, It is much simpler tu obtain the 1\NO\'t\ through
the

.~1initab

"ANOVA .>Two-Way" command. For this, une enters the data into three col-

umns of a spreadsheet. Row I, fur example, contains the e1llrie:. ..JILi\,

I,

I; thL Ja-,t

(18th) row co11tains 31.9, l, J, The result is shown in Tabk 7.6.


A decomposition into linear and quadratic components is ,1;iprnpri,1te in this example as
the factors are conttnuous and the di-;t,11JCc'> between the
co111111,1nd "ANO\/>\

>

level~ . .ire

111e<111111gfuL The /\linitab

Two-Way" does not provide the decnmpositiu11 into orthogDnal

components automatically For th,1t one needs to construct the spreadsheet shown in
Tuble 7,7 , It contains the regressor columns of Table 7.4 and the respomes of Table 7.5. The
data for week l are in rows 1-9, while the data for week 2 are in rows JU -18. Rows I 0 -18
are identical to rows 1-9, except that the responses are different.
Below, we illustrate the calculation of SS/( (D( qua)), the regression sum of squares that is
explained by the quadratic component of display. lJ(quaJ is the fourth Lolumn in Table 7.7,

--='!

1811

I X I' I I! I

~11 -.; 1 <, W 1 1 11 I Al I l >I!' A I I

111! 11

ll I! \tl ll! l. I I \ I I '

I' i\ ll I l 7. 7
Nc,grnrnm ron1111/<1tH111 \'a/n of \/'/'It flu,'

IH~l<1'.'

I A<

I ll

I~

Ul"'PI ,.\'f H'

PHIC I

ll!Sl'I A'I

l'I{!(./

I\; 11,.l{A{ l Ill'.;

"')

/)

/J

/)

l111J

ljUd I

/)/'

I'
quc1

/'
t'lm

lm

itn)

/;/'
qua)

lilll"

/I/'
(ql .. 1

JJ/'
lin1 (qu.i 'qu,11

Kt''i~'utht'

I(.;

II

II'

t)

YI "J
II

\)()

II

()

()

II

()

II

ll

II

II

.,

I)

()

(J

()

()

()

\1 I

ti

II
I)

II

()

l}\)

,() ~

I~

1.1

~J

.\I 'I

'111

I )H5.4

2 l \.{l

~h

1'.J.U

'I

and it> ele111e11ts are used 111 the following LakulatJon. \'\'e l111d:

(t)

+ (

2) 2 t (1)- c (I)'

~ (I) (

\V,

l(I0.8)

2)!

2 ) 2 ~ (I ) 2 t (I ) 2 t

t-

I (55.9)

I (SLJ .3)

I ( liUl)

2(2! 9)

2('i0.2)

2(5.\.5)

1(.\4 .2)
..j.

2)' ' , ( I )'

2(44.2) + l(LJl.'i) + 1p2.0)

2(.2'1.LJ)

(!)

2) 2

(!)'
+

I. 70 ,:; )

2)'

(I )

1(8'i.7)

(1) 2

(I)'

l(LJ.OJ

I (\I .4)

2( q LJ)

I ( 31 .9)

95

.)S/\(/J(qua))

"'
..,, ,
_x,y i21r ..;.,x;

(95 ) 2/\6

250.7.

This is the number that is reported at the bottom of'!Jblc 7.7 and in th..: r\:-.10\'r\ in T,1ble7.6.
The -,ums of ;qua res of the other lOlll ponents Lan be obtained 1n a '>1111 il,1r lash ion.
The F-statistic for interaction in !able 7.6, } ~ ( IJ0.1/4)/( 1,079.7/LJ) 0.27, 1s small and
insignificant (probability value 0.889), conhrrrnng what we had seen in the 1nteract1011 plot.
PriLl' (F
10.94 with probability value 0.004) and display (I~ . 19.32 with probability value
0.00 I) are both highly ~ignificant. furthermore, the AN OVA ,hows insignificant quadratic
components for both price (f
(213.6/1) /( 1,079. 7/9) - I. 7l and probability value 0.215)
and display (F
2.09 and probability value 0.182). We LOnLlude that the rel,1t1omhip' be
tween sales and price and between sale~ ,md di~play are linear.

F XI' I RI \', F NT S WI f 11 I AC. TO

7.5

I{

S A I

l HR FF. 0 R M 0 R !'. I I VF LS

IR I

A FACTORIAL EXPERIMENT WITH TWO FACTORS

AT TWO LEVELS, AND ONE FACTOR AT THREE LEVELS

Two -level and 3-levcl designs can be combined. Consider, for example, the experiment
where factors A and H have two levels each, while factor C is studied at three levels. We call
this a 2 2 3 1 factorial experiment.
Such a 2 ' 3 design wa~ used to improve the consistency of a bottle -filling operation
(Montgomery, 2005, p. 184 ). Process engineers controlled three factors: the operating
pressure in the filler (factor/\), the number of bottles produced per minute (line speed, factor ll), and the percent carbonation (factor C). All three factors arc continuous. For the purposes of this experiment, the engineer selected 2 levels for pressure r 25 and 30 pounds/
(inch) 2 j, 2 levels for line spe"d (200 and 250 bottles/minute), and 3 levels for carbonation
(10, 12, and 14%). The response is the (average) deviation of the actual fill height from the
targeted fill height. Positive deviations represent fill heights above the target, and negative
numbers are fill heights below the target. The average deviation is calculated from all bottles
within the same production run. The experiment was replicated once.
The de;,ign and the resulting data arc given in Table 7.8. The levels of the 12 runs arc
shown in columns 2 - 4. The design is orthogonal; you can check that each level combina tion of any two factors is studied with the same number of runs.
With the d.it.i from such ,'n experiment, we can obtain the sum of :,qua res of the main
effects orfactors A and R (wi th I degree of freedom each), the main cffcd of factor C (with
2 degrees of freedom), the !\!i in tcraction (with I degree of freedom), the AC and HC interactiom ( c.ich with 2 degrees of freedom), the A RC interaction (with 2 cegrecs of freedom),
,rnd - since there arc replica< ions - the .,um of squares of error. We have not provided the
detailed calculation equations for the sum of squares in the 3-factor experiment, as we
expect vmr tn use computn :,nftware for the computations. The ANOVA table in Tahlc 7.9,
without breaking down the 3-level factor C (carbonation) into its linear and qu;1dratic
components, can he obtained from the Minitab command "ANOVA > General Linear.
Model." (Herc we have 3 factors, more than the 2 factors allowed in the Miniti!b command
1

TA ll l.L

7.8

Jlottlc -Fillin.1; Operation


Speed

Pressure
(l.1e1nr A )

( foc1nr H)

Carbonation
(foctor

- ]

- 1

- l

- I

- I
- I
- 1
0
0
0
0

I~ u 11

.1

5
6
7

-]

-]

I
- I

- I

JO
11
12

- I
I

Response
( dcvi.111011)

-3
-I

- 1

0
0

- I
I

0
2
6
5

.I

9
6
II

5
4

IO
-

---

182

EXPERIMENTS WITH

~ACTORS

Ar THREE OR MOl(E u : vr.t.s

-------- ----

---

TABLE 7.Y

ANOVA Table: Hottle -Filling Operation


Ar.~lysis

of Variance for Response

p
F
MS
SS
DF
Source
1 45.375 45.375 64.06 0.000
A: Pressure
1 22 . 042 22. 042 31. 12 0.000
B: Speed
C: Carbonation 2 252.750 126.375 178.41 0.000
C(l in)
1 248.062 248.062 3 50. 21 0 . 000
6.62 0.024
4.687
4.687
1
C(qua)
1. 042
1. 47 0.249
1. 042
AB
1
2.625
3. 71 0.056
2
5.250
AC
7.15 0.020
5.063
5. 063
AC(lin)
1
0. 26 0.620
0.187
AC(qua)
0.187
1
0.41 0. 671
0.583
0. 292
BC
2
BC(lin)
1
0.563
BC(qua)
0 .021
1
0. 542
1. 083
o. 76 0.487
ABC
2
ABC(lin)
1
0.063
1. 021
ABC(qua)
1
Error
8.500
0.708
12
Total
23 3 36. 62 5

"ANUVA > Two-Way. " The command "ANOVA -, Cener,d Linear ,\.1odcl" pruvidc-, the
ANOVA for the factorial experiment with more than 2 factor>. We must specify three col umns that identify the levels of the 3 factors, and request the sum of squares contributions
for the main cffecb of A, H, C, and the interaLtions A13, AC, {j ( , and AliC. )
For the additior'ial decomposition into linear and quadratic components, we construct
the spreadsheet in Table 7.10. lt contains orthogonal polynomials for the main effect and the
interaLlions involving the 3-level factor C. Its effect on the rc;ponse is expressed thruugb
the columns C(lin) and C( qua) that use the coefficients of the linear ( - l, 0, l ) and the quadratic ( I, - 2, 1) polynomials. The interactions that involve the 3-lcvel factor C can be parameterized similarly. The columns that represent the linear and quadratic cumponents of the
2-factor interactions AC and BC arc obtained by multiplying the elements in columns A and

B with the elements in C (lin ) and C (LJLia). The columns representing the linear ,111J LJUadratic
components of the 3-factor interaction are obtained by multiplying column A with Ji(.'( lin )
and /3C(qua). The columns in Table 7.10 are of length 24 as there arc 24 observations. The
procedure employeJ in the previous section is useJ to obtain the regression sums of squares
that are associated with each of these columns. The regression sums of squares are listed at the
bottom of the table, and they have been added to the AN OVA in Table 7.9.
The 3-factor interaction (ABC) and the three 2-factor interaLlions (AB, AC, BC) are insignificant. The largest 2-factor interaction is between pressure and carbonation (AC, with
F ~ 3. 7 1 and borderline significant probability value = 0.056 ). The lines 111 the interaction
plot in Figure 7.3 that connect average:, for different carbonatiun (CJ arising under identical
pressure (A) are almost parallel. This is another indication that the AC interaction is
negllgib\e. The main effects of a\\ 3 factors arc highly significant. Increased pressure, speed,
and carbonation increase the average deviation from the target Ii\\ height. Since carbonation
is studied at 3 levels, we can assess whether the effect is linear o r whether a quadratic component is needed. The main -effects plots in Figure 7.3 shuw that the effect of carbonation is

IM

EXPERIMENTS WITH FACTORS A'! Tfll!EE

on

MORI I EVU.S

l11teracl1011 plol fur respull>e'

10
8
~

c: 6

:;;-

,,

1:'. 4

4-

,, ,,

,, ,,

,,
,, ,,

,, ,,"

,, ,,

,/'

..

c:

"'
;;;:"
ll

--,----

2
()
( Llf[HJlldlJU!l

- -

Prl's~un'

-I

Main effects plot l(>r respon"'


Pressure

8
6

--~ ----- ~--~

g ()

:;;-

~-~---~---~-~-~~---~----~-()

1:'.

4-

Carbonation

2
0

-1

Figure 7.3

Main Effects and Interaction Plots: Bottlc-l'illing Operation

roughly linear; this assessment is confirmed by the very large and highly significant linear
component of factor C (F = 350, with probability value= O.OO!J). The quadratic component
or C is much smaller and not nearly as significant (F ~ 6.62 did probability value = 0.024).

7.6

THREE-LEVEL l'RACTIONAL l'ACTORIAL DESIGNS

The number of runs in 3-level factorial experiments grows quite rapidly with the number
of factors k. Even with only 3 factors, 27 runs arc required. Orthogonal fractions of

3k

factorial experiments c.rn be constructed. This reduces the numbn of runs but, depending
on the particular fraction, confounds certain main and interaction effects. We describe a few
simple

3k-p

fractional factorial designs in this section. Jn particular, we discuss the 3 1

3' ' fractional factorials for studying 3 and 4 factors in 9 runs.

and

r
'

EXPE1n,1 cNTS WI r11 !'A C TORS AT THREE OR MORE LEVELS

185

TARLE 7. l l

The Ciracm -Latin Square Design with Four Factors


at Th rec Levels Fa ch
!ACTOR

r.cvcl Level 0
I cvel I

!actor ,\

l .cvcl - I

J .evcl 0

Level I

n rt
hy

h {3
(()'

Cl

( /3

ay

h IY

cy
{3

TA81F 7 .12

7'/u

l ''

' and

l '

fll '-.Jr ,r-.i

l'm clwnnl Factorial Design s


r!I [

PF S I(; N

/J

I)

()

I)

()

()
()

A T1 2 fraction:il factorial in 9 runs Ciln he constructed from the Craeco-Latin square design in Table 7.11. The dcsig11 involves 4 factors (A, R, C, and[)), at 3 levels each. The rows
and columns in this table represent the coded factor levels ( - 1, 0, + 1) for factors A ,rnd El.
l-aL!or levels for r;Ktor r; are given by the Latin letters a, h, and C, where a represents level
- I, h represents level 0, and c represents level l. The factor levels for factor[) are given by
the Creek letters a, {3, y, where a corresponds to level - l, f3 corresponds to level 0, and y
corresponds to level l. Graeco-Latin square designs have the property that each letter ( Latin,,
as well as Greek) appears exactly once in each row and each column. Also, each Latin letter
appears with each Creek letter ex;ictly once.
The 9 runs of the 34 2 fractional factorial design are listed in Table 7.12. The first two columns list the runs in a full 3 2 factorial experiment The levels for factors C and D result from
the (;raeco - Latin letters in Table 7. I I. For example, the first run wit\-- letter combination
a a is described by (A =
1, fl =
l, C = -- I, D = - l ); the next ru n with letter combi0, Ji i , C - 0, J) ~ + l ); ... ; the last run with letter combination
nation hy is (A
ho: is (A = + I, B = + 1, C = 0, [) = - I). The 33 1 fractional factorial design in Table 7.12
is obtained by omitting the factor D. Both design~ are orthogonal. You can check that each
level -combination of any two factors is studied with the same number of runs (name! y, one).
The 9 runs oft he 3 1 2 cks1g11 in Table 7. 12 allow us to estimate the sums of sq uarcs or :ii I
four main effects. h1ch sum of squares has 2 degrees of freedom, because comparisons of
3 levels arc invol\'cd. Since the design is orthogonal, one can obtain the sum of squares of a
factor by ,weraging over the other 3 factors. (Minitab's "Stat > ANOVA > General Linear
Model" provides the 1\NOVA table. Replications are needed to obtain an error sum of
squares and test the significance of the four main effects.) Of course, the main effects are

186

EXPr. RIMENTS WJTll f-ACJOJ(S Al

l!IJ(Er OH MOl{J

TAfllf-

A 34

l. E\'L L'

7 .J3

Fructiunal Factorial Designs zn 27 lfons


t-:AC IOK

13

- J

- J

u
lJ
I

()

I
I
0

- I

0
0

- J

(J

()

I
I

0
0
0

()

(J

0
0
0

- I

0
ll

0
l

I
- I

0
I
0

-]
-]

- I

- I

()

- ]
0
l
I

0
0
0

- 1

{)

confounded with 2-factor interactions, which limits the useful11ess of such dl'.signs to initial
scn.T11ing experiments, where the aim is to identify important !actors for further study. A
34

lractional factorial design in 27 runs is needed if the experimenter wanb a 3-level de

sign that docs not confound the four main effects with 2-factor interactions. This resolution
IV design is shown in Table 7.13.
Similar to 2k

P fractional

factorial design,, we can construLt .3 '

1'

designs b) first writing

down a full 3-level factorial in (k - p) factors and generating the remaining p columns of
the design matrix from specified gennators. The generators imply a defining relationship,
and from the defin ing relationship, we can obtain the confounding structure. However, the
steps Me more complicated in 3-level designs, because they in volve modulus-3 arithmetiL.
Modular arithmeti'- is a system of arithmetic for integers, whne numbers "wrap around "
after they reach a certain value-the modulus. Two integers u, b arc said to be thl'. same
modulo 3 if their difference is divisible by 3. In this case, we write u = /; (mod 3) . Consider
the sum S of the (coded) levels of the 3 factors A, 13, and C in the full 3 ' factorial design in
Table 7 .13. The sum can take any one of eight possible values, ::, = - 3 ( if all three factors
are at - I), - 2, - l, 0, I, 2, 3 (if all three factors arc at t I). Wl' huve selected the level of the
fourth factor Das D = - 1 if S = 0 (mod 3); thut is, if Sis 3, 0, or 3. We have selected

~ 0 if S = I (mod 3 ); that is, if Sis

- 2 or I. And we have selected D = + l if S = 2 (mod 3 );

that is, Sis - 1 or 2. These are the levels shown in Table 7.13.

7.7

NOBODY ASKED US, BUT . . .

l)e,igm that stud\ LOnt1nuot1' faLtoro., at more than two levels arc useful if one wishes to cx plon" the functional relatiomhip between the response and the factors. Such design' ,1Jlow
us to flt quadratic models, which will tell us the lcveb of the factors that maximize (or min
1m11L' ) the response. The litc-..1ture refers to this area as response surface analysis.
The 3-levcl factorial and f.. 1ctional factorial designs are useful, but other designs, such as
central composite and simpkx designs, have been employed. The central composite de.,ign
in 2 factors, for example, adds to the four runs oft he 2 2 factorial design [i.e., ( I, I), ( + I,
l ), ( l, + I), (+ I, + l)], a center point (O, 0) and four "star" points [i.e., ( - w, 0), ( w, 0),
(0, w ), (0, w l! . !"hi, design studies each factor at 5 different lcveJ,, n.imely w,
I, 0, l,
and 11', with w speuf1ed hy tlie experimenter. These ck'>igns are included in most c:-.peri
lllL'nt,1! design -,oft11arc prog1ams such a., M111itah ,rnd JMP. For more on these desigm, sec
!lox, l luntn, ,1nd I lunter ( 200'i ).

----- -

Xl'l Hl~ll s rs

WI

111 '_'_' _1n_H_s_A


__r 1_1_11_<F_1_o_H_M_<_)_R_F_1_1v__1 _1.s_~\~1R_9

had .JLccss to a set of 11 icnds' Docs your proLedurc 'iatisfy the as:,umptions in :-icc tion 7.2 that justify thv analysis in :-icct1on 7.Y Can you expect tint the obsenations arc indcpc11dc111 ,111d ofcqu,d precision?

Excrci ~c 2

Con sider Case IO ( Jliggly Wigg Iv) from the case study a ppcnd ix.

t,1 ) hir each of thL' two prnducts, complete the A\iOVA table. Jn particular:
(a I )

~peed\'

the degr L't's of freedom and Lalculate the mean squares and the dp propnate /--statistics.

(a2 ) C:alcul.ite an estimate of the standard deviation of the error.


la3 ) As-,es.s whether one needs to consider the' factor interaction.
(a4 ) Asse-,s \\'hether one needs to consider ,rny of the 2-factor interactions.
( b ) <.1ve graph1c.d intcrp elations of the results. If vou find significant 2-factor interactions, show the relevant 111tcraction diagrams. Display the main effects of fac.tors
that do not interact. :-i11mmari7.e your conclusions. How do price, display, and ad vertising ,1ffcct sales?
! !111/s: for White House apple juice, only main effect> arc relevant. For Ma
hat ma riLe, )-'OU will also notice a Price X Display interaction. for White House
apple ju1cL', it m.1kes sense to calculate average responses for the levels of priLe
(and for tht levels ol display, and advertising ). You can obtain these averages Crom
the given Lell average:, (each average comes frnm an equal numhcr of observations;
here two ). l 'se the estimate of the standard deviation in (a2) to ootain an estimate
of the stand.ird error of these averages . Display the averages on a dot diagram and
superimpme on th est graphs the distribution of these averages. l'he distrihut1on
of the a\crage-; i-; sometimes referred to as a sliding reference dis/ ' ,bution. One c.1n
im.1gine -;Jiding the (li . trihution ,1Jong the x-axis trying to "cover" the averages .
I he m.i111 clil'ct ol a I 1Ltnr t'> 111s1gnd1c.rnt 1f all ;l\eragcs can he covered by the rclcrL'llle d!'>tnbution.
,\ simil .11 graph L.111 he made (or Advertising in the Mahatma rice data as \dverti s111g doL''> not .ippc.ir tll inter.ill with either Price or !Jisplav. Also, one can uil
lap.'>e the ta hie of averages and obtain cell averages for Price and Display. These av
crages can he rcpresc itcd in an interaction diagram. Error bands can be calculated
after obtaining the standard error of the dVerages (which, similar to the earlier dis
tmsion , c,in he c,du1 l.1ll'd from the estimate in (a2); one just needs to keep track of
the number o(obsen<1tions that go into each average).
(c) Discuss the following issues:
(c I l Anv weaknesses of the study. Do you think that the experimenters have done
a good job?
(c2 ) How seasonality (if present ) would affect the results.
(c3 ) How one could guard against seasonality at the design stage.

lfint: Rand<v1ization of assignment; hut even better, h'ncking for season .


(c4) How one could check for seasonality at the analysis stage.

190

FXl'FIUM!SISWl 'lll

f,\(

l'Ol{SAI lllRl'loORMOHI 111'11'

flint: Construct indicator variables to denote ;eason (quarter) and incor


porate the indicator variables into the AN OVA /regression analysis.

(cS) Keeping track of the data: Should experimenters :1ave kept track of the number of weekly customers, and should they have analyzed unit sales per rns
tomer? Why or why not?
(d ) The experiment used just one store. How would you prou.'L'd ii 1ou h.id morl'
stores available? What about if the stores were of different s11l's?
(l')

Discuss the practical difficulties of carrying out such a11 exper1111cnL l)id thi:-. studv
do a reasonable job?

Exercise 3

Cons:der Case 11 (United Dairv Industries) from the case studv appendix.

(<I ) Consider the Latin square design for the test markets 111 l'an I, ~how that the Lalin
'>quare is orthogonal (i.e., "a111e number ol runs at L\lLh ll'\l'I u1mh111at1on ul ,u11
two factors ). Orthogonality in1plie., that one Lan 1gnorL' tilllL' .illll loL.1t1011 whl'n
obtaining the effects of advertising. Ignoring these two factor'>, the ob.,crvatiom tor
the four advertising groups are as follows.
0 cents (A): 7360
3 cents (B): 7364
6 cents (C): 8049
9 cents (D): 9010

13153
11258
13880
13147

11852
12089
11800
11450

7557
7900
8501
7776

Ave
Ave
Ave
Ave

9', 81
%53
10 5 58
10 346

Use the AN OVA calculations for the completely ranlom1zed one factor experi
ment in Sed1on 3.2, and calculate the sum of squares due to 1\dvertising. C'hLLk
that it is idrntical to the one listed in the ANOVA ta bk 111 Part I of the case study,
Repeat this for the factor City. lgnonng advertising <llld t1111e, obt,un the A:\( )\'A
for the completdy rnndomiLed one-factor experiment (with faLtor City), Show
that the sum of squares due to C:ity is th<: same as the one listed in the ANOV,\
table in Part I. I )iscuss.
(b) Compare the ANOVA results in Part I of the case study and the results in TJble
3.10 of Chapter 3. Discuss the sign1faanLe of advert1s111g. \\'hy are the two tests
diffrrent? Which test is more relevant?
(c) The analv'i1s in Part 2 is the same as the analysis in the randrnrn1ed rnmplete block
experiment
choice.

111 ~ect1on

3.3. Repeat the analvsis using your '>tat1st1L,il '>oft ware ol

Exercise 4 Orthogonal fractiom of) level faltorial design'> .ire disu1'>Sed 111 '-,ect1on 7.6.
\Ve U'>ed Graeco Latin square arrangements lo construct the.,c dt:signs. Thi.-. "trategy L,ln he
extended to factors with more than.\ leveb.
( 'onsider factors with 5 levels. Construct orthogonal fractional foctorial desigm that al101' you to 'itudy the main effells oUour 'i lcvel faLlor'> in ju st .2'i rum.
( .1 )

Write down a table of 5 rows .ind 5 rnlumns. Add Latin letter., to the .2 5 cdb.
The first row consists of letter., a, b, L, d , ,111d e. Rearra ige the letter., LVCl1cally,

s w

I'\ p I H r \ 11 i\ I

I I II I A ( l 0 H s

T Tl I Hr[ 0 H M 0 ll I

I Fv

Fr s

191

similar to the strategy we used to .:onstruct the runs in Plackett-Burman designs


(( haptcr 6). 'ih1tt the row ot letters to the left, and move the letter in the far-left
position of a row to the

f~ir

right position of the subsequent one. Similarly, for the

Creek letter'> 1r, {3, y, fi, t:. But, now shift the letters to the right, and move the letter
1n the f.ir nghl posili<>n of .i row to till' for left position of the subsequent one.
Write down the 2'i ru11'> that vou obtain from this arrangement. Check that the
number'> of runs at c;ich factor level combination of any 2 factors arc the same,
m.1k111g thi.s <111 nrthof''1n.1I design.
(h) horn thi.s orthogonal design, you can obtain the sums of square, of4 factor'>. I low
m<rn\ degrees of freedom arc assoc1.11ed with each 'illm of squares' I low do vm1
obtain the sum of,qL .ires of error, .ind how manv degrees of freedom arc a'-'>OLi
a!L'd with it? I l1'>CllS'- lww vou would determine the significance of a main effcLI.
(L) .\.'>'>umc that 2 f.iLtor 1ntcractio11., .1rL' 1m1)ort<1nt. \\'ould thi-. affcLI the main crfccl.<,
an.1lv'>i'>'.
Exercise 5 l'cter \\'.'.\1. John ( 1990) dc'>crihed <Jn 18-run experiment that involves S controllable factors, each studied at three levels.
/l

I
\9.08
11.K~

>9 77
42. I 'i

2
_\

2
}

_,
I

3
I

2
3

4n.R2
4'1.05
46.28

!ti.RO
1'i.n7
\9,)()

12.hS

11 \ l
>9 'I I
l'i.21
l'i.'il
1.H7
41\.07
46.1,~

(a) Show th;1t thi'> is an orthogonal design.


(h) Consider a 111.1in cffeLts model and determine the sums of squares of the five 111.1i11
ctkcts (each with 2 degrees of freedom). Oht.11n the m.1in effects plots. Assume
that you want to max1m11e the response. Detnmine the beo;t values for each oft he
five foLtors. ~t.1t1stil<l software suLh as l\1initab's "Stat> ANOVA > Ccner.il I in
car Model" crn be uo;cd for the calculat1on of the ANOVJ\ t.1blc ind the plol'> of the
main effects.

'10NOH llJCH,ONAI

lll',J(,.'>;S A'11l ( <l.'vJPl'TfR SOr-IWARF

19'

Section 8.2 discusses ,in interesting case study that involves a nonorthogonal design with
many factors and different nunhers of factor levels. Section 8.3 talks ab011t useful computer
software for design LOnstruct on and the analysis of the resulting data.
8.2 THE PHONEHOG CASE
\\'c would like to th.111k MMk \\'achcn, CFO of Optimost, for providing the data and for
sharing his modeling insights with us. Optimost (www.opl1most.com ) is d lcchnolog1 .ind
services company based 1n !\'cw York that specializes in comprehensive real -time testing and
conversion -rate marketing. \\'e would also like to thank Phil Nadel, CEO of Gulfstream Internet (the parent comp<1nv of Phone Ifog ), for allowing us to use this e<1se.
Phoncl log.com 1'' , owned 111d operated by c;uJfstrcam Internet (http:// www.phonchog
.mill ), is a suhsLnption-hasd service through which consumers get free long-dist,ince
phone calls. Particip.rnts sign up for the program, then cam phone min .lies hy visiting In ternet sites, cnlL'ring '11ecpst 1kc'>, or try111g new produL ts and serviLes t\s of Nmcmhcr
2001, the wsll'lll h.1d more ti lll l lllillion 11ll'lllhns.
'-,1nce Phone! log is a sub'>( "iption based service, 11 1.s important that its VVeb site Clll ,1t tr.1ct Lll'10lllcrs to '1gn up f11r its program. Phoncflog needs to learn which advert1.,ing
Lnp1e,, ()ffrr,, .ind 1111.1gcs 111l 1L,1 .sc the 'ign UJ' rate of Internet surfers who come in u111t.1L1
with its \\'ch site. I xperimc11 1 >arc run continuallv to determine mndific.ations to the currrnt ">land.ml (ba,Llinc ) '>tratl'g1 with the hope of improving the sign-up rate. This cc1se
foLusts 011 I 0 differL'nt .irt'<l' A the Phone Hog \,\cb site displayed in Figure 8.1: A-top and
A bottom 1111dge ol the headline, H (subhcadline), C (main copy ),/) (form), t (privaLy
copy ), I (submit button ), C (l1ow it works section), f-f (main image on right -hand side ), and
I (footer). The choices for cac h area arc described Ill Figure 8.2. ror example, there arc.four
different choices for the top image of the headline (A-top): the baseline showing pictures of
tive people making calls and the word Phoncl-fog written next to them, and three C\perirnrntal versions (picture of a hog's head peaking through the "O" on white background;
same pill me on hi uc hac kgro11 nd; pi ct urcs of rive people calling 1vi th Phone Hog's logo to the
right). There arc I0 choices for the bottom area of the headline (A-bottom): the baseline
"Let our advertisers pay for your long distance calls," and nine experimental versions. 1\rea
H describes the main image on the right-hand side of the Web page; in addition to the baseline picture showing a rntatirg flash image of a woman on the telephone three cxpcrirncn
tal pictures arc considered: a hog standing in a telephone booth cxtendi.-,g a phone, no im age ,11 .ii I, .111d ,1 p1Ll ure '>howi ng ,1 brief summary of the high I igh ts (yes Inn) oft he progra rn.

8.2.1 Design and Resulting Data


Phone Hog experiments w1 hits Web page continually, and many studies, called "waves,"
have been carried out in the 1ast. In the specific experiment that is discussed here, '>cveral
of the test levels in hgurc 8.2 were not studied. ror example, only the baseline land the five
test versions 4, 6, 8, 9 and 10 were considered for the bottom area of the main headline
(,\bottom ). The .ILtlVl' levels for the other factors arc indicated in the last column of
l.ihk H.I.

!I http:l/agency.op\imosl.com - PhoneHog.com

Microsoft Internet Explorer

GJ@JLRJ

PhonHOG
... leuse complete

th1~

bf 1ef form so that we can email you1 free

otlhng ard tJJ you =d provide you w1tl1 free


PhoneHog.con1. As a member of PhoneHog.cor
rrkAflY l?).Ctt1ng Oppot t.Jnttl8'c; to e.arn free bng d
new calling C3"d ... ,1 ink yo1 '.

us

7~;;'~1";,,{f

a"ld

r1rst Name.

: -'

Jlfi ::_.: :
.i

1r,

'wd receJVe

;.r:;ir~._,.

lls on yoLll'

& p*3ase
~Jame

E-tna1I

, Retype Password

Password:

Gende1. 0Male Q~emale

Zip (:ode:

IIowitwEJ

Jo1LllJ:] PIT...-1eHJ1? 1$ f.j~~t, 1~asy ard

we'll nstant!y ernatl your tree Ph

Mdk_

'i-

oJl l-::

:;~:G-:"--_,,::.
.~,.

run ,my phone.

- you're.';! member,
. 1_ lhng Card tIJ you.

~ ir1t ~1rn" jf1'1wt:t~rE

~ Oon

Figure 8. I
,\

f4D

Internet

l'honl'1 log' \\cb P<1gc: J c,t Arca lJJ.1gr<1m

l~1llorial

e.\pll'i111ent th,ll com1Jer-, <1ll levcl co111h111alilll1.'> ol thL''L' I() l~1Ltors requires
1,658,880 rum. ()f course, Il IS llllposs1hJc lo 'ludy alJ

(6)(1 ( 3)(4)(6)(4 )( 6) (5)( 4)(2)

<.0111h111ations. Only <I small fraction of the faLlorial experi111l'nt L,111 hL Llln,idered. \n L'\
pertmcnt with ju,l 45 rum was earned out. The 1uns (wh1Li1 JrL' rckrrcd lo as "Lrc.1t1\n"
arc Ji,tcd in Table 8.2. Creative 45 uses the baseline level lor each of the I[) areas; the other
44 runs arc test versions where one or more baselines are changed to test ver,ions.

The 44 test runs were offereJ lo Internet users random!) <111d with equal probabilities,
while the probability of the baseline (creative 45) was four tIJ11es ,1., large. I he 45 different
creal1ves were made available over a 2 week period. During this period, PhoneHog recorded
then umber of dist111ct visitors lo the Phone! log Web site ( VJL ITS) and the number oft irnes
visitors clicked the subsequent page to nhlain additional information tel ICKS). The diLk

through rate, CTR


in 'I able 8.2.

CLICKS IV l ~l'\S, measures the success uf a run. \'he results <He -,hown

Top Image i11 headline

\aim..,
h

- 111'

PhonPJm . . 1(11

1
-

Bottom Image inhcacllinc

~
l v~Jue;
b, 1 ~t

Let our a~~ertisers pay for your long dis'tance cal(s

11nc

~ I

.-

.. -.

\'

'

Get free lqng


distance calls
"
~. . '

Join .for Free and let our advertisers pay for your cal/S.
'

-,

Stop paying for long distance calls


Get free long distance calls for trying new products and service,s
National adv.ertisers want to pay for your long distance calls .
~"

h<>W! n

"~- i~ .

How to get !1f!lional advertisers to pay for your long distance ca/If,
.

Make long distance calls for FHEE


'~

;.

re It>~

' '"ho"'"''

figure 8,2

let our advertisers pay for your long distance ca/Ii ,

IT:r:G f[if !r; fi f(;Tri rr,cr

Join Now and Start Calling for lree(

J\,1sclinl' and TL'st Versions for the JO rest 1\rcils on PhoncHog\ Web Page

In Chapter 5 we learned 'iow to find good orthogonal fractions of 2-lcvcl factorial designs. However, the situatio11 is different here as factors with many dilfcrcnt levels arc involved. This makes it difficult to find orthogonal fractions that have a rur;onable number of
runs, Other design criteria, different from orthogonality, must he used to dctermim the
fractions.
The experiment involves 10 categorical factors, with varying numbers of factor levels.
The numbers of factor level , are given in the last column of Table 8.1. Considering main

m
196

NONORTHO(;ONAL IJFSIG:-IS AND COMPUTl'R SOI I

\~Alli

SubheadHne

Values
Earn free long distante 1:alls by visiting our advertisers' ites, entering
and trying new products and services.

~weepstakes

It's easy to earn free long distance calls by visiting our advertisers ....,l_te_s-=~-----0----1

Earning free long distance calls with PhoneHog is easy. Just dick, register or
If)' a ne" product. Then start calling for free!
Get free long distance calls by visiting our advertisers' sites,
--+~e~n_tc_ring sweepstakes and trying new products and seni_c_e.s_--~----_,
Earrd'ree long dbtanct' calb by Yisiting our ad,crtist"rs' silt''>. t"nttring
s~eepstakcs and trying ne" product'> and senkcs

"">bllll-'-k-~---

m:Copy

r. ...

Please complete this brief form so that we can email yot..r free calling card to
you and provide you with free membership in PhoneHo<JCOm A~ a member
of PhoneHog.com, you will receive mdny exciting oppo unities to earn free
long distanc_e ca_!l.s_o~ your new calling Cdr~. T:!:'.ank you
Please take a minute to J_o1n now

r .....,,

--------

Complete the brief form below to join Phone Hog today for free. We II
instantly email you a free long distance calling card. As J member of
PhoneHog, you will receive many exciting opportun1t1e~ to earn free long
---+-d-is_ta_nce calls on your new calling card. -----~~
Pleas.icomplete this br-ief form so that we can email a free Tong distance
calling card to you and provide you with free membership in PhoneHog.com
As a n.ember of PhoneHog.com, you will receive many Pxciting opportunities
--+--oto earn free long distance calls on your new calling card
_,,
Sia rt earnrng-freeiOrlg distance today. It take5Tess than 30 seconds.

!1'

Figure 8.2

Continued

etlechalone,weneedtoestimate 1 +(6
(4

1)+(6

1)+(5

1)+(4

1) + (4

l) i(2

1) -+ (3

lJ

+ (4-

1)-t-(6

I) +

1) =35e ffcd.'>.Co11s1deringcreat1\e+F45

as the baseline, the regression formulation of the main-effects model includes a constant, 5
indicator variables for A-bottom (uh4
1 for version 8, ub9

for version 6, uli8

3 indicators for A top (ul2


and

'iO

1 if test version 4 is used and 0 otherwise, ub6


1 for version 9, and ab 10

1 for version 2, ut3 - 1 for vcr, 011 3, al4

I for version I 0 ),
I for \crsion 4 ),

on. A minimum of JS runs is needed to estimate the 35 coefficients. Of course, more

runs are required

to

get better estin1,1tes of the main effects, but the cost of too many .idd1

tional runs may be prohibitive.


lhe experiment in Table 8.2 contarns 45 runs: a control run (creat ive zt 4S) and 44 test
rum. \Ve want to select the levels of the test rum such th,ll the runs '\over" the region of
interest and allow us to estimate the lll<tin effects with lll<lXlllllllll precision. U optimality
and .\-optimality arc two design criten,1 th.it selc...t the \eve\, ul-the design 111.1trix with exactly this objccllvc in mind. We discuss these two uiteri,1,

t)w rn11nect1om between them,

I Value
OnP accmnt per pRr~on,

S .-;nd Crll?ad1a1 Rec; de.ntc; only plertse


~

1r< t J'.Jdn11
f

rnM

r.,<-

'1/o/!I

l.-lst !\JAmf'

1"'1p Cnrli
I.

Cr>ndpr

0Mrlle ()FemalP

~11

C)np cK(OUrlf p1~1 .(Jf?r'i(lfl,

I :- df?d CrH?dCi1ar1 R&!:.Kinnt<> only p/f'>ac;e


n)dif

'

fla;sw11rd
f

~J.1nt1

ir~I

Zip c rnlt-l

Gnn0Hr OMalA 0Fernale

It'~'

(Jr1F

.trrl
t1r~t

Nan11

atr, !tff1! ;rur /Hr<>or1,

c~1c-/1,-u

RP<.l(N~nrc.

only 1-1/f-!,J<iR

Last Ndmo l

Zr . . onP

(,t'->nrler

0MMle OFemalA

EmaI
rni,.c;w11rtl

>-

I icq~-

One accoUl7t pRr per'5on


U and Cat?dd1an RPsdent.- only please
First

N~mP

RHtypP Pas,>wuni [
c.~~r1dpr

OM.1IP 0FPmale

One trep c al/Jnq card pAr pRr-;on,


t / . :.~

nrld ranacfl.ft'? RPc;lc/ent-; ontv please

F If :-;.t J'Jcirn

Last Na mp

Retype Passworrl

rrk t

Zip cnG;

Grnder.

Mnle

Female

(Jnp f>'t>P cr-111,riq (.;rd /7RI /:']RI c.on


.1. ,. rlOt.i L anJt /1df7 R.<-><; ir~nts only plP.~'>H
In

t f',Jdrru>

I.

f-1'

hgurc 8.2

ri.-111
W(lf

Continued

,111d rnmputcr programs to onstruct such designs 111 Appendix 8.1. JM fl, through ih u1s
tom dc;.1gn kature, 1s able to generate a I) optim<il design for a given number of rum (-,cc
Section 8.3).

Ohser\'c that the resulting design in Table 8.2 is no longer orthogonal, and that thi-. fact
has consequences for the subsequent data analysis. Also, note that the main-effects model
ignores all interactions. The main-effects interpretation could be seriously wrong if interactions are present.

198_1

NONURTllOC;ONAL IJLSICiNS ANIJ CO.~IP~LR SOI l\VAHI

_J

Values
By clicking below I agree that I have read and accept the Member,;!l!J2.1.
Privacy Agreement. I understand I will receive emails from PhoneHog with
opportunities for more free minutes
By du;l(1ng below I agree that I have read and a<.:cet ttie Membership & P11vacy A;n:e1r1en1
Hf_!_1ai1:-. lrorn PhomiHog with opportunities lot more tree rmnutes

t
l

.II

ur1llt:rs\;md

will fL"(;C1vl.l

Send me a free calling card! By clicking below, I agree that I have read
and accept the Membership & Privacy Agreemgnt. I understand I will
receive emails from PhoneHog with opportunities for free long distance
calls
Send me a free calling card! I agree to the Membersll1p & Privacy
Ag_reement. I understand I will receive emails from PhoneHog with
opportunities for more free minutes

Submit Button

Values

~
[

Click here to Join

Jl>lllllc:k~l

Join Today for Freel

~lick Here to~

I 1111agcl

Oick Here to Join!


Flashing
I want tree Iorig d1ste.nce ce.lls
~111Uc;.11J

1-igure 8.2

Send Mee. Free Calling Card

( .011t111uLd

8.2.2 Analysis of the Data


\'\'e use Minitab's "~tat ANOVA > Ceneral I inear Model" command to estimate a
nw111 tj}ccts model lor the click through r,1ll'. Other st<1t1st1Lc1I softvvc11e programs (such as
J,\1J> work in a similar fashion. In this p,1rt1rnlc11 example the data 'ct rnm1-.h of 11 LOI
Ullllh: the column ol the response (d 1Lk through rate), a1HJ 10 columm that contain the
categories ( leveJ.,) for caLl1 uf the I tJ factor-. (1\ hot torn, A lOf' ... , {). I or e\a111ple, the col
u111n lor A bottorn Lontaim the six Lategone-, (leveb) I, 4, 6, -;, L) ,111d 10, 1u>t like the one 111
'I able 8.2. ,\.1 initab produces the anal:si-. olv,rnance table 111 ! able 8.3.
1 ltL analysis ol \ c11ianLl' table look'> slighth ditfrrent l'ro111 the t.1hln in Ch,1pll'r'> ) .ind 7,
as 1t ,how-, two different sums of squ.1res. lhe sei1ucntial su11.i.- ofs</lltll'<'\ ('ieq SS) lllL\l'>Ure
the expl<1natory contribution of e<1d1 fallor as laLlors <lrL' added 'lequcn\1,illv lo the mmkl.
hir e:-.arn\l\e, A b~lt\0\1"\ ex.\)\ain~ \l,.\.'i7bl\ o\ tne tot.\.\ \lari,\' 1m\ in the '-\'11._k-thrnu't!,h rntc
I
( 'iS"J
\ 0 \.23~3) when it ts the only factor in the model. l he model with A-bottom ,md

------ -

Tu

- - - - - - - - - - -- - - -

Ho"' it works sc<tion

---- -

""~l ,:r"i:.

----

----------'

-=-=-= - - -

oining PhoneHog 1s fast. easy and FREE Once you 're a member, we'll I
stantly em11I your free PhoneHog Calling Card to you
,
ake your free calls from any phone, anytime, anywhere
j
1111" ii "orh .:
1Joining Phor eHog 1s fast, easy and FREE Once you 're a rr:ember, we'll
I instantly errail your free PhoneHog Calling Card to you
Make your fee calls from any phone. anytime, anywhere .

:'l.o\I hct\ll'l'll rnriahlcs B anrl ('

it"" works:-

I
[

How

Jo1mng PhoneHog 1s fast, easy and FREE. Once you're a member,


we'll instantly email your free PhoneHog Calling Card to you.

Make your free calls from any phone, anytime, anywhere.

Ji n\\ If \\ nrl..

IJoining Phon eHog 1s fast. ea sy and FREE

Once you 're a member, we'll


instantly email your free PhoneHog Calling Card to you
Make your free calls from any phone. anytime, anywhere 1n the world.
ever pay a long distance bill again'
_
_ __ _

fll

How it works:

Joining PhoneHog 1s fast , easy and FREE. Once you're a member,


we'll instPntly email your free PhoneHog Calling Card to you.

Make your free calls from any phone, anytime, anywhere.

I'\ow lwtw1tn variahles B anrl ('

--

-- -

___

.Footer

----~
How it works

Figure 8.2

Member Login

FAQ

Contact

Cont1nuLd

A-top explains 9.7248 of the '. ariation, implying that the additional contribution of 11 top
is 0.8484. Or, to say this diffe1 l' ntly, A-top explains 0.8484 of the variation when it is .idded
to the model with A-bottom. The factor fl explains an additional I 0.9232 when factor fl
added to the model with /, bottom and A top, and so on. Sequential sum' of squares
alwan .idd up to the regre-,s1(1n '>ttm of squares that is explained by the largest model IVith
,11! factor'>; the rcgrc-,sion sun' of squares of the model that includes all factors is .\~ I
SS (crrnr )
101.2383
~ . 12 '.
93.8180.
I or ortl10go11,d dlsigm, th -,um of squ.ire., of ;1 factor docs not Lh.inge if other Lil tors
1s

arc present in the model. f lei ,_ e, we

c.111

assess the importance of a factor bv comparing it'>

rcgns-,ion ulntrihution (whid1 I'> unconditional of other factors ) to the total variahilit:. In
the 11011orthngcrn;1I situ.ition this 1-, no longer possible, a'> the contribution of a l.1Llor
Li1,111gcs with the factor'> that .ire alrcadv present in the model.

Au I I

8. 3

A.NOVA (Regresswn) lfrsults for CIR CLICKS! VIS/JS

Analysis of Variance for CTR=CLICKS/VISITS(%),


using Adjusted SS for Tests
Source
A-bet
A-top
B

E
F
G
H
I

Error
Total

Seq SS
8.8764
5
0.8484
3
10.9232
2
6.3376
3
7.0235
5
27 .1948
3
1.3796
5
7.3321
4
4.8428
3
1
19.0596
7.4203
10
44 101.2383
OF

0.861413

Adj SS
8.6401
0.0146
5. 9607
9.3502
7.3990
14.9875
3.5688
5.6330
3. 6671
19.0596
7.4203

R-Sq = 92.67%

Adj MS
1. 7280
0.0049
2.9803
3 .1167
1. 4798
4.9958
0. 7138
1. 4083
1. 2224
19.0596
0.7420

2.33
0.01
4.02
4.20
1. 99
6.73
0.96
1. 90
1. 65
2 5. 69

0.120
0.999
0.052
0.036
0.165
0.009
0.484
0.187
0.240
0.000

R-Sq(adJ)

67.75%

rl.ic partial (or adiusted) sum of squares (Adj SS) measures the explanatory contribution
ofa faLlor as this factor is added last to the model. Fornampk, the adjusted sum of -.quarcs
offallor 13 (5.9607) in !able 8.3 is the rnntribut1on ot taLtor B ,hen 1t 1s added to the model
that dues not include factor B (i.e., the model with ti bottom \ top, ( through/ i. lleL,lLht'
of lhL nonorthogonality, the partial (ad1ustl'd) sum of squares is d1tkrcnt from the scqucn
ti.ii '>lllll of squMl''> (\\'h1d1 '' 10.92.~2 for faLtur HJ
l'hc degrees uf frecdu111 of eaLh f.tllor sum of squ,ltt's LOI Il''>Jllllld lo the nurnlitr of
indiL.tlor variable-, that ,ire needed tu rqirtscnt the leveb of th.it faL101. I or a faL tor'' 1th u
le\Tk the degree-. of freedom arc u
I. 1 ht ,1d1thll'll 111t\lll 'quart's arL' oht.t111nl b) di\ id
ing thl' artial (adju,tcd) sums ot squ.trcs by thl'1r dcgrl'c'> of lrted!lln.
A sensible strategy fur .tssessing the rmporlanLl' uf the vanolh fadors 1s lo (t>n.rn!t thl' 1.1/

.111sted mean square'> and their associated J.-statistics and probability \,iiul's. \\'c notiLl' that
the factors A-bottom, H, C, f;, and I affect the srgn-up rate. The weakest LOlllJ.loncnt among
these five factors is A-bottom, with probability value 0.12.
\\'e need to find out which levels of these faLtors are benefiLial. The main-cflccts plob in
hgurc 8.3 show that the best level for ;\-bottom 1s 4 ("~top pa1'1ng !or long distance Laib').
Level 3 works best for subheadline Ii ("Faming free long tfotanLL' L.dls '''1th Phoncl lug is
easy. Just click, register or try a new product. Then st<1rt calling for frcc 1"). The simple 1mi
tatio11, "Please take a minute to join now," works best for the rnarn rnp) (lcvcl 2 nffaLtor Cl.
The -.mall-font privacy line (level 2 off;) works better than all others, perhaps because 1t
does not highlight the "fine print." Level 2 uffactor I (providing link buttons to get tu more
information) is quill' successful in enticing viewers to visit the subsequent Web pages.
Hence the best factor level cornb111atio11 rs gr,en b~
(A bottom

4, H

3, (

2, I:

2, I - 2)

The m.l!n-effrcts plots were obtamed Js an option in Minitab' "~tat , 1\NOVA __, ( ;cncr,d
Linear Model" command. Minitab dislays the titted means from the main effects regression model with an intercept and 0/\ indicator variables for the ab-,cnce/prescnce Llf the

\lop

:\ hol

2(1

----~I
\~
l,
'

l'I

IH

l)

T -,---

T
I

'6

10

II

IH

-----=---

l/\~
..
I
-r

fi

Ii

--,
I

Pigure 8.3

---,---l-

I
"1 / - .j /

20

fl

'

Main r.ffccts Plots for Click-Through Rate

various levels (3'.'i rncfficienh n total; sec Section 8.2.1 ). The fitted mcam Jre different from
the mcam that arc obtained b:,1 averaging over all other factors. However, the differences arc
usually minor as long .is the cit-sign i'i not too different from an orthogonal design.
1 he results of the main-efkcts regression model with the 5 identified factors arc shm,n
in !able 8.4. Substituting (A-bottom
4, fl - 3, C
2, r - 2, f
2) into the estimakd
cqllat1<1n lead' to the prediLted cliLk through rate
17 ..\671 i o."i.\.>h

o.wn2 1 o.767.\ +- I.4663

1.5)14

22.56.

I hi' rtprcscn1' .1 ~; .. .,, llllJ'1t11cmcnt mcr tht cl1Lk through ralt' of till' current ha<>t' Incl
(creative# 45; cm
lo.65).
Comment. Lxpcnmcntat1cn at Ph on et Jog 1s a cont1nu.d activity that tries to impnl\e on
the current best rcwlts. Our best factor-level combination becomes a baseline for the next
set of runs. Also note that the w11incr among the 45 studied runs is creative# 32 . Its click
through rate (22.72 1Yo), and its factor levels arc quite similar to the predicted click-through
rate and the factor levels of the best strategy. It certainly makes sense to also include this
winning strategv (A-bottom= 8, B = 3, C = 5, E = 2, I= 2) as one of the crcatives 111 the
next wave.
There is a lot ofunccrtaintv in our analysis. There <lrc many factors, tre factors are categorical, and there arc many f.1ctor levels. Our assumed main-effects model may he 111..:orrcct. It may well be that noth1rig works except for one single specific corPbination of foctor-

lcvcls. If we arc lucky to have tnis particular combination as part of the exr.erimcnt, its result
will stand out as the clear Willner. A situation such as the one describ1:d here introduces

204

NONORTHOGONAL DESIGNS AND COMPUTER SOF 'IWAI<E

TABLE 8.4
Model Coefficients: Model withA -hottum, 13, C, E. and I (baseline 1 selected as reference level)

The regression equation is


0.515ab9+ 0.056 ablO
CTR(%) = 17.4 + 0.534 ab4 - 0.905 ab6 + 0.243 ab8
+ 0.893 b3 + 0.375 b6 + 0.767 c2 - 0.160 :: 5 - 0.165 c6 + 1.47 e2
- 0.195 e3 + 0.035 e4 + 1. 5 3 i 2
Predictor
Coef SE Coef
17. 3671
0.5368
Constant
Factor A-bottom (baseline 1)
0.5336
0.5120
ab4
- 0.9047
0.5244
ab6
0. 2427
ab8
0. 5220
- 0.5153
0.5287
ab9
0.0556
ablO
0. 5146
Factor 8 (baseline 1)
0.8932
0.3576
b3
0.3752
0.3472
b6
Factor c (baseline 1)
c2
0.7673
0. 4111
0.4037
cs
-0.1604
c6
- 0.1648
0.4056
Factor E (baseline 1)
eZ
1. 4663
0. 4118
- 0.1946
0. 4132
e3
e4
0.0347
0. 3977
Factor I (baseline 1)
1. 5314
0. 2968
i2

s =

0.938145

R-Sq = 73.9%

p
T
32.35 0.000

1.04
- 1.73
0 . 46
- 0 . 97
0.11

0.306 (largest)
0.095
0.645
0.338
0.915

2.50
1.08

0.018 Cl argest)
0.288

1. 87
-0.40
- 0.41

0.072 (largest)
0.694
0.687

3 . 56
0.47
0 . 09

0.001 (largest)
0.641
0. 931

5. 16

0.000 (largest)

R-Sq(adj)

61. 7%

complicated inter<ictions, and a mGin-effects analysis and it> i1'1plied hcst levels could he se riously flawed. So, ;tis a good idea abo to include the winne" :.is one ol the creatives in the
next wave.
Alternatively, imeractions may not matter and the result may be affected only by main
effects. furthermore, because the experiment represents a very small portion of all possible
level -combinations, it is very likely that the best combination snot part of the studied runs
of the experiment. In this case, our main-effects model and its implied best factor-level
combination will outperform the winner in the experiment , supporting a strategy of in cluding the best creative in the next waive ofexperimenb.

'fo be on the safe side, we recommrnd that both the best and the wi1Ining combinations
a1-c included in the next stage of ex pcri men ration. Our approaLh tu Lksign i 11g expertmcnb
is powerful but not foolproof. The key is to experiment. Missing something occasiu11ally or
including something that eventually turns out to be unnellsary will be of small conse quenle compared to the accumulation of insights over time.

8.3

COMPUTER SOFTWARE FOR DESIGN CONSTRUCTION

AND DATA ANALYSIS

Ust>l'ul tools for the comtruction of designs and the analysis of"the resulting data arc included
in must statistical software packages. Our discw,sion in this section focuses on two general
statistics software programs with strong process control a1;d design components : JM P

NONCll!IJIO(,()l\iAI

(The 'itatistiL,11 D1stmcrv

~oftw<1re,

fll'Sl<.NS AN!l COMPUTER SOf-TWARE

20S

http://www.JMPdiscovery.com) 1s SAS Institute's

p.1Lk.igc for L'Xplor,1tor1 d.ita .inaly'>is. ,\11n1tah (http:// www.minitah.mm) is distributed hy


\l1n1t.1h '>t.llistic.il ~oftware. 1'11e emphasis of the following discussion i' on design-rl'l.ited
a-,pech oft hesc program.,.
\1.1111' other U'>eful <;o(twarL' paLk.1ge., '>f1L'CifiLall\ t.1rgcted tn the cons.ruction ofe\pcri

mental designs and the anahsi'> of the resulting data arc also available Design-EilSl' .rnd
lk'>1gn -I xpcrt, d1.'>tnhuted hv '>tat Fase (htpp://www.statease.com), '>hare many of the fc.1 t11re'> fnund in l.\IP .ind [\11n1t.1h.

8.3.1. Minitab
\ 1111 it.i h ma kl''> 1t L',lS\ for the u scr to construct l lcl'l'i fartonnl and (mctionnl /i1ctonr1/ de
szgns. The user enter'> the number offactors and is then presented with a list offull and traction.ii designs and their run '>.fl''>. After deciding on the number of runs, the user can LOil
.,truct the design l'1ther through default generators that optimi1e the resolution of the
de.'ign, or h\ -,t1pul.it111g >J'eLif1L gener.1toro,. In either '>ltuation, Minitah obtains the dL-,ign
Lolumns, d!'>pl.w-, 11 de-.1rnl ,1 random11cd arrangement of the runs. and indicate-. the
wnfound1ng patterns of the part1Lular fr,1Lt1on that is selected. If default genercito1-. .ire
used, \1111it.1b di-,plan the .idopted generator'>. The user can add center points, repliLate the
design, hloLk the l'\perimcnt by '>pecifying blocking gener,1tors, and modify the design by
considering foldovers. These, an he complete (full) foldovers where the signs of all factors
arc Li1,rnged, or foldon'r'> ofindi\1dual factors. Min1tab displays the confounding pattl'rn of
the modified design.
i\1initab can also construct Taguchi orthogonnl array designs for designs with factors at
three or more levels. rhcse designs include 3-level factorial and fraction,.! factorial designs,
Latin square and Craeco Latin square designs, and mixed -level designs su ch as an 8-ru1i de
sign with two 2 lcvel factors and one 4 level factor. However, Minitab dics not specifv the
1mpl1ed contoundlllg pattern' for these designs. The user needs to understand that the I <llin
.rnd Cr.1cco Latin squ;irc de-,i gns as well as the mixed-level design mentioned here arL' orthogon;il main-cffcll"> design' 1hat leave main effects and 2-factor interactions confounded;
refer to the di..,cu.'>S1on 111 'iect on 7.6.
:\11n1tah also comtructs rnpo11sc surfacl' 1frs1g11s (Lentral composite and Box - Behnken
designs ) and 1111xt11rctles1g11s ('1mplcx and lattice designs). Response surface designs arc use
ful for titting quadratiL models that arc subsequently used to determine the optimum rnn
dit1ons oft he respome. I or further Jiscussion, sec !lox and Draper ( 1987 ).
()met hL' design h.t'> been l arned out, M 1n1tah foLilitates an cffiL1ent ,rnalysis of thl' d.11.i.
l he analys1'> of 2 lc1clfi1rtorwl and fractional jnctorinl designs includes t";tirnatcs of the cf
feLts (as \\ell a'> the rcgressio11 coefficient'>, which arc one half of the effects), standard er
ror-, of the est1matl'll cfkLt'> 11 rcpliLations .ire a1.ulablc, and the ANO'vt\ t<1blc. In till' 1111
replicated s1tuat1on the user can omit model coefficients (e.g., by specifying a model that
conta1m only certain -.clcc!L'd main effects and interactions). Minitab combines the omit ted effects into an estimate of the variance and calculates standard errors of the cstimatl's.
Normal prohahilit\ plots and Pareto plots for assessing the importance of the effects, as well

206

NONORTHOGONAL DESIGNS AND COMPUTER SOl'TVIARE

as main-effects and interaction plots for assessing the nature of the relationships, are read ily available.
Programs for determining the sample size are also available in M initab. In uddition to
the sample size determination for 2-samplc comparisons of means and proportions (which
we discussed in Appendix 2.1), one can obtain the sample sizes in the one-way ANOVA
model (see Section 3.2) and 2-level factorial and Plackett-Bur:nan dt:signs.

8.3.2 JMP
):VIP is an equally versatile and useful software package for Lht: L011struction and Lhe
analysis of a variety of experimental designs. It can construct Juli jc1ctorial designs for

speci -

fied number of factors with different numbers of factor levels. The analysis of the resulting
experimental Jala includes the AN OVA table for testing the sig nificance of main and inter actions effects. JMP's output is very similar to Minitab's ANO\/A output.
The screening designs in JMP are particularly useful. )Mt- ,dlows the w,er lo construct
2-level factorial and fractional factorial as well as Plackett-Buman designs. For a spl'Cined
number of 2-level factors, the software offers a list of available 2 level arrangements with
their implied numbers of runs. The software allows the user tq rl'view J11d Lh,tnge thl' grn
erators of the fractions and the available blocking arrangemer ts. The software displays the
confounding stru~ture that is implied by the selected gene1 ators. Center points can be
added, the design can be replicated, the order of the runs can be randomized, and the design ur parts of it can be augmented by various foldovers.
JMP facilitates the estimation of user-specified models once the data from the experiment have become available. Similar to Minitab, the user can omit certain main effects or
interactions from the model, calculate a variance estimate by pooling the omitted effects,
and compute standard errors of the estimates. )MP has excellent graphical capabililits; the
prediction profiler allows the study of main and interaction effects, and normal probability
plots of the effects are easily constructed.
Scrt:ening designs provide the user the option to construct full and fractional 3-level, and
mixed 2- and 3-level designs. Similar to Minitab, )MP does not pruvidc 111!orrnatio11 011 the
confounding structure of the 3 -level (or mixed 2- and 3-level) fr<1Ltional facturi,d desigm.
JM P includes procedures for constructing and anaJyzrng rcipunse rnrfact' dt'szgrzs, 1111.>.:turf
dl:'sigm , and 'foguchi designs. It includes prugrani:. fur sampll' .'lze clcu1111111<11to11 111 the q11e
way /\NOV/\ situation, but it does nut determine the sample sit'.e in 2 level foLlori,d and
Plackett-Burman designs (as is done by Minitab).
Another useful feature of JMP is the construction of cuslo!ll dc:iigns. After entering the
number of factors and their levels (which can be either categrJtical or continuous), and after specifying the

d1~sired

model (in terms of its desired mair. effects and interaction com -

ponents), JMP determines the minimum number of runs th"t are needed to estimate the
model coefficients (this is referred to as the minimum solutiot1 ). lt also calculates the num ber of runs that <.)re needed when combining all possible ,factor lcvcb into a factorial
arrangement (this is called the grid solution). Furthermore, fw a given >pecitied number ot
runs (not less than the minimum number of runs needed) JMl' constructs designs that are

_ __

_ _ _ ____ _ _
N_o_NoRTHOc,oNAl

rirs1GNS

AND COMPUTER sorTWARE

zo7

optimal with respect to certa'n optimality criteria (such as D-optimality). AD-optimal design rninirnizcs the dctcrrni11ant of the wvariance matrix of the estimated rnodel rneffiucnts, 11 guarantee' elfic1ent cstirnation oft he rnodcl coeffiuents; see Appendix 8.1 for further discussion.

8.4

NOBODY ASKED US, BUT .. .

\\'ch ,jtc dt",ign pro1itk'> an ide;1J area for applving experimental clesign methocls. In the
Phone! log Lase, the umsultar:t and the rnrnpany decided to test IO factors with each factor
,it 111,111: le1'l'1', leading to ,1 11011orthogon<1l design that required a more cornplex analys1'>
uimp,1red to 2 level fraction ii factorial or Plackett-Burman designs. Ir the online sLtting,
this was a semihle approach. Although it rneant creating 45 different W~h pages, doing so
was not prohibitively d1ffiudt or expensive. 13ut 2-level designs would be useful also in this
setting. Many factors could lw sueened in a large resolution TII design to identify the likely
few important ones. 1 hen supplcrnentary 2-level experiments could be carried out to test
.1dcl1t1011,d .dtern.11i1e' 1m kc1 f.!ltor<-., or the method' nf C'haptcr 7 could he med to te't
'ome of these fallor'> .it more than two il'vels while maintaining an orthogonal design.
( omputer ,oftw<1rc h<l'> 'i111plif1cd tlw construdion of suitable designs .111d the efl1uent
,rn;1lv"' of the l'L''ult111g d.11.l. ' omputcr ,nf!l\.ll'l' ;ivoid' tediow, hand (or t.1lcu);ltor ) L<llllputat1ons and 11 '11lljllitics th' urnstruction of graphical displays. Use the L<lmputer to your
.idvantage, hut do not tru'>t it s outcorncs hl1ndlv. Check the reasonahlen.>ss of the results, ,1s
result' depend on nrndcl assumptions that rnav he violated. There is no substitute for com'>L'll'>l'
lxccl has heuimc thL stand.ird computer soft war~ for business analysis, and you prohabl)' me 11. h;ccl 1s appropnalL' ,111d useful for simple analyses, but it is deficient in situations
that require a more sophisticated approach. rortunately, good statistical software packages
(such as ivlin1tab, ~ t\~, )Ml', ~!'SS, and R) arc available.
Much can be learned by studying a textbook, reading case studies, and solving end-ofchapter exercises. However, 11 has been our experience that to really master the material,
you must apply the methods 1n the real world. Cet out and experiment 1I )iscovering the unexpected is more important 1ha11 confirming what you know.
lllOll

portz111t. For example, <hsUllll th,H one desires lo study the main effects of three factor-,, each
with t\\'O levels ( I and+ I). It 1-, straightforward to write down an orthogonal design 111
1\' 8 runs, and among8-run designs, this design allows us to estimate the main effech with
the lca.-,t \,iriahilitv. 1lowevcr, .i-,sume th,it one wants to estimate the main effects from the
result-, of just/\' h run-,. An orthogonal design in 6 runs docs not exist. and one needs another criterion to determine the optimal design. One can select a D-optimal design and use
avail,1hlc cnmpulL'r o,oftwarc such as the custom design fc<Jturc in }MP ) to determine the
lcnk The note helm, on computer software explains the algorithms that ,1n: used hv the
prngr,1111s to find them. Using JMJ>, we obtain the following levels for the 6 runs:
R1111

J,h IPI I

'

!," lnr \: '

2
1

The m.1trix Xis obtained h~ :1dding to the three rnlumm of levels ii column of ones. ('heck
th.it the matri\ X' ,\ i-, no lo11gcr diagonal, ,md verifv that the dctermirmt of its ill\crse is
I/ I,024. Convince yourse lf that ,1ny other .irr.1ngcment will result in a argcr determinant.
hir e\amplc, change the level of the first factor in the first run from I to I. Repeat the m.1
tnx algebra, and you will find th<1t the determinant of the resulting (X'X) 1 is 1/.1'18 and
larger than I/ 1,02-l.
( )ur d1-.cus-.1on L111ph.hi1n the important role oforthogonalitv in de-,ign of experiments .
.\n orthogon,1l tk'>lg11 1-. ,ilso ,1 I) (and 1\ I optimal 111,1in effect' design. However, for cl'rt.11n prohkms ,rnd run sill'> orthogo11.il de-,igns m;n not exist, ,rnd in these situ,1tions
{) .rnd A optim,ili11 hcconH 11-.cf11I dcs1gn cr11L'ri,1.
Observe that A .md ll-opt1m;1l1ty critcri.1 are model specific, a-, thLv look at the 11rLcision of the coefflcients in a ccrt<1in specified model. Herc we discuss the main-effects model,
hut extensions to model-, th,H include interactions or quadratic components of factors with
more than two levels arc possible, and software for finding the optimal designs is readily
.ivailahle.
Categorical l>csign Facto1 s wit Ii More Than Two Levels
t\ main-effects dc'>ign with categorical factors at more than two levels can he paramctcr11ed in terms of a regression model with an intercept term and indicator variables that express the absence/presence of the various levels of the design factors. C:rnsidcr the <>pcci,11
situation of2 categorical factors with 3 and 4 levcls. One pi!ir of levels (cg., the first lcl'cl of
factor I and the first level of factor 2) becomes the standard against which all other f.ictor
levels arc compared. The rcg;-ession model includes k =- (3 - 1) + (4
1) = 5 indicator
variables that express the pre-;ence/absence of factor levels 2 and 3 for factor I, and f.ictor
levels 2 through 4 for focto1 2. The vector of regression coefficients (3 is of dimension 6,
c1nd we need a minimum of 11 runs (N 6) to estim<Jtc the coefficients. Of course, better estimate'> of the main effects wuld be obtained if more runs were available. A 12-run full foctorial design with il single run at each level-combination would be an excellent choice. This

design is orthogonal, and it is opllmal in terms of minimi1.1ng the variability of the resulting estimates. But, let u:, assume that our resources allow for only S

8 runs. An orlhogo

nal design in 8 runs is not possible, and hence one needs anotlwr cntcrwn to select the rum.
D optimality is a reasonable critenon as 1t leads to the most precise main-effects estimates.
Consider the PhoneHog example in Section 8.2 as a second illustrative example. There
we study 10 factors with 6, 4, 3, 4, 6, 4, 6, 5, 4, 2 levels, rc>pectively. A full factorial is
orthogonal and hence D-optimal, but its number of runs is prohib1tm:. Assume that we
want to estimate the main effects of these factors as precisely il~ possible and are looking for
a [) optimal design that uses a certain small number of rum. We can write down a regression formulation for the main-effells model. It include'> an 111ten.ept ,111d 14 rcgressor ull
umm. One particular combination of I 0 levds (one for each factor) bernmes the standard
against which the other levels are compared. lhe .\4 111diGttt>r-, expres-, the Jb-,encc pre-,
ence <>f the other leveb 1n the considered runs. At J minim urn, we need \'i ru11' lo est1111,11l'
the 111.iin etfrcl'>. Additional rum would help e-,timale the par,1111eters 111ore preuscl).
Phonl'lfog was looking for a design that estimated the par,11mter'> (i.e., the main effects),,,
alcurately as possible.
the de-,ign.

J)

oplunality turned out to he a re<1-.011able Ll"llerion fur genn,1ling

Note on Software

)e\eral iterative algorithms for determining A and [) optimal designs arc proposed in
the l1tnature and thq have been implemented in easy-lo-use software palkagcs. I hL cus
tom designer 111 )MP, for example, starts with a random des1t-;n of the desired run o,izc with
each of the runs

s~1tisfying

the restrictions of the design. An ill'rative algorithm called cyc/1

rn/ wo1d1nc1tf' exchange (1\leyer and N.1cht-,he1m, 1995) " used !ll 1111 prm e the de-,ign. LtL h
1tcral ion of the algorithm involves testing L'\'cry value of cveq f,1ctor 111 the de-,ign lo dllL'I
m111c 1f replacing that v,tluc mcrcascs the optim,tlity critenon. It .'>o, thL llL'W value rq1L1ce-,
the old. Iteration continues until no replacement occurs 111 .rn entire itcralL'. fo ,l\'(>id con
verging to a local optimum, the whole process ts repeated -,ever.ii times u-,ing a different ran
dom -,[,1rt. The custom designer display-, the best of these des:gm.
A recent article hy Kuhfeld and Tobias ( 2005) describes LOI 1bi11atonal and hcumlll opti111i1al1on methods for constructmg U-opt1111al factorial designs. The authors d1s<.uss
t-:cdcrov's approach of iteratively exchanging candidate/design pairs (where rum from the
list of possible design runs are swapped) and the rnordinate exchange algorithm of Meyer
and '\Jachtsheim ( 1995) (where coordinates of runs arc swapped), and they appl}' simulated
annealing optimization techniques to improve the performance of these two methods. LJsc
ful SA~ Macros arc available from the first author's Web site. An earlier paper by Kuhfeld,
Tobi,\'>, and Garratt ( 1994) describes usdul markl'tmg ,1pplicatio11s of I )-ol'l im,d designs.
EXERCISES

Excrcbe l

Consider Lase 12 (Almquist & Wyner) from the L.t'e -,tud1 ,1f1l1e11dl\.

\a) Consider the design in Table A I 2.2 o\ that case. ~how that 1t is bc1bnu:d (1.c., '>atne
number

ot runs at each \eve\ ot every factor)

and m\th1gona\ \"1.e., ~ame number of

,(),()({ 111()(,(),AI

Ill ',l(,'J', ,\ , I ) ( 01\ll'l ' TI R ',Ol lWAIH

211

run., at each level cnmhination ofanv two factors). Show th<lt the main effecl'> ol
,\le-,sage and J>romoton arc not confounded with the Message hv Promotion interaction.
flint: lJ-,e the regr-,sion appro;ich and shcrn th"t the interaction column x . is
orthogonal to x (me,,.1ge) and x, (promotion).

(h) L'sing statistics '>Oftware of vour choice, conf1rm the regression output in Tahlc
\ 12.4. L',111g re., ult-, 111 \f)pend1x 4.4 ( Krief Primer on Regression), explain how
the program oht.i111-,'
0.0707 .rnd the st.1nd,1rd errors 111 I .1bk t\ 12.'1. ht the
rq~ression on ju.,t ~Jt,sage ;ind l'rornntion, .rnd explain whv the regre<,s1on l<ldli
Lienh 101 .\kss<1gc .111d Promotion .ire llllLhanged.
(LI ( om1dcr the design in Lihlc t\ 12.'.'. I orrnul;ltc ,1 regression model with an 111IL'I
cept, three m.1111effcL1' for the 2 level faLtors (Subject, Action, and C:los111g), .111d
linear .md quadratil L<lmponcnts for the two 3-lcvcl factors (Salut.ition, Promo
tion); sec Section 7.4. SpeLify the 16 X 8 design matrix. Imagine fitting two regrL's
s1om. One nwciLI com1dcr., all eight rcgressor<,, while the second uintains the 111
tcrcept and only the three main effects of the 2 level factors. Would the estimate.,
nf the three main effects stay the s.ime 1 'v\'hy nr why not?
(d) Using a design software of your choice, obtain a 16-run D-optimal design with
three 2 level factors 2nd two 3 level factors. Discuss the software's approach of obtaining such designs.
Exercise 2

Consider Casl' 11 (Phondlog) from the case study appendix.

(a) l'.,e i\11111t,1h or ,lJl\ (!her stat1stiLs software suLh as )MP or SP.'.'>, ht the main
effects model with .ill !O factor.,, .rnd rnnfirm the results for the Llick through rail'
in Ta hie 8. \. Scuind, l'lllcr the I() 1:1Ltors in a dirterent order and observe th.it the
'>equential regression sums of squares change. This is a consequence ofnonorthogon;ility. For orthogon,il designs, the sums of '>qua res would stay the same.
(b) Focus on the model with just three factors, H, f., and/. Define indicators for the
levels of the\ factors; J 1ndicltnrs for H, 4 for f, and 2 for I, for a total of9 indiLa
tor rnlurnns, hi, h_), J.6, cl, e2, e), c4, il, i2. fstimate the model that includes ,1
constant and the indicators h3, h6, e2, e3, e4, and i2. The constant represent'> the
mean response when all factors arc at their baseline values (level I of H, level I of
f, lcvcl I of /l. Oh1<1111 the least squares L'sti111.1tes and their standard errors. I 111d
the best lcvds of each factor and obt,1in an estim,itc of the click through rate of the
he'>! factor level combination. C:ompare thi., to the prediction (22.56) that vou got
111 SeLt1on 8.2.2.
L) 1\11,il\'1e the .1ct1011 r.ltL" using the same approaLh that we used for click-through
rates.

T11111F

/\I.I

1:aclnrs and /,rvl'is


h\( ! OR l 1-VFI

Regular p.ickage
:-Jn in store samples
~n

cnupun.'-.

Current fat pcrcen tage


No gold S' 1ckcr
Current 1ackage lettering

Deluxe package
In-store samples
Coupons
Reduced fat percentage
Gold sticker
New package lettering

In the first phase of the work, the group developed an initial list of .1bout 20 factors for
possible iriclusion in the experiment. Gradually the list was reduced t0 its final form consisting of 6 factors ('I ahlc A I. I ).
Arriving at the final list was not an easy matter lwcause each manager had personal favorites among the list ofpotenti;1J changes. Al Dougl;is (finance) felt that the current deluxe
package only added to the cost without affecting sales. Rill Evans (marketing) strongly disdgreed. "Al, our lunch meat line of bologna, ham, and turkey has the finest products on the
market, and the cl<tssy pack.ige enhances our quality image." A heated argument ensued.
Clcma Johnson oi' i'.ip St()res, the supermarket chain, suggested simply adding <t gold
star to the package. " It will catch the shopper\ eye, and that is important in a crowded
d is1'l,1y case."
The advertising manager felt that coupons were an important way to increase sales.
Others argued that sales wnLld increase hut not enough to offset the discount provided hy
the coupon.
Eagle had in the p;1st ncccsrnnally set up in-store displays where customers were offered
lrl'e <;<implcs ofhologna, h.1111, and turkey. There was general agreement that this led to;11ore
sales (''if they try it, they'll like it 1"), but these free sample displays were expensive, and it
was unclear if the inuea.se 1n sales was sufficient to justify the cost.
r.agle had recently developed a new version of its cold cuts with a reduced level of fat. Extensive testing had shown th;1t taste and appearance was unchanged, and the firm felt that
the lower fat product would appeal to health-conscious customers, and therefore sales
would increase. The downside was that the low -fa t version had a higher productinn cost.
Several team members fe11 that a change in package lettering to a holder look would increase sales. The proposed nc:w lettering would result in a small increase in packaging costs.
HOW MANY STORES SHOULD BE INCLUDED IN THE TEST?

There was general agreemen l among the members of the team that the shorter the test period, the better. It was agreed that the test would be run for one week across a random
sample of stores. The /'.ip Ste res supermarket chain had approximately 500 stores all located
in the Midwest. Thnc w;1s extensive dat;1 avail;1hle on sales of the three Eagle Rrand products hv store .md week. The te<lm eliminated weeks from the database that were not consiclcr-ed aver;1ge. The not average weeks included the week of the Super Rowland week<- with
special promo! ional .tel iv it i .'S. For average weeks, average weekly sales per store of E.igle

cold rnts was S 1,200witha~1and;ird dcvi;ition of $150.

1;

111

\1.1111 cfil'ct nf A

Main effect of C:

cover pra:c

number of copies on newsstand

2
,
c

~c
c

"
111

..,

t ;~

.,~.

~-

'"
i....

()

.,

~ J
'

,.\

:i..

,-

T
: S.l.99

\O

()

)l

----.,

'11 $5.99

S~.99

: I/3 rd

le"

'II

_.:,:
r

.-,

Ill

';[,

.,
..,
.,""
J

"
J

I()

"

,, ,,

,,
,, ,,

,, ,,

,,
,, ,,

,, ,,

,,
,, ,,

,, ,,

,,
,, ,,

,, ,,

<.'O:
turrcnt

C i/lnl
more

,, "'

( +: I/3rd more

I 1 lrd le"

Nu rnhcr of cop1c.s on new<Sta nd


- -

ligurc 1\2.1

S.l.99

A~:

$5.99

McJin I ffcch and Interaction Plots for '.->ales

s.dc' Lhangc at the Ll'!ltcr pot11t ( 2.1 'Yri), where each of the 3 factor-, is at the midlevcl, falls
ahoul 111 l111e hL'll\'L'L'll thL' low .111d high lcvcb; for sales, there is no appreciable cun.itme.
I he 1\( 111ter.ict1011 .idds further ms1ght to the sale-. analvsis. It shows that the incrL'<l'e in
,,des at a lown cmn pnlc (1 \ ) is much smaller ii the numherofuiptcs 1s reduced ((' ).
1\lternat1vely, the increase t11 newsstand copies (C+) has a minim.ii impact if the cover pnce
i'> high (1\ ).Furthermore, combining the levels that arc optimal indi,idually - Jm, cm er
pnLl' and high number of Lopies-inL.reases sales by more than what can be expeLted hy
summing the two individu.il main effects. Depending on the team's profitability analysis
(which is not discussed here ), the publisher should increase the number of copies on the
newsstand only 1f it plam to lower the cover price. Otherwise, cost savings from a reduction
111 ncws.\ land rnp1cs mav hm c little impact on sales.

CASE 2

221

The J\/l interaction shows that the two significant main effects arc even greater in combination. A high cover price (A+) and a low subscription price (B-) result in more new subscribers than the implied number that is obtained by adding the two individual main effects.
Analysis of curvature leads to important insights. Significant curvature for subscriptions
means that, within some range, cover and subscription prices have negligible impact. Yet beyond <1 certain level, price changes result in a big jump in subscriptions. However, with only
one center -point test cell it is not possible to determine which factor (or perhaps both) is
causing the curvature. With ,;ignificant curvature, the next step for the marketing team was
to run a new test design with more combinations dt different price and subscription levels.
i::ollowing this test, the publisher's marketing team ran additional tests to pinpoint the
optim;d price poinh. They ended up increasing the cover price and increasing till' subscription price, while maintaining the same level of newsstand copies. These price increases
ultim.itely increased profit while maintaining the number of copies sold on the new<;stand
and through subscriptions.
QUESTIONS

Fxerci-;e 2 in Chapter 4

I All I I 1\ \ I
Fnctors 1111d Ass1incd lcvr/_1
F-\< IOR I I\'!- I

l.ll!Cli
'\n .HI nnw 1n1.,crt

J\(t now 1nsl'rt

'>:n, rcd11 c.ird


I I.ml offer

( rcd1! tilfd
I l<1rder offer

/)

'itrong guc1r~mtcc

~trongcr

"\() ll'"'1!0101ll<ll ...

1-e..,t11110111cll..,

:-.:n hum per s11tkcr


c;utsy

Bumper sticker
llallsv

/<

guMantcc

ing firm that organi1cd the mailings, the seven factors listed in Table 1\3.1 were idcnttlicd
for the test.
The "act now" imert, a '.mall separate note in cnlor, urged people to act now .ind to
respond/pay today. 1 he front of the insert showed the cover of the next hsttc and urged the
recipients to act irrnnediatel} to make sure they received this very special issue. The b.tLk of
the insert in bullet point st}k and vibrant J,111guage described the article<., that would appear
in the next issue. lithe act now insert was included, the words act now were also bold Iv written on the rep Iv <.ard
The -,ernnd faLtm ga\'C pd1ple the option of paying by credit card rather than only by
personal d1eLk.
The factor "hard ofler/h.Hder offer" refers to the language in the offer described 111 the
letter from I lams. the ptrhii-,her. hir example, "hard" encourages the person to "gL't thL'
next i.-,sire of ,\fol her /0111,," .md "harder" '>.IV'> "to get the next issue hot off the pre-,-,, send
vour rq1h .rnd -,uh'L rrlw tod. \."
"(;u,1rantee" allows the pnson to l.lll<..el the subscription at ,rnv time during tlw fir .... t
1e.n, .ind rL'Le11 L' !llOllL'I h.tL k for the j-,-,ue' not vet r-ccetved, whrle the '\tronger gu.1r.111tec"
lllL'.111' th.it the full 'ith'L rrpt1011 priLe would he rdllndcd, as long a-, the '>tthscription "c.111
celcd before the end ot one year.
"k-,11mornals" refer to .in imert rn \\'hich !)'pica! subscribers and notable people make
positive comments .ibout tht magazine.
"(,utsy/ballsy" rl'fers to a srngle word that is printed on the outer envelope. Prc1 iously,
Mo/her Jones had effectin~ly used the word "ballsy," but had received some compl.1ints
about this Lrnguage. 11.irrts was interested in finding out whether softening the langu.1ge to
"gutw" would produce 'imil,1r results.
THE MAILING PROCESS

The mailing process consisted of two stages: printing and insertion. Printing produCL'O the
,1ddressed outer envelope with either "gutsy" or "ballsy" printed on 1t. The testimonials
were printed on a single shlct of paper, and the reply card included information on the
price, whether a credit card c lllld be used, the type of guarantee, and the phrase "act now"
1mluded or not.
A tirm speLiali1ing in mailings was responsihle for carrying out the logistics of the
experiment. The hrm's production line rnns1stcd of automated insertion equipmrnt that

( ,\'I'.

229

te'>l cell," overs1mpli11ed the sample sill' issue and often led to a weak test with no s1gnificrnt results. In this L<lSL', with only \5,000 names and an average response rate of I%, ,1Jl ef
fl'LI would ha,e to Lh,rnge th1 response h} about 20/ci (from 1.0% to 1.2%) to have a 50:50
Li1ame of being found significant.
I he u1nsulta11t offered L'llL<>urdgement. \\'ith the right multi factor test design and a fo
ct1s on hold changes, they could create a strong test with useful results. Thr consultant surnmari1ed the requirements:

l.'se one experimental design to lest numerous variables, maintai11ing the same test
power no matter how many variables were tested.
llse all available names, but design the test so differences among 'egments can be
quantified.
I .1kc adv,lllt.1gc olthc flexihil1tv of e-mail hy using a higher-rc-,0lution test design
with more test cells yet less LOnfound1ng among the effects.
TFST rACTORS

t\ftcr brainstorming idea' and trimming the list clown to the boldest ideas, the marketing
team identified 13 \'ari.1hlc.s and selected two different versions of each variable to test.
These 13 factors uiuld he tested <;imult;incomly in a 16-run design, hut for reasons out lined
helm,, the consult.int 'elcctd .1 \2-run fraLtional factorial design instead.
I hL .\2 ru11 tksig11 requirl s grc,1tcr effort for the m<Hkcting team to co11struct 32 diilercnt
e rn.1ils, hut 1t has 1rnportan st.it1stical <ldv<rntagcs. \\.'here a 16-run design is only of resolution 111 (with main e!lects lully confounded with 2-foctor 1ntcraltionsJ, the 32-run tk-,ign is
of resolution!\' (1,1th mam l'ffells rnnfoundcd with 1-f.ictor inter<1Ltion ..,, hut independent
of2-factor interactions). SinLe higher-order interactions are unlikely, this design reduces the
potential rnnfou11d1ng error and also helps identify key 2 factor interactiom.
1 he three rnstorner segments al-;o had to he considered. Including a 3-level factor Ill the
test would h,1ve led to .in unh.il.inced design, in which each factor level would not h.ivc .ip
pc.ired in the same number of runs. Instead, the three segments were defined as <l factor
with 1 levcls, with segment I (the IMgest -,cgment) taking up 2 levels. just ;isa 2-levl'I faLlor
rcqu1rn one uilumn in the test design, a 4 level factor requires thret: columns. The 1\, R,
.rnd \ /l 1n tl'rat t ion uil um n' 111 1 ahle ;\ S. I were used to define the th nc segments.
After creating the test design, one of the 13 factors was eliminated. 'I he team planned to
test d search box at the top of the e-mail message, but this was too difficult to exeu1te ,ind
LOlumn I was left empty. 1 Ii~ remaining 12 factors plus the 4-level segment factor arc listed
in Table AS.2.
TA Ill. f'

AS. t

Three Customer Segments Treated as a 4-Level Factor


Available
( nmh111.111nn...

r\

1\H

Segment

Name...,

Scgmcnl 2

11,SXh

Scgmcnl I

I 151lX

Scgmenl 1

8,96()

IAllLF

AS.2

Factors u111/ Thl'ir /,eve/;


1.ictor

~cgmenl

'-.q.~nll'nt

) l .uni rnl

< I 111k to ortlin" ,,11alog

:--;u

IJ
I
I
<i

Wlnte

ll.1ckgrou11d color
' l'rnpty)
I Jn1gn ul e-mail

/\ I >"rnunt offer

Srrnple
None
Current
No
15% off

I I rl'e gift

:\one

\I l'rnduets pictured
N "Valued eusto1r1l'r" copy
( J l "'" sell cup~
I' sub1eet l111e

lew
Current

P,irtner pro1notions

// Navigation bar on side

I speual offer starburst

( ' urrent

"hclust\'l' e rn.111 olkr

( + )

~'"" !Jea

i...trongl'r brand 111lJgL'

Olfers lrom two partner companies


Add!tronal buttom
"Spcual e marl ofkr" starburst
~o d1...,coun1

I''"' atrd

I re'
.\trill'

jlL'tlcil "'I

Slrt)11gcr

Ne\' u1in
"'tpn.1.il nlkr

f11r<Hll llhltlllll''"

A 11nd B: Segme111
The marketing team had defined three key customer segmu1h, b,1,cd un bchav1or,d vari
ables .111d the li1111ng of recent purd1,1scs:
'icgnll'nt I consi.,ted of customc1", who had made a purch.ise onlinc or in ,1 '>lore with111
the last \month-,. ~egmcnt 2 had 111,tde ,1 purd1,1se within the I.isl
() 111ontl1'>. '>eg111ent \
had made a purd1asc \\tthin the J.t-,t 6 -12 111onths.
( : link tu 011/i11e Ca111/ug

The c mail included a "Shop our catalog online" button towards thl' bottom of the
e-mail. the tea111 kit that J link to the \!\ch '>Ile would L'llu1ur 1ge cu.,trnllL'J'' to hruwsL' the
Jvailahk products.
IJ: /Jackgruullll Color

All l'-mails were -,ent with dark text on white background. l'he Lre.1t1ve d1rl'Llor thuught
that ,1 blue background might hl'lp the e-mail stand out.

F: /)esign of E-Mui/
I mails used a has1L font with a -,mall compan> logo at the ,op. I hl' team wanted lo test
a stronger brand image, with a larger logo, more stylized fom, and greater use of thl' Llllll
pany\ brand colors 111 the e-mail.
(,;Partner Promotions
\\'ith brand-nan:e products, the marketing team believed t1"1t pro111otrng several brand-,
could help convinu customers to make a purchase. They dccded tu promote two spccitiL
brand., in two bright boxes under "Olfers from our partner.," at the bottom of thee m,1il.
H: Navigation Bar on Side
F mails currcntlv wrnt out with a sideb<1r similar to the n.ivigation bar on the companv
Web -,1te, but with a shorter list ot links. they didn't want to test <111 e m.111 without an}
sideb<1r, so instead they decided to te-.t the LUrrent 11<1vig.1t1011 har ver-,us one with more
choices.

'

( AS I.

Bl

]: Special-Offer Starburst
Since e mails were 'ient to tl "select group of customers," they wanted to play up the ex
clusivity with an eye -catching red star at the upper right stating "Special e-mail offer."
K: Discormt Offer
'J he Internet director had gone back and forth between offering a special e-mail discount

or not. He thought the discount helped, but had never quantified whether it pulled in
enough sales to 1ustif) the lower margin.
I: Free Gift
Thev had not offered a free gift with on line orders before, but wondered whether 1t was
worth a try, as other companies were doing it. They selected an attractive, but low-cost, penand pencil set that they could offer for free. At first, they wanted to offer it only for orders
of $50 or more, but choose instead to be bold and offer it with every order.
M: Products Pictured
I ven e-mail fornsed on a selection of products-with pictures and prices-but they
never knew how many were best. Some people on the marketing team thought that a simple
offer with just a fl:1, prodt1Lh would get people to respond faster. Others thought that a
J,irgcr selection would give more people something of interest. They decided to test a few
producl'i versus mam products. hlr the test, every picture was the same size, so e-mails featuring "m.rny products" had additional rows of product pictures.
N: "i'alued Customer" Copv
Their standard e mail copv stated, "As a valued [company I customer, we would like to
offer you the.,c lnlcrnl'l onh sfll'Lials." Thev tested this against a copy Y.ith a strongn mes
'>age, adding a second .,entencc about how onlv their best customers get these special offers.
0: Cross-SC'// CO[I)'

l'he sewnd copy change was designed to sell more products. An additional sentence was
added to enc..ou rage people tn order a variety of office supp! ies at once to lower shipping costs.
P: Subject Line
The Internet director had been testing different e-mail subject lines. Currently, "Lxclusive c mail offer from Icompany I" was the wmner. Since he knew the subject line was important, he wanted tu test an.other version, "Special offer for our best customers."
TEST DESIGN

The consultant developed a 2 11 111 fractional factorial test design, based on a 32-run, 5-factor,
full factorial design in factors A-F, with factors F-P assigned to I0 of tl{e 26 interaction columns using the design generators:
P ABC, C - AHD, H =ABE, I= ACD, K =ACE, I.= ADE,
M fl<.'[), N HCE, 0 - RDE:, P-= COE

l\1initah statistical software was used to generate the design columns. The test matrix for
the IS design columns (the 13 factors plus the 2 factors, A and B, that specify the three

l3l_A~'=N~~X

----

A 5. 3

AB l F

Test 1Jes1gn and 'Jest Uesults

""
-;;;
0

";;;
~

.5"'

c
"
E

Test
Cell

c
"
E

"
'./'

"

.r.
"""

c:

;;;

.s

;:!

;.u

~
co

3
-"'

c:

c:

,..,

......
0

0
0

6:

...
c:
..."'

c:

-:; '- 0.
E

.,,,"
Vl
c:

...0

"'
::c
c:

"'c.v:

"

Ci

0..

"'

/)

/'

(;

II

c:J

-:::

~
c5

'...

"
E

2u

-;;;

""

co'

.,,,

-e
Vi"

";;;

"'"
%

OJ

5.

...
:J

/,

v:J

0..

"5

":l

../')

"

:J

:;
"'
.E

0
"
~

.,,,
~

;"

.'.J

.r.

.r.,,,,;LJ \.1

1.

.~t

,\'

(}

I'

~~lllH.'~

OrdL'"

I .'i IS
88.\
88\
I }(llJ
I, ill

21

Kl

l.1"'1111

KH.l
1.180
1,'iJ'i
SH'
KOO
1,180
1,515
88\

() qu,,

""

/\

0::

>.

~:J

:.;"

:J

4
~

ti

7
8
9
JU

+
+

II
12
13
14

h
lfl

.%.\

17
18
19

20

KL1
I,=' I J
88.\
(l90
1,091
I, 178
HH.l
88 \
I 180
1,:, 1,
ti.JI

~'

+
t

21
22

23
24

25
26
27
28

,,

.,

KK.\

)"

30
31
32

1,180
1,118
88.3
88.l
1,171

29

,.

-r

+
+

<JI

Rt'SJ1< ll1'C

.\

HI-~

l\d[L'

{910

0.bH 0 10

(>

l}

l .0210

.s

0.{l~p!I

')

o)yuiu

J. l9 u
I 12 11 )

14
I'

11

()_ -~ ~' 'u

J.j

u
9
10
11
LJ

17

,-).i,o

0.00%
0.591u
I I -~u
I .~:iu'o
l.IO'o
I I 2 ~u
U./LJ1J u
1

0.,

I .17<~o
I ~5'-hJ

lo
18
\

1 Viu

0.

q~JllJ

I 2-,u,u

II

(J_qo,u

K
K
I\

lI
9
I.\
8

{_),--,_{ 1h1

I .09
0.91
"
1.10,,
l.Uo%
J .lJ2ll10
1 .471u

O.ti8%

segments) are shown in Table AS.3. Minitab also provides the alias structure for this fractional factorial desi~.n. Ignoring 3- and higher-order interactions results in a design that can
estimate the I 5 mlin effects of factors J\ through P, IS effects each rnnt<1ini11g 'even
2 l~1ctor interactions, and one effect co11ta111111g only 3-factor 111tcract1ons.
I able A5.3 lists the sample size and rcspome data for each test cell. S111ce each customer
segment was randoi11ly assigned to certain test cells based on the t I

levels in columm ,\

and B, the numbers of customers contacted in each test cell arc not the . . amc, ,ind the lL'st is
not completed balanced. In addition, after names were ,1ssigncd to test cclb, the hnal
purge/merge (where addresses arc double-checked and invalid e mail addresses Mc re
I

moved) dropped soi11e names from the test. In the end, only -~(:1,060 names were used. Lach
vcl",ion of thee mail was sent tu a-, few ,1s

(14

I or

I'

,ls

m,1nv ,ts h.1 I ~1 < w.tomcrs.

i,/

I'

,1-

_J

CASE

23)

EXECUTING THE TEST

The creative team--made up of the creative director and a single person who designed
every e-mail-was somewhat tentative about the test. The thought of creating so many different c-m<lil;; for one drop was daunting. They also didn't know if all required combinations would work from an artistic st<rndpoint.
The consultant worked to minimize their concerns ;rnd lighten their workload. hrst, he
helped the tc,im define clear, independent factors that could work together in any combination. Then he sat down with the team to review every required test cell, changing factor
detlnitions to make ,11! test cells essentially simple rnt-and-paste combinations. Fin,1lly, he
worked with the crc;1tive te<1111 as they developed each version; he checked everything to ensure u1111pliance and Lomistency and solved any problems as they arose.
Overall, the creative work added two days to the marketing schedule. The team was surprised how smoothly things went once all factors and combinations were clearly defined.
TEST RESULTS

The test dropped on Tuesday, and initial results were analyzed after one week. Since the
team wanted to increase the number of orders, the primary metric was the response rate.
Average order size was also analyzed to help assess profitability, but this particular analysis
is not shown here.
The main effects and 2-factor interactions arc shown in the two Pareto charts in Figures AS.land AS.2. The effects are calculated by applying the+/- signs to the response
rates in the last column ofTahlc AS.3, and dividing the resulting linear combination hy 16.
Alternatively, the effects can be obtained by regressing the response rate on the design <ind
intcr<iction columns; the only difference in the results is that the si7e of the effects is cut in
half. As disrnsscd hclow, the significance of the effects is best determined through logistic
K:

Jl1,Ln11nt

offer -0.5Y'l

(;: Pa rt ner pro mot ion., -0.291


t-0.264

/.:Free gtlt
A: Segment -0.228
/: Spcci.i I nl ler starbu rq

,\r: V;ilucd customer cn~1v

0.077
10.07'.)

f: (cm11t1

, -0 071

(): ( rn" .sci I rn11v


11:

N,1v1gattn11

Significant effects (above linr)

0.0'J5

h.ir

fl: Scgnwnt

0.070

F: Jlcsign of e moil ,., "11;:;;:::::::'J

0.067

I: .Mr' , 0.056

I>: Hackgrouncl color f' ij +0.028


C: I ink to on line catalog

] +0.027

/\1: Products pictured

f': Suhjccl line

J I 0.020

!i - 0.013

f----~-,~--~,~--~,~--~,~--~,---~,---~,-

0.0

0. !

02

O..'l

0.4

Effect in percentage points

Figure AS. I

Test Results: Main Effects Only

0.5

0.6

0.7

214

APPENDIX

K: Discount offer
G l',irtner promotions 0.291

0.599

1.: !-rec gift +0.264


A: Segment -0.228
+0.171

Significant effects (above line)

!\] - KL - - - -

]: :-ipcc1al offer star burst .. .~~


!\C
!\N

N: Valued customer copy


F: (rn1pty)
(): CruS> sci I cup)'

I/- '.\Javigation har


I-: IJes1g11ufe 111,1il
AL!

B: Segment

AM
All
i\K

D: B<lckground color

C: l 1nk tu onlrnc catalog

M: Products pictured
;\}'

!\U

P: Subject l111e

AL
Ill'
AW'

AL
Ml
!).()

0.1

IJ.2

O.. l

11.4

()_ 5

U.ti

U.7

Eflcct in percentage points

Figure AS.2

Test Results: Main Effects and Interactions

regression. The output of the standard regression and the logistic regression are surnmari1ed i11 Table AS.4. The Mi11itab statistical software is used for both regressions.
The standard regression with the response rnte as the dependent variable has several
drawb.rcks: (I J It ignores the different sarnple .,izes. It lrctb each respunse rate as cqually
pn.:Lise and analyzes the response rates the same no matter how 111a117 e-mails were -.ent.
(2) It uses unweighted averages of the response rates in the estimation of the effects, instead
of rnorc appropriate weighted averages that adjust for the unequal prec:1siori. (J) ThL' ~tan
dard errors of the estimates are obtained by pooling smaller irniignificant dleLls in lo the experimental error term, a somewhat arbitrary decision that can overstate the number of significant terms. Logistic regression represents a better approach for analyzing pruportiom

CASE

235

TARLF. AS.4
Standard Regression and Logistic Regression Output of Significant Factors

STANJl1\RIJ RF.<;JH.SS/ON 01 RFSPONS/ RATF ON FACTORS A, ll, G, K, L, All, AND Kl

Estimated Effects and Coefficients for Response Rate


p
Term
Coef SE Coef
Effect
T
Constant
0.9263 0.02823
32.81 0.000
A
-0.2275 -0 .1138 0.02823
-4 .03 0.000
0.0561
B
0.0280 0.02823
0.99 0.330
G
-0. 2913 -0.1457 0.02823
-5.16 0.000
K
-0. 5993 -0. 2996 0.02823 -10 .61 0.000
0.2639
4.67 0.000
L
0 .1319 0.02823
AB
-0.70 0.488
-0.0397 -0.0199 0.02823
KL
0.1711
0.0855 0.02823
3.03 0.006
s = 0 .1 59694 R-Sq = 88.68% R-Sq(adj) = 85. 38%
H I :--; A H v

[ () ( I s I
l

l(

HI (I

I{

I . . . " I () N

nI

N ll M B (,

I~

()I 0 R ]) I Rs () :--.i r Ac T 0 Rs A ' H'

cl

K ., /_' I\ H 1 A N

J)

"

Link Function: Logit


Response Information
Variable Value
Count
Orders
Success
312
Failure 33748
34060
Names
Total
Logistic Regression Table
p
Predictor
Coef
SE Coef
z
Constant
-4.781570 0.0643679 -74.28 0.000
A
-0.127779 0. 0590138
-2.17 0.030
0.023847 0.0590880
B
0.40 0.687
G
-0.163981 0.0577553
-2 .84 0.005
-5.99 0.000
K
-0.369910 0.0617801
L
0 .196689 0.0618118
3.18 0.001
AB
-0.023241 0.0590546
-0.39 0.694
KL
0.157110 0.0618326
2.54 0. 011
Log-Likelihood= -1744.333
Test that all slopes are zero: G
60.822, DF =
Goodness-of-Fit Tests
p
Method
Chi-Square DF
Pearson
12. 0296 24 0.980
Deviance
15.3150 24 0. 911
Hosmer-Lemeshow
2 .0619
7 0.956
Brown:
1. 7748
General Alternative
2 0.412
Symmetric Alternative 1.7468
1 0.186

Odds
Ratio
0.88
1. 02
0.85
0.69
1. 22
0.98
1.17

95% CI
Lower Upper
0.78
0.91
0.76
0.61
1. 08
0.87
1. 04

7, P-Value

0.99
1.15
0.95
0.78
1. 37
1.10
1. 32
0.000

that originate from samples of different sizes. The number of positive responses among the
sampled cases in each run is modeled as a binomial random variable, with a success probability that depends on the design variables. Since success probabilities are always between
1cm and one, logistic regrcs~ion Jllodels the logarithm of the odds (the ratio of the prnhabilitics ofsucces.-, and failure) as a linear function of the design variables. For a detaiil'd discussion of logistic regression and on how to interpret the coefficients in lo gistic regrl'ssion,
we refer the rcwkr to< :h.1ptcr 11 in Ahrahalll and I edoltcr (2006). I Jere we use the logistic
regression merely to assess the significance of the regression coefficients.

2.!6

APP~NLJIX

SIGNIFICANT EFFECTS

The ,1vcragc response rate in this test was 0.9 l 6%, with just 0 - 21 orders for each test cell
and a total of only 312 orders. This was a small sample size in an unbalanced design with
low n:ponse rates, and yet the subsequent results were convincing. Significant effects include the following:
K: Discount Offer
The elimination of the 15% discount resulted in a 0.599% reduction in the response rate.
The team calculated that the loss of margin from selling the product cheaper is more than
covned by the increase in the number of orders.
G: Partner Promotions
The two partner offers in the e- mail reduced the response r.ite by 0.29 l r)!ii, contrary to
'what they had expected. The team theorized that the additional offers may have confused
the message and given customers too many disjointed offers to choose from.

L: .Free Gift
The free pen-and-pencil set increased the response rate by 0.264%. Analyzing profitability, the cost of the gift was easily covered by the increase in orders.

A: Segment
The significance of at least one of the three components responsible for the segment effect (A, B, AB ) indicated that the three segments responded differe11tl[. The respomc rate
for the three segments are summarized in !'able A5.5.
J"he differences among the three response rates arc small and not particularly signiflLant.
The 95 % confidence intervals in Table A5.5 overlap each other. This ilnding could suggest
thJt the blocking with respect to the three segments may not have beL11 needed. Nevnthe
less , the signific<ince of the factor A in the earlier analysis and the summary in Table A5.5
raises the question of why half of segment I (/\ + /) -) had J response rate lower than any
other grnup, while the other halfoi' segment I (A - H +) had the highest response rate oC all.
After some investigatiun, the problem LUuld be trc1ccJ lo ,1 sirnpk nrur in the cxcLution ul
the experiment. The top half of segment I ( i.e., the best customers) had been placed in the
A - H + test cells, while the bottom half were placed i 11 the A 1- 8

test Lclls. This not only

poi11ti:d out the risk of a nonrandom assignment of names but abo showed that the 'cg
mentation model needed some refining; perhaps the best and wurst recent buyers should be
in different customer segments.
KL Interaction
The final significant effect was the KL 2-factor interaction, with an effect of 0.171 I. Before explaining how this interaction affects results, it is worthwhile to take a step b,1ck and
sec where it came from.
Analysis of the data shows 31 independent effects: 15 main effects, 15 strings of2 -factor
interactions, and one string of 3-factor interactions. In Mini tab, the default is lo label the
interactions with the first of all confounded interactions. For example, the labeling of the
significant interaction in Figure A5.2 starts with A} because A is the faLlor that is listed first.

'P'

_ _

_ _ _ _ _ _ _ _ _ __ _ _ __ _r__
:A_s_-F._4_~/_2_n_ __

noring interactions of order 3 or higher)? Briefly explain why you have selected this
design.
2. Would it be beneficial to replicate the design? Why? 1 low would replication help you
determine which effect~ are significrnt? If you choose not to replicate, how would
you decide which effect-; were signifirnnt?

CASE 5

23 7

TAllLF A5.5
lfrsponsc Rates Jin the Three Customer Segments

Com hi
.\

11.i!1nn .'>

2
3

Ii

~c g mcnt

Segment 2
Sc~ment

--

------

4
No TE:

t-

Availahle
N,rn1es

All

Average
Re,ron se

I 1,586
0.97 5%
----0. 789% }
I 3,508
0.940%
1.090%

--- - - - - - -

Segment 3

8,966

- - - - --

0.803 %

The display shows averages and 95 % confidence intervals.

l. 2

0.9

c
"u

ct

0.7

0)

Segm ent I

Segm ent 2

Segment .'l

However, this effect could actually be the result of one or more of the seven confounded 2factor interactions. The list of interactions is given in Minitab, or the interactions can be cal culated using the design generators that have been listed earlier in the Test Design section.
The seven interactions mixed together in the Al column arc Al + RM ~ CD+ EP -+ FC
KI ~ NO.

The regression results in Ta hie A5.4 show that this column has a significant effect, hut do .
not identify which interaction (s) is most likely. Here is where marketing knowledge and statisti cal principles come together to help pinpoint the most likely interaction effects. The following two principles can help:

1. Sparsity of effects - Few, if any, interactions are usually significant .


2. Heredity of effects-Large main effects tend to produce interactions.
Even without any understanding of the test factors, these principles imply that few of
the seven interactions are likely, and the best-guess interactions are those related to the
largest main effects. So, from effect heredity, the most likely interaction candidates would
involve factors K, c;, I,, ;ind A. Therefore Af is possible(! is not significant but large ), FG
might he possible (though Fis a smaller effect), and KL looks promising (since both main
effects are significant ).
Limiting the numbe r of interactions to three, the marketing team and consultant could
assess which interact1om make sense. Though interactions may be completely surprising,

238

APPENDIX

often they result (rom related factors-factors located close together (like two clements on
a direct mail envelope) or conceptually related (like price and offer variables).
In this case, the choices are

1. AJ

The starburst (!)has a different impact depending on the customer segment (A)

2. FG

The e-mail design (F) and partner promotions ( G) work together to impact

the response.
3. KL

The 15% discount (K) and the free gift (L) have different impacts depending

on how the other factor is set.


From this one test, there is no way to prove for sure which interaction is present, but KL was
selected as the most likely interaction. Both Kand L are large significant effects, both arc
offer-related variables, and an interaction between the two can be logically explained.
The interactiun diagram in figure A5.3 supports the main etfrcts: the 15% discount
(K-, both points on the left) is always better, and the free gift (L+, the top linl') increases
response over no free gift. However, the 2-factor interaction shows that both together-the

15% discount and the free gift-increase the response less than what can be expected by
addi1ig the individual main effects.
The interaction can be understood by comparing both points on the left versus both
points on the right. On the right (with no discount, K +),offering the free pen-and-pencil
set gives a large jump in the response versus offering no free gift-the response more than
doubles from 0.41 % to 0.84%. ln contrast, the points on the kft show that, with the 15%
discount (K-), the Cree gift increases response only slightly (frnm f.18"'u to l.27u!(,).
Ovnall, this interaction shows that the l 5% discount is great, the I rec gift i:, goud, but
buth together are overkill-the free gift adds little to the benefit of the di:,count offer. These
data helped the company more accurately quantify their rctun; on investment (ROIJ on
ever)' combination of offers. Also, this gave the marketing team deeper insight into cusKL 1ntcrJction
1.4%

1.2%

l.OlYo

"'

"

~ U.8%

"'
p::

0.6%

U.4%
-i---

K+ No discount

K-: 15%off
!Jiscuullt offer

- - L : No lree gilt

Figure AS.3

Lt: hee pen -" 11d-pe1Jcil set

Interaction Diagram for factors Kand L

CASf

239

tomer behavior, showing tlrnt one strong incentive is valuable, but additional incentives arc
prob,1hly unnccess,iry. With these results, the Internet director decided to offer a discount
more often, hut some! imes switch lo a free gift, depending on the e-mail campaign ;rnd the
profit.1hility oft he llhtnmcr scgmrnt.
CONCLUSIONS

The Internet director was amazed by the depth and value of the results of this one rest in
one drop with just 34,060 names. He learned in one week what would take 6 months using
standard techniques of testing one variable at a time. With these results he decided to do the
following:
C:onsistentlv offer the I 5% discount (testing different discounts in future campaigns).
/\void the partner promotions that hurt response.
lJsc the special -offer starhursl (! + ),even though it was not quite significant.
Offer a free gift every few e-mail campaigns to keep the offer fresh and sometimes
offer it along with the discount to the highest-value customer segments.
Improve his segmentation model, adding more variables and splitting apart recent
high value <llld low -value buyers.
I le implemented this strategy in the next campaign. The response jumped to 1.54%, which
was somewhat higher than the prediction and much better than the previous performance.
The Internet director continued testing offers along with bolder creative changes, eventually achieving response rates consistently between 3% and 5% while adding more names in
every drop.
After these results, the marketing team began testing changes in their catalog, retail
stores, and regional ,1dvcrtisi11g, continu;1lly squeezing greater profit from every marketing
dollar. They found few major breakthroughs, but continually uncovered a number of small
ch,!llgcs that <Hided up to a big bottom-line impact.
QUESTIONS

CASF 6

TARIFA6.2

Fstimatrd J:::jfccts
l:Ht-cr

Y:actor
\
fl

/J
I
(,

All~

Cf:+FC;

!\C Rr +!\/)

-u '

n;

-t

nc;
Fl

0 ..>45
0.165
0.005
0.035
0.165
0.045
0.555
-0.145
0.255
0.035
0.085
(J.205

AF~

fl(' +- /)/
fl(;+ /JI

~(;"

('/) I

HI

o.nzs

/HJ ("/ +
A ll/J

f(,

0.075
0.105

241

CASI

245

experimental design, we found a few early papers all in the marketing literature, and we
found no paper that employed a Plackctt-Rurman design, which was used in our study.
Curhan ( J 974a) used a 2-level fractional factorial design to test the effects of price, advertising, display space, <llld display location on sales of fresh fruits and vegetables in supermarkets, while Rarclay ( 1969) used a factorial design to evaluate the effect on profitability
of raising the prices of two ret,1il product;, manufactured by the Quaker Oats Company.
Holland and Cravcm ( 1973) presented the essential features of fractional factorial designs
and illustrated them with a hypothetical example concerning the effect of advertising and
other factors on the sales of candy bars. Wilkinson, Wason, and Paksoy ( 1982) described a
factorial experiment for assessing the impact of price, promotion, and display on the sales
of selected items at Piggly Wiggly grocery stores. In addition, marketing researchers have
used small experimental designs in survey and conjoint analysis applications (e.g., sec Ettenson and Wagner, 1986; Jaffe, Jamieson, and Berger, 1992; Srivastava and Lurie, 2004).
Our purpose in this case study is both to report on a successful retail marketing application of experimental design and lo highlight the opportunities that exist for operations
management researchers ;ind practitioners to apply these methods to service problems. In
gener;il, experimental design in service operations can be used to test the effects on service
quality and effectiveness of changes in staffing, training levels, procedures, and service system design. Particular ex,1mplcs in marketing include optimizing the design of Web sites,
increasing the effectiveness of direct mail distribution channels for maga1ines, credit cards,
and other product<>, <llld variow; in-store experiments to evaluate changes in factors such as
p.Kkagc dc.'>ign, price, .rnd pomt-of-s;ilc displ<iys.
1 he tr.idition<1I a11prn<1ch lo experimentation in manufacturing, as well as in ret,1iling
and other service areas, is to test one factor at a time while holding the remaining factors at
fixed levels. In contr:ist, in multivariahle experimental designs such as factorial, fraction<1I
f,1Ltori,1l, ,md PlaLkl'tl Hm111an designs, all f.ictnrs arc tested simultaneously. Recausc nfthc
orthogonality property of these designs, it is possible to obtain independent estimates of
important effects (main effects and interactions), while greatly reducing the required
sample si7e.
In the rel a ii a re<i, which is the focus of l his case study, firms typically make very large invcstments in testing. However, few companies use sophisticated state-of-the-art experimental design techniques for in-store tests, choosing instead to test one variable at a time.
Supermarkets offer especially attractive opportunities for experimentation, because of their
low profit margins .rnd highly competitive environments.
In designing an in-store experiment, there are many issues that need to be addressed.
I low many and which factors will be included< How many levels will be tested for each of
the factors' Whal alternative designs should be considered, and which design should he selected? With respect to sample size, how many stores should be included in the experiment,
and how should they be chosen< Over how many days or weeks should the test be run to obtain statistically valid and significant results? How should the results be analyzed'
This case study both provides insights into the design issues that are important to decision m;ikers and presents the details and results of an actual application. The product tested
was a popular maga7ine with a very large readership. For proprietary reasons, we do not
identify the company or the magazine, and minor changes were made to the data presented.
In spite of these modifications, the factors tested and results are essentially unchanged.

246

A)' p lo N 0 Ix

Retail testing is ideally suited for the use of experimental dl'sign techniqul's, o!Tering decision makers the opportunity to test numerous variables at a relatively low cost. Dozens of
elements can be tested simultaneously with the same sample size as a test of one variable
alone.
Tim case study describes a magazine supermarket test of I 0 in-store variables using a
24 - run Plackett-ilurman experimental design. All 10 factors were tested simultaneously
over a 2-week period, with only a fraction of the sample size required !or one- variable tests.
Results quantified the main effect of each factor and allowed for the analysis of 2-foctor
interactions.
TEST !"ACTORS
The supermarket is the final stage of the grocery supply chain. Typically, firms give a great
. deal of attention to supply -chain m;:magement issues that include forecasting, inventory
management in sturcs and warehouses, and transportation and logistic-, management.
Within the supermarket, there is a range of management issut'' th;1t affeLt quality and pru
ductivity. Some may be addressed with the help of mathematical models , for example, the
u>e

or queuing models to schedule the front -end checkout area or computer models for de -

ciding how to allocate shelf space to products.


lk-,igning and implementing in -store expe1-i111enh offer opportunitie_., to innovate , im prove service quality, and increase profits. There are many variables that might be tested, in cluding changes in stafling, training, product location, displays, promotions, and the su pern1,11kct environment (store temperature, type of background music , attractiveness, etc. ) .
With respect to the environment, in a supermarket experiment, Milliman ( l LJ82) found that
slow- tempo music compared to fast -tempo music decreased the p<1ce of in -store customer
flow cllld increased daily gross sales.
The focus of this paper

is on in -store changes that would increase single-copy magazine

sales. lhe maga1.inc publisher imtigatecl the study, but the variables tested were uf general
in I crest to the management of the supermarket chain as well.
Single-copy magazine purcha:,es are often an impulse buy. The cover price aid at

;1

news.- ,tand is usually much higher than the per -copy subscription jHile; so, loyal customer-,
have a strong incentive to purchase a subscription !or its low price and in home delivery.
Publishers invest extensive time and effort on each magazine cover, using experience, focus
groups, and one-variable tests Lo find the right pictures, words, colors, and layouts to attract
those impulse buyers who spend just a few seconds selecting a magazine.
In this particular experiment, the magazine itself was not changed. Instead, the project
focused on the location, number, and arrangement of magazine racks as well as in -store ad vertising. Copies of the magazine were primarily displayed near the checkout area. The operations team was particularly interested in the effect on sales of adding additional locations
throughout the store. Management of the supermarket chain was also interested in evalu ating the effectiveness of these additional sites. These added locations had been unused ar ea.'>, a11d because the displays required relatively little space, the m<lga1.ine was <.ill especially
attractive product to test in these additional locations.
The effect of in -store product location on sales has been studied by a number ot authors.
LJreL'.L', Hoch, and Purk ( 1994) used a basil test -umtrul exper1111entd! approach to assess the

CASE

TAB! F

247

A8. I

Factnrs and Their Levels


F.H Inr

A Ralk on criolcr rn prildurc ,mlc


H lnca11nn llll checkout .iislc

'.\In
r.nd cap

<

l\umhcr ol pnckcti.; on

111,1111

r<lLk"

) I ow l.cvcl

(:L1rrcn1

I J J<.1ck lw "1.JL k lnnds


h Advertise on grocery divide"
r I liq rihuI inn of m.1g.11111cs in the store

No

(; (hcrsi1ed card insert


11 Clip on rack ;idvcrt1semc111
/ lliscnunt on n111ltipk cnp1es
I On shell advertisement

No
No
No
No

'.\In

Random

( +) l ligh I.eve!

Yes
Over the belt
,\1orc
Yes
Yes
Even
Yes, in 20% of copies
Yes
Yes
Yes

sales impact of in-store shelf space management. Changing the location of products among
various she If positions, they fou 11d that rea rra ngi ng products in com pie men t(I ry groups a 11d
placing certain products at eye level could increase sales. Placing fabric softener between liquid and powder detergents and moving toothbrushes from a top shelf to a shelf at eye level
both increased category sales. They also found that shelf position was more important than
the amount of shelf space allocated for a particular product. In earlier research, several authcirs studied shelf space elasticities, including Brown and Tucker (J961) and Curhan
(1974h) while fhilte7 and Naert ( 1988) and Bultez et al. (1989) studied space allocation using an attraction model to estimate brand interactions.
In this case, the tcrn1 wanted to test as many factors as possible, which made sense because mos I of the cost of the experiment relates to the number of stores included in the test
rather th,rn the number of factors. Aher brainstorming a wide range of new ideas, the team
identified the I 0 factors in Table /\8.1. For each factor, they selected two levels: the low or
minus level and the high or plus level. A number of factors related to the number and loeation of pockets and rack--. A pnckct is one slot in a magazine rack, holding a few copies of the
same magazine. A rack is the physical display with a few or many pockets. The main magazine rack in one aisle of the supermarket may take up all of the shelf space for 30 feet down
the aisle and hold 150 different magazines. A small countertop rack may have just two pockets holding one magazine.
A: Rack on Con/er in Produce Aisle
The tc,1m wanted to ,1tlr1ct c11stomers a-; they entered the store. Most supermarkets ,ire
designed so that customers begin shopping in the section where produce and other fresh
foods arc displayed. The team hJd a new, small rack created with just two pockets. The r<1ck
was designed to fit on top of a refrigerated case located in the center of the produce aisle.
The di.,play was easy to install and took up little floor space. The team anticipated th.it a
magazine display early in the shopping route would increase the likelihood of purchase.
B: Rack Location 011 Checkout Aisle
Two different mag.1zine racks were available at the checkout aisles: the end-cap racks that
customers see as they <lpproach checkout and the over-the-belt racks above the moving
grocery belt, usually with smaller-sized magazines. The team had in the past tried both locations, but never tested one against the other.

248

APPENDIX

C: Number of Pockets on Main Racks


The number and location of pockets on the magazine racks are like shelf position for
packaged goods. The publisher already had a number of pockets, but wanted to know if additiunal pockets wuuld increase incremental sales enough to jus tify the cost.
D: Rack by Snack Foods
following the same idea as with factor A, the team felt that an additional rack at the end of
the shupping trip might encourage more people to buy a copy. /'he team developed a similar
small rack to place on the shoulder-height snack food shelves. They felt this might also entice
buyers on brief shopping excursions who go directly to the beer /snack food aisles.
E: Advertise on Grocery Dividers
With ever-growing alternatives for in-store advertising, frum coupon dispemcr' tu flour
graphics and public-address announcements, several low-cost options arc available for
publishers. The operations team brainstormed ideas for something new and narrow.cd the
list tu three factors (1:, 11, and}). One new idea was to advertise on the grocl'ry dividers,
the pJastic sticks used to separate groceries at the checkout aisle. The team agreed to use the
same basic plastic stick, but place short and simple magazine advertisrn1ents on each of
the four sides. Each test store received new grocery dividns in every checkout aisle whl'rc
the rnJgazine was so ld.
F: lJistribution of Magazines Within Each Store
The magazine was normally placed in a number of pockets and racks in the checkout
area. The team usually let natural market variation takes its course. ~ome pockeb would
empty out completely, while others would remain nearly full. Since customers cannot buy
magazines from empty pockets, the team thought that a more even distribution of copies
would increase sales. For alJ F + stores, they would pay people to go around to each store
every week and even out the distribution of copies-pulling some copies out of full pockets and placing them in empty pockets.
G: Oversized Card Insert
The one test factor that related to the magazine itself was a card insert similar to subscription cards in lllJny magazines. The insert was made taller th;rn the magazine, so its
promotional message extended above the top of the magazine. This approach let the team
promote the magazine without paying the supermarket for in -store advertising. The card
was added to about 20% of the copies used in the test stores, so it would stand out among
all copies on the newsstand. This factor was implemented by the distributor who made spccial deliveries of magazines with oversized inserts to all G + stores.
H: Clip-on Rack Advertisement
Another new strategy for in-sturl' advertising was thl' addition ufa -.111c1ll, pla-,tic, clip-,>n
sig,n with a promotional message about the rnag,a't.ine. These c\ip-ons were dcsig,nt.:d fur the
wire racks at checkout and were added to about S0Ai of all of the pockets around the checkout aisles.
I: Viscount on Multiple Copies
The team wanted to test the effect of a special promotion. They printed stickers for the
front of each pocket, promoting a discount un tl1c sccund copy purchased at the sclllll' time.

CASI

249

Cdsh registers were programmed to register the discount when two or more copies were
pun.:hdsed.
]: On-Shelf Advertisement

!he final in store advertising factor the team tested was an on-shelf "billboard." These
small signs in plastic frames were attached to the edge of shelves so they stick out into the
ai'>lc. These on-,belf '>1g11s were placed in a few of the non magazine supermarket aisles.
TEST DESIGN

With I 0 f,1ctnrs, there .ire several altcrnc1tive designs that can be considered. One possihilitv
is ,1 ~2 run 2" ' lr.H.tlllll.tl faL101ial de.sign of rc'iolution !\'. In thi' design, main effects
.ire u1nl(1undcd \~1th ~ .ind 1 Llltor interal t1ons, whereas pairs of 2-Lictor mteractions .ire
u1nfounded with each other, C\LCpt for one contrast in whiLh four 2-factor interactions .ire
uinloundcd ror ,1 cl1sL lJ',..,1on of design resolution and lOnfounding patterns, sec Hcl\,
f funter, <lnd I !unter (2005). Assuming that the .~ and i factor interaction.'> arc negligible,
thi' design pro\ Ilk'- ck.1r L'o.,tim,lte<, of all main effects. !n addition, with proper labeling of
the foctors, it may he possible to anticipate which 2-factor interactions arc likely to be neg iigihlc and therch: c'timate 2-factor interactions as well. A second alternative is a 16 run
2 10 "fractional factorial design. Hut this design i' resolution !II, with main effects confounded \\ith 2 factor 1ntcract1 Hls.
1\ third alternative 1s to chome a Plackett-Burman design (Plackett and Hurman, I 946).
f'l.iLkctt Burm,111 dcs1gm ,ire ,1 Ll.io.,s nf orthogonal designs for factors with twn levels, with
the number of runs Na multiple of 4 (i.e., 4, 8, 12, 16, 20, and so on). If\[ 1s a power of 2
(i.e., 4, 8, 16, 32, 64, ... ), these designs coincide with the fractional factorial designs. fhe
orthogonal Plackett Burm,lll designs with N = 12, N ~ 20, N - 24 runs arc important in
practice because they result 111 uncorrelated estimates of main effects of a large number of
factors 1n very few runs. For 2 level (fractional) factorials the run size must be 4, 8, 16, 32,
and -,o forth. Thi-, lcaveo., large g.1ps in the run si1es. In our case, with 10 factors, a minimum
of 16 run.'> would he needed, while the next highest run size would be 32 runs, as noted
above
Orthogonality of the dcs1gn 1111plics that the main effect of one factor can he calculated 111
depcndcntly of the main effects of all others. The main effect of a factor is the difference he
tween the response averages at the high (plus) and low (minus) levels of that factor. Plackt'ttBurman designs have fairly complex confounding schemes. In contrast to the fractional
foctori.11 designs where main effects and interactions arc either not wnfounded or "fully
.diased," Plackctt-Bur111.1n designs leave main and interaction effects "partially aliased."
1
1'his means that the ah-,nlutc values of the alias coefficients are strictly less than one. Tht' lit
crature refer., to de,1g11s that lead to partial aliasing a-, nonrcgular designs; sec Wu and
I L1mada ( 2000 '.
The authors selected the 12 run rcAected Plackett-Burman design in Table A8.2, rnn
.,isling of .1 tot,il of21 runs. Thi-, design was chosen to increase resolution vvhilc minimizing
the number oftrc.1t111c11t comh1n.1tiom., or "tL''-t cells." The 12 run rcAcctcd test dcsign c.1n
include up to 11 factors. \-\'ith only IO factors, the I l th column, K, was simply left empty.
\Jotc that although factor columns may he left empty, test cells (rows in the matrix) mar not
be eliminated. In an empty column, the resulting effect is simply a measure of experimen-

250

APlJENDlX

TAlll 1'

A8 .2

The Reflected Plackett-Burman IJesign in 24 J<uns


~

<

"'

0::

~;:

'.5

.5

-"'
u

0"

c
0

-"'
u

OJ

..c:
c

c
.S

;;;
u
0

:-g

!:5"'

-5

i::

"'
c"

-~

>-c
0

"'
0::

-'6

0
Cl.

""'0"

t;
~

"tJ

0
0

u..

-"'
u
c

"'
VJ

;:::

"'
0

G
0

;:;

"'
"

.<'..

;:;
"
E

t::

"c

"'

II

>--

:J

-"'

:;;

v"

.c
c
"'

"
~
>
"
0

c;

et:

J)

+
+

+
+

~
-"'
u

..D

-J

"'"'

.g

"

"E

'...J
v

..D

"'
~
'lbtCclJ

v;

"'

Vi

":Ju

~
~

"'
-<

::l

c'2

9
c..

g.
'_,

c
~

-~

c:_

:;

;;:

;:;

OJ

"6

<
c....

o:;

c
C-

~
K

+
+

7
8

t-

9
llJ
II

+
+

+
+

+
+

+
+
+

+
t-

12
13

14
15
16

+
+

+
+

17
IH

I'!

2U
21

t-

22

t-

23
21

tal error and /or interactions. The removal of test cells, on the other hand, creates a
nonorthogonal tesl design destroying the independence uf the rnai 11 dT<.ch.
ln

.1

resolution JIJ Plackett-Burman design, each main effect is confounded with all

2-factor interactions th<Jt do not mclude the nrnin effect, but it is unconfounJed with 2-factor
interactions that incl udc it. Plackctt-13urrnan designs are nonn:gular designs with confounding (alias) coefficients strictly less than one in absolute value. ln the 12-rnn Plackt'tt-Burma11
design, for example, the alias coefficients are either + I /3 or - I /3.

A complete foldovcr (or "reflection") of a Plackett-Burman design, such as the one used
in Table A8.2, leads to a resolution JV design where mai11 effecb are unumfounded with a!J
2 factor interactions. The term reflection is used because 12 additional test cells are run with
every plus and minus switched, somewhat like holding a mirror up tu the original design.
Forexample,testcelllis:A+,B+,C-,D+,E+,F+,c; - ,11 ,I ,J-t,K~.!'orthcflrst

reflected test cell, 13, all signs arc reversed to become: A-, R-, C+, D-, f-, F-, G+, H +,
f-1 , f , Kt . Though main effects are independent of all 2-factor interactions, each 2-foctor interaction is confounded with many other 2-factor interactions. A reflected PlackettRurman design provides more accurate estimates of the main effects of a large number of
factors, hut it creates challenges in trying to identify significant 2-factnr interactions.
Reflected designs can show the presence of 2-factor interactions, but it is difficult to
quantify individual interactions. A significant difference between effects calculated from the
12 original Plackett-Rurman runs and effects from the 12 reflected runs is due to one or
more interactions, because the interactions switch signs from the original design to the reflection. However, the group of interactions confounded within each column cannot he
separated mathematically. Experience and general statistical principles, like effect heredity,
ca11 lead to the 'uhiecti\'C ,eJectio11 of likely i11teractions, but selective a11alyses can only offer clue'> to potential interactions. If important interactions seem to be present, the best
course of action is to run <l higher-resolution follow-up experiment where all interactions
can he clearly quantilicd.
Recausc of the time and cost of producing many test cells, reflected designs arc seldom
used in direct mail, print advertising, or even Internet applications. But for retail testing, additional test cells add little, if anv, further cost. Each store needs to be set up and monitored
i11dividu<1lly, -,o more stores require more effort, hut the number of u11iquc test cells docs
not m;ike a difference. The statistical benefits far outweigh the cost of implementatio11. The
onlv constraint is the number of test units available (i.e., the number of stores that can he
used for the te<;t ).
In th is case, a Li rger .l2-ru n fractional factorial design with less con founding would hcwe
heen preferable. But the company chose to limit the number of stores used in the test, so the
larger design was not possible.
DEFINITION OF KEY MFTRICS AND SELECTION OF TEST UNITS

J'he key metric for this test was unit sales. The team wanted to uncover any factors that increased the numhcr of lll<1ga1inl' copies <;old throughout the superm<irkets. After analy1ing
sales data, the team could then calculate profitability based on sales and the cost of each new
pocket, rack, and advertisement.
Unit sales were easily and reliably measured using scanner data from each store in the
test. However, unit sales were not directly comparable among stores because each store had
a different historical sales level. For example, a large supermarket may sell 100 copies per
week, while a small store sells only 50. These store-to-store differences would likely overshadow any differences due to the test factors. Therefore, sales data duri11g the test were
standardi1.ed hased on the historical sales volume of each store. The actual key metric was
the percent change in s,iles relative to the historical baseline: IOO(actual units sold - baseline units)/(baseline units).
Calculating the baseline sales level for each store can be complicated and potentially a
large source of error. If stores vary widely in sales levels, then they should not be grouped
together in the same test, because our confidence in a 10% change in sales is much different
tor d store that sells JO m.iga1.ines one week and 11 the next, as compared to a store selling
I 00 magazines one week and 110 the next.

252_1_

AJ"'I .'>l>I \

ln1t1aJly, the authors suggested a mirnmum of96 stores (or the test. With a resolution l\'
32 run fractional factorial design, this would have given three stores -three replicates per test cell. However, analyzmg lest costs, management set the limit at 50 stores, all from a
single supermarket chain. At this point, not wanting to risk having just a smglc.: store in ,omc
te'>t ct'!ls, the authors changed the test to the 12 run reflected design (described earlier) with
ju-,t two replicates in each of the 24 te-.t cells.
\'\'ith 24 unique combinations 1n the 12-run rellected test design, .it least 24 storL's must
be sekcted as test units for the experiment. Two or three t1111e-, the 1111ni111um number of
store-. is often better for three reasons:
I. Larger sample size. More stores offer greater sales volume per week,
be completed more quickly.

!>O

the test

Gl11

Varwbility 111ialys1s. \Vi thin Lei! vari.1t1on Lan he used a;. a lllL\l>ure of' e.\.peril11L'lltal
error or stores can be combined together lo rcduLc total l'<Iriahilitv.
J. ldentificatw11 of outliers. With three or mores stores per test cell, a store with sur

prisingly high or low sales can be 1dentihed, sLrutinizcd, and, if appropriate, eliminated from the analysis of lest results.
Selection of a Retail Partner
!'he hrst step is the '>cleLlio11 of.1 rl'tail d1a111th.it1s used fm the t,-.t. !11 tl11, ca-.e, the pub
li-,hcr '>clcLled a grm:cn chain known for its L'XCL'llcnt coupcr.1t1011 with prl'\ iou> lL'st L<llll
paig11 ... ,'l.1any of the Lhain's supcnn-irkct-, were loLatL'd in Lime proxii111t: h.1d . . rrong mag
a1111c -.ales, and had a fairly standard store lavout. There were '>trong rca;.om to npect th.it
the tnt IT'>Ulh would transfer to other Lh<llll'>.
1\ltcr approving all tc'it factors, the retail partner agrLcd to run thl lc'>l dnd -,harL' sc.1n
11er .,,iJes data for all stores during the cour'ie of the test. The d1ain's m.111agemc11t team was
'>Uf'i' <>rti\'l' and ,1greed to arrange lllL'l'llllg-. 11ith '>ture 111.111.1 gL'I'> '><>th.it the tL',llll u1uld L'\
pl.1i11 and man.1gc tl1l' execution ol the k>t.
Analysis of Available Stores and Selection/ Matching of Test Units
l he grocery chain had nearly 100 stores from which the team could select the f111al 48.
The lirst step in this selection was to analyze the past sales of .ill stores and eliminate out liers. '-:cw stores, highly seasonal locations, and stores with dramal1L rates otgrowlh (or de cline ) were eliminated first. Then stores with Jm\ sales volun1es were re1m>ved.
< nntrol charts (individuals, X, .ind moving range, Ml~. chMh) ofwcLkl) s.ilcs d.it.i were
created for all rerna1n1ng '>lores plnttlllg unit sale'> per week <Ind adding Lontrol l1n11h to
quantify variability and identify speual causes (sec Montgornerv, 201)!, for a d1scu-,s1011 of
control charts for individual meusurcments).
The authors selected 48 stores with high sales volumes, low 1ariability, and stable sales
over time . The next step was simply to match smaller stores with larger stores so that the average '>ales volume per test cell would be relatively comtJnl.
The fu13\ baselme sales numbers were (clkubted aftn store;. were 111c1td1ed and plaLed in
each test cell. Each pair of stores was considered one lc'>I unit, and 24 new Lontrol Lharts
were created. All pair> were simil<lr in show111g minimal .,ale, grov. th ()\'er the pre\ iou-, k\1

,J

____
<AS!

2>J

weeks, with avcr.igc s,ile.s consistent with the long-term average over the last few month.s. i\
couple of special 1..auses-ident1tlahle sources of variation-affected previous weeks. 1\
holid<1y 6 weeks before caused a large jump in sales, and a special issue of the magazine before that also caused a shift in sales. Therefore, average sales over the previous 5 weeks were
selected as a baseline for the test.
The average sale over the 5 previous weeks for each two-store test unit was selected because it gave a valid, easily understood baseline for comparison. More complex options
could have been used instead of the 5-week average. For example, a regression model based
on past performance-including seasonality and growth rates-could have been used to
predict future sales. Covariates could be added to the model based on information about
mm pet itor pricing, pm motions, and special offers. With sufficient and accurate data, a
regression model may work well. However, historical results do not always predict future
performance, and numerous predictor variables can potentially create additional sources of
error. Also, in this case competitive data were not available, and recent sales were fairly consistent among all test stores. Therefore, the 5-week average sales level gave a clear and simple
method for standardi1.ing all test units without undue complexity or potential error.
Minimizing and Measuring Experimental Error
Since two stores were combined into one test unit, store-to-store differences were not
med as measure of experimental error. To get greater consistency among test units, large
stores were llldtched with sm;ill stores, potentially creating higher within-test cell variation.
Therefore, week-to-week variation of each pair of stores over time was used to calculate experimental error for the test. With the same combination of factors run in the same stores
over a number of weeks, the weekly difference in sales paralleled the natural market variati,on. bch additional week provided an additional replicate for each test cell.
Sample Size
The power of the test was determined by the overall sample size. This number depended
llll the number of the storL'S, plus how long the test was to run. Once the number of stll1cs
wa .s set, !Ill' only W<l)' tll llht;1in more datil and increase power was to run lhc test for a lo11gn
period of time. Sample size is an important issue in planning the testing schedule. Company
executives were concerned ;1hout the cost of testing, while the team wanted to run the test
long enough to identify small effects.
Sample size calculations require a reliable estimate of the variance of the key metric, in
this case the percent change in sales. This variance must come from the test units as defined
for the test. In this case, the important number was the variance in weekly sales for each pair
of stores used for each test cell. An estimate of the variance was obtained by pooling the information from the control charts of all 24 pairs of stores. An average of 125 copies of the
magazine was sold in each pair of stores every week, with a standard deviation of about
12 copies, or I 0%.
The authors recommended running the test for at least 5 weeks, assuming the standard
deviation during the test would remain at I 0%. With a total sample size of l 20 test units ( 24
pairs of stores X 5 weeks ) and standard deviation of l 0%, the team would have an 80%
chance of detecting any factors that impacted sales by 5% or more.

254

APPEN_D_1x_ _ _ _ _ _ _ _ _ _ _ _ __ __

_ _ __ _ __

_ _ _ _ __

The overall sample size in factorial-type experiments where factors are changed simultaneously has a different meaning from the sample size in the experiments that test one variable at a time. With one-variable tests, each individual comparison represents a separate statistical test requiring a certain sample size. A test of one factor alone would require the same
120 test units recommended for this 10-factor Plackett-Burman design, or a total of50 weeks
for a series of 10, one-variable tests within the same 48 stores.
As the launch date approached, company executives felt the need tu speed up the project
and reduce costs, so they limited the test to just 4 weeks. Then, just berure the test began,
further delays reduced the run length to only 2 weeks.
TEST RESULTS

The 12-run reflected Plackett-Burman test matrix and the resulting percent changes for
weeks l and 2 are shown in Table A8.3.
The main effects are obtained by applying the plus and minus signs in the design col umns tu the averages in the last column of Table A8.3, and dil'iding the rl'sulting sum bv I 2
(the. number of plus signs). Alternatively, one can regress thl' response ( thl' averages in the
last column of Table AS.3) on the design vectors. The only differencl' with the rl'grl'ssion is
thl' deltnition of the effects, which arl' cul in half when using regrLssiun.
We treat the changes in weeks I and 2 as independent rcplil<Jtiom and ctlculatc for each
run ,in estimate of the variance of individual measurements. For exa111ple, for the lirst run,
the variance estimate is [ ( J 2.5 - I 7 .9 )' + (23.3 - 17 .9 )' Ill = 58.32. We pool the 24
variances to obtain an overall estimates ' = 93. 75. The variance of cJch run average (aver age for weeks l and 2) that goes into the main effects calculation is given by s2/2. The vari ance of an effect is var(effect) = 2(s 2/2)/12 = s 2/l2 - 93.75 / 12 -= 7.81,andthcstamfard
error is vi.81 = 2.79. El1ects that MC larger than 1.96 times the standard error (5.47) arc
considered significant. The effects arc displayed graphically 1n Figure A8. 1.
Three effects are statistically significant:

A+: Rack on Cooler in Produce Aisle


The display on top of the refrigerated case in the produce section inneased sales by
10.8%. This identified the one most profitable new location to sell the magazine and supported the idea that attracting the customer early increases sales. This was a major change,
placing magazines far from their usual locations.

F-: Not Adjusting the Number of Magazines in the Pockets


Sales dropped I 0.6'Jl(J when workers adjusted the number ol maga1.i11cs among the puc:k
ets. This result was completely surprising, but saved a great deal of money on unnecessary
effort. After seeing these results, the team thought that empty pockets might create the pcrcepti()n of greater demand-perhaps customers are thinking, "if everyone else is buying
this issue of the magazine, then it must be worth reading. " Another explanation was that an
uncl'en distribution - with more copies in some pockets and just one or two in othcrs might be more eye-catching, adding ''texture " to the numeruus ruws and columns of rnag a;-:ine,. Of course, there was also the possibility that the,,e results wnc due to chance. Hut
even if the apparent negative effect was random, it was clear thcll there was 110 bcnetit tu eve ning llUt the distribution of maga1.ine~ acro~s pockets.

_J

- - - - - -- - - U-+-: Rack by Snack Foods


Thl' second new display that worked well wJs a magazine rack at the other end of the
store next to the snack foods and beer, increasmg sales by S.5%. Once agam, J small raLk m
a L<>lllpletely new location was beneficial.
1 hl' average sales mcrease during the test was 8.4%. Adding these three signi!H..mt ctfrcb
(cakul,1ted as thl' ovl'rall average plus oni:-half of each effect) re-.ulted 1n a sales increase ol
21.8"'" ,1s compared to the 'i-week baseline. Profitability analysi:s (not shown herl') showed
th.it these three relllained valuable when taking into acu>L111t t 1 1t cost of the two additional
raLb.
( :on founded 2-f,1ctor interactions were also analyzed b)' rnrnpanng the effects of the
origin.ti 12 runs wi'.11 the effects calculated fro!ll the rdll'l.ted 12 rum. No s1gnifiLant diffi:r
ence in effects was found, so interaction columns were not analyzed further.
I he nonsignificant effects were also very valu.ihle. With sud1 ,1 bnef test, it Lan be risky
to assu111e that nonsignificant effects have no impact 011 sale" but these rt.''>Ulb can -.ignal
where the company can achieve significant savings. All oi the in store advertismg h,1d no
i!llpact, so the publisher avoided continuing with the on-shell, clip-on, and grocery divider
ads. The location on the checkout aisle was not statistically significant, but because the est1
mated effect was negative, the team decided lo keep the end c.ip di.spLns. !'he second LOP>
discount and oversized card insert had no impact as well and were eli111111,1ted. Surprisingly,
more pockets on the main racks had no impal.l on sales. !'he team not 0111> avoided the ad dition.ti cost of more pockets, but abo reali1ed that the incremental benefit ol add1t1011al
pockets has some point of diminishing returns.

FINAL COMMENTS
In ~8 stores over 2 weeks, the team learned more than they couicl havl' learned from months
of testing one variable at a time. Two new rack locations were the significant winners among
the four factors that related to the number of pockets and 1tllation or racb. rllt' team
avoided unnecessary operating costs ,1fter all five 111 -storc .1dvert1s1ng l.1L1or-, showed no
effect. Finally, the common perception that redistributing copies was a worthwhile invest
ment proved to be a significant 1111slonceptio11.
ThL focus of this case study 1s on increasing magazine -,,tie-, 111 ,1 retail .,ett111g, but the
methods we have presented and disrnssed apply lo retail produlls 1n general. l11 te'>l1ng rnch
produLls, decision makers are interested in a range of factors, includ111g pme, pack,1gl' de
s1g11, location, and advertising. The experimental design methodology dl'sLnhed here L.111 be
USL'd lo test speLiflL options, for ex.1111plc, one p.tLk.tge de"g11 'ersus lllllther. or it L,In tu
cu-, 011 providing more general insights about the etkLtivene.,-, ol f.t(tors suLh .i.s .1dvert"
ing ,111d product location. More generally, the expenment,d design appro.1d1 has applica
bility to a wide range ol problems involving '>ervice operatio1h .111d 111.irkcting progra111s.
'J hesl' statistical tools offer an efficient methodology lor future studies ,11111ed at improving
the qu,1lity and effectiveness of service systems.
QUESTIONS

Exercise I in Chapter 6

~SH

AP

l N1>1x
_ _ _ _ _ _ _ __

lrated them with d hniothetical examle concerning the effell of advl'rti;,111g and olhl'r fav
tor-, 011 the ~ales of candy bars. Wilk111so11, Wason, and Pabo: ( 1982) de-.cribed a faLLonal
t'Xpenment for assl'ssmg the impact of pnce, promotion, ,md di-,play on the sales of selected
itelll'> al Piggly Wiggly grocery store;,.
Although the market testing literature 1s sparse on the ust ofexcnmental design 111od
eb \\Ith many factors, I- or 2-factor exenments have been common. hir example, Lod1sh
et al. (I 995a) analy1ed the results of389 televi..,ion adverfoing experiml'llh to dl'lerminc the
cffrct of advertising on sales. Their data set mduded three types of tests: comparing two
di f'ferent versions of advertising copy, comparing two different levels of exposure, dnd test
ing rnpy and exposure simultaneously using a factorial design. In a related paper, Lodish
et al. ( l 995b) examined the carryover effect of television-advertising exposure by tracking
sales for an additional two years beyond the original one-year test period.
I actorial and fractional factorial designs are well known and have been widelv used 111
beha1 ioral marketing experiment'.> in laboratorv 'lettings hee e.g. )dffe, Jamieson, and
Berger, 1992, Srivastava and Lurie, 2004, and Ettenson and \,\agner, I 986) as well as 111 con
joint analysis applicatiom. Green, Krieger, and Wind (200 I 1 described a credit-card study
that.illustrates how fractional factorial designs may be used 1n con1oint dnalvsis. Their design consisted of 12 attributes relatu1g to potential credit c,1rd sen ices, each havmg two tu
six le1els. ]or example, annual price (six alternatives), retail purchase 1nsuranu' (no, yes),
rental car insurance (no, yes), and ,1irport dub admission (no ad1111-,'>!on, $)kl' per visit,
$2 kl' per visit). L1sing a fraction,d facto11al dc-,ign, 64 profdn wnl' Lll'dll'd uut of' ,1 tut<1l
of I86,624 possible attribute-level combmdt1om. The 64 profiles l\t:re partitioned in tu
"blocks" of eight profiles each, with all profiles in a given block being pre-.ented to each rl'spondent. [;or each profile of credit card services, the respondent ,,,Is asked to 1nd1G1te the
likelilwod of purchase on d 0 - I 00 point scale. I his blockl'd tract10nal design provided 111
dl']1c11dent (uncorrelated) estimates of main effects.
(; n:en, Carroll, and Carmone ( I ')78) prol'1dcd an L'xcelle111 uven iew and disc uss1011 of'
the kl'\ elements 111 frdctional fatton,d design-., while (,rt'l'll ,1nd Sri1111,1-.,111 ( 1978 l'J9()\
<1nd (; reen, Kriegn, and \,\'ind (200 I) prlll ided 11ot<1ble l'L'I 1c1,-. ol t hL l'.\ lL'll'>I\ L' I1tn,1t u 1L'
on tonjoint analysis. Bradlow (2005) d1stus-,ed current l">UL'., 111 urn101nt ,malvsl'> and the
need for future research; Wittink and Catt1n ( 1989) and \'\'i11111k, Vnem, and Hurhenm
( 1994 I documented the w1despn:aJ Lo111111erual use of con10111t models. Although l,rl'L'll,
Cai-roll, and C:.irmone ( 1978) brictlv discussed Plackett Burman dcsigm, we found no p.1per., that used these designs in co111oint and discrete choitc models.
()ur Plackett Burman design i-, ,1 ma1n effcLl.'> model that. <I'> \H' 1,ill show, ma: prn11dc
evidence of likely 2-factor interaLl1011s under some circurnstdnccs. I he fractional tk'>lgm
used in conjoint analysis are tvpic,dly main effects models ,1s well, L.onfoundlllg m,1111 dfrth
and 2 factor interactions. Carmone and Creen ( l 981) showed how selected 2-factor interactions can be included in fractional main effects designs. Plackett-Burman and fractional
factorial models are orthogonal designs, which means that effects arc l''>timated independ
cntly and with 111in11num variance. Orthogonal designs may be prohibitively large in situat10ns with many factors, including $Orne at more than two 11.:vcb, and 1n cases where interactions are important. !:-or these lircumstances, nonorthogunal designs are availahk and
mav be generated using statistical software. Kuhfcld, Tobias, and (;arratt ( 1994) discussed
such nonorthogonal designs and their use in conjoint and discrete thoicc '>tudics.

''' j

t ASE

259

Our review of the literature shows that fractional designs and rel<ited orthogonal dL''>igns
have been u<>ed exten-,ively in conjoint and discrete choice studies. As we have noted, there
hdve ,1Jso been a few papers on market tests involving relatively few factors that use faL torial
or fractional factorial des1gm. However, it has been our experience that until rcccntlv the
gre.1t maioritv of market testing practitioners relied on the trnditional 1pproach of te.'1ing
one l.ictor at a t11nc. 111 this c.1-,c we show the benefits of statistical met Inds that simult.rncouslv test many fodor'> and also demonstrate the usefulness of Plackett-Burman designs, an
1mport,rnt class of e.\pcriml'lltal dt's1gn models.
THE EXPERIMENT

The Factors
rhe firm's marketing group regularly mailed out credit-card offers and wanted to hnd
nC\\ \\avs of increasing the eftl'ctivcness of its direct mail progrnm. The 19 factors shown in
Table t\9.1 were thought to influence a customer's decision to sign up for the advcrtist'd
product. factor'> 11 F- were approaches aimed at getting more people to look inside the envelope, 1vhile the rem.1ining factors related to the offer msidc. Factor C: (sticker) refers to the
peel-off sticker ,11 the top of the letter to be applied by the customer to the order form. The
firm\ marketing staffhelicved that a -;ticker increase<> involvement and is likely to JllLrcase
the number of orders. hie tor N (product sclcllion) refers to the number of different lrcdit
c.ird im,1ges that .1 Lw,tomcr could chose from, while the term "huckslip" (factors Q .rnd R)
de-,Lrihc-, ,1 -.m.1'1 '>cp.iratl' -;heel ofpapn th.11 highlights product information.
A Plackett-Burman Design for 19 Factors
\\'1th '>O manv f".ill<lr,, we chose a 2-leYcl design. Bv doing so, we uiuld keep the numlwr
of n1n' rclati1-ch 101' and ,t\'01d more complic.1tcd and possihlv nonorthogonal tJL''1gns.
J"wo-lcvcl screen111g designs <1rc common 111 the field of experimental design; sec Box,
TAR11/\9.I

/he /9 lcst lucton and Thc1r I ow and lligh l.n'cls


I .1Llor
1\ I nvelope IL'a,cr

fl Return add re.,,


( "( )ffo 1,11" 1nk ,1,111111 lln emLlllpl'

/) J'oq.igc
{ 1\cld1t1onal graphic on envelope
/ l'ncc gr.1ph1c nn kiter
(, ~ticker

/ (_ontrol

'

( .cncral offer
Blind
Yes

l'rcpnnted
Yes
Small
Yes

(+)New Idea
Producl specific oiler
Add company name

"'"

Stamp
No

l..1rge
No

11 Pcr'>on,tf11e letter l np\

'\Jo

I Copv me,,age
I I etler headline
K I.isl of benefit'
l'os1'cnpt on let I er
.\1 Signal u re
;\' Product ,e1ecl1nn
() Value of free gitt
I' Reply envelope
Q lnforrnal<on on hucblip
II ~econd huck,lq1
.s I ntcrcsJ r;ilc

Targe1ed

(~cnenL

Headline I
Standard layout
Control version
/\1anagcr
Many
I I1gh
Control
Product info

I leadline 2
C.rca11vc layoul
!'\cw pmhcnpt
Senior executive
rcw

No

I.ow

Low
'\Jew siyle
hee gift 111fo
Ye.,
High

~h'-JIJIX

260

---

----

--

TI\

81 I'

A 9. 2

Response Rales in the 20-Run Plackett-Hurmun /)esign

"0Cl.

"p,
0

a;

c;
>
c:

>

c:
w

c:

c:

0
Cl.

~"
"'0Cl.

"'O
"'O

<

E
~
..,<
.5
OJ
c;

:.cCl.

0"'

OJ
c
<U

c:;

'5

v ?

Ii

])

;..

Test
Cell

~
~

0
u

::l

OJ)

<

Cl.
0

.....l

v
.....l

:.cCl.

"'

.~

[".

CJ

<U

-"'

.::

OJ

c:

ct

V)

c;

1-f

+
+

+
+

0
u

.::"

Cl.

v;

.::"

""' =a
'2
"'
2. :r:"
b!J

<U

;>..
Cl.

~<U

-'

cE

"c
<U

co

c
~

I\

.9

"

.....l

c:

.-

'iii

"-

"

o:;

::l

c:

Vl

u
:l
"'O

"~

._

"

:l

Vi

,\!/

\'

()

"

Cl.
0

c
0

Cl.

""
.

o:;
>
c

"'

"'O

0.
CJ

0
,_,

r::t:

..':

VJ

}'

()

/(

::l

co
c:

"

CJ

'iii

sc

19

1.04%

\8

(). lbu,.11

12

0.8!l!1u
2.f.t-;U;(J

[IJ.j

2.UK' .,
1.20%
1.22'),o

(,IJ

ill

t-

+
+

fr~

S7

\()

12

15
16
17
18

'i2

1.\4

l3
14

R.rtv

( JrdL'I >

7
9
10
II

2
0..

b!J

::l
cQ

<t

5
6

"'O

;>..

Re~pun:,L'

+
3

9
v

llJK
\9
40
49

\} ()(JU/U

]b!JI()

U.7WJ10

0.80%

'N

0.98%
0.7.:JOto
1 .9810

8(1
4.l

0.8(1%

47

0.9.fl~(J

IUl

2.o~u,n

\,

I ~(110
l. J 4lVu

20

1.7 } 01ii

I lunter, and Hunter (2005). Our philo . . ophy in testing many factor-,, each at twu leveh, was
to identify which factors were active-that is, which factors had a significant effect on Lile
response. Once these active factors were identified, it would be possible (if needed) to test
each of them at more than two levels whik still maintaining an orthogonal design.
With 19 factors, we created the 20-run Plackett-Burman muin efjtcts design shown in
Table A9.2. Plackett-Burman designs are orthogonal designs for factors that have two levels
each, with the number of runs N given by a multiple of 4 (sec Plackett and Burman, 1946).
For 2-level fractional factorials, the run size N must be a power of 2, leaving large gaps in
the rLIIl sizes. For example, a minimum of 32 rum i::i required in a frat..tional factorial design
involving 19 factors. The Plackett-Burman design, on the other hand, can study 19 factors
in just 20 runs. This is why these designs are useful in situations where the number of runs
is critical.
I11 a Plackett-Burman design each pair of factors (columns) is orthogonal, which by def1nitio11 means that each of the four factor-level cumbinations I(
), ( + ), ( t ), ( + +)I
appe<1rs in the same number of rum. In the 20-run design (Table NJ.2), fur every pair of
columns, each of the four combinations appears tive time>. As a LUJ1scqucnce ul urthogo-

--

_J

-- -

--

-- -

---

TA

~--

CASE 9

-- - - -

n 1.1- A 9. _'\

A Frartional !'r1rtnrial Drsign with c;rnemtars E - AR, F = AC, G - AD, fi - RC, I = H/J,
J

('/J,

I\

ARC', l - /\/iTl, M - ACT!, N - RCL), and() - ARC[)


FACTOR

R.t1n

/)

/-

(~

11

+
+

13
14
15
!Ii

+
+

+
+

+
+

+
+

+
-----

+
+

+
+
+

()

7
H
9
10
11
12

n
+

+
+

+
~

:v

--

+
-

+
+
+
+

+
+
+

------

+
----

nality, the main effect of one factor can be calculated independently of the main effect of all
others. Plackett and Rurman showed that the complete design can be generated from the
first row of + 's and - 's. l n Ta hie A9.2, the last entry in row l ( - ) is placed in the first po<;ition of row 2. The other rntries in row l fill in the remainder of row 2, by each moving
one position to the right. The third row is generated from the second row using the same
method, and the process continues until the next to the last row is filled in. A row of - 'sis
then added to complete the design.
In what follows, we will assume that 3-factor and higher-order interactions are negl.igible
and can therefore be ignored. The main effect of a factor is the difference between the response averages at the high (plus) and low (minus) levels of that factor. Both fractional factorial designs and Plackett-Burman designs are orthogonal, but the natures of their confounding pattern-, differ. Consider a fractional factorial design in which main effects arc
confounded with 2-factor interactions; for example, the saturated design for lS factors in
16 runs shown in T,1hlc A9 ..1. The design matrix is constructed by first writing columns of
signs for a full factorial design in four factors (columns A-D). The signs for the remaining
columns are determined from 11 generators that use all interaction columns in the fllll fac torial design. For example, consider the generator K = ARC. Multiplying the signs in column'> A, fl, and C, row by row, results in the column of signs for factor K. There are 15 main
effects and 105 2-foctor interactions (15 1/2!13!). Each interaction belongs to a single set of
seven 2-factor interactions, and each main effect is confounded with one of these sets. For
example, we flnd that A is confounded with BF, CF, DG, HK, IL, JM, and NO. The factor A
does not appear as a letter in any of the seven interactions, and no two interactions include
the same factor. The column of signs for factor A is identical to the column of signs for each
of the 2-factor interactions that are confounded with the main effect of A. Hence there is
perfect correlation (p = 1) between the column of signs of A and the column of signs for
each of its confounded 2-factor interactions. For example, multiplying the signs in columns

262

A p I' EN ll 1 x

Band I: row by row to obtain a column representing the Hl: interaction results in a column
o( sigm that is identical to the column of signs for factor A. Because of this perfect corrcla
tion, estimating the main effect by takmg the difference between the response averages ,11 the
h 1gh (pl us) and low (min us) levels of a particular factor actually g1 ves an estimate of the
main effect of that fador plu:; the sum of the seven 2-factor interactions that MC rnnfounded
with that main effect. If all of these interactions are negligible, then the result will be a clear
estimate of the mam effect. If one or more of the interactions are s1gnlficantlv different from
zero, the estimate of the main effect will be biased. The books by Berger and Maurer (2002),
Box, 11unter, and Hunter (1978), and Ledoltcr and Burrill ( 1999) discuss lracl1011al facto
rial designs, confounding, and the analysis of experimental rL">Ults.
l'iaLkett-Burman designs have nHHC c:omplex urnfound1ng patterns. Lach main etle..:t is
confounded with ,di 2 factor interact1om cxcqll those that 1n\oll'c tlut 111a1n effell In 0111
19 fall or de-,1gn 1n 1able t\9.2, the 111~1111 ctfell lor L'<tch fall or 1s u>11i1HlnLkd 111th ,di 2 lac
tm interactions involving the other 18 factors for a total of I 5 i in tnaLl ions 18 ~/ 2 ~ I 6 1J. ll u t
in LOlltrast to the fracl1onal factori,d design shown in L1ble t\Y.J, the u1lu111n of s1gm for
each main dkct 1s nut identical to the colurnn of-,igns lor each ol its uintuundcd 2 t.1Llo1
1nln<1d1011s. Although not idrnt1cal and thus not pcrlcc:tlv rnrrclated, these rnlu111m of
s1gm c1re correlated. That is, the wrrclation between the s1gm in a 111,iin effect rnlumn and
the signs in each 2 factor interaction column that is confounded with that 111,1i11 eff\.ll ts
strictlv less than I in absolute value (Ip!< 1). As a consequrnLe, it can he shown (sec Chapter 6) that estimating the main effect of a part1rnlar factor by taking the diffLrence between
the high (plus) and low (minus) levels for that factor actu,dlv provides an estimate of the
main effect plus the weighted sum of the 2-factor interactions that arc confounded with that
m,11n dfect.
The weight associated with each 2-factor interaction i-. the correlat10n between th,1t
2 faLlor interaction and the main effect; see Barrentine ( l Y96) for a d1'>1..uss10n of the struc
ture of confounding patterns in Plackett Burman designs. Lnumcr.iting all corrcl,1tions
among factor columns and interaction columns reveals that, for the 20-run PlackettBurman design 111 Table A9.2, the weight\ (corrclc1tions) are either 0.2, + 0.2 or 0.6. Of
the I :'d interactions confounded with each ma111 effect, 144 h.ivc weights of 0.2 or +0.2
whik 9 have weights of 0.6. /\. particular 2-fallor interaction will appear in the confounding pattern of 17 main effects. 1--or 16 of these main effects, the weight a<,'>oci,1tcd with
this interaction will be 0.2 or t 0.2, while for a single main dfrct, the weight assouated
with this interaction will be 0.6. !or example, consider the main effect of factor N and the
<;(;' 111tcract1on. V\e use +I and
I lo represent the column signs ,1nd mult1plv the entries
in c:olumns sand G lo obtain the entries Ill column sc;. \Nnting each LOiumn ,lS ,l rm\ to
save '>pace and listing the run numbers above the entries, we obtain
Rull
( ol u 1n11 I<
l l1l11111n \'(,

- t

. I

'I

I0

II

, I

I
I

12
I

I
I

Ii

14

t'

I
I

' I

I
I

Ib
I

I/

18

I
I

+- I
I

I 'I

20

Both colunrns hal'c I 0 plus s1gm and I 0 minus -,1gm, .ind the c11t11L'' in ea1..h rnlun111 .idd to
1no. lurthermore, the sum of the '>quarcs of the l'ntril's 111 l'ach colun111 is 20, the numh . :r
of runs N. The column> are correlated. In 4 of the 20 rum the -,ign; Ill.itch, \,hill' 111 lb runs
thl' signs are opposite. J'he correlation hctwl'cn these two mean 1no columns (call them.\
and.::) is given by

_________________________
c_A_s~_._9_~ _

fl

L-'Z.
\ 2: x

12

2: z-

20

0.6

T-or s1mplicitv, suppose a single 2 factor interaction confounded with a particular main ef-

fect is important. A total of 17 main effects will he confounded with that interaction. For
each l>f these m::iin effects, taking the difference between the high (plus) and low (minus)
Incl-. liir that f.1Ctor prmidl'' .rn estimate of the main effect plus a time'> the magnitude of
the rnnfounded 2-f.JL tor 1ntoaction. 1\s noted previously, for 16 main cfkcts the fraL't1on a
will he 0.2 or 0.2 .. rnd the hias in nur estimate of each m;1in effect will he relatively '>Ill.ill:
plu.. or minus 0.2 time., the magnitude of the 111ter,lltion. for a single main effect, n "ill he
O.h ,rnd the h1.1s 1,jJJ lw O.h t1111L'S the m.1g11itudc of the interaction.
< 1ill'll the uim~)ll'\ confnund1ng pattern<. of l'lackctt-Burman designs, it may 'L'L'lll dt
fw.t gl.1nLC !hat thc1 11<1tild not prmidc ;rnv u.,eful 1nformation about 2 factor interaL 11nm.
In I.ill, lr.id1!1on.ilh' 1hn h.11L' hL'en u,ed ,1, main efti.ch design.,. \1orc recently, ho11-e1L'r,
i'lad.ctt Hurm.rn dcs1gm h.ivL' received much greater attention from researchers heca11sL' of
wh.il I-lo\, lluntcr, .rnd l luntcr (2005) call "their remarkable projective properties." In analy1ing the resuJu, of our experiment in the remainder of this case, we will discuss these proicctil'c properties .111d show lw1v they can he used in certain circumstdlllt'S to estim.1k one
or more 2 factor interactions from the results of a Plackett-Burman experiment.

The Results
I he fou1' oft he npcnmLlll wa' nn inne.i-,ing response rate: the fraction of people who
respond to the nffi.r. r\ large mailing list of potential customers was available for the test.
r he overall 'ample ,i;e (the number of people to receive test mailings) was determined .1ccord1ng to sl.itist1lal .rnd marketing cons1der;1tions. The chief marketing executive \\,Jnled
to limit the number of name-. 111 order to minimi1e the cost oftest mailings that performed
worse than the control (especially when testing a higher interest rate) and also to reduce
postage costs. Of the 500,000 packages that were mailed, 400,000 names received the control mailing (th.it was run in p.irallel to the test) while 100,000 were used for the test itself.
I hercfore, e.1d1 of the 20 test cells in Table /\9.2 was sent to 5,000 people, resulting in the
response ratL:s listed in the Ja-,t column.
I nr cad1 f.1ctor in the experiment, 50,000 people received a mailing with the factor at the
plus level and 50,000 people received a mailing with the factor at the minus level. Each main
effect is obtained by comparing average responses from these two independent samples of
50,000 each. Kecause the design is orthogonal, the same 100,000 people are used to obtain
independent estim.ites of each main effect.
The marketing team regularly used 25,000 50,000 names for each split-run test, '-.O the
sample -.i1c of I 00,000 for the designed experiment wa:-. not much different from what h.1d
hecn dnne in the p.1sl !'own l.Jlculations ming the M1n1tab software convmccd the a11tlwr<,
th.11 this s.1111plc 'i1c was l.irgc enough tn detect meaningful differrnce-.. Determining the
st.1ti-.tiL.d sig111fic,111LL' nf ead1 main effect is equivalent to the sta11d,1rd statistical lc't fur
,om1).iring two independent s,1111ple proportions, 111 this L.ase of s11e 50,000 each. The firm
estimated an average response rate of I% and wanted to be quite confident of dctell1ng a
change of0.2o (either ,lfl in, rease from I <Yci to 1.2/ci or a decrease from I 0;,, to 0.8%). r\t the
:; 010 significance level with a s;11nple of 100,000, the detection probability (statistical power)

164

A I' I' EN lJ I X

\:Interest rate
0 .. 1.1h

(,, St1lker - - - - - - - - - - - - - - Ii: \econd bucblip 0.304

opy message
0.192

I I L'lter headline
/ l'ricegraph1c '....,;;~
I. I etter postscript . 1

I/ l'er">llali1.at1on
I' l!eply envelope
<J- \.due ol I ree

0.128

;:rm

0.116
0.104

0.096

gilt

+0.092

1-. ,\dd1t1onal graphic

-0.092

/.: I 1st ot benellls

0.088

C.): I 11!0 on bucksl1p

0.080

H: l!eturn addrns

-+0.076

\.1: Signal ure

0.064

\:I 11velope teaser

+0.064
0.052

\': l'rnduct selection


D: Postage
Official stamp

0.0

0.1

0.2

0.3

Etfrct

Figure A9.l

0.5

0.4
111

I
IJ.6

I
0.7

T
0.8

0.9

1.0

percent.ige points

tv!ain Effects Estimates: Plackett-IJurrnan Ucs1gn

was found to be 0.86 for a change to 1.:2% and 0.92 for <t change tu 0.81i1. Thu-,, with ,1
sa111pk size of 100.000, the authors and marketing team were Lonhdent of being able to detell very small yet economically meaning!ul differences.
'I esting all factors .,1multaneouslv has J,1rge sa111pk-sue aLh,tntages L0111p,tred to testing
e<tch of the 19 factors one at a time '>up pose we kept the total -,.1111pk s11e ,1t I 00,000. !"hen
a sample of 5,263 persons would be used for each of the 19 te-,h of one [,1ctor at a t11ne. Be
caU'>L' the control package was already be111g sent to 400,()(J() people, the group of S,26 ~
people would receive a mailing of the control with one factor changed. The two -,ample
proportions would then be compMed to determine thl' effeLl of Lh,111gi11g th,1t one L1ctm.
Us111g Minitab, we <.:Jkulatcd the power of '>Lich a tc-,t u-,1ng the a">Sumptions just de
scribed-a 5% significance level and an average respome r<tll' of 1%. !"he st<tti'>tical power
(detection probabilit:) is 0.32 for a Lh,mgc to 1.210 and 0.29 Im a ch.111gc to 0.8"'i1, comp.ired
to O.K6 and 0.92 (rcspell1vely) fr1r the Plackctt-Hurman dc-,1g11 c,dculated previou-,ly. r\g,11n
using ,\I initab, we found that to obtain the same statistic1l power a-, the Plackett Burman
design would require a sample size ot about 25,000 for each of the 19 tests of one faLtor at a
time; this would yield .i total sample size ol 4 75,000 people, ,111 inue<tse of 375,000 persons .

Initial Analysb of the Result~


The estimated effects, which arc differences between average rc-,ponses .tl the plus and
minus levels of the factor columns, are shown in l'igure 11.9. l. In the ligurc, dkcts are or
dered from the largest (at top) to the -,mallest (at bottom) in knm ol their <tbsolute 1alues.

r
CASE

26S

The sign of each effect shows which level is hetter: for positive effects, the"+" level increases
respome; for negative effects, the" - "level decreases response.
Significance of the effects was determined by comparing the estimated effects with their
standard errors. The result of each experimental run is the proportion of customers who respond to the offer. Each proportion is an average of n = 5,000 individual binary responses;
its standard deviation is given by a = ~(1 - 7r)ln, where 7T is the underlying true proportion. Each estimated effect is the difference of two averages of N/2 = J0 such proportions. Hence its standard deviation is
Std!Jcv(effect)

2-;{l _- ;) -

2 7r(l ---;)

~ J7T(l
= \14/N -

- 7T)

N
n
n
J N n
Replacing the unknown proportion 7T by the overall success proportion (averaged n1Tr all
runsandsamples),p = (#Purchascs)!(nN) = 1,298/100,000 = 0.01298,leadstothestandard error of an estimated effect,
- ,0.01298)(0.98702)
Std Frror( effect) - V4/20
- - -0.00072
5,000
The standard error i' 0.072 if effects nre expressed in percentage terms. Significance ( di the
5% level ) is determined hy comparing the estimated effect with 1.% times its standard er ror, :+- !.%(0.072) - :+:: 0.141. The dilshed line in Figure A9.J sepilrates significant and insign ifie ant effects.
The following five faLtnrs hc1d a significant effect on the response rate.
S- nr I ow lntrrcst l?ate

Increasing the credit -card interest rate reduces the response hy 0.864 percentage points.
Jn addition, it vv;isvcrvcicM based on the firm's financial models that the gain from the higher
rate would be much less than the loss due to the decrease in the numher of customers.
G- or Sticker
The sticker((; - ) increases the response by 0.556 percentage points, resulting in
much greater thdn the cost of the sticker.

<l

g,1in

R- or No Second Buckslip
A main effect interpretation shows that adding another buckslip reduces the number of
buyers by 0.304 percentage points. One explanation offered for this surprising result was
that the buckslip added unnecessary information and obscured the simple "buy now" offer.
A more compelling explanation, which we discuss in the next section, is that the significant
effect is due not to the main effect of factor R but rather to an interaction between two other
factor'>.
I+ or Generic Copy Messaf(e

The targeted message (I - ) emphasized that a person could choose a credit card design
that reflected his or her interests, while the generic message (I+) focused on the value of the
offer. The creative team was certilin that appealing to a person's interests would increase the
response, but they were wrong. The generic message increased the response by 0.296 percentage points.

j- or Letter Headline# I

'I he result showed that all "good" headlines were not equal. The best wordillg 11lcreased
the res~mnse bv 0.192 percentage points.
The response rate from the 400,000 control mailings was 2.1 %, while the average re
some for the test was 1.298%. The predicted response rate for the implied hest -.trateg1,
starting with the overall average and adding half of each significant effect, amounted to
2.40%. This reresented a 15% predicted increase over the respo11'e rate of the "uJ11trol."
Further Analysis of the Results
!'he confounding of main effects and interactions introduces some unLertainty into our
interpretation ot the results. A straightforward approach for ohtaining unconfounded main
effcLh is a "foldovLr" of the original Plackett Burman design. Jn sucl1 ,1 foldovcr design, thL'
20-run Plackett-Burman design would be augmented by an additional 20 runs in which the
sigm of each of the 19 design columns arc switched. The combination of a Plackett-Burman
design and its complete foldover creates a design in which main dfrch ,1re no longer rnn
fou1ided with 2-fallor interactions. In our experiment, a foldover was not carried out (with
40 runs it would have greatly increased the operational compkxit1 ot the mailing), and we
cannot be certa111 which combinations of mJ111 effects and 111te11ct10m Jre responsible for
the significant estimates in 1-igure J\9.1.
The use of our Plackett-Burman design is supported bv rn1p1rrL,tl C\PL'rimcntal design
prrnuplcs. Effect sparsitv (Box and ~lever, 1986) means that the number of important t:I
fells rs typically small; hierarchic,tl ordering means that imp< rtant intcraLl1om are usually
k1HT 111 number, <Ind 'mailer in m,1gn1tudc, th,rn ni,un cfkLts '\\'u ,lfld I L1111,1d,1, 2tHHl . 111
add 1t1011, on the b<hlS ol effect he red it v ( 11 a m,1da ,u1d \\'u, 1':ILJ2 ) the p1111u pie that srn 1I1
cant 111teractions arc likely to inrnlvc factor-, with sig11if1L,lllt main cffl'Lt., it ts po-,sd1le 111
solllL' lircumst,111Les to 1dentifv likely 2 faLto1 111teraLt1011-,.
I a Lt ors S ( i11tnc-,t rate) and(, (presence o! a sticker) ,ire bv f,1r the l,1rge'>l cf!ects 111 hg
urc .\':I. I. The rnrrdation between the main effect of R (scrnnd bucksl1p) and the SG 111teract1011 is 0.6. Hence, a significant SG interaction would bias the estimate of the main cf
feet of R by 0.6 times the value of the 1nteractron. rh1s suggcsb thc1t 11 may not he the main
effect of factor R th<ll is important, but the 2-factor interaction betwern Sand G. I his in terpretation is supported bv the prinLiple of cffeLl heredity, s111ce the main effects of'> and
G Ml' the most important factors. As one might expect, at the high interest rate the ctkLl of
having a sticker i-, small (a change from 0.776"1" to 0.956, "' implied h\ the re-,ulh 111
Table J\9.2); at the low interest rate, howcve1, the effect of ha\ 111g the '>l ILker "'mud1 l,1rge1
(a change from l.264<Y<> to 2.024;b). !'he sticker is most cffccti1c when the customer receive-,
a more attractive offer.
Box and Tyssedal ( 1996) showed that the 20-run Plackett-Burman design produces,
for any three factors, a complete factorial arrangement with some combinatiom. replic,1ted.
The cksign is said to ha1e "ro;ecti\ity" 3. In contrast, fract10'1,il faLlori,il de-,1g11s th,ll urn
found main effects with 2-factor 111tcractions, sud1 as the one -,hol\ll 111T<tblc1\':1.5, fail to
produce a complete factorial for some sets of three factors and hcnLc only have project ii
it} 2. \Ve use this projcctivity idea to provide more ev1dcm.c that the .1pp,1rcnt main effect
of U (second buckslip) is actually a consequence of the bias created by the SG interaction.

consider the th rec L!Ltor-. .\ ( ;, .rnd /( Oft he 20 runs in Table A9.2, there is at least one run
.it L'<ll h oft he e1gh t f~ll tor le1,l L nm h1n<1t1om oft hese th rec fdctors. In spcu f yi ng cac h l.Omhlll,1t ion, we let the first sign indic,Jte the level ofS, the seumd sign the level o(G, and the last
sign the level of/{, I here .ire four runs at each of the four combinations (
), ( t ' ) ,
(-r ~), (-+
) and one run at each of the remaining four comblllations. Because we
hal'e at le.1st one re-.ponsc at L,1d1 combin,ltion, we have a full factorial arrangement 111 factors S, c;, and R (ignoring the other factors). Because the number of runs at each combination is not the same, we must use regression to estimate the effects. Domg so, we find that
the three significant effects arc S, c;, and sc;, confirming that it is the SG mteraction and not
the main effect of/? that is -.1gnJficant.
Table A9.4(a) shows the results when regressing the response rate on the main and interaction effects of the three faLtors S, c;, and R. The standard errors of the estimated regre-.,ion coeffic1enh use the pooled variance from the eight factor-level combinations, assuming that the other factors have no effect on the response. The I-ratios and the probability
'f" . ,nr~ A9.4
lfrgrn.,1011 lfrrnlt., for .\loilcl.1 l<clat111x the f<cspomc Rf/Ir to
/-11cton \ 1 /11tcrest Rate).(, 1.\t1ckcr), f< 1.\econd Fluckslip ).
I 1<'opr .\ fr.<sagc ), 11 nd / {/ cttcr f lead/int)
(.,) Rr<.Rr-ssrn--; or

IU-SPO"<Sl' RATJ. ON s,

c,,

R, ANO H!~IR

J'il l ' R\C I 10'\S

R.llc

I Q,

Predictor
Constant

G
R
SG
SR
GR
SGR

l{.ilc

I "JX

Predictor
Constant

G
SG

(IU20)C:
i004'i)S<;/i; JI -

1Il.Hln1\

O.Oi6)UI

Coefficients
l. 32 5
-0.386
0.320
0.061
0.151
-0.070
0.076
0.045

11

I 12 '>

R.ire

I 2'l8

(!l.-l l2)S

StdError
0.066
0.066
0.066
0.066
0.066
0.066
0.066
0.066

t-ratio
20.07
-5.85
-4.85
-0.93
2.29
-1. 06
1.16
0.68

0.2 '81(,, 10.IHH).'>(,; II

Coefficients
1.298
0.432
-0.278
0.188

(<) HIC.Hl'.SSIO'\ Or RFSPONSL

(0.061 )ii ,. 0 I 51 {S(,')


0.902

StdError
0.052
0.052
0.052
0.052

(0.070)S/I ..-

P-value
0.000
0.000
0.000
0. 372
0.041
0. 310
0. 271
(I, 508

0.872

t-ratio
24.75
-8.24
-5.30
3.58

P-value
0.000
0.000
0.000
0.002

RAT~ ON S, G, SG, I, AND J

(0.27R)c, (O 151)\C

+ (0.118)/

(0.066)/;

0.921

Predictor
Constant

G
SG
I

Coefficients
l. 298
-0.432
-0.278
0.151
0.118
-0 .066

StdError
0.044
0.044
0.044
0.046
0.045
0.045

t-ratio
29.46
-9.80
-6. 31
3.29
2.62
-1. 46

P-value
0.000
0.000
0.000
0.005
0.020
0.166

268

AP I' EN VIX

values of the regression coefficients listed in this table indicate that 5, G, and SG are significant whereas all other effects (including the main effect of factor R) are insignificant.
Table A9.4(b) lists the results of the regression on the significant effects S, G, and SG. The
regression explains 87.2% of the variability in the response rate.
C:heng (1995) showed that in the 20-run Plackett-Burman design, for any four factors,
estimates of the four main effects and the six 2-factor interactions involving these four fac tors can be obtained when their higher-order (3- and 4-factoi) interactions are assumed to
be negligible. Having eliminated factor R, we apply Cheng's finding and consider a model
that includes the four factors that were significant in our initi.il main effects analysis: S, G,
I, and;, together with their six 2-factur interactions. The result of this regression shows that
all 2-Cactor interactions except SG arc insignificant, leading to J model with the four main
effects and the SG interaction. The fitting results for the modd with S, G, SG, and the two

lllain effects of I and j are shown in Table A9.4( c). These five effects explain 92. J % of the
variation, a rather modest improvement over the 87.21b that is explained by S, G, and SG.
Jt is clear that factors S (interest rate) and G (sticker) and their interaction SG arc the main
drivers of the response rate.
A FOLLOW-UP EXPERIMENT

Full Factorial Design in Four Factors


ln light of the positive Plackett-Burm<Jn lest results, thl' Lhief marketmg excrntivc
wanted to continue testing. Since the long-term interest rate was such an important factor
in the first test, he decided to focus un a smaller test of'just interest rates ;ind fees. In the first
test, the introductory interest rate was fixed. Now, he wanted to test changes in both introduLtury and lrn1g-term rates as well as the effects of adding an account-opening fee and
[owning the annual fee. The four factors are shown in Table A9.5. Although the accuuntopening fee was likely to reduce response, one manager thought the fee would give an im pression of exclusivity that would mitigate the magnitude of the response decline. The team
also wanted once again to test the effect of a small increase in the long-term interest rate. At
the sdme time they wanted to test the effect of two alternative initial interest rates, both
lower than the long-term rate.
Each of the factors affected the cost to the customer, so it was expected that 2-factor
interactions might well exist. In order to study these interactions along with all main effecb, the authors recommended a ful1 -factoriJI design. The 111Mketi11g team used columns

A- lJ of the test matrix in Table J\9.6 to create the 16 mJil pJckages. The + / - combina tiom in the 11 interaction (product) columns are used solely for the statistical analysis of
the results. All pairs uf columns in Table A9.b arc orthugonal. All 15 effects (4 main t'I!'ABLE

A9.5

Facturs und Their Low anJ Hiy,h Levels


I aclur

in

the J-ullow-11p t.'xpcrirnent

) Cu11trul

t )

,-..;cw Jde.i

A Annual fee
13 Account -open1ng fee

Curre11t

Lu wet

No

Ye,

Jn1l1al 1111eresl rate


Lung Lenn intere::it

Current

l .owt'r

Lo\v

I l1gli

j)

ralL'

CASE 9

269

TARLL A 9. 6
Results of the Follow-up Fxperiment
1J

::;
::r'.

'J

'-'
-'-

er,
<=
<=
v

"J

"J

..,
:J

01.

~
~

<=

:J

- ~

1n1
(ell

"'
::;
::r'.

;.:

1J

~
lntcrac1inns

fl

/)

\Ii

\("

/\ [)

/Jr'

1in

(;/)

~Ji('

/\fl})

/\(})

+
+

.l

i-

+
H

IHJ
J\K
J6H
1r

184
2)2
162
172
187
2)1
174

,_

/iC:f) A fir'[) Ordc"

J()

11
12

HO
I 72
219
15.1
152

J\

14
I'>
16

"'

+
+

+
+

+
+
---~

+
-

Response
R,ite
2.4)%
1.36%
2.16%
2.29%
2.491<1
3.39%
2.32%
2.41%
1.84%
2.24%
1.69%
1.87%
2.29%
2.92%
2.04%

2.03%
----

fects and 11 interactions) can be analyzed independently, and none of these effects are
confounded.
The Results
Each of the N
16 test cells was mailed ton= 7,500 potential customers. A total of2,837
customers, or I00(2,837) /( 16)(7500) = 2.364%, responded to the offer and placed an order. Main and interaction effects were calculated by applying the plus and minus signs to
the response column and dividing the weighted sum by N/2 = 8. The results are shown in
Figure /\9.2. Stand;ircl errors of the effects (expressed in percentage changes) are ohL1incd
hy substituting['
1).()2364 into

,/

{P(I-/>)

Sid /:rror( effect) = I 00 V 4/ N \) -

---;; -

0.0877

Lffects ouhide ~ 1.96(0.0877) = :!:::0.172 arc statistically significant at the 5% level.


As shown in hgurc 1\9.2, all four main effects as well as one or two (the AB and the C/J)
interactions arc significant. Note that the CD interaction is just slightly smaller than 1.96
times the standard error.
8- or No Account-Opening Fee

Although one m;inagcr had thought that charging an initial fee would give the impression of exclusivity, this fee had the largest negative effect, reducing the response rate by O.'i 18
percentage points.

270

APPENDIX

ii: r\ccount-openrng fee

-~-----

U.518

0: Long term i.nterest rate -0.498


A: Annual fee +0.405

(Initial

1ntere>t

AIJ - - - - - - - - - - - -0.302
rate - - - - - - - - - - +-0.252

:Oignihcant dfech (above Imel


CD - - - - - +O. 158
AD ~l~llJ:p 0.108
HClJ l~'

IW

"m"'""'-..;;;:::r.

0.108
+ U. I 02

1\ IW

~Jlll:i?

AlJC

Z;"l;! -U.052

1 IJ.085

A HCD

-0.052

BC

-0 048

ACU

AC

+U.008

+-0.002

0.000

0.125

U.250

IJ.375

Fffect in percentage

Figure A9.2

-,-----,0.500

IJ.625

po111I>

/\fain and lntnaction bJfects: Follow-up Fxperimrnt

D- or Low Long-Term Interest Rate

Another attempt to slightly increase the interest rate showed, once again, that the longterm interest rate had to stay low. Raising the interest rate reduced response on average by
0.498 percentage points.

A+ or Lower Annual Fee


The annual fee was not charged until the end of the first year, but the fee was stated in
the mailing. It was not surprising that, as with the other charges, a lower fee was betterhere it increased the re>ponse by 0.405 percentage points.
C + or Lower initial Interest Rate
Reducing the introductory interest rate increased response by 0.252 percentage points.

The main effects are quite strong. However, the significant interactions (A/! and ClJ) imply I hat one needs to look at the effects ufA cllld l3 and of C and/) joint I}. The clic1gra111s in
hgure A9.3 show the nature of the interactions. The AB interaction supports both rnc1i11 cffi.:lb, but prnvide:i <1dditio11aJ important in>ighb. With an alrnu11t-oprning fee (Ii t ), the
lower annual fee results tn only <I small increase in response from 2.05% to 2.16%, but with
no acLount-opening fee (B- ), a lower annual fre results in a large increase in response from
2.27% to 2.98%. The estimated response of 2.98% is highe>t for the combination A+ H ,
the lower annual fee and no account-opening fee. The AU interaction expresses that A+ and
B- together increase the response rate beyond what can be expected by either of the two
factor' separately. This may result from positive synergies or may be due to the negative impact of the account-opening fee, which for some customers may cause an immediate rejection of the offer. The nature of this 2-foctor interaction provides extremely valuable in for-

CASE

271

r:n interaction

A fl intcr,1c ion
1.2~

:u s

2.7S

2.75

-------------- ~
2. 2~1

I.Ti

2.h

,---

---,

!\ - :Current

A
i\ n llllil I

f-:

l.7'i

Lower

f('C

No Jctn11111 -n rening fee

fl' "'"

Figure A9.3

- ,- - -

C-: Current

T--

C : I ower
Initial interest rate

/)

: I.ow long-term rntc

n -.

I l1gh long-term ,.,11,

ln1crac1io11 !'lots: Fnllnw-ur Experiment

m,1t1011. L\1ng it<- lin,rnci,il modeb, the u1111pany found that the increase in response 1L-,ulting from no account-opening fee and a lower annual fee (A+B ) was much greater th;in
the lms in revenue that would result from eliminating these fees.
The Cf) interaction shows that when the long-term rate is low (D - ), the effect of;1 lower
initial rate is small and not ,-,tatistically significant (a change in response from 2.57'Yo to
2.66/ri). It is clear that offering the lower initial rate would not be profitable if the lower
long-term rate were also offered. However, if the long-term rate is high (D+ ), then the
lower initial rnte has a large impact, with the response changing from 1.91 % to 2.321ii. The
interaction shows that, for persons receiving both lower rates, the increase in response is
considerably less than the sum of the two main effects. This customer behavior is consistent
with the concave value function used by Thaler ( 1985) and based on the earlier work of
Kahneman and Tversky (1979). In contrast to the main effects that suggest both interest .
rates should be low, these results followed by additional analysis using the company's financial models showed that a lower long-term rate coupled with the current (higher) initial rate
was the most profitable.
FINAL COMMENTS

After these two ma iii ngs-one with a 19-factor Plackett-Burman screening test and the other
with the 4-fac1or full-factorial follow-up test-the marketing team learned more than they
had ever before when using the simple technique of testing one variable at a time. The specific
findings of these experiments led to immediate and substantial improvements: increa.,cd response rates, lower costs, and higher profits. But the longer-term benefits have been even
more substzintial. This study introduced the company to the use of formal experimental design methods. Since then th~ firm has continued to experiment, incre:1sing the speLd and
profitability of its testing programs and becoming a leader in the applicat;on of these tools to
direct marketing. Testing has given the company the ability to quickly prove what sells <llld to
greatly improve its performance in the highly competitive financial services marketplace.

272

APPENDIX

Although the focus of this case has been

011

direct market11g, the potential appliLalions

of cxperiment.il design approaches are widespread. Website design, 011-line 3Jvert1s111g,


telemarketing, catalog design, and retail tests arc fertile areas for multi variable experimenb.
As marketing applications of large fraction3J factorial and Plackett !3urman dcsigm ,m:
more widely disseminated, the real-world use of these powerfJ.I techniques 'ihould become
more commonplace.
QUESTIONS

Exercise 2 in Chapter 6

274

A I'" F N

!)

rnenl: a medium-size store with a loyal, varied, and stable dientelc. !-'our products were
sLudied: Camay soap (bath size), White House apple juice (32 oz), Mahatma rice ( l lb), and
Piggly Wiggly frozen pie shells. Sales of these products were stable withoul Lrrnds, and they
exhibited limited seasonality.
A complete factorial experiment was carried out. With three price levels, three display
oplions, and two advertising options, the design called for 18 factor-level combinations
(treatments). Since the design was replicated once, 36 weeks were needed. Furthermore,
each week was preceded and followed by a base week (which is a week where all four products are priced al regular price, displayed at normal shelf position, and not advertised). For
such a time arrangement and because holiday weeks were not used, the experiment spaJJneJ
roughly 80 consecutive weeks. The response was the number of unib sold between Wednesday noon and Sunday 9 p.m. of each experimental week.
Trend and seasonality were not considered sniuus l~1ctors because products were stable
with minimal seasonality. furthermore, prior studies showed that the customer flow varies
little throughout time.
The precise schedule of the treatment week:,, and a detailed discussilln of the necessary
prep.a rations and logistic problems that arc associated with running such an experiment arc
given in Wilkinson el al. ( 1982).
THE DATA AND THE ANALYSIS

I !ere, we consider Mahatma rice and White House apple juile. The /\NOVA t<.1bles (for the
model with the three main effects, three 2-factor interactions, and the 3-factor inter<lction)
are shown below. We also list the cell dveragcs for the various f'actor-lcvcl Lolllbinatiom.
!\NU\ 1\ for \ \'h11e I luirn' Apple f u1Lc

ANOVAj(n /vlalwtma Rice

~u111

SuurLt'

Ma111 ,llects

J\1ttl!l t:f!eL l:-J


4,376
7,430
900

Prill'

llispla)'
Adwrtis1ng
2-JaL!llr interactions

l'riLc x Display
PricL' X Advert

14~

.\,98 \

2(>1

1,070
tS,395

428

x Advert
~

..j

Ad,en

208

J fouur 111ll'raLl1u11
p x /) '< ;\
l:rror
'fotal

1,440
9,861

U1>play

99

l:rro1
TotJI

2,X4_l

IJ1>pla'
AJvl'rtt-.111g

PriLL'

3-L.lLl11r intcrnction

Price

2 LiLtur 111tcr1.1\.t1u11..,
!'rice x Displc1v

1,068
107

Llisf>l.1y X Advert
I''< J)

ol Squar,s

624

Average Unit Sales for Treatments: Mahatma Rice


NO ADV f/U l!>I Ne;

Regular
Pricl'
Regular display
lxpa11ded display
S,u.11 d1slay

Reduced
Pricl'

A!JVl:HTISJNl1

Cost

Regular

Reduced

Cost

Pr1Le

l 1 r1Ll'

Pr11.1..

!'!ILL'

j/ .5
51.5
7 \.0

28.0

J'i.5

32.5

21.0
38.0

42.0
6ll.O

46.U

22.J
)2 ;

76.:J

>\.()

~.l.lJ

:;=i.,
IUl.ll

CASE 10

275

;\vrr1if;C U111t Sales/in Treatments: White House Apple Juice


'\,'()

Rcg11l,ir

ADV FR rl'ilN(;

l'rrcc

Rcd11tcd
Price

c:nst
Price

Regular
Price

Rcd11ced
Price

Cnq
I' rice

11.'i
11.0
61.0

26.'i
22. 'i
38.0

41i.O
44.'i
'ii .5

.lB.O
.\l 'i

1Zcg11IM dl'J1l,1,

I 'Jn

2 '5

I 'l'"'Hkd dl'pi.11

'.fi.O

lh ()

~f1l'li,il

17.[J

6:i.O

,Jr,pl,lV

ADVl'RTJS!NG

7k.O

COMMENT

In Sec lion 7.2, we di'>cusscd lhe analysis of ,1 general 2-(actor factorial experiment. tlcrc we
li1cc ,1 J-(octor lactnri,d cxpcr1mcnt. However, extending the A NOVA table to this situation
is straightforward. Now we have a total of nbcn responses,

y,1k1

z= 1,2, ... ,n(factorA);j= 1,2, ... ,h(factorB);k= 1,2, ... ,c(factorC),


I - I, 2, ... , n (replications)

In our case, A - Price, 8 - I )isplay, C - Advertising, and a = 3, b = 3, c = 2, n = 2. For


each of the ahc factor-level combinations (groups), we obtain the sum of the squared deviations from the group mean. The sum of these sums of squares across the abc groups is the
error sum of squares, and it has ahc(n - 1) degrees of freedom. The other sum of squares
entries in the ANOVA table arc for (i) main effects of A, B, and C with a - I, h
I, and
c - I degrees of freedom; (i1) 2-factor interactions: AH, AC, and BC with (a
l )(/!
I),
(n - I )(c
l ), and (h - I )(r - l) degrees of freedom; and (iii) the 3-foctor interaction
All(' with (n - l )(/J
I )(r
I) degrees of freedom. The significance of each effect is tested
through ,rn l -1atio th,1t compares the mean square of the effect to the mean square error.
The degrees of freedom in the nu mcrator and denominator become the degrees of freedom
of thL' /-d1strihution th.it is used to te~t the significance. Remember to alw;iys test the highest-order interaction first, because lower-order interactions and main effects make sense
only if this interaction is negligible.
Standard statistical softwa1c will give you the A NOVA table. For example, you can 11se the
Minitah command "Stat> ANOVA > Ceneral Linear Mode l. "
QUEST I ONS

Exercise 2 in Chaplcr 7

CASE 11

277

TARLF.All.l
Test and Control Markets

Northca.,I
Midwest
Southwest
Southeast

Test Market

Control Market

Binghamton, NY
Rockford, IL
Albuquerque, NM
Chattanooga, TN

Utica-Rome, NY
Fort Wayne, JN
El Paso, TX
Montgomery, AL

TARLE Al I .2
Test Drsign
r ! ..,-1 MAH!..: r1

R111g
rirrn:

h11m1on

l11h -:
Oct 72
Nn\' Ian 7'
leh Apr 71

LON 1H'11

)(ock
rord

Albuqucrque

Chat tanongd

\l.11

fl

("

1\11g

fl

/)

/J

/J

/l

l!ticaRome

rort
\Vavne

VIAHKFT

,,

A
A
A
A

1'vlo11tgomcry

I I Paso
A
A
A
A

A
A
A

A
A

TARI.FAIJ.1

Results
TFST MARKETS

Time l'erincl
Mny Jul
Aug- Oct
Nov Jan
I-ch-Apr

1972
I972
1971
1973

llingh<lrnlon

Rockrord

Alhuqucrq11e

ChattancH1ga

7,:1110 (A)

11,258 (B)
13,147 ([))

11,800 (C)
11,852 IA)

7,77(:, (I))
8,501 ((")

13, 153 (A)


13,880 ((")

ll,450(f))
12,089 ( R)

7,557

7,JM (B)
8,0'19 (C)
9,010 (/))

r:ONTROI

Time Period
Mav Jul
1\ug Oct
Nov-Jan
I-ch-Apr

1972
1972
1973
1973

l r11c.1-Rome

7,900 (fl)

IA J

MAllKFIS (AOVERTISIN<i I FVFI A)

f-ort Wayne

El Paso

Montgomery

7,166

10,970

11,706

7.411

7,489
7,679
8,536

12,718

11,495
11.753
12,008

8,2-,0

12,902
13,826
--- --------

7,853

7,768

ANALYSIS OF THE DATA


Part I (Test Markets)

The 16 test-market resp)nses in the Latin square allow for the estimation of the main
effects of the three factors: location, time, and advertising. We arc most interested in the
effect of advertising, after adjusting the analysis for possib le location 2nd time cffcch. The
ANOVA shown below indicates that there is no strong evidence for an advertising effect.
The /-statistic for testing thL' significance of an advertising effect is 639, 139/330,445 ~ 1.93;
it<; J1l"llhahil1tv value I" P(.\ n) > 1.931 = 0.225 is larger than the significance level 0.05.
Time is aloo not significant (F = 2.42, with probability value 0.164). The only sig11ifiL.111t
factor is location, with Rockford and Albuquerque having considerably higher cheese sales.

nx__l

APPl ' ="'DJX

A~OVA

p
MS
F
Source OF
SS
Advert 3 1917416
639139 1. 93 0. 225
Cities 3 79308210 26436070 80.00 0.000
799400 2.42 0.164
rime
3 2398201
330445
Error
6 1982671
1ota1 15 85606498

Advertising
1 (0 cents) 9981 287.4
2 (3 cents) 9653 287.4
3 (6 cents) 10558 287.4
4 (9 cents) 10346 287.4
Cities
1 (Binghamton)
7946 287.4
2 (Rockford)
12859 287.4
3 (Albuquerque) 11798 287.4
4 (Chattanooga) 7934 287.4
Time
1 (May-Jul 72) 9549 287.4
2 (Aug-Oct 72) 10216 287.4
3 (Nov-Jan 73) 10138 287.4
4 (Feb-Apr 73) 10634 287.4
Part 2 (Control Markets)

rhe 16 control market respomes (<ill under zero cent ,1dverfoing) origin.1te from a fac
torial C\periment with two factors: location and time. The J\\:0\'A tahlc allows us lo test
whcthn therL' Jrl' time and location elfi.Lh. I line i-, '>OlllL' ind1c.1t111n lor ,1 t111lL' L'ilLct, hut
the ev1drnce JS weak (probability value 0.075 ); the 101..alion clfrll i-. \cry '>ignifJL,Jlll, with
lort \\d)'lle and Fl Paso having rnnside1"c1bly higher cheese de-.. I hL' \\eak t1111e elfell and
the -,ignilicant location effect confirm the tinding'> of l'arl I.
J<e>ults fur ( 0111rol A/wk.et> ( hutunu/ / Jc.<1~11)

F
p
Source OF
SS
MS
Cities 3 78938086 26312695 85.41 0.000
Time
3 2985501
995167 3.23 0.075
Error
9 2772578
308064
rotal 15 84696166
Cities
1 (Utica-Rome)
2 (Fort Wayne)
3 (El Paso)
4 (Montgomery)
Time
1 (May-Jul 72)
2 (Aug-Oct 72)
3 (Nov-Jan 73)
4 (Feb-Apr 73)

7718
12604
11741
7828

277. 5
277. 5
277. 5
277. 5

9321 277. 5
9988 277. 5
10047 277. 5
10535 277. 5

CASE 11

270

Part 3 (Combining Test and Control Markets)

We combine the observations from the test and control markets and fit a regression
model that includes variables for the four levels of advertising, the eight different locations,
and the four time periods. The ANOVA table for this model is shown below. This is no
longer an orthogonal design, because the different factor-level combinations do not have
the same number of runs; for example, there arc no observations for control cities and advertising at levels R-D. A consequence of a nonorthogonal design is that sequential and adjusted sums of squares are no longer the same. We are interested in adjusted sums of squares
because they tell us about the regression contribution of each factor, on top of all other factors that are part of the analysis. We find that there is not a huge benefit to increased advertising. The test statistic F =- 2.45 is not significant at the 0.05 level. There is evidence for a
time effect, with salc'i increasing linearly with time. The effect of location is quite strong,
with higher sales for Rockford, Albuquerque, Fort Wayne, and El Paso. The main effects
plots in Figure A I I. I illustrate these relationships graphically.
Ii C.<ii I Is fin Trst and Cnntrol Markets (Combined Analysis)
t\ NOVI\

p
Source DF
Seq SS
Adj SS
Adj MS
F
2126193
1917416
Advert 3
639139 2.40 0.101
Cities 7 158246501 158246501 22606643 84.94 0.000
5348522 1782841 6.70 0.003
Time
3
5348522
Error 18
4790430
4790430
266135
Total 31 170511645

Advertising
1 (0 cents) 9977 144.2
2 (3 cents) 9649 295.5
3 (6 cents) 10554 295.5
4 (9 cents) 10342 295.5
Cities
1 (Binghamton)
7946 257.9
2 (Rockford)
12860 257.9
3 (Albuquerque) 11798 257.9
4 (Chattanooga) 7933 257.9
5 (Utica-Rome)
7871 341. 2
6 (Fort Wayne) 12758 341. 2
7 (El Paso)
11894 341. 2
8 (Montgomery)
7982 341. 2
Time
1 (May-Jul
2 (Aug-Oct
3 (Nov-Jan
4 (Feb-Apr

QUESTIONS

Exercise J in Chapter 7

72) 9511 213. 9


72) 10179 213. 9

73) 10169 213. 9


73) 10661 213. 9

APPE~JDIX

282

TAllLl

Al2.l

The 16 Runs of the 22 41 Design, and Its Half Fraction


FACTOR

Message

Promotion

RuJC Included
in Fraction?

Price

-I

Yes

-I

Yes

-J
-I

Yes
Ye..,

-!
-J
I

Yes

A ll L IC

A I 2.2

Co11stnitlion of the I illl(Frnctwn of the lvl1xt:,/ .!I' L>rngn


fJro1notH>n

12

J\

123

2.l

-I
-[

-I

11.1 11

>1

-I

-I

k'.:

-1

-1

v.

-I

-I

y~ ~

l!.'llJ

I
-I

Y,

11.U I

-I

-I
I

f{l'-.pon_..,(.'

Price

-I
-I

y"
y,

-I

y,

U.lN
~

IJ.l \

O.IJ11
~

IJ.lll

11.U.

belcd 13 and 23 could have been used for two additional 2-level factors, resulting in an orthogonal fraction of a 5-factor 2 4 4 1 factorial design.
The design for the factors Message, Promotion, and Price in Table A 12.2 is balanced and
orthogonal. The design is balanced as the factor levels of each factor occur in the same number of runs. It is orthogonal as the foctur-level combinations fur each p.1ir of factors appear
in the same number of runs. Because of orthogonality, the main dlecb of the three factors
can be estimated independently of each other. The effect of message can be obtained by averaging over the other two factors; the same is true for the efieLb of promotion and price.
The orthogonal half-fraction was carried out and the results (proportions of sampled individuals responding to the offer) are shown in the last column ofTahlt' Al2.2. The estimated main effects arc

Message: Ave( Message at - l)

= 0.095; Ave( Message at +I) = 0.155;

Main effect of Message ~ 0.155 - 0.095 = 0.06

Promotion: Ave(Promution at

l) = 0.075; Ave( Promotion at

Main effect of Promotion

~ 0.

175

U.075 - 0. l 0

+ I)

0.175;

l.ASF 12

Price: Ave( Price at

I)~

283

0.27; Ave(Price at 2) = 0.11;

Ave( Price at 3) = 0.08; Ave( Price at 4) = 0.04;


The m,1i11 effects of :-,,1ess.1gc, Promotion, and Price Jre confounded by two factor interactions. The interaction between the two 2-level factors, Message and Promotion, docs not
affect the main effects of Message and Promotion, but it does confound the main effect of
Price.
Price is a continuous factor with 4 levels ($150, $160, $170, $180), and it makes sense to
partition its effect into three orthogonal components: a linear, a quadriltic, and a cubic
component. We can express the main effects of the three factors through a regression oft he
response on the six columns listed in Table A 12.3.
The second and third columns reflect the levels of Message and Promotion; the next
three columns representing the linear, quadratic, and cubic components of price (sec Appendix 7.1 ).
The regression output is shown in Table A 12.4. The regression coefficients for Message
and Promotion arc one -half of the main effects that have been listed previously. The standard errors of the regression es ti mates in Table /\ 12.4 pool the effects of the two unused columns ( 1.\ ,111cl 2.\ ) in Table A 12.2. The type of promotion and the price matter most. There
TAlll.E

/~l'pnmm

At2.3

illrn111/at1011 ll( I he /Io/( Fraction of the 22 4' I )csifin

HI f,RFS'-.OH ( '. OJ . l 1 .l\.1('.;...,

Constant

Message

Prom0tion

Price(lin )

Price( qua)

- 3

- I

Price( cu hie )

Response

- 1
3
3

y, = 0.14
y, - 0.09
y, ~ 0. 13
y., .0.40

- I
I

3
3

I
I

-3

TARI.~

)';

O.Ot

y, = 0.06
y, O. to
y,
0.07

A12.4

Regression ( !utput nf tlic Ma111 lffccts Model with Orthogonal Trend Components

The regression equation is


Response = 0.12S + 0.0300 Message+ 0.0500 Promotion - 0.0360 Pricelin
+ 0.0300 PriceQua - 0.0070 PriceCub
Predictor
Coef SE Coef
T
P
Constant
0.12500 0.02500
5.00 0.038
Message
0.03000 0.02500
1.20 0.353
Promotion
0.05000 0.02500
2.00 0.184
Pricelin
-0.03600 0.01118 -3.22 0.084
PriceQua
0.03000 0.02500
1.20 0.353
PriceCub
-0.00700 0.01118 -0.63 0.595
R-Sq(adj)
64.4%
89.8%
s = 0.0707107 R- Sq
Analysis of Variance
p
MS
F
DF
SS
Source
Regression
5 0.088200 0.017640 3. 5 3 0.235
Residual Error
2 0. 010000 0.005000
7
Total
0.098200
--

284

Al' p EN 0 J x

is a .strong linear component to the price etlect. This pattern

W<b

seen earlier in the averages,

which decreased with increasing price. Of course, these results rnme from a very small
study, and they should be confirmed by additional experiments.
EXAMPLE 2

In a '>econd example, Almquist and Wyner discussed the launch of a cre<llive arts and act iv
ities Internet portal for Crayola, the maker of colored markers and crayon;,. The goal was lo
design a letter marketing campaign that attracts target customers to the site and cunverh
browsers into buyers. In their letter to potential customers, Crayola varied several levels of
the following 5 factors: ( l) two different subject lines; (2) three salutations; (3) two calls to

ae1io11; (4) three promotions; and (SJ two different closings./\ tull 2 1J 1 factorial des1g11 that
includes all possible factor-level combinations requires 72 different letters. Constructing
. and sending each one of72 letters to
monitoring their performance is

<I

reasonably large sample of potential customers and

challenging task. It wuulo be prd(:rnblc to reduce the

number of differen1 letters by comidering suitably chosen fra-.:tium. The discussion of fractional experiments in Chapter 5 has shown that while fractions of L1cturial cxperimenb
coni\rnnd effects, they can provide much useful information <1bout the importance

llf

the

studied factors.
It is straightforward to construct an orthogonal half-fraction of 36 runs by combining
the 9 runs in the full 3 2 with a half fraction 23

for the 2-levcl factors. Such a design con -

founds the main and interaction effects of the 2-level factors, but it docs nut confound the
main effects of the 3-level factors. However, 36 runs may still he too many, and one ma)'
want to look for designs with fewer runs. Almquist and Wyner mention running a I 6-run
design, but they do not specify how they selected these runs. A balanced and orthogonal design in 16 runs is not possible. The 16 runs cannot be divided evenly among the 3 levels;
hence, the design cannot be balanced. furthermore, there is no arrangement that achieves
the same number of runs at all factor-level combinations of each pair of 1:1ctors; hence the
design cannot be orthogonal.
The design software JMP was used to obtain the l 6-run 1J-uptin111/ design in Table/\ 12.5.
/\ D-optimal design minimizes the dctenni11,111t of the covari,111ce 111atri.\ ufthc main -elfl'cb
estimc1tes; it maximizes the precision of the parameter estimate~. Nute that this de.s ign is
quite Llose to being an orthogonal design.
QUESTIONS

290

RF.rEltE_N_c_F_~s_- - - - - - - - - - - - -

CHAPTER

Fisher, Ronald A.: The Design of Experiments. Edinburgh: Oliver & Boyd, 1935 (and various later editions).
Len th, R. V.: "Quick and easy analysis of unreplicated factorials." 'frchnumetric:., Vol. 31
( 1989), 469-473.
Montgomery, D. C.: Introduction tu Statistical (Juulity Control (3rd ed.J. New York: Wiley,
19%.
Yin,(,,!.., and Jillie, D. W.: "Orthogonal design for process opti111iLatio11 and ib applica
ti on in plasma etching." Solid State Technology (May 1987 ), 127 - 132.
CHAPTER

Abrahc1m, 13., and Ledoltcr, J.: Introduction to U.egre:.s/011 /\!loi/elu1g. lklmont, CJ\: Duxbury
lJrcss, 2006.
Eibl, S., Kess, U., and Pukelsheim, 1:.: "Achieving a target \'alLL !or a 11Jc111uL1Lluri11g pro
ccss: A case study," journal of<Jiwlity 'frcl11wlogy, Vol. 24 (I 992), 22 26.
Ledolter, )., and Swersey, A.: "Dorian Shainin's variables search procedure: J\ critical as sessment," fournul of Quality Technology, Vol. 29 (1997), 237-247.
CHAPTER

Abraham, B., and Ledolter, J.: introduction tu U.egression Modt'ling. lklmont, CA: Duxbury
Press, 2006.
Box, C. E. P., and Tysscdal, ).: "Projective prupcrties of certain orthogonal c11rays." 13/u mctrika, Vol. 83 (1996), 950-955.
Cheng, C. S.: "Some projection properties of orthogonal arrays." Annuls ufStatistics, Vol.
23 ( 1995), 1223-1233.
Draper, N. R.: "Plackett and Burman designs." Encyclopedia ofStatistirnl .Sciences. New
York: Wiley, 1985, 754-758.
Draper, N. R., and Smith, J J.: Applied 1?.egresswn A11ulys1s (2nd ed.). New York: Wiley, 1981.
Margolin, B. H.: "Orthogonal main -effect 2"3"' designs and tl<Yo-factor interaction aliasing." 'J'echnometrics, Vol. 10 ( 1968), 559-573.
Plackett, R. L., and Burman, j. P.: "The design of optirn um rn ultifactorial experiments ."
8io1netrika, Vol. 33 ( J 946), 305-325.
CHAPTER

Box, C. E. P., Hunter, William C., and Hunter, J. Stuart: S1.1t/,1ics/(1r l:\pcrimentas: /Jc sig11, lnnovatiu11. und Oiscuvery. New York: Wiley, J 978 (2nd L'd., 2005).
John, 1'. W. M.: St11tistirnl Methods in J:'ngim:ering und Quulity Assum nee. New York: Wiley,
il)L)()

Muntgurnery, D. C.: Urngn and Analysis oj J:'xpa/l/lents (6th ed.). New York: Wiley, 2005.
CHAPTER

!3ox., (,, E. P., and Draper, N. R.: Empirical Mudd llliildi11g w1d i<.L'spo11>e Surjiices. New
York: Wiley, 1987.

RF.fERENCES

291

John, P. W. M.: Statistical Design and Analysis of Experiments. New York: Macmillan, l 97 l.
Kuhfr:ld, W. F., and Tobias, R.

n.: "Large factorial

designs for product engineering and

marketing research applications." Technometrics, Vol. 47 (2005), 132- l 4 l.


Kuhfeld, W. F., Tobias, R. D., and Garratt, M.: "Efficient experimental design with marketing research applications." Journal of Marketing Research, Vol. 41 ( 1994 ), 545-557.
Meyer, R. K., and Nachtsheirn, C. ).: "The coordinate-exchange algorithm for constructing exact optimal experimental designs." Technomctrics, Vol. 37 ( 1995), 60-69.

APPENDIX

Case 4
Abraham, B., and l.edolter, ).: Introduction to Rcs;rrssion Modeling. Relmont, CA: Duxbury
Press, 2006.
Case 8
Barcl.i), \'V. D.: "fallor1.1l design in
Vol. 6 (19691, 427 - 429.

,1

pricing experiment." Journal of !'vtc?rkrting R.esc11rch,

Risgaard, S.: "Industrial use of statistically designed experiments: Case study rcferenLes
and '>Orne historical anecdotes." Quality fngincering, Vol. 4 (1992), 547-562.
Rox, C. E. P., Hunter, W. G., and Hunter, J. S.: Statistics for Experimenters. New York: Wiley, 1978 (2nd ed., 2005).
Rrown, W., and Tucker, W. T.: "The marketing center: Vanishing shelf space." Atlant11

Economic Review, Vol. 46 ( 1961 ), 9-13.


Rultez, A., (;ijshrechts, E., Naert, P., and Vanden Abeele, P.: "Asymmetric cannibalism in
retail assortments." Journal of Retailing, Vol. 65 (1989), 153-192.
Rulte7, A., and Naert, P.: "S.H.A.R.P.: Shelf allocation for retailer's profit." Marketing Sci-

ence, Vol. 7 ( 1988), 211-231.


Chcrfi, 7-., Bechard, 8., and Roudaoud, N .: "Case study: Color control in the automotive
industry." Quality tngineering, Vol. 15 (2002), 161-170.
Curhan, R. C.: "The effects of merchandising and temporary promotional activities on the
sales of fresh fruits and vegetables in surerrnarkets." Journal of Marketing Research,
Vol. 11 ( !974a), 286-294.
Curhan, R. C: "The relationship between shelf space and unit sales." Journal of Marketing

Research, Vol. 9 (l974b), 406-412.


Dreze, X., Hoch, S. J., and Purk, M. E.: "Shelf management and space elasticity." Journal of
Retailing, Vol. 70 (1994), 301-326.
Ettenson, R., and Wagner, J.: "Retail buyers' saleability judgments: A comparison of in formation use across three levels of experience." Journal of Retailing, Vol. 62 ( 1986), 41- 63.
Fisher, It A.: The /)esign o{Experiments (8th t>d.). New York: Hafner, 1966.

I lolland,

c:. W., and Craven~, n. W.: "Fractional factorial designs in marketing rese<nch."

journal of Marketing Research, Vol. l Cl ( 1973), 270-276.


/affc, I..}., Jamieson, L F., and Berger, P. D.: "lmract of comprehension, positioning, and
segmentation on ;1dvertising response." /011rna/ of Advertising Rcwarrh, Vol. 32 ( 1992),
24

_B.

Lin, T., and C:ha11ati<1, IL '\)uality improvement of an injection-molded product using


design of experiments: A c.ase study." Quality Engineering, Vol. 16 (2003), 99-104.

~ HFlFRF~'l(__
Fs_- - Milliman, R. E.: "Using background music to affect the behavior of supermarket shop-

pers." Journal of Marketing, Vol. 46 (1982), 86-9 l.


iv1ontgomery, D. C.: Introduction to Statistical Process Controt (5th ed.). New York: Wiley,
200..J.

R. 1., and Burman, J. P.: "!'he design of optimum multifactorial expenl11L'nh."


ll10metrika, \'ol. 33 ( 1946 ), 305-325.

Pl.id~l'tt,

~d1aub,

D. A., ,1nJ .'-1011tgomery, D. C:.: "Usmg experimental Jes1g11 to opt1m11e thL '>IL'

1eolithography process." Quality t::ngznccring, Vol. 9 ( 1997, S7'i

Sn\ ,1st<1va,

J., and

'i8S.

Lurie, N .: "Price matching guarantees as signals of low store prices:

~urvey and experimental evidence."

Journal of Retailing,\ 1)1. 80 (2004), 117-128.

\,\'ilk111son, ). B., Wason, J.B., and Paboy, C.H.: "Assessing tl1e impaLl of-,hort-term su
permarket strategy variables." Journal ofMarkct111g R.cmm h, Vol. 19 ( 1982 ), 72
Wu, (.

r. J., and I !Jmada, M.:

86.

Expemncnts: Plam11ng, A11alys1.. um/ Purmnetcr nesign

Uptzmization. New York: Wiley, 2000.


Young, J.C.: "Blocking, replication, and randomization-the key to effective expenmentallon: A case study." Quality l:nguzccrlllg, Vol. 9 ( 1996), 269 277.

Case 9
Barcl.1y, W. D.: "Factorial design in a pricing experiment." Journal of Marketing Research,
\'ol. 6 (1969), 427-429.
Barrrntine, l.. H.: "Illustration ofrnnfounding

Ill

Plackl'tt l~urm.111 deo,igns." Qualitv Pngi

ncenng, Vol. 9 ( 1996), 11-20.


Berger, P. D., and Maurer, R. E.: J:'xperimental Ueszgn with Appl1rn11cn1'

111 Alanugemrnt,
mid the Sczenccs. Helmont, CA: I )uxburv Preo,s 2002.
Box, ( ;_ E. P., Hunter, W. C., and Hunter, I. S.: )tat 1st1cs Jin Lxpcrzmentcr.'. :\ew York:

f-11.~zneertng

Wiley, 1978 (2nd ed., 2005).


Box, C. !:. P., and Meyer, R. D.: "Dispersion efkch from frallwnal des1gm." 'frchnomet

ncs, Vol. 28 ( 1986), 19 27.


Hox, C. F. P., and Tyssedal, ].: "ProjeLtive properties of certain orthogon,d arrayo,." Hio-

mctrika, Vol. 83 ( 1996), 950-955.


I-lradlow, E.T.: "Current issues and a wish list for conjornt analysio,." App/zed Stochastit
,\/oclels 1t1 Jfoszness u11d Industry, \'ol. 21 (2005), .11':1 .\2.\.
C:apk-,, J.: fested Advcrt1m1g Mct!tods (4th ed.). h1g!t:wood U!ff-,, 0-J: l'rentlLt'-1 Jail,
1':17-1.
( ar111one, 1-. )., <llld (.ieen, P. !:-.: ",'-lodl'l 1111sspeufiLatio11 1n 111ult1,1ttr1hutL' p.1rc1mclL'r L''>
t111l,1tio11." Jounwl oj .\tarketzng /frseurch, \'ol. 18 (1:-ehruan 1981 ). 87 Y \.
( heng, C. S.: "Sollle ProJeLl1011 Properties ol Urtl10go11JI Alla)>-" \11nc1/_, uj Stc1/i:ilio,
\'ol. 23 ( 1995 ), 1223 -1233.
( 'urhan, R. C.: "The effeLls of merchand1s1ng and temporary pron10t1011,d <1cllv1t1es on the
sales of fresh fruits and vegetables 1n supermJrkets." Journal of Marketing Research,
Vol. 11 (August 1974), 286-294.
Ettenson, R., and Wagner, j.: "Ret.iil buyers' sa!t:abtlity 1udgments: A co111parison of information use across three levels of experience." Journal ofRctailzng, Vol. 62 (~pring
1986), 41-63.
Green, P. !::., Carroll,]. D., and Carmone, F.

J.:

"Some new types of fr.1ct10nal factorial

REFERlNCES

291

designs for marketing experiments." Research in Marketing,). N. Sheth (ed.), Vol.


l I LJ78), 99-122.
Green, P. E., Krieger, A. M., and Wind, Y.: "Thirty years of conjoint analysis: Reflections
and prospects." 111te1jaces, Vol. 31, Issue 3 (May/June 2001), 556-573.
Crecn, P. L, and Srinivasan, V.: "Co njoint analysis in consumer research: Issues and
outlook." Journal of Consumer Research, Vol. 5 (September 1978), FJ3-123.
Green, P. E., and Srinivasan, V.: "Conjoint analysis in marketing: New developments with
implications for research and practice." Journal of Marketing, Vol. 54, No. I (1990),
3-19.
Hamada, M., and Wu, C. F. J.: "Ana lysis of designed experiments with complex aliasing."
Journal of Quality Technology, Vol. 24 ( 1992), 130-137.
I lolland, C:. W., and Cravens, D. W.: "Frac tional factorial designs in marketing research. "
Jormwl of Markct111g /~esc11rch, Vol. 10 ( 1973 ), 270-276.
Hopkins, C. C:.: Scientific Advertisinj;. New York: Lord & Thomas, 1923. (Reprinted by
~TC: Busincs.'> Rooks, Ch 1cago, 1966. )
Jaffe, L. J., Jamieson, L. F., and Berger, P. D.: "Impact of comprehension, positioning, <lnd
segmentation on advertising response." Journal of Advertising Resentch, Vol. 32 ( 1992 ),
2'1-.B.
Kahneman, D., and Tverskr. /\.:"Prospect theory: An analysis of decision under risk."
Fc1mo111ctrirn, Vol. 47, No 2 (1979), 263-292.
1--:uhfcld, \!\'.I., li1h1;1s, R. I>, and c;arratt, M.: "Efficient experimental design with m,H
ket1ng research ,1r1plic.itin11s." Journal of Marketing Research, Vol. 31 (Novemher ILJ94),
545-557.
Ledolter, )., and Burrill, C. W.: Statistical Quality Control: Strategics and Tools for Co11l1111111/ lmprovcment. New York: Wiley, 1999.
l.odish, L. M., Abraham, M. M., l.ivelsberger, )., Lubetkin, B., Richardson, B., and
Stcvem, M. E.: "A summary of fifty-five in-market experimental estimates of the longterm effect of TV advertismg." Marketing Science, Vol. 14, No. 3, part 2 of 2 (I 995a),
c; 133-C 140.
Lodish, L M., Ahraham, M. M., Livelsbcrger, )., Lubetkin, B., Richardson, R., and
Stevens, M. E.: " How TV advertising works: A meta-analysis of 389 real world split
cahle TV advertising experiments." Journal of Marketing Research, Vol. 32, No. 2
(1995b), 125-139.
Ogilvy, D.: Ogilvy on Advertising. New York: Random House, 1983.
Plackett, R. L., and Burman,/. P.: "The design of optimum multifactorial experiments."
Riometrika, Vol. 33 (1946), 305-325.
Srivastava, J., and Lurie, N.: "Price-matching guarantees as signals of low store prices:
Survey and experimental evidence." Journal of Retailing, Vol. 80 (2004 ), l 17-128.
Stone, B., and Jacobs, R.: Successful Direct Marketing Methods (7th ed.). New York:
McCraw-Hill, 2001.
Thaler, R.: "Menta l accounting and consumer choice." Marketing Science, Vo-I. 4, No. 3
(1985), 199-214.
Wilkinson, J.B., Wason, J.B., and Paksoy, C.H.: "Assessing the impact of short-term supermarket st ratcgy variahies." Journal of Marketing Research, Vol. l 9 ( 1982), 72- 86.

I
~

c'l4

H 1 1 H< 1 N<.1 s

- -

-------

Wittink, D.R., and Cattin, P.: "Commercial use of conjoint analysis: An update." Ju1mwl
of \/arketing, Vol. 53, No. 3 ( 1989), 91-96.
\Vitt ink, D.R., Vriens, M., and 13urhenne, W.: "Commercial use ot conjoint in l:urope:
IZe-,ults and critical reflections." lnternatwnal journal ofReot'arch in Marke/Ing, Vol. 11,
'\o. I ( 1994), 41 -52.
\\'u, L I.. J., and I famada, M.: Expenrnents: Planning, Analysl',, and Parameter Design

Or1t1111ization. J\iew York: Wiley, 2000.


Case 10
\'\.'ilk1mon, J.B., Wason, J.B., and Paksoy, C. ll.: "Assessing the impact of short term super market strategy variables." Journal of Marketing Research, Vol. 19 ( 1982), 72- 86.
Case 11
Clarke, D. G.: Marketing Analym and Decision Makzng. Redwood Cit CA: The Scientific
Press, 1987.
Case 12
Almquist, F., and Wyner, G.: "Boost your marketing ROl with experimental design." flar
1111il Husiness Review (October 2001 ), 135
14 I.

296

INDEX

tron1cs, 3, 225-227, 242-243; PhoneJJog, 3,

79-80, 80; calculatron columns lor calculat-

19_) 204, 286-287; Piggly Wiggly, 273-275;

ing effects, 73-74, 74; design matrix for, 67;

United Dairy Jndw.tries, 59-60, 276-280

interactions in a three-factor design, 72-73;

categorical data, I7- I 8, 65, l 69

interactions ii' a two-factor design, 6Y, 70-

cells, l 70

72, 7 I, 72; interpretation of results, 78-79;

central composite design:,, l 87

main ef1ect calculauon, t>LJ; normal pruhabil

central limit effect: for averages, 26; for propor-

ity plot, 74, 80-82, 82; rcpliLated runs, 76-

t1uns, 27-28

79, 77, 78; statistically significant effects, 75

central limit theorem, 26, 28, 40, 4 l

82; three factors and their levels, 66; two lcvd

ch,unpron-challenger testing, 2

factorial design, 68 -74, 6\1; variance of the

changi11g factor levels simultaneously, 67-68,

experimental_crror, 79. Sff alsu two-kvcl fac-

67,

(18,

95, 99

torial experimenb

changing one factur at a time, 66- 67, 96-98,

curvature in the rnpotlsl',


cuslo111er spccilrcatrom,

97

chi -square distributions, I 5; densities ot. three,

l)

I 95

i1

eye! ical cuorcl i n,rl c' cxch.rngL 2 I()

17

Lhou.srng levels for L'Jch factor, 87 88

lJ opt1111al1tv, 2118

Cl.1rkc, D. C., 276

cLit,1 cb,urpt1un, 17 22; ,rrrth111l'lrc 1man, IX;

2 Ill

cu111plctely randomized experiments, 47 55, 19;

b,1r charts, 18, .'I; hux plots, 23, 211; calt'gorr

,t11al1sis of variance table and standard com-

cal cL!ta, 17 !:-\; cur1tinuous cL1ta, 18; UJl"rl'la

pukr software,

S_l

55, 54, 55; cornpar111g

within-sample .rm! between-sample estimates

trun cuetlicicnts, 19 - 2tJ; dot diagrams, 18, 23;


histograms,! .~. 2.>;

intnqu.rrtik range 18;

u! population variance, 52-53; variation be-

median, I 8; pcr-centile ul order p, I 8; piL'

tweer1samples,51-52; variation within

charts, 18, 21; rar1gL', 18; s,rrnplc Sl<tmiard de

sarnples, 50-51
computer software, l. 187; for A- and D-optimal
designs, 210; )MP, 187, 205, 206-207, 210;
Minitab, 187, 205-206; for nonorthogonal

viation, 19; sarnplc variancl',

Jl);

scatter dia-

grams, 19-20, 23, 24; slatistical software, 18,


20; summary statistics, 18- I9, 22
definingrelationofthedcsign, I 16-117

design and analysis, 204 -207. See o/;u statist i

!Jcrning, \\'.Edwards, ()ut o(tlie ('ris1s, fl

cal software

de ivloivre, Abraham, JY

cor!lidc11ce intervals, 41; for a population mean,

Design-Ease, 205

26 27; for a population proportion, 28; and

Design-Expert, 205

tests ofhyputheses, 32-34

JJesign o/Lxperiments, Flze (Fisher), 3, 6, 244

confounding effects, l l 3 - 114, l 39, 150

design optimalitv, 208-210

co11four1ding patterns: Plackett-13urm.ln dl'-

direct mail credit card ofter case study, 82

s1gns, l 63- l 64; and results of experiment,


114-ll6, 115, 116

87,

83 -86

discrete random variabks, 8, 9- I 2; binomial

contir1uous data, 18, 65, 169

distribution, II

continuous random variable, 9, l 2-17; chi-

uete distributiun, 10; standard deviatiun ul a

SlJUare distribution and, 16, 17; f-'-distributiotl and, 16, I 7; normal distribution and,
13 -15, 14; probability density function and,
12 -13, 13; I-distribution and, l 5-16, I 6. See
llisu 110r111al distribution.
correlation coefficie11t, 19-20
Lrackcd pots example, b'i-82, l 17 119, '"slltll
ing l11gher-order 111tcractiuns arc negligible,

12 .\Y,,ll;111ca11oladrs

discrete distribution,

JU-\\;

variance of.1

discrete distribution, I 0
dot diagrams, I fl, 23
Fagle llrandscdst'study, 214-216,215
lcconomic Cuntrul oJ C)1wl1ty a/ Mrmu/actured
l'rud11tl \'ihch.1n), llJ

II

elkct herl'd1ty, UK

--------

JNDF.X

e-mail advertising example, US- 139, 136, 137,

297

Hald, Anders, A History of Prohability and Sta-

tistics and Their Applications before I 750, 39

139

errors of measurement, 39

hierarchical ordering principle, 75, 93-94

bee\, 207

higher-order interactions, 140; negligible, 79-

experimental design, I; history of, 4-5


Fxpcrimcnt;1\ Jlcsign

011

the Fmnt Lines of

M.irkct1ngL,1sestud;, J'i.)

161, /'>6-/5/,

2'17 272,2'i9-26/,2M,zr,7 271

80, 80, I 39
histograms, 18,

/{1stnry o( l'rohohi/1ty 1md St11t1st1rs and Tl1e1r


Applirntwns heforr 1750, A (Hald), 39

cx11crimental error, I J9

hypotheses for future studic.s, 95

Lxpcriments 111 Retail Operatiom lase study,

hypothesis: alternative, 29; wnfidence intci-vals

244-256, 247, 250, 255

and tests of, 32-34; null, 29; statistical tests


of, 28 30, 4 I

factorial designs, unreplicated, 82, I 00


"If Japan Can, Why Can't Wei" (NRC television

factors, 65
factors at three or more levels, 66, 169-191;

a p11lc ju ice s;1lcs example, I 77- 180, l 7R, I 79,


JRO; cake baking example, 173-174, 173, 174;
central composite designs, 187; continuous

white paper), 6
interactions: in three-factor design, 72-73; in
two-factor design, 69, 70-72, 71, 72
interquartile range, 18

factor experiments 174-180; general 2-factor


factorial experiment, 170-172, I 72; orthogo-

JMP, 187, 205, 206-207, 210

nal polynomials, 174 -176, JX8; partitioning

Juran, Joseph, 139

sums of squares into mtcrpretahle compo-

just-in-time production, 244

nents, 176-177, / 76; response surface analy


sis, 187; simplex designs, 187; thrcc-lcvcl frac-

Kenya AA coffee example, 111- I I 7, I 12

tional factorial designs, 184 -186, I 83, IR4;


two factors at two levels and one factor at
three levels, 181-184, 181, 182, 183, /R4
F-distributions, 9, 16-17, 17; Fisher, Snedecor,
and, 60

Lady Tasting Tea: How Statistics Revolutionized


Science in the Twentieth Century, The
(Salsburg), 6
Laplace, Pierre-Simon, 39

fisher, Sir Ronald, 6,

_'\9; changing factor levels

Latin square designs, 48, 59

s1multJneously, 64, 6 7 ; The {\>sign of Expcri-

Lenth's PSE, 82, 100

mcnls, 3, 6, 24~; /;-distrih11t1on named for,

levels, 65

60; Statistical Methods }or Rcseorch Workers,

linear contrast, 114

60
foldovcr: switching 'igns in cverv column, 133
135,/l4,139

217-221, 218, 219, 220

fractional f.ictfm,11designs,2; in 8 runs, 121124, 122, J2l;


128. <;<T

Magazine Price Test case study, 92-93, 93,

Ill

Jfi

rtlll.S,

124

128, /26-127,

olso two IC\TI fractional f.1ctorial

designs
-sLllist1c, 53 - 55, 57

main effect calculation, 69


mean(s): arithmetic, 18: comparing, 32-"1fi, 1'\,
36; nf a discrete distribution, IO; estim<ll 1011
of, 31
median, 18
minimum aberration design, 140

Causs, Carl Friedrich, .l9

Minitab, 187, 205-206

generatorofthcdesign, 116-117, 140

mixture designs, 205, 206

genuine independent replication.s, 94

Mother Jones (magazine) case study, 2, 222-

c;ossctt, William, 6; "The Prnhah\e Error of


.i ,\k,rn," 40

224, 240-241
multi factor techniques, 2, 4

2~8

INIJFX_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __

/56, 15/, 160 projLct11itv prnpcrt1cs ot, l 50,

nwlt11c1riable expcnmcntal design

164-165; rnatnx for 12 run, 151; weapon fail

techniques, 4
nqd1g1blc higher-order interactions, 7 9 80, HO
nonorthogonal designs, 192-2 l l; computer

ure rate probk111, IC>I ~"" ul.\o l.xpcr1111c11lc1l


Design on the 1-ront l 1nl's of \1.irketing
populatrnn, 22

2~

snit ware for construL11on ol, 204 -- 207; de-

population distributwn, IO

l111cd, 192; Phrn1ellog case, 193-204, / 95

probability dcns1lv !unction, I'


probability d1str1but1om, 8

2UI

1101111.11 d1stribut1on, I J

15, i.l; '1pprox1matmg

l -, )9

probabilitv ,a]u.:, 29, .lO, \.l,:) \

the binomial d1stribut1on, 41; as contmuous

"Probable 1-.rrnr of a Mean, rhc" (Gossett), 40

rdndom variable, 9; history of, 39-40. See

proJelliv1ly propntics ol PlaLkl'tt Burman de

a/,,, continuous random variable; stJndard


1w1111JI dist ributwn

signs, 1-iO, I (1l

I (l 1

proport1on Lst1111at1ll11, \I

1wrn1,tl probabtlit) plot, 7-/, 80 82, 82

\2

prup<>rt1on ol sL.i>JL'ch. ,111d 1cqu1rcd sarnpk

11ull l11pothesis, 29

s11c, 91, and s1gn1l1La11cc ol clfrLls, L)lJ


P'>l (pseudo sL1nd,ird error), 1-12, I OU

Oi11CL' Supplies L Mail 'lest case study, 228


2.lLJ. 229, 23U, 2.l.1 234, 237, 238

quadratlL modl'is, 187

011li11L' learn1ngrxample, 128 133, /29, /JO,


fl/, 133

qu.tl1tv lll<ln.igc111c111, 4, (>


l)uetelet, i\dolp

ie,

39

orthogonal polyno111ials, 174 -176; table ol, 188


orthogonality, 89 90, 89, l l 7, 192; and regres
Slllll

model, l 04 - l 06

/<.A. J.1slta, /'lie I 1je uj u

~ne1111_1t I Box),

randomization, 94

Out of the Crisis (Deming), 6

randomized complete block experiment, 4748, 55-59, 56, 57, 58; 60, 61

Pande,

Peter~-,

Neuman, Robert P., and Ca

random sampling, 25

CJ, Motorola, and Other '/'op Conipames !\re

random vanabks, H

I lu11111g 'Jhetr Pcrfumw11u', 7

63; inference

in, 34 - 36, 35, 36

vanc1gh, Roland lt, !he Six Stgmu Way: I low

range, 18
rcgrcsslOll Jppt<>d< It to dLtc11111111ng s1gn1!1c,1nt

pc11 a1nctcrs, 111 s.1111pl1ng, 24


Pareto charts, l 56, I',;'

eftclls, 88 8ll

l'c1rl'lll pnnuple, 7;
part1,il .tl1ascs, 152

rcgrcssron mlldLI and orthogo11alit1, llJ4


regression prirncr, I 0 I I()_\

part1c1llv confounded effects, 152

1q>ltcatcd runs,,() ;9, ,-/,

part1t1llning su111s olsquares, 176 177, /76

resolution of design, I 19 120, l _l9, l 'iO

Peak llectronics Lase studv, .l, 225 227, 226,


21~

<i

243,243

I Ob

:s

response surl.lcL .rnc1hs1s, IH,' 205, 206


response vari.ibks, b'i

pcrcc11t1lc ut order I' 18

rubust des1g11, I>

Phone I-log case studv, J, 193-204, 286 287

runs, 65

p1cd1arts, 18,2/
Pigg!> \\'iggly

LJSL'

study, 27 l-275, 274

2~r,

),iisburg, I >.111d /'lie I <Iii\ /11>t111g '/eu: I luw

~tu

pl.1cd>u effect, Jti

ltsltcs ffrvoh.tw1112etl Snence 111 the 1we11t1cth

Plad.ctt Burman designs, 2, 150-168, 192, 245;

C'erllury, 6

cllnlound1ng pattern;, 16.l- l 64; comtrllllllln

s.impk, 22 24

ol designs 111 N
12, 20, and 24 rum, l 62;
dtrl'd rnail credit card c_ampa 1gn, l'iS 161,

sample s11e, 25, I I, ckter1111n,ll1on lll, _lU


12-43, 20(>

l2,

2:::_1_ I~ !Jf'X_- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 111ult1' ,mable expc-ri111ental Jesign

116, 15;', 160 pro1cLl1\1t: propntil',ol, l'iO,

164-165; matrix lor 12 run, 151; weapon fad

teLhniqucs, 4

un' rate problem, I 61 \ce a/,o Lxpl'flllll'nlal


11cglig1hk higher order interactions, 79 80, /JO
nunmthogonal Jes1gm, 192 211; computer

lk,ign on the lront L.ims of MMket1ng


2~

populauon, 22

"iltware for rnnstructwn of, 204 207; Je

population d1stribut10n, 40

lined, 192; Phlli1el log e<1se, 193 204, I 95

probabilitv de mil> lu11ct1011, I 2


probability Jistnbut1um, 8-17, 39

2!J.I
Jllll

111,d d1stributwn, 13 15, I 3; approximating

probabilit; \Jluc, 2LJ, lU, l \,

-,_l

the h1no111ial Jistnbution, 41; as continuous

"Probab le l.rror of a Mean, J'he" (Gossett), 40

random vanable, 9; history of, 39-40. )cc

pro1ellivity propnt1e' ol i'lackett Burman de


signs, I SU, I 64 I h'i

c1/,o c!lllt1nuous r.111dom variable; standard


Jilli

mJI d1stnhut1on

pruportio11 cst1111,1t1011, 'I

11ur111al prnbabi11t) plot, N, 80- 82, 82


null !11 pothes1s, 29

12

propllrllUll ol ,L.b)ects: dl!d requ11ed ,,1111pk


91, ,1nd s1grnlicdl1LL' ol l'iil'cts, 90

'Ill',

l'Sl (pseudo 't.!llddld l'llor), 1'2, IOU


Oll1cc '>upplies !. Mail lest case study, 228
2.l9, 22Y, 2 JU, 232 234, 237, 238

onl111c' learning example, 128 133, 129, 130,


I 1/, 133

quadratic 11wdcis, I 87
qualitv 111a11agl'rnLnt 4, h
Quetclet, Adolp1e, .l9

orthogonal poly1101111als, 174-176; tahle of, IH8


orthogonality, 89- 90, 8Y, l 17, 192; and regres
sH111 model, I 04 -106

J<. A hsl1er, /he I 1je oj u '" 1e1111.'1 (Bux),

(i

rando1111zat1011, Y.J

Out of the Crisis (Deming), 6

randomized rnmplete block experiment, 4748, 55-59, 56, 57, 58; 60, 61-63; inference

Pantle, Peter S., Neuman, Robert P., and Ca


vanagh, RolanJ

I{.,

rl11: Szx Sigma Way: I low

(;/, .\1vtorola, and Other 'fop Cvmpanfrs Arc

I /011111g Fhe1r l'crfornwncc, 7

p.ir,1111eters, m s,1111pl111g, 24

in, .'\4-36, 35. l6


ranJom 'a111pl111g, 25
random variables, 8
ra11ge, 18
1egress1un JflfJJ"a' h to dctLrm111111g signiliLJnt

P<11etu charts, 156, IS/

effects, 88 84

l',irl'lu pnnuple, 7'i

rcgrl'ss1011 model ,111d ortl10go11,d1t;,

pa rt 1,d ,1ii.ises, 152

n:gress1u11 prnner, IOI

I().!

.~8

p.1rt1allv confounded effects, 152

repltcdteJ runs, 7h ,''), /,,

p.irt1lloningsumsofsquares, 176-177, /76

n:sulutiun of design, I 19 120, 1.l9, 150

Peak llectrorncs case study, 3, 225 227, .l2i1,

response surl,ILL ,IJlJ!\sJs,

212 243, 243

response va11abks,

pl'ILL'lltile ot order p. 18

rubust cks1g11,

l'hunl'I log case studv, 5, 193 -204, 286 287

runs, 65

I ()(1

10.l

18~,

205, 2Ub

(i'i

pie charts, l 8, 2 I
1'1ggh Wiggly e<1'c -,iudy, 273-275, 27.J 175
pl.1nhu effect, 36
l'lackett Burman designs, 2, l 50-168, l 92, 245;
conlounding patterns, 163

164; comtrud1on

ol designs Ill N = 12, 20, and 24 runs, I62;


d11cLt mail credit card rnmpaign, 155 161,

'iabburg, ll,1\1d
llo/JlS

'/'/11

l.ud1 liw111g 'it'll: litn1 \111

Ncvo/1.1111111.:ni '>t111ce tn the 1went1t'th

Ce11turv, 6
sample, 22 2"1
sample
-~2

Sill', 2~.

lJ, 206

I I, deteri11111at1011 ol, .\0 .12,

INDEX

sample standard deviation, 19. See also standard


deviation
sample statistics, 24

299

ence in the blocked experiment: comparing


means of two dependent samples, 34-36, 35,
36; mean estimation, 31; null hypothesis, 29;

sample variance, 19, 40. See also variance

placebo effect, 36; prohability value, 29, 30,

sampling distrihution, 26

33; proportion estimation, 31-32; random-

s.1rnpl1ng issues. 22-2S; parameters, 24; popula-

i1cd block experiments, 34; sample size deter-

tion, 22, 24; random sampling, 25; sample


st,1tist1c.s, 21. See a/<u s.implc: '<lmple si7e
sulllcr diagram.,, 19 20, 23, 24
screntif]c lcarnrng process, 95
screening designs, 206
sequential approach, 135
Shcwhart, William. /;mnomic Control of Quality
o/Ma1111(artrirrd l'rodurt, 40-41
:-ilwwh.irt ch.1rt.s (X-h.ir rnntro. charts), 10
significance level, 29
s1gn1f1urncc of effects, .rnd prop()rtion of suhicct.,, 90
significant main effect, 139
simplex designs, I H7
Six Sigma approach, 7, 244
Six \1g11111 \\'m: /low ( ,/.', Motnm/11, and Othrr
7op Co111pw11c, A re f/onrng 7'/1err Pcrf!111111mn".

'/'he ( l'andc, Neuman and Cavd-

mination, 30-.'l2; sampling distribution, 26;


'ignific.ince level, 29; standard error of the
sample average, 27, 3.'l; statistical tests of hypotheses, 28-30; z-table, 33

Statistical Methods for Research Workers


(Fisher), 60
statistically significant effects, 75- 82
.statistical process control t '-PC), 244
1tati.stical 1oftware, 18, 20; .inalysis of va11ance
table and the output of standard computer
software, 53-55, 54, 55. See also computer
software
statistical tests of hypotheses, 28-30
Statisticsfnr Experrmcnters (flox, Hunter, ,rncl
Hunter), 3
<;t1mmary statistics, 18 19, 22
switching signs of all columns. Sec foldover
switching signs of one colt1mn, 130-1.'l I, J .'\'.)

nagh), 7
<;11edecnr, Ccorge, 60

Taguchi, Cenichi, 6

Sl'C. .~ce statistical process control


split plot experiments, 170

Taguchi designs, 205, 206


Taguchi loss function, 6

split-run testing, 2

Taylor series expansion in continuous response

standard deviation: ol a discrete distribution,


10-11. See also sa m pie standard deviation

standard error of the sample average, 27, 33


standard normal distribution, 13; densities of,
14, 16. Sec also cnntinuous random variable;
normal dist rihut ion
Stat-Ease, 205

stat1st1c,1l expcri111e11h,
st,1ti,tic.il inference, 26- 16; ait"rnative hypothc'is, 29; "best practice" drug, .16; central limit
effect fnr .wer;1gcs, 26; central limit effect for
proportions, 27-2/l; central limit theorem,
26, 28, 41; confidence intervals and tests of
hypotheses: compMing mcam of two independent samples, .'12 .l1; rnntldencc intervals

functions, 75, 93-94


I-distributions, 15-16; as continuous rJndom
variable, 9; densities of, 16; and William
Gossett, 6, 40
test-control, 2
three or more levels of factors. See factors at
three or more levels
total quality management (TQM), 7, 24 1
total sum of squares (SST 1, 54
TQM. Sec total quality management
two factors at two levels and one factor at th rec
levels, 181-184, 181, !R2, lR3, IR4
two-level factorial experiments, M -l 10, 170 172, 172, 192; changing factor levels si 1rnrlt,1neously, 67-68, 67, Ml, 99; changing rn1c f'.1c-

1'01 a population mean, 26-27; confidence in-

tor at a time, 66-67, 96-98, 97; chcmsing

tervals for a r1opul.ltron proportion, 28; infer-

levels for each factor, 87- H8; curvature i11 the

"''l'"ll'>e, 91

9 \; d1rect 111ail cred1t L.ird olfr1

case studv, 82-87; genuine mdt'pendrnt


1eplications, 94; Lenth\ PSI:-, 82, 100; Maga
/lllL l'nce Test, LJ2

124, 122, Ill; l1.1Ltit111,tl dts1g11s 111 16

124

9 l; orthogonal1tr, 1-9

128,1211

.11nple, 128

n11is,

/J;'/.!8;onl1nelcarn111gex

I l2, 129, I l(), I l /, / ll

90,

89, I 04-1 Oo; proportion of subJecb .ind rr

lJl!Jred sample size, 91; proportion of subJrcts


.ind s1gmlicancl' or dfrcts, 90; randomization,

Unttcd Dairy lndu-,tnrs case studv, 59-60, (i(J,

276-280, 277-280
unreplicated fallonal dcs1gm, 82, JOO

LJ4; 1cgression equation implied by factorial

l'.Xl'l'rlments, 88-89; regression model and

variance: of a di.scretr d1stribut1on, IO; of the ex

orthogonality, I 04 - I 06; regression primer,

pen mm ta I error, 79, I \9. Ser also -,ample

I0 I

vanan...:c

I 03; statistically '>Jgnificant effects, 75

K2, J'.1ylor series expansion in continuous

fl'

'Pll!Jse functions, 9l 94; terms dci111ed, o5;


two kvel lactonai design, 68-74, 69, 8LJ 90.
89 '>er alsu craLked pots example
two kvd fractional lactorial designs, 111

weapon foilure IHL prnbk1n 111 Plackett


Burman design, 161
Web site de-,1gn, 3, 201

149,

I SO; ambiguitie-, u1, 125, 128-1.35; cracked

within-sample population\ anancr, 50 51, 52

53

pots example, 117 119, I 18; de-,ign re-,olut1011, 119


J

\5

120; e nMil advertising exam pk,

l 39, 136, 137,

J 39;

foldover, 113 -135,

/ l/, 139; fraL11011al designs 111 8 rum, 121

:: l.1bk, l.\

\IJO~.J\

tL''l'"11se, 91-9 \; dire<-1 mail credil L.Jrd offer

124, 122, 12\, lr.rctrun,rl des1g11' i11 lfl run,,

c.i;e s1udy, 82- 87; genurne 1ndependen1

124

repl1catiom, 94; Lcnth\ J>:,b,, 82, 100; Maga

ample, IW

1111<.'

128, 12<> 127, 128; 011l1ne learning

t'x

I \2, 129, 1 l<J, / l/, /.ll

l'rrLl' lesl 92 YI; orthogonalrl y, 1':9 90,

89, I 04

I 06; proportion of sub1ech and re

qqrrcd sample size, 91; proportion of subjects


.111LI srgnrJicance of elkcts, 90; randomizatron,

l'n1tcd Dairy lndustrres

c<l'>L'

studv, 59 (i(J, 60.

276-280, 277 280


unrcplicated factorial desrgm, 82, 100

94; regresston equation implied by factorial


l'xpenments, 88 -89; regression model and
nrthogonality, I 04
I0 I

I 06; regression primer,

I 03; statist1Caily s1gnil1Cant effects, 75

variance: of d di'crcte dist ribut1011, IO; of the ex


pcrirncntal nror, 79, I l9. Set' ulso sample
variance

82; I aylor serres csp,rnsron 1n continuous rc


'i1011Sl' l.UllCtiOllS, 9.) 94; terms deillll'd, o5;
two level lactorrai design, 68 74, 69, 89 90,
89 . .\eL' also cracked pots example
two IL\el fractional lactorial designs, 111

111

Plackett

l\urman dL1g11, 161


\'\'di sile design, J, 20/

149,

150; ,1mbiguities in, 125, 128-1.lS; cracked


1>ots example, 117 119, 118; design resolu
t1011, 119 120; l' rn<1il advertising cxampk,
I l5

we<1po11 tarlurL' r.1ll' probk111

139, 136, 137, 119; foldover, I 13 - 135,

I l.l, ll9; fract1011al dcs1gns 1n 8 rum, 121

withm sample population varrancc, 50 51, 52


53

PREFACE

analyzing real data sets, and designing and carrying out their own experiments. Each chap ter includes. many exercises ranging from straightforward "drill-type" problems to more
challenging ones that test tools and concepts. The 13 cases involving real-world applications
are a key and unique part of the book. Some of these cases describe how experiments were
conducted and give readers the opportunity to analyze and interpret the results, while
others are written so that students can develop their own designs and compare their approaches to what was actually done.
Each chapter ends with an important section of notes titled "Nobody Asked Us, But ... "
Including these notes allowed us to focus on the basic concepts in the main text and then to
elaborate on them at the end of the chapter. Doing so gives the reader the opportunity to
first learn the basics without being bogged down with too many details and then through
the notes to build upon these core concepts. The title of this section rnmes from a wcUknown column of the same title written by the late New York sportswriter Jimmy Cannon.
The development of statistical computer software has made it easier to design experiments and analyze and interpret the results. We have not tied the buok to a specific computer program, but discuss computer output from several packages, in particular, Minitab
and )MP.
Instructors can use the book in a number of ways.. The entire book can be covered in a
full semester course on experimental design that would include most of the cases in the case
study appendix, as well as an experimental design project that would combine methodology with real-world practice. The book can also be used for a section on experimental design in a course on quality management. To do so, the instructor would assign Chapters I,
4, and 5, along with several of the cases. Jn addition, the instructor might assign selected sections of Chapter 2, which reviews basic statistical concepts, and Chapter 3.
Many people contributed to this book. We begin by acknowledging several people who
greatly influenced our thinking and learning. We gratefully acknowledge George Box, Norman Draper, and the late Bill Hunter who taught courses on design of experiments and statistical modeling when one of the authors (J L) was a graduate student at the University of
Wisconsin-Madison. The very lively "Monday Night Beer Seminars" in George Box 's basement had a profound impact as these discussions showed the importance of well-designed
experiments for learning and also provided a strategy for im?lementing these methods in
real-world settings. We also pay tribute to the late Sebastian B. Littauer, a distinguished professor at Columbia University and a recipient of the American Society for Quality's Shewhart Medal, who was a mentor to one of us (AJS). He was an expert in statistical methods
who influenced many by his extraordinary teaching of statistical quality control not only as
a set of problem-solving tools but as the conceptional foundation of a quality management
philosophy.
At Stanford University Press, a number of people made important contributions. We are
especially grateful to Martha Cooley, our editor, for her encouragement, insights, and
suggestions, and to Jared Smith who carefully checked and organized the manuscript in
preparation for its production. We are grateful to the production services team at NewgcnAustin, including Andy Sieverman, who oversaw the production process from start to finish, and Teresa Berensfeld, whose excellent copy editing improved the presentation.

,,
I

CASE

OFFICE SUPPLIES E-MAIL TEST


Gordon H. Bell

INTRODUCTION

Marketing know-how does not always translate well from one channel to another, as one
office-supplies retailer came to realize. With consistent growth and solid profit from their
retail stores, one industry leader decided to expand into direct marketing channels, mailing
out catalogs and sending e-mails to direct small-business customers to their Web site ,rnJ
store'>. Two of their biggest challenges were bullding solid ma'I and e-mail lists ofprospec tiVt' customers and translating the in-store experience onto a two-dimensional page. A year
after starting these new programs, the marketing vice president wanted to speed the learning curve with a more disciplined approach.
Talking with other executives, he decided to bring in an outside consultant to strengthen
their marketing testing efforts. Both the catalog and Internet programs had room for irnprowment, but the flexibility and low cost of e-mail (versus printing multiple catalogs) became the deciding issue on where they would first apply scientific test mg techniques.
With fast response, low costs, and flexible production, e m.1il was a great place to -.tart
testing. In addition, what worked in e-mail cuuld then be tested in the catalog and L'l'en 1n
retail -,tores. However, in this early stage of their business-to-business e-rnail program, the
lntcrnct marketing director had very few e-mail addresses he cuuld use. Retail sales associates had begun asking for e-mail addresses, and online orders were growing, but at this
point the marketing team had only about 35,000 names. Moreover, these names included
three distinct customer segments, each with different buying behavior. With so few names,
the Internet director had tested one or two new ideas in each 111onthly drop, but had J dif
ficult time tryi11g to get a statistically significant read on his results.
PLANN I NG THE TEST

The consultant agreed with the Internet director that sample size could be a problem. He
ex.lained that there was no magic shortcut-no way to redu<:e the natural variation in the
marketplace-so it was necessary to overcome variability with bold factors and a sufficiently large sample size. He explained how simple rules of tbumb, like" 100 orders in each

'fo eli111inate this variable, he designed his test panel with no hub within 3 inches of the

panel edge.
Tiu.: expedatJ()ll was that under the u1r1Tnt cu11ditiuns ahuut 2"u u1 3% u( the huks t111
a given test panel would have broken tents. !'he cust er test panel i11Lluding labor and i11SlJl'Ctio11 was estimated to be about $20.

r
r

'
PREFACF.

xi

We wish to thank several people who helped us develop the cases and examples in this
book. Mark Wachen, CEO of Optimost, provided the data for the Phone Hog case in Section 8.2 and shared with us his modeling insights. Optimost (www.optimost.com) is a technology and services company specializing in comprehensive real-time testing and conversion rate marketing. We also thank Phil Nadel, CEO of Gulfstream Internet (the parent
company of Phone Hog), for carefully reviewing the case and allowing us to use it. Jay Harris, publisher of Mother Jones, was instrumental in the development of the Mother Jones (A)
and (B) cases, providing access to his organization and contributing many helpful ideas as
the experiment at Mother Jones was designed and carried out. Alexander Dean, president of
David Brooks Company, was very generous with his time and expertise. The broken pots
example that we introduce in Chapter 4 and discuss further in Chapter 5 was written based
on many discussions with Alex and describes a simplified version of the production process
his company uses in the making of clay pots.
We thank Elsevier Publishing Company for allowing us to include the article [Bell, G. H.,
Ledolter, J., and Swersey, A. J.: "Experimental Design on the Front Lines of Marketing:
Testing New Ideas to Increa~e Direct Mail Sales," International Journal of Research in Mar-

keting, Vol. 23 (2006)] as Case 9 of the case study appendix.


We are also grateful to Ronald Snee, Soren Bisgaard, and Barry O'Neil'for their helpful
suggestions.
There are several people who deserve special mention. We are pleased to acknowledge
the contributions of Jullie Chon who over the course of a summer produced a comprehensive and very useful review of the experimental design literature. It was a pleasure working
with her. We are also extremely grateful to Berton Gunter, a leader in teaching, applying,
and writing on experimental design. Bert provided a detailed, excellent review of an earlier
version of the manuscript. His input was invaluable and we have incorporated many pf his
suggestions.
Ken McLeod was the original editor of our book at Stanford University Press and was instrumental in the birth of this project. We will always remember and be grateful for his enthusiasm for the project, intelligence in understanding what we were trying to do, the constant encouragement he gave us, and his personal warmth. We are saddened that he did not
live to see the book completed and are indebted to him for his contributions.
In acknowledging the contributors to this book, we have saved for last our deep gratitude
to the most important contributor, Gordon Bell, president of the consulting firm LucidView. Gordon made major contributions to our book by providing us with cases and chapter examples that arc based on his expertise and extensive experience helping firms apply
experimental design methods. He contributed Case 2 (Magazine Price Test) and Case 5
(Office Supplies E-mail Test). A simplified version of that case is used in Section 5.8. Gordon also co-authored (with the authors of this book) two other cases: Case 8 (Experiments
in Retail Operations: Design Issues and Application), and Case 9 (Experimental Design on
the Front Lines of Marketing: Testing New Ideas ~o Increase Direct Mail Salesi. Cases 8 and
9 nre based on Gordon's exceptional consulting work. Parts of Case 9 are also used as a case
1

example in Sections 4.5 and 6.3. One of the greatest benefits to us in writing this book has
been the interactions, both professional and personal, that we have had with Gordon.

CASE 1
EAGLE BRANDS

INTRODUCTION

Bill Evans, Director of Marketing at l:agle Brands, was worried. Eagle, a national producer
of packaged sandwich meats, was facing increased cornpetitio11 and declining market share.
Looking over the latest quarterly supermarket sales numbers, ham observed that the situ ation was not improving. I le realized that drastic action was needed to turn things around.
Evans had recently read an article in the Wull Street Journal about a statistical approach
tu product testing and was intrigued by the idea of using it tu tr-y uut su111e new marketing
initiatives. The article called the appruach mu1tivariab1e testing (MVT ), and its proponents
claimed it could be used to devise an efficient in -store test ol multiple variables that might
influence sales and prolits. Evans had in the past led a pruject to test market a new paLkage
design, and although the experiment provided very useful results, it had been a major un dertaking. I le was concerned that testing a number of variable, in one experiment might be
prohibitively expensive to carry out.
The journal article mentioned QualTest, a mariage111e11t ul!1:,ulti11g lirm speciali1ing in
applications of MVT. !.'.vans contacted the lirm and arranged lo have Qua!Test give a pre
sentation at Eagle Brands. Evans assembled a group of 10 key prnplc, including the head s uf
sales, tinance, and accounting. Steve Cardner , a senior QuaJ"le-;t con>ultant , made the pre ~
sc11tat1on, explaining the approach and illustrating it with ::ievn;d LclSL"exJmplt:s ofsULLt:ss
Cul experiments for QualTest clients.
DESIGNING THE EXPERIMENT

The response to the QuaJTest presentation was a positive om, ctnd Eagle Brands hired the
firm to help them design and evaluate an in -store marketi11g expen111ent. Qua!Te.-,t con ~
sultan ts led by Steve Gardner began a series of meetings with r~ aglc managers.
One of Eagle's major customers was Zip Stores, a nation~ii 'upern1arkct chain, and the
plan was to select a group of the chain's stores to participate in the test. Hill Evans realized
that input from the chain was important to the success of the experiment, and a merchan dising manager from Zip agreed to join the team.

INTRODUCTION

1.1 THE THEME OF THE BOOK

This book is about the power of statistical experiments. In the increasingly competitive
global economy, firms are constantly under pressure to reduce costs, increase productivity,
and improve quality. Testing or experimentation in the business world is commonplace,
and the usual approach is to change one factor at a time while holding other factors constant. To some, this approach seems logical, simple, and therefore appealing. But as we will
show, it is highly inefficient, and it may fail to identify important factors and lead to wrong
conclusions. The hetter method is to test all factors simultaneously. Doing so not only reduces the costs of experimenting but, as we will demonstrate, also provides the experimenter with more and better information.
Elementary courses in statistics that cover topics such as probability, hypothesis testing,
confidence intervals, and regression analysis often appear abstract; and although they are illustrated with numerous examples, they typically seem far removed from practical issues.
In this book we use and build on basic statistical concepts to explore approaches for solving
real-world problems. Although our focus is on practice, it is important to keep in mind that
statistics is a science, and sciei:ice is based on theory. While computer software has made the
implementation of statistical methods much easier, there is a danger in relying on a cookbook approach in which the ,user fails to understand the underlying concepts. In contrast,
this book's presentation com.bines theory and practice, and focuses on strengthening the
reader's understanding of fundamental statistical ideas.
Our goal in writing this book is to share our passion for the subject and to provide stu dents, practitioners, and managers with a set of highly relevant, interesting, and valuable
tools. In the past, in the area 6f experimental design, nearly all the attention was focused on
manufacturing rather than services. In contrast, most of the applications and examples in
this book will involve mark~ting and service operations. In the next section, we give a brief
introduction to some of the' cases that are included.

[
i

I.

DESIGN OPTIMALITY

Tlfr, .ippendix is written for readers with


effect'> of k design factors, x 1, x 2,

.. ,

substantial statistics background. \Ne study the

xk, on the response y and estimate the k

+ l coefficients

in the main-effects (first-order) model.

The factors may be the price in a marketing study, or the temperature and the conLcntra
ti on of an input factor in an engineering problem. A total of\,' > /... + I experimental runs
are needed to e<;tirnate the k + 1 coefficients. The N X (k + 1, design (regression) matrix X
consists of a colun . n of ones and k columns of factor levels th.it need to be '>elected at the
design stage. At issue 1s the optimal seleLlion o( the elemenh 111 ,\
Our interest is in the precise estimation of the regre.,.,ion LOcfficients f3
({3 11 ,
/) 1,
, (3.) '. Le.isl '>llll<lrL'S theorv (see Appendix ~. 1) 11npl1L''> th,1t the v,1r1,111LL' of the L''>tl
mJte

f3

is given by

V(/3)
Several optimality criteria have bern proposed rn the design literatun.:.

A optimality. 'We look for a design that lcatb to the rn1alil'-.t <l\'l'rage variance of the
resulting e-,t1matcs. The design that allow-, LI'> to estimate the pa1ameters with the
-,111allest ,1vcrage error mu.,t 1111nin111L' the lion ot (.\ \ ,
I his Llltlr10111., L,iikd
,\ optimalitv.
I) optim11/1tv. Altcrnallvcly, we look tor a design that 111in11ni/l''> the \olume of the 1oint

cont1dence region of the parameters. 1t c,l!l be shm\ll that the volu111L' is pruflLH
tional to the '>quare root ot the detcrmin,111t of(\'' ,\)
I knee, 11e w,1nt to '>ekct
the levels of the factors in the de.,1gn 111atrix X rnd1 th,ll the dctcnninunt of (,\ " X)
is mm1m11cd. lh1s criterion 1s Lallcd lJ opt1mal1t).
The two design criteria ,ire similar, involving two Lloselv related fullltiom of the reciprtJLab
of the eigenvalue' ofX'X. A-optimality mirnm11es their sum , whereas U-opt1malitv 111in1111 i1es the sum of their logarithms.
Before one can apply these criteria to dctcrn11ne <ln opt1m ,tl de.,1gn, one must <>pec1ry the
per111issiblc experimental region of the design factors. Also, one needs to remember that
the rnefficients {3 , (1 1, 2, ... , k) are affected by the choice uf the scale for x,. lfx. denotes
the price measured in dollars, we can change f3 bv a fallor of I00 by measuring the p1ice in
crnh. It is often desirable to scale the factors uniformly. We have done so 1n the 2 level designs
in Chapters 4 6 by ,1dopting the scaling \ and t \, implymg uni form 1t v ,1uoss the/... fa Lt or,.

One can show that an orthogonal l\'

(k

+-

I)

<le~ign

mc1trix X guarantees both

J\ ,rnd D-optimality of the main-effects design. I-or a proof. see John ( 1971, p. 194 ). lt is in
'>itu,1tions where orthogonal de>1g11s L.tlllH>l lw luuml th,1t .\ ,111d I) opli111,dity llL'colllL' 1111 -

Rotating Flash Image

_,_________

Nu
lrnag<~

Fonn
Ccntcn.~1l

Figure 8.2

Continued

T /\ u

ilt i\reu:i (the Fact on)

011

1. E

8. I

l'ho11d lugs Web Page ruul

1"11c1r

~ulllbtr

~u111ber

Arca
A-bo110111 (ma111 headlinej
;\ top (marn heaJline)
IJ (.,ub headline)

C (llldlll

lllp)")

[) ( fur.11)
l (prrvacy copy)
F (submrt button)
G (huvv it work.s ~edion)
// (ma111 i111age on rrght "de)
I (fouler)

Nor t

of

I e1c/.1

ul Lcvl'b Jnd

Level., (111 p<1rc11the.,es) l"seJ

Levels

111 the l.xper1rne11t

IU
4

n 11,4,h,H,9, IUI
4 (I I
.\ (I, J, lJ I
4 (I, 2, 5, ti I
6 11 t1

ti
(J
(J

4
7
4

Level \corresponds lo the Lurrently used ba,,ehne.

4 (I

4)

Ci I I, 2, 4, 5, b,7)
5 (I 5)

l (I 4
2 ( I, 2)

Pie ch art: 2004 dona tions (no/ yes)


Bar ch a rt: 2004 donations (no/yes)
70
6()

so
;:

"~
0..
"

40
30

20
JO

No

Yes
200 4 d onati o ns

Ba r c ha rt : proport ion of d o no rs by class

0.6

O .~

0
'D

i
f

0.4

cu

g_

0.2

.~

:0::

0.1

0.0

1957

1967

1977
Cl ass

Figure 2.6

Bar and Pie Charts of a Catego ri cal Va riahle

1987

1997

NONORTHOGONAL DESIGNS AND COMPUTER


SOFTWARE FOR DESIGN CONSTRUCTION
AND DATA ANALYSIS

8.1 INT R ODUCTION


The designs that we have considered up to this point are orthogon::il. The 2- level factoriaJ,
fractional factorial, and Plackett-Burman designs in Chapter,,4 through 6 are orthogonal ;
they share the property that each factor - level combination of any 2 factors is studied with
the same number of runs. Also, our analysis of the general 2-factor factorial experiment in
Section 7.2 assumes that the numbers of runs at the ab factor-level combinations are the
same, making it an orthogonal design. The same is true for the

Y and Y

factorial designs

in Chater 7.
Orthogonality of the design simplifies the analysis of th<' resulting data considerabJy.
Main effects and interactions can be estimated by averaging over alJ other factors. The esti mates ctre independent, and the sum of squares that is explain xl jointly by the studied !actors can be partitioned into individual, unconditioned sums ofsqu::ires that ignore all other
factors. For example, the total sum of squares in the 2- facto r factorial experiment in Sec
tion 7 .2 can be partitioned into the individual, unconditioned sums of squares for factor A ,
factor Fi, the interaction, and the error component.
Such an additive unconditioned decomposition is no longer ossiblc if the design is not
orthogonal. Orthogonality, for examle, is no longer presen t if observations are missing
frum

'111

orthogonal design. More importantly, an orthogon .il de-.ign mav simpl)' not be

available in situations that involve many factor:, with differ ent numhn:i of foctor lcveb.
Consider, for example, 7 factor:, with 2 factors at 2 levels , I factor dl 3 level:, , 3 factor:, at
4 icVL'ls, and I foct,.r at 5 levels. A full factorial with (2 )( 2)( 5 )( 4 )(,l )( t)( 5 )
certai11Jy orthogond. Also, a few special orthogonal

fraction~,

3,840 run' is

in fewer than 3,840 runs are

possible, but the n.Lmber of runs of these orthogonaJ fractions is sti!J quite large. It is simply nut possible to find an orthogonal fraction with a moderate number of runs. Other de sign criteria need to be adopted if one wants to select a good design that is able to study the
main effects of these 7 factors in, say, N = 30 runs. In this chapter, we discuss useful guid ing principles for constructing such nonorthogonal desigm. Design concepts such as
D - and A-optimality become useful, and we discuss them in Apendix 8. l.

Dotp lo t of 20 04 donations
-~5

.0

30

~ 25

~
~

20

0.

l 'i

._?..

c 10

.s
:;;
>

~
()

..........

-,---

-- - --

4,000

2,000

()

6,000

8,000

10,000

12,000

14,000

2004 d onations

Ji istog ram of 2004 donations ( < $2,000)


50
40

c
::'"'.
0..
"'

30
20
10
0
0

300

600

900

1,200

1,500

1,800

2004 donati ons


H istograms of 2004 do na t io ns (< $2,000) by class

1967

1957

3nn

tiOO

<fOO uoo

1987

.1mi

'loo :.rno

l. ~00

1.1100 0

JOO 60n

900 1.200 l,SOO l ,300

1977

JOO

600

900

1,200 l, SOl'l

l.~00

1997

1, ~ 00 r.~oo

inn

Pa n el va riabl e: class

,.;no 'loo U(l(1 1..~no 1.110()

2004 do nations

Figure 2. 7

Dot Diagrams, H istogra ms, Rox Plots, and Scatter Plots of Con tin uous Variab les

TABLE OF ORTHOGONAL POLYNOMIALS

For factors with two levels:


Linear

l,eve l I
Level 2

For factors with three levels:


L111ear

Level I
Level 2
Level J

Quadratic

u
I

For factors with four levels:


Linear

Quadratic

Level I
Level 2
Level l
Levl'i ,I

Cubic

I
-[

For (actors with five levels:


L11h:-a1

Level
Level
Level
Level
Level

Quadratic

I
2

Lubic

-[

4
5

- 2

6
4
I

EXERCISES

Exercise 1

Consider the data in Section 7.3.

(aJ Use an available computer program to obtain the ANOVA table in Table 7.3. Obtain the interaction plot in figure 7.1. Obtain the rnai n effects plots of factor A and
factor B, and comment on whether or not these plots 're useful.
( b) For this rather smaJJ data set, calculate the nine cell avuages, the three averages for
factor A and the three averages for factor B. Use the expressions in Table 7, I to calculate the sums of squares and convince yourself that the results coincide with the
ones given in Table 7.3.
(c) Discuss in detail your experimental procedure. How would you carry out the bak ing experiment if you had to use your home oven, and .the rating procedure if you

TESTING DIFFERENCES AMONG SEVERAL MEANS:


COMPLETELY RANDOMIZED AND RANDOMIZED
COMPLETE BLOCK EXPERIMENTS

3.1

INTRODUCTION

In Section 2.5 we used sample information to test whether the means of two populations are
equ;il. In this chapter, we extend the discussion to the comparison of more than two means.
We discuss two designs for making this comparison: the completely randomized experiment
and the randomized complete block experiment.
Internet experiments that present one of several advertising messages at random to users
of search engines are examples of completely randomized experiments. There the k advertising messages, which may differ with respect to advertising text, background color, and
font size, are offered to distinct Internet users at random; each user responds to one and
only one advertising message. The response in such studies is the sales volume gent,>rated
from each advertising message, or the "hit ratio" (the proportion of those who access a particular Web site in response to the message).
Consider another example. A firm wants to test three different in-store promotions for,
a major product, and identifies a group of 15 stores of similar size to participate in the experiment. Each store will test one and only one of the promotions for a certain period of
time (say three weeks). The promotions are randomly assigned to the stores, with five different stores per promotion. In the language of experimental design, the three promotions
are called treatments, and the 15 stores are called the experimental units. Since the treatments are assigned to the experimental units at random, we call this a completely random-

ized experiment.
An alternative design for comparing the three promotions is the randomized complete
block experiment. Suppose the firm believes the I 5 stores are not homogeneous and that
possible store effects could introduce additional noise that would make it difficult to recognize differences among the treatments. Hence it may be better to observe each store under
all three in-store promotions. We could divide the study period into three one-week periods and, for every store, assign each of the three promotions to a different week. In this
design, each of the 15 stores acts as a block. Within each block, treatments are assigned
to the three one-week periods at random. The design is called a complete block design

CONSTRUCTION OF PLACKETT-BURMAN DESIGNS IN

= J 2, 20,

AND

24

RUNS

'!'Jhlc 6.1 lists the Plackett Burman design for N - 12 run;. The design matrix

1vas

con

'lruded as follow'>. Starting from the first row (which 1s l1>tcd bclm1 1n the row under
,\'
12), you cyclically rearrange the symbols. That is, the sequence of plus and minus sigm
111 row 1 gets pushed to the nght by orn: space to form row 2 and thl' 111111us sign 1n thl' far
right position of row 1 gets moved to the far left position 111row2. I he plu-, -,ign in the far
right position in the second ro\.\ get> moved to thl' f~1r left pth1t1un in rm1 J, .tnd so 01i. I he
cyclie<tl rearrangement of rows continue-, until ro11 11. ThL' , 2th ro11 1' .1 ro11 of .di minus
s1gm. Alternatively, you can cycle through the 11 ruws bv pu,h111g the -,equcncL' ul plus .u1d
m1nu-, '>igm to the left and moving the entry in the far-left pm1tion of .1 row to the f.1r right
po-,1t1on of the subsequent row. 'I his only changes the order f the rum.
\1111ilarly, for\

20. You create the f1rst 19 runs by cyclically rearranging the symbob

in the row shown below; the 20th row i'> a row of.di minu., '>ign'>. The rl''>Uiting de-,ign ma
lrix I'> slrnwn in I able 6.4. The same procedure 1s used tor ge H:rattng the dc-,1gn 111.1tnx tor
,\'
24 runs. !'he init1.il rows needed to comtruct PlaLkett 1lurm.1n designs with ,\' , 28
run-, L.lll be found in the original Pl.1ckctt and Burman ( t Yo.16) rl'lercncL' .rnd tn adva11Lcd
hook.-, on design ot experiments.
'I Im procedure results in the standard order ofa Plack.ett 1lurman dl'sign. As with all e\PL'rimcnts, one should randomizt.: the .i.-,signmcnts of the expLriml'ntal urnh to each run, or
il c\periments arc carneJ out in timeurder, one ~houlJ rando11111c the order of the 1 um.
,\'

12
20
21

8
t

f
f

'I

Il

II

II

I I

I!>

I'

p;

I 'I

'll

2I

"

'I

"'l
2v

;~ 1

26I V- ~

E = ABCV
E = ABC
F = BCD

2 ;\~

E =A BC

, .4
2l\-'

')

:;;

2111

21; -1 1

111

F= BCD
G = ACD
E = ABC
F = BCD
G = ACD
H =ABD
E - ABC
F= BCD
G =ACD
H = ABD
J = ABCD

E=ABC
F= BCD
G =ACD
H=ABD
J = ABCD
K=AB
L =AC
M=AD
N = BC
0 = BD
P = CD

BE

3-f

AE
F

CH+DE

E+DJ

+ CJ

+ BJ

+ A!

J + AF

+.

BG+ CH+
DE
,.

E+

H+

]+

AO

G+
AP

F+

AN

AJ

BE .

BM

BJ

BP

CM
DL
EO

DN

AF
BG
CH
DE

CK
CJ
Pl
DK
FM __ <: BP

GOY -:,..:::,)FL
I{#, . \. . . GN

. FK

. HN

co

~l>f
GK
'FIL

.Kf

LO

. ;:;;i

MN

No TE: The expressions below the columns of plus and minus signs from th e previous page specify the confounding patterns of the es timated effects. I, whi ch denotes
the column of plus signs, is not used as a factor. Interactio ns of order three or higher are assumed zero.

I"

lJ

PLACKETT-BURMAN DESIGNS

- - ' - - - - -- - -- - - -- - - - - - --

- - -------

--

--

6.1 INTR O DUCTION

In cl1apter 5 we focused on 2-level fractional factorial designs. As we have seen, in those de signs the number of runs N is a power uf 2 (N = 4, 8, 16, 32, etc.). In this chapter, we dis cuss '1nother important class of fractional designs called Plackett -Burman designs.
In a classic 1946 paper in the journal Biometrika, P lackett and Burman showed how to
construct 2-level orthogonal designs when the number of runs N is a multiple of 4 (N = 4,
8, 12, 16, 20, 24, and so on) . If the run size is a power of2 (fo- example, N = 8, 16, 32, .. . ),
these designs are identical to the fractional factorial designs that we studied in Chapter 5.
The 2-level fractional factorial designs leave large gaps in the mn sizes uf the available de
signs. r:or example, 7 factors can be studied in 8 runs with a 2 111 ''design, but if the number
of factors is between 8 and 15, 16 runs are needed; and 32 runs are needed for l 6 to 31 factors. The Plackett-Burman designs for N
Su~)pose

12, 20, and 24 h!J in these gaps.

that we wish to estimate the main effects of 8 fac1ors and want to achieve this

through a design with as few runs as possible. We could use :he 2r\

fractional factorial in

Table 5.7. This design does not confound main effects with 2cfactor interactions, but it re quires 16 runs. A Plackett-Burman design with the smaller run size N = 12 is an option if
economy of run size is important.
Plackett-Burman designs have resolution Ill-confound ing main effects with 2-factor
intnadions. Traditionally, they have been used tu estimak 111,1111 L'flLl ts under the assu111p
tion that 2-factor interactions are largely negligible or small in magnitude. Mure reu.:11tly,
researd1ers have begun tu explore the so-called projective prupertie-; of Plackett-llurman
designs and have shown that in some c1rcumst,rnces they can he ei'1cll1\ely used lo identil'\'
llh\y 2-factor \nteract\uns. ln Sect\un 6.3.3, m uur Lhscuss\on uf the n:w\to of a case study,
we make use of this important property of Plackett-Burman designs.

'

-~J

TABLE 5 . 5
/!-Run Fractional Factorial DesiJ<ns: Genera tors, Confounding Pa tterns, and Resolu tion

Resolution IV design:
4 fact o rs: 2fv

Resolu tion III designs:

D=ABC

5 fac tors:
6 factors:

25

7 factors:

2'IJI- 4

Iii

16-- 3
wJI J

D =AB, E = AC
D = AB, E = AC, F = BC
D = AB, E = AC, F = BC, G = ABC

Run

5
6
7

8
-- -- -

24- I
IV
2s- 2
Ill

2f11

27
Ill

D = ABC

D =AB
E =AC
D = AB
E = AC

A
A + BV +CE

B
B + AD

AB + CD
D + AB

AC+ BD

C + AE

E + AC

BC+ AD
BC + DE

D
BE+ CD

A + BD

B +A D+ CF

C A.E+ BF

D - AB - EF

E+ AC- DF

F+ BC + DE

AF+ BE+ CD

B + AD + CF + EG

C _. AE +BF + DG

D -+ AB + o;

E + AC BG - JJF

CE

r :-; nc
D = AB
E = AC
F = BC
G ~ ABC

A + BJ) + CF + n;

EF

AC; - BC + DE

G + AF + BE + CD

The darker -shaded area represents the run s of the 23 factoria l building block des ign . The lighter-shaded area rep rese nt s the calculation columns that are available
for generating the levels of addition al fac to rs. The expressions below the columns of plus and minus signs specify the confounding patt erns of th e estimated effects. For example, in the 2\' design, the linear contrast in estimates B + AD + CF. Inte racti o ns of order 3 or higher are assumed to be zero throughout the table.
No TE :

'.I

A7.I

TAHLf

Design and Jest !<es ults

----

---

---

-- -

----

l'ACTORS

----

Ldminat1on
A = Lamination IJ
Roll Thickness t.x.it Temperature

C =Spray
Pressure

[) = Break
Point

I:= Hold
Time

Rcspon..,c

Standdrd
Deviation

Average

---

- l

-I

-1

II

7.5

4.Y'iO

J_:>

0.707

l l.O

lUl4

1 _.J

u.-07

44.5

6.)64

15
'32\
6
120)

22.0

9.899

23.0

24.042

26

30.0

5.657

0.0

0.000

l.O

1414

1.0

0.000

0.0

0.000

23.5

7.778

4.5

2.12 l

7.5

0.707

12.0

9.899

271

()

-I

-l

19\

-I

-1

-l
-l

-1
-]

-[

-1

I
l

l
l

-I

21
119\
2
(71
40
(3)

29

ill]

-l

-1
-1

-1
-1

-1
-1

l
l

-I
I

-1

-I

-1

40
( 17
34
(2)
0
(6)
0
(16)
1
(l)
0
( 26)

-1

-]

-l

-1

18
( 12)
6
(8)

8
(4)
19

'l.ll

------

-----

-- - - -

0
I

2.1)
0

12[ \

.l
I)())

49

128)

( 18)
0

( 14)
2
(25)
I
( 15)
0
3 l)
29
129)
l
(24)
7
( 10)
'221
---

A 8 .3
Test Remits

TA fl L F

-------

i:'.
0

r
.L

-7
,_,

'J

<

6
:t

::::""

:;

;;:;

""

c;

...;

-"'
'J

.~

l'e'1
( ell

-5

=
v

';!

~.,

>-.

.D

.D

E:

-"'
u

;!

-a

/l

::J

"
E

"'c

-.r

"'

::>:""

<

;::

-a

'J

-a

:;

;;:;

E
'.l;
~

:;;

v
>

-a

'-

0..

0
:3

(,

II

'?

'J

-"'

.g-

<
-'.'.:
ce:"'

._)

v~

0..

>

-;:,.,

~
"

>

OJ)

4-.

-""
'J

=".

:.::;
c

-"'

"J

-?

.,;

.g
u
""
0

>

u
u

--"'
u
v

V)

-a

:!'.

--".

,.,

0
0

<(

::;

.l

i-

6
7
8
g

12.5
29. I
5.8
16.8

+
+

+
+

4 9

+
+

17.8
-3.7
11.0
11.0
1..1
- 6.0
5.3
17.5
- 14.6
7.7
28.9
17. I
6.6
16.8
25.0
125
8.9
14.1
-3.2

I-

t-

+
+

JO
11
12
13
14
15
Iii
17
18
19
20
21
22

\\'eek I

j-

(2 ST0Ri.,/TF'1 <. c I l )

j-

IN SALFS

I\

PFRCF.NT Cfl,\NflF

0..

c
0

-t

j-

+
+
+

+
+

+
+

+
+

+
+

I-

_,.

+
+

21

+-

2~

Week 2

,.\vcrrigc

23.3
26.0
4.2
1.8
3.6
12.8
0.0
-8.8
1.5
33.J
-J J.9
-8.7
8.4
3.9
-6 .4
11.4
6.7
-5 .J
23.2
35.0
8.4
18.3
17.1
2.4

17.90
27.55
0.80
19.30
4.25
15 ..10
- 1.85
1.10
6.25
15.90
-8 .95
-7.00
12.95
5.35
0.65
20. l 5
11.90
5.85
20.00
30.00
I 0.45
13.60
15.60
-2.RO

---

A: Dr splay in prnducc

+10.7<J
10.h \

n:

/-: p.1ck.rg1ng
Signifrcanl effects (ahnvr line)
Crnss-pronrnle with s.rlsa +5.51

H: Displ,ry rc1ck hv beer

:..,};~~

.l.64

f: Shell pnsr11nn ""'!if!\"'d!/.l.'h~El +3.63

Ad

-W: -"_.~~

111 sttlr<' i..1rc11lar

(~:

Discou nl

~-:~

f-1: Ad nn grocery divider

-I 14

/:On shelf advcrtiscrncnl

-1.09

C: Add

10

natural fnncl a1'1c

.l.10

-1.2 6

0.02
-i-

n.n

2. ~

5.0

7.5

Effect ds d percent change in sales

Figure A8. I

Estimated Main rJfccts

10.0

12.5

CASE

EXPERIMENTAL DESIGN ON THE FRONT LINES


OF MARKETING: TESTING NEW IDEAS
TO INCREASE DIRECT MAIL SALES
Corr/011 H. /kl/,

Johannes Leda/fer, nnd Arthur f. Swersey

This case is reprinted from the International Journal a( Research in Mnrketing, Vol. 23
(2006 ), pp . .rn9 319 with r1crmission of the Elsevier Publishing Company.
INTRODUCTION

"Test everything" ha' been a rallying cry in the marketing and advertising industry
throughout the 20th ccnturv. Industry experts like Hopkins (1923), Ca ples (1974), Ogilvy
( 1983 ) , and Stone and Jacobs (2001) have stressed the importance of testing new ideas in
the marketplace. Hut as statisticians developed and refined sophisticated experimental design techniques, most marketers held firm to the approach of changing one factor at a time,
oftrn called "split -run testing" (also referred to as AIR splits, test-control, or championchallenger testing). Only in the last few years have marketing leaders begun to embrace
advanced techniques for real -world testing.
The financial industry - including insurance, investment, credit card, and banking
firms - was among the first to use experimental design techniques for marketing testing.
The project described here is from a leading Fortune 500 financial products and services
firm. The company n;ime and proprietary details have been removed, but the test strategy,
designs, results, and insights arc accurate. Tests were run within two direct-mail campaigns
th<lt focused on increasing the number and profitability of new customers. The initial experiment, a Plackett - Burman screening design of 19 factors in 20 runs, was followed by a
4-factor 16 run full-factorial experiment.
A\though factorial, fractional factorial, and related methods of experimental design have
been widely applied to manufacturing problems, there have been few applications to direct
mail, Internet, retail, and other market-testing programs, and we found no papers that apply Plackett-Burman designs to these problems. For in-market testing, in an early paper
Curhan ( 1974) used a fractional factorial design to examine the effects of price, advertising,
display space, and display location on the sales of fresh fruits and vegetables in a supermarket, while Barclay ( I%9 ) used a factorial design to evaluate the effect on profitability of 1ai;, ing the prices of two retail products manufactured by the Quaker Oats Company. Holl.ind
and ( ravens ( I973 ) 11rcv' nted the essential features of fractional factorial designs and illus-

REGRESSION APPROACH APPLIED TO THE A..NAI YSIS 012-LEVEL FACTORIAL EXPERIMENTS, AND 1 HF FORTUNATE
CONSEQUENCES OF ORTllOGONALITY

In ~ect1on 4.3 we defined and cakulall'd main and intcract101 l'fiects, and we -,bowed th,1t
they arc linear combinations of the responses, with weights coming from the design vectors
and the calculation columns (obtained by multiplying eleme ts of the design vectors). An
alternative way of obtaining main and interaction effects is to write down a regression model
for the response and to obtain the estimates of the regression coefficients. Denote the vec
tor of responses as y, the k design vectors consisting of~ I \a'ues as x, x, .. , x1., and the
calcul<1tion columns as x 1h xu, ... , x, u (each a product ot 1-.vo design columns), x 1"'' ...
(each a product of three design columns), ... , ,ill the way to x 1 ; (the product of all k
design columns). Including the column llf unes, x,J
column is of length 2'.
F\amples of these vectors are given in Tables 4.4 and 4.9. 1-here WL' Ji-,t the vector or re
sponses, as well as the design and calculation columns. l'he only d1ffLTL'nce 1s that the rnl
umm are denoted by factor labels (H, 1, ... , JU'C' 1n !able 1.4; and A, Ii, ... , Afi(f) 111
Ta hie 4.9), instead of x , x", ... , x

~ 1.

Also, !'ables 4.4 and 4.9 do not list the column of ones.

ThL regression model can be written as

/3,

,x
'.\./,./I

\\e rcgress the veLlor of respon-,es yon 1' regre-,-.or \L'Llors I, x, x,


x1
x 1.,, ... , x H1> ... , x 12 ., ,. fhere 1s no error term in this regre-.-,ll>ll ,\, thi-. i-, .1 fulh
'>aturated model, ,,1th the '>ame number of rnetliLil'llh ,1, 11umhe1 ol ob-,enatiom. UI
cour'>e, nut all dfects need to be i1JLluded. hir example, the model 111 '-,eLlion 4.6.2 uimid
ered only main effects of R, C, and I>, .ind the 2-foctor interaction RC .\1-,o, one 111a1 be in
terestcd in just main effects and 2-factor i11terad1om. In thi-, ~<1se, onc would regre.,, yon
I + k ~ ( (k - l )k/2) vectors I, xi> x,, ... , x,, x 1 , x 1,, ... , x, . ,.. The noise component 111
this model would reflect interactions of order three and higher.
Computer soft11are can be employed to Larry out the regre-,.-,ion. The responses ,md thL
dl'sign vectors are entered as columns into a spreadsheet; calculation columm are lormed by
111ultiplying variow, subsets of the design vectors; and the regression command is executed.
The vector of regression coefficients f3 consists of clements {3 0 (the constant),
/3 1, {3 2... , {3 1 (main effects), {3", fJiJ ... , /3k i.k (2-factor interactions), /3 12 1, . . . (3-fac..tor
interactions), {3 12 q . . . (4-factor interactions), ... , and {3 1 , (k factor interaction). Xis the
2' X 2' 111atrix containmg the regressor vectors. Regression theor)' in 1\ppe11d1x 4.4 ,hows
that the least squares estimate of the regression coefficients f3 i-; given hv
1 ,,

/3

(X' X)

X'y

where X' is the trrnspo'>e of the matrix X, and (X' X)


matn\ X' X.

1s the i111cr-,e ul the 2'

21

'

LEN'J H's APPROACH !-'OR DETERMINING STATISTICAL


SIGNlFICANCE IN UNREPLICATED FACTORIAL IJcSIGNS

Len th, in his paper "Quick and Easy Analysis of Unreplicated Factorials," (fo,cusses another
useful strategy for assessing the significance of effects in unreplicated cxperimc11b. !fr, pro
cedurc is based on the following simple formula lor the stancbrd error uf an estimated effect. 11 none of the factors are active, the standard deviation ol the m estimated effrcts
}1, / ;, . . , f " serves as the standard error ufthe estimated cCfcch . However, if some effects arc
active , this estimate is too large, as it not only incorporates random variability but abo the
effccb of active factors. Hence, one needs to omit from the calculation of the standard de viation the estimates of all active factors. The normal probab11itv plot discussed previously
does this informally when determining the best -fitting straight line tro111 just the estimates
in the linear portion of the middle part uf the graph, not fru111 thL' esti111ates u11 the L'xtre111c
left and right side that do not appear to fit the line through the rniddk.
Len th ( 1989) uses the fact that the median of the absolutL values u( the estimated non
active effects, suitably normalized, provides an estimate oC the: standard deviation , and he
calculates

s = ( t.5 )Median( lf; 1, ./~I '

... , lf "i )

The factor 1.5 in th~ normalization arises from the relationsh::J betwee11 the standard devi ation and the median of the absolute value ofa mean zero normal random variable. Jn the
next step, Lenth ( 1989) omits from this calculation all estirna'es with absolute values larger
than 2.5s, and he calculates a revised standard deviation
PSF == ( l. 5)Median (I J; I J~ 1 ,
,
1

...

1f "i )

),1< 2.Ss

He calls this the pseudu standard error (PSE ) and uses it l1l the calculation of the rnn fidenu' intervals 1\Jr the eCtecb. !'he 95'!1u cunl1dcncc interval !or an cfft:ct, t:st1mutcd
e}Tf'CI >- (t)(PSE), uses the 97.5th percentile ofa t-distribution with m!J degrees ol freedom.
For a standard error that is estimated with reasonable contidcncc ,111d th.it comes lrom
many ubservations , one would use the 97.5th percentile of the standard normal distribu tion, or simply t - 2. However, in the unreplicated situation the PSE comes from very few
observations, and Lenth ( 1989) lound through sirnulatiom th <l t the 1-dist1ibutiun with m /5
degrees of freedom works best. For m - 7 effects , l ~ 3.76; for rn = 15, t ~ 2. 57, and for
m
3 l, t
2. 22. Le11th recomnll'nds displa ying the estimakd l'iTL'cts 011 a h.1r -.:hc1rt , ,111d
adding to this chart the margin of error, :!:: ( t )( PSE ). If an esti1 n ate cxccl'ds these limits , then
it is likely that this particular factor is active (i.e. , :iignihcant ). We should point out that
Len th .,uggests even larger margins of errors by innnporating .,imultaneous .1djustmc11ts for
the multiple cornp<1risons (sec Appendix 4.2 for a discus:iion of multiple comparisons ).
Minitab displays Lenth 's PSE in the context of its norm.ii prubability plot (sec 1-'igure 4.6 ).

INEFFICIENCY OF APPROACH

E~

l HAT

CHANGF-. ONE FACTOR AT A I !ME

---- ---- ---All too often the effects of k factors are studied by carrying out SLILLessive cxperimenh in
which the levels of l'aLh factor are changt'd one al a time. ~uch experiments start with the
standard settings of the k factors, then change the levels of the rne I.it.tor th,il 1.-, rnmidned
the must influential. The responses al the low and high setting, of this um faLlor art' Ulm
pared whik keeping all other factors fixed, and the kvt:I al '>'11ich lhl respume is hesl 1s
locked in for the ne.xt stage. The factor that is considered sern:1d most import<H1t is v;1ried
next. Again, responses at the low and high lcveb uf this factor arc rn111pared, and thL best
tevcl of this factor gt:ts locked in for all subsequent runs. Then rn to the third factor, and so
on, until the last factor is reached.
Cu111pared to the factorial (multifactor) experiments whne the il'veb of all f"ctllurs are
changed togetha, the approach of d1.111grng one factor al a time is inefficient for >l'\eral

It requm:s more runs lo achieve the same prec1s1on for the l'ifnts <.''>ltmates.
It may miss the opltmum altogether.
It cannot est irn,ile mteractions
ll does nut provide gent'fal co11L.lus1011s ,1buut laLtor effect,, g1ve11 th.1t thl' L'sti111dll's
depend on spL'ctliL levels of the I"L'lll,1ini11g faL tors.
\t\',, illustr,1te these shortLOrnings 111 t hl' rnntnt of the 2) foctonal dtsign 1n the 2,. factorial

vvitl1011t tL'pliL<1l1ons,

WL'

conduct a tot.ti of I rutb. I he 111.1111 cikLI.> ut both f,1ctors atl' t'stt

rn.ited bv rnmpariug two observatiom .tl the low ,rnd the !ugh ln els ol L'<IL h factor. A"-.ume
that tht approach uf ch;rng1 ng u11e fa Lt or di a time begins with ( x 1
+ ) and vanes
thL Jc,,ds ot faLlor l first. To obta111 the same prl'Llsto11 for thl' t'st1111,1lL's, une must stdrt \\ ith

1 ,,\,

four observations, 2 runs each at(-', x , - ' ) and (x 1


1 , .x,
+ ). l.ock1ng 111 the
be't 1,1el for factor I (assumt' th,lt it is \ 1
), onl' procel'ds to the lll'\t L"CHllp,mson wltcrL'
one 1.iries the kvels uf fatlor 2 ,rnd >tudies the respoml' at (.\
, x,
) ,111d

(x,
, x2 = +).The 2 runs at (x 1 =
, 'i - +)have alreadr been obtained in the first
sll'p; hut 2 more runs at (x
, x2
- ) are required. l'h1' kads to ,1 total of 6 runs, as
rnmp<tred to the 4 1uns in the 2z f~tLtorial Lksign. This shows th,1t the dl'l'rllad1 ofch,1ngi11g
one foctor at a time requires more runs to obt,un estim,1tes with thl' s.1111e prechion.
Also note that the approach ot ch,1nging onl' t.1ctor at ,1 t llll' 111,11 1111ss tht opti111u111.
Comider the situation with the follow111g Juu1 laLtor levt'I co111h111at1ons ,md tht1r respumes that are supposed lo be max1mi1ed:
t- ):y = 80

- ):y - 90

):y
70 J
- ):y - 110

-t

CASE 10
PIGGLY WIGGLY

Wilkinson et al. rwilkinson, J.B., Wason,). B, and Paksoy, C.H.: "Assessing the Impact of
Short-Term Supermarket Strategy Variahles," Journal of Marketing Research, Vol. 19 ( 1982),
72- 86 J described the results of an experiment that assesses the impact of price, promotion,
and display on the sales of several grocery items.
THE EXPERIMENT

They considered three price Je,cls:


Regular price, which is the recommended retail price to customers as listed in the regional warehouse price manual
Cost priLc, which i' the cost to the '-Upcrmarket
Reduced price, which is the price halfway between the recommended retail price and
the cost to the supermarket
They studied three display choices:
Normal display <;pcice as determined at the beginning of the experiment on the stock
manager's recommendation
Fxpandcd display, which amounts to twice the normal display area
Special display, which is normal display, plus some type of additional display alternative such as special display at another location in the store
They considered two levels of advertising:
Advertising, which means including the product and its price in the supermarket
chain's Wednesday advertisement
No advertising
Detailed operational definitions of the factor levels are given in Wilkinson, Wason, and
Paksoy ( 1982).
The authors used a single, carefully selected Piggly Wiggly supermarket for their experi-

TAB! lo

4.9

Results of t/1e 2' 1-rntorial Experiment


----

------

---

-- ---

- - --

-"

>-<

0:

r~

z
z
";..;,,;

<

z
z

<

-::;:

<

"'>--

"

""',_.

-z:;:

<

-~

"'"
r

"'"',_.
z
c_;

L\:-l FHA1 flON:-i


---

lest Cell

ll

l)

AB

AC

----

---

AD

BC

HD

( /!

ABC

ABlJ

--ACD

+
+
+

+
+

152

-t-

+
9
10
II
12
13
14
15
16

+
+

+-t-

+
I

Orders Response Rate


184
252
162
172
187
254
174
i83
138
16R
127
140
172
219
153

t-

ABCD

~--

BCD

+
-;-

2.45%
3.36%
2.16%
2.29%
2.49%

3.39%
2.3Li:\o

2.44%
1.84%

2.24/o
1.69/r.
1.87')-b
2.29%
2.92%

2.04%
2.03 1Yo

CASE

12

ALMQUIST & WYNER

Eric Almquist and Gordon Wyner ["Boost Your Marketing ROI with Experimental Design," Harvard Business Review (October 200 I), 135-141) make a convincing argument
whv expcrimcnt;d design can speed up the learning curve of marketing research. IJirect
m;1rketers have used simple techniques such a.'> split mailings to compare consumer reactions to different prices or promotional offers. However, such traditional testing techniques
that change one factor at a time become prohibitively expensive if more than just il couple
of advertising techniques need to be evaluated. Changing factors simultaneously ilnd
changing the factors according to a well-constructed experimental pbn is the key to efficiently learning which of many factors have an influence. Almquist and Wyner discussed
two examples.
EXAMPLE 1

The lirst example describes how a company called Ri1Ware tests the sales response to .1 c<1mp.1ign I hat varic.s th rec factors: Price <it four levels ( $150, $160, $170, and $180), two differ-.
cnt messages (one cmphasi1ing speed, the other power), and two pro111ot1011 slratcgil's (one
involving a free trial period, the other a free gift). With 3 factors-two at 2 levels and one at
4 levels - the 2 "4 foctorial involves the 16 experiments listed in Table A 12. l. The la .st column in this table indicates the orthogonal half-fraction that is suggested by the design software JMP as <1!1 8 run screening design.
A fraction of a mixed -level design with one factor at 4 levels is easy to generate. One
writes down an 8-run dcsigr in seven 2-level factors (see Table A 12.2 given below) and uses
two columns and their interaction to assign the 4 levels of the 4-level factor. This procedure
generates an orthogonal design with one factor at 4 lcvels, and up to 4 factors at 2 levels each.
Herc, we let the first two columns represent the levels of the two 2-levcl factors. The columns 3, 12, and their product (3)(12) = 123 are used to determine the levels of the 4-level
factor. These are the columns in boldface. Level I of the 4-level factor is associated with
( - I, I, - I); level 2 with ( - 1, - 1, I); level 3 with (I, - I, - I); and level 4 with (I, l, I).
This leads to the 8 runs that a re indicated in Table A 12. I. Note that the unused columns la1

TABLE A\2.5
7'hc 16 Run [) -Optimal Design for the Crayola Marketing Campaign
----- ------ -----

Suhject

Action

Closing

Salutation

Promotion

0
- \

0
()

0
- \
0
- \

- \

I
0

- \

- \
I
I

0
- 1

- \
I

0
0
- 1

TA

ll l.E

Al 3. I

D!'scripti()n n( the 45 Crentivcs and the Resultin;; VTS!TS, CUCKS, ACTIONS, CTR = CLICKS/ VISITS,
and AR = ACTIONS/VISITS
-( ,rcanvc..,

1\

.1

(run_o:;

hnttnm

top

(,"

/J

Ii

/{

Visitors

Clicks

Actions Cli((%)

AR(%)

I 0.97

2.177

405

2:\9

I H.011

111

2, 150

420

212

19.53

9.86

10

1,988

376

203

18.91

10.21

2, 163

412

2:\2

191

231

19.01
1-.46

10.72

2,239
2,204

404

249

I R.JJ

11.29

2. 119

11 n

213

]lJ~'-l

10.05

2, 124

414

2011.~

2,088

1n0
1'>2
413
479
420

183
196

8.61
9.38

10
6

6
6

2,208
2, 131

10
II

2,262

12
11

I 0.31

245

17.24
20. '\7

215

l'Uk

252

21. ! 7

I 0.08
11.14

9.39

11.09

202

19.5!.

214
214

I ".52

9.54

2, 134

.193
391

18.32

I 0.02

2, I 01

356

186

16.94

8.85

2,089
2, 144

353
385

188
226

I 0.54

202

2,068

360
4 i .l

16.89
17.93
1-/ _,_,_,
lJ

196

19.97

2,054

)79

I HO

r .~.45

2,202

4_::;,3

230

2,087

174

216

2,087

1ql,

41'

2 I6
2[18

}h

2, 12.1
2,077

211.:.17
1-.92
I 8.81
'() 19
1h.-o

9.05

2.059

34
).19

188

183

8.88

28

2,21 I

4 1

242

I h.iri
18 (\-

I 0 91

2'1

2, 121

120

21 I

,:'.().()h

9.93

"'

2, I 62

406

2() 1

I' --

9 ..~IJ

\[

1,649

161

"-1.)

2,257

252

\2
\1

289
s 11

9.76
II 16

2.12\

.184

177

2, 188

189

227

I ii.OH
1-.r

2,244
2,202

406
434

I 86
206

19.70

2,241

448

223

IY 9'!

2, 185

454

241

2fl77

39

2,166

479

242

22 11

40

2, 194
2,094

451

20.~S

1M

243
217

2,214

397

214

J 7.9J

2,061

.190

210

18.92

2,072
7,698

441
1.282

222
74 I

2, 151

14

Ill

2,242

15
16

Ill

Ill

2,090

17
18
Ig

l()

)I

"

[()

21
.~ 11

lfl

2;

\'
\6

JO
8

.l7
18

41

42

43

44
4'>

~)

18 09

I 0 18

I .2R

16.65

8.99
9.66
9.47
8.76
I 0.44
I 0.34

I 0.34
9.79

8 ..11
I 0.17

8.28
9 ..\5
9.95
11.02
I 1.17
I 1.07
10 ..16
9.66
10.18
10.71
9.62

--The numbers under areas A-bottom, A-top, R through l refer to the available levels in each of the 10
tl.'st areas . Level l represents the baseline level.
NOTF

REFERENCES

CHAPTER 1

Box, Ceorge I. P., Hunter, William C., ,ind Hunter, J. ~tuart: Statist1rs{or Experimenters:
/)esign, Tnnovatwn, nnd Discovery. New York: Wiley, 1978 (2nd ed., 2005).
Deming, W. Edwards: Out of the Crisis. Cambridge, MA: MIT Press, 1982.
Fisher, Ronald t\.: The lks1gn o{ l:xpcriments. Fdinburgh: Oliver & Boyd, 1935 (and variOll'> later edit inn'.).
1-islwr KO\, Jo,lfl' U. A Fisher, 7'/w I if( nfa .',(1c11t1.,t. New York: Wiley, 1978.
Pandl', Peter'>., l\'cum,lfl, ]{ohert P., and C:avanagh, Ronald R.: The Six S1gmu Way: ff ow
(;F, ,\fotnro/a, 111111 Other Top Co111pm11cs /\re Hn111ng Their l'er{nrma11ce. New York:
:-.tccraw-l lill, 2000.
'>al,hurg, I )avid: Flu I 11d1 fosting 'fr11: I low "itatistics Rcvo/11tio111zcd !->( rcnce zn the rwr11t1ctlr Cn1t111T. \;c1, York:\\' H. freeman, 2001.
CHAPTER 2

Cl1rkc, D. C1.: .\/ur~ct111g A1111li'sis 1111d l>l'nsron Making. Redwood C:ity, CA: The "c1cntific
Press, 1987.
l-L1lcl, Anders:/\ flrstury of'J'rohalnlity and Statistics 11nd Their Applirntinns Before 1750.
:\cw fork: \\'iley, I 98h.
'>hcwhart, \\'. ,\.: f(<l/Tom1c ( ontrol oj ()11a/1ty uf.\1a1111facturecl Product. \kw York: \',in
l\;o-;trnnd. 19~1.
'>t1gkr, '>tephcn \!.: .'>tati.,tu 011 the fable. Cambridge, MA: I larva rd LTnivcrsity Pre">, 1999.
\'\'clch, H. !..: "The significance of the difference hctwecn two means when the population
variances arc uncqu<1l." Hwmetrika, Vol. 29 ( 1937), 350-362.
CHAPTER

l\m, ( ,corgc '"I'.. I lunlt'r, \\'illiam C, and Hunter,). '>tuart: St11t1sticsfnr 1-.'xpcrimmtcrs:
J>cs1g11, /1111ovatio11, and I>iscnvcry. New York: Wiley, 1978 (2nd ed., 2005).
Cl.1rkc, D. C.: Marketing Amilysis and f)f'Cision Making. Redwood City, CA: The Scientific
Press, 1987.
hshcr, R. A.: Sta11st1rnl Methods for Research Workers. Edinburgh: Oliver & Boyd, 1925.

TWO-LEVEL FACTORIAL EXPERIMENTS

4.1

INTRODUC J'ION

In 1)11, chapter, we hcg111 foLusing on the heart of thi'> book. 11 ( 'haptn 1, we were um
ccrned with compa.111g the effcct1vencs., of'>c\cral 1real111e11h, Lillh

ve1",1011 of,1 single faL

tor. 111 one example, the fador was a product displa1', and we tLstl'd thrL'L' ditlcrenl display.,
based on a Lo111pariso11 of weekly store sales. Herc, we extend 1hal disLussion, focusing 011
experiments with multiple faLtors.
Fxperimenta1 design methods have roob

Ill

agriLulture, and we uo,c an example from that

field lo introduce the material we will cover. ~uppose we arc nperime11t111g \\'ilh 11,1ys lo
imprm'L' the yield

Of COrll,

and

WC

identify three factor'> th al sel'lll llllportalll

type ol fCr

til11er, \aricty of seed. and type of pesticide. We decide lo tL'sl two fcrtili;er for111ulat1om,
two kinds of seed, and two different pesticides. As we discussed m ( hciptcr I, the lrad1t1onal
method for testing multiple factors is to test one factor at a t11m. But hsher ( 1935) showed
that a factorial design that tests all faLtors si111ult,111eously is a 11 ud1 bettn approad1. Using
fishers method, wc test the 2 X 2 X 2
8 possible combinaliom of fertilizers, seeds, and
pesticides. We divide our experimental field into 32 equa1-si1ed plots and randomly assign
each of the eight co111b111,1lions to four plots. hJr ca'-h of the 32 plot:-., \.\L' measure the num
ber of bushels of corn produced. This factorial arrangement would allow us to compare the
two krtilizers, the two kinds of seeds, and the two pesticides. It would also allow us to un
co,er ,111y interacllom between factor!->. hn e>.amplc, 1t ma) l 1 i111 out that 'eed \aril'l) l 1s
bcllcr than variety 2 when fertilizer I is used for both, but that the opposite is true when fer
tilizer 2 is used with both seeds.
(:on sider another example. An advertising ,1genL~ is deo,1g11111g .in onl1m ad. It idl'nl1fies
three factors to test. with the response be111g thl' lrdction ol ad 1'1ewLrs 1,ho '>lgn up for the
adl'erti,ed service. One foctor is the ad copy- -.i traditional l'et'>ion or a 111ore modern one.
The SL'rnnd factor is the font

a traditional font or a fanuer ,ne, while the third factor is

the b<1Lkground color white or b\ue . In cl fadorial Lk-.ign, lllle ul the eight po-....ihk ,\Lb
would rcmdomly be '>l.:11\ to each v1.:wcr. 1he qut:-.t1u11 I'> \\h,1t 1-. 1rnpm\,rn\ h.:r.:( h 1\ the
Cn\)'), the font, m the h,KK\jround L~)\m \\1a\ matt.:r<~ DL1 the b\.tm-. ml.:r,1LI m the '>C\\'>C t\1,1\

INDEX

Page numhcrs in italics rifer to figures and ta hies.


1\-I\ splits, 2

block, origin of term, 60

AdTcl (Barrett Food.s Company) case study,

block effect, 140-141

:19, 37, 18, 39

block experiment. Ser completely randnmi1ed

agriculturdl studies, Ml, 04


al 1ascs, I 1:l

experiment; randomi7ed complete block

15

experiment

Almquist & VVyner case study, 281-285, 282,


283,285

Box, George E. P., Hunter, William G., and


Hunter, J. Stuart, Stati;ticsjor fxperimcnt

ers, 3

alternative hypothesis, 29
Alumni Donations case study, 20- 24, 21-24
ambiguities in two level lr.ict1onal factorial
designs, 125, 121-:- I -15

Box, Joan Fisher, R. A. Fisher, The Life of 11


Scientist, 6
box plots, 23, 24

American Societv for Qu,1litv I ,\SC)), 4


,rnalvs1s nfvari<rncc

11\~(l\.'t\),

60, 171; t,1hlc

.ind st.imbrd L011lJ1L1ter soltware, 5.\-55,


51,

~s

/\NOVA. Sec ;inalris of V<nianu:


A optim,1lity, 201-: 210
apple juice sales example, 177 180, 178, 179,
/8()

arithmetic mean, 18. Sec also mean


i\SQ. See Americin Society for Quality
automobile industry, h

cake hakrng example, I 73-174


calculat1on columns for calculating effec 1:-,
73 74, 74
case studies, 2-3, 213 -287; 1\dTcl (lfarrett
Foods Company), 35-39; Almquist &
Wyner, 281-285; Alumni Donations, 20-24;
cracked pots exam pie, 65- 82, 117- I 19;
direct mai l credit card offer, 82-87; 1-aglc
Rrancls, 214-216; e-mail advertising ex
ample, 135-139; Experimental Design 011 the
Front Lines of Marketing, 155-161, 257-

bar charts, 18, 21


Bell, Gordon H., 21.1, 217, 228, 244, 257

256; Kenya AA coffee examp le, 111-117;

"best-practice" drug, 36

Magazine Price Test, 92-93, 217-221;

272; Experiments in Retail Operations, 244-

between-sample variance, 51-5.1

Mather Jones (magazine), 2, 222-224, 2'10-

binomial distribution, R; ofa discrete distribu-

241; Office Supplies E-Mail Test, 22R-239;

tion, 11-12, 39, 11; parameters of, 11

on line learning example, 128-133; Peak Elec-

CASE 13
PHONEHOG

This Lase continues our discussion in Section 8.2. Phone Hog recorded the number of distinct visitors to the PhoneHog site (VISITS), the number of times visitors click on the -,ub
sequent page to obtam additional information (CLICK~), am1 the number of actions of av
tually completing the subscription agreement (AC'! IONS ). The click-through rate, CTR CLICKS/VISITS and the action rate AR= ACTIONS/VISITS rnea-,ure the succes> of tht
creat11es. The results are shown in 'lahle Al 3.1.
QUESTIONS

b . erc1>e 2 in Chapter 8

_J

Main~effects

plots (fitted means) lur sale,

Advertising

Cities

14,000
~

12,000

:;;

4-

c:

""
~

----

10,000

-------

K,000

t>,UUO

------------- - f ---

~-,--------,-------,-----~~--,----~--,--r

,---,5

-1-

Time
12,0UO
v,
~

ro

4-

l0,000

ro

:::;:"

8,0DO

Figure Al 1.1

Plot of Main Effects

_ _ _i

CASE

11

UNITED DAIRY INDUSTRIES

J_
Thi-. Lase is ad.ipted (rum D. ( ;, < l.11ke /Murkct111,~ J\1ui/y,1, 1111d I >ct1~1011 .\/11klllg, ThL '>u
entiliL Press (I 987) ].
Researchers at the United Dairy Industry Association ( L'Dl1\) were evaluating the results
o( a recent field experiment that tested the impact of varying levels ol advertismg 011 the
sale., of cheese. The principal objective of the study was to measure the retail .,ales response
(pounds of cheese sold) to varying levels of advertising. Eight markets were selected for the
experiment-two from each of the four geographic region-.: Northea'>l, Midwest, ~outh
west, and Southeast. Two markets with similar monthly saks patterns were selected from
each geographic region in a way that m1111mi1ed overl.ip of local tek1 i-,1011 and ne11-,papcr
cover.ige. Vl/ithin each geographic reg1011, the two markets 11cre des1g11all'd as test or LOil
trol market on ,1 random basis.
becutives determined the levels of advertising to be le;ted in the experiment. It wa.,
beliL'1ed that the levels should be distinct enough to generate measurable difference-, in the
results. They decided to tests the impact of four levels of advertising: 0 cents (level A),
3 cents (B), 6 cents (CJ, and 9 cenb (D), all expressed on a per-capita basts. The 6 cenb per
ca pi la level represents a national campaign costing approx1111ately 12 m1llion dollars ( 111
197 l). The principal medium for advertising was television, with point of purchase displ~11
materials in stores and newspaper ad~ playmg a secondary rok. Lich of the tour leveb of advertising was implemented within each test market during one of four l-month periods he
tween May 1972 and April 1973. The sequence in which the .1dvertisi11g leveb were tested
was selected so that each advertising level was tested in onlv 011e test market during <InY Olll'
tilllL' period. Such Jn arrangement 1s referred lo a' a I.attn ~ifUUrl' design. You can check that
ead1 letter in the Table A 11.2 (A, B, C, J J) appc.irs only onu: 111 ead1 uilu11111 and e,1d1 ro"'
\\'1thin each market, L' DIA executives obtained the coopei at1011 of approximatelv lO su
perm.1rkeh 111 oht,1i11ing quarterl1 .1ud1h uf d1ee'>L' saiL'' .\1 , 1,1gL' LhL'L''L' s.iiL'' I 111 J'llllllL1'
per store in each lL''t market aero" the fDu1 l 11w11lh f'L'l'imb hell\L'Lll ,\I.iv I 'J7 2 ,111d :\pril
197 l ,1re listed in I ,ihlc A 11.3.

:-..n l 'J Jl'l I C0\11'1\RJSO;-...s

l'hc J1r<lll'durc for de1cr111111111g wh1Ll1 L'ffrcts .ire sig111tiL.111t 1nvoln'' nrnlt1ple comf1,1r1-,nm
of 111,1111 l'fTeLI'-. In 1hc 2 1f.1ctnr1.il cxpcrinll'nt, fcir nample, we assess the significance of 15
effects. Jn /11
15 LOmparisons, 1t would not be unreasonable to sec one effect outside the
l ri t iL.il 1al uc I ma rg1 n of error l, + ( I. 96) sin 11drird error( effect), iust hy cha nee even though
thert' .ire no acti1'l' f,1t tor,. rci gu<1rd against the error of declaring a f.ictor significant incnrrcctlv, we can apply multiple u>mparison proLedures that increase the critical value. 1\ simultaneous 5/ci margin of error 1s obtained hv repL1ung the 97.5th percentile (0.025 upper
tail probability) with the pcrLentilc of order (I + 0.95 '")12. This simultaneous margin of
nror uses the f.ict th.it estimate., of the effects arc independent. !or cx<1mple, for 111
7
1
comp.1ri'>ons, the JK'rLcntile of order (I + 0.95 )/2 - 0.9963 from the standard normal
distribution is 2.68. This is larger than 1.96, the factor used in the critical value without a
multiple comp.irisnn .1djmtrne11t.
\\'c can apply the multiple comparison.., procedure to the results of the rq1l1cated
3 factor cr.icked pol'> example. In Table 4.6, the confidence intervals and significance tests
for the effects arc ha,ed on a /-1alue of 2.106, which corresponds to 97.Sth percentile of a
I distribution with 8 degree nl freedom. Applving the multiple comp.rnson method with
111
7, the apprnpri,lte I v:1li1c 1., ).56, the 99.63th percentile of the t cfotribution with R degrees of freldom. 1 he cnnh lcnll.: interval for each effect becomes wicn; Estimated effect

3.56 ( 1.41), or I 'tim.11L'd effect +-S. I 1. Am effect with absolute value greater th,111 '.i.13

is statisticallv signific.1111. Th' main effects are still significant, but the significance of the F<C
1nter,1L ti on (e.-,t i mated effcL t lf 5.5) becomes borderline.
\d1u,tments for multiplL comp;1risons guard against the error of judging too m.1111 factor.'> .is important. ( )nc could Mgue agJinst thl' use of such adjustments on the ground that
1t J'> muall} not ,1 \l'l'IOlls mi,t.1kc to u>midcr an imignificant effect as significrnt. f\lost l'\
pcrime11tat1011 1-, ,1 '>eque11t1al <lctivitv, and not ,1 one-shot affair. Including borderline '1gnifiL.111t f;1llor.., .it .1 '>llh'L'']llL'nt ,t,1gc Lert.11nlv involve' more work .l\ cxtr;1 factors need to
he c.irried .1lo11g. l lo1,t1,r, <>Ill' l\111 karn .lt the next stage that such foLtor-, arc not lll'L'cicd,
.ind not n111Lh h.ir-rn i' dorH h1 not ruling them out immediately. On the other hand, dispmlllg ol

.1

f.1ctor t<><1 q111Lkh- m.1y po'e

,1

more '>enous rr.-,k.

A BRIH l'Rll\llR ()]\'

REc;Rr-.SSIO~

Note th.1t rcgrc>s1on software 1s reJdily a\.iil,1hlc ,rnd all that is needed in practice is .lll underst.1nding ofhow to 1nterprL't the program output. \'\'hile a detailed k1owledge ofrcgrt's-.ion ''not nccc.,.,an for ,ipph ing the design apprnad1 put forward in this chapter, it \\'ill
help you underst.111d Ll'rtain isrnes in Chapters 7 and 8. Also, if you have h<ld prior e:--pnsu1-c
to regrl's.,ion, the m.1tcnal in the following two appendixes will give you .1 brief conLi.,c -,111nm.1n of the mJ1n n''>tilt.s
Con-,1dcr the -,111111lc l1ne.1r regrc-,sion model y
{3 11 + {3 1x l t:. Ignoring the 1rn1-,e t:,
this model represenh a straight ltne if plotted on ,lll x-y graph. The noise componrnt in
troduLe'> random '>L.tlll'I around the model line f3n I f3 x. Assume that the noise uimpo
nent h,1-, mean /CICl .ind \'Ml lll(e rr 2 .
,\s.,ume th.it there .ire 11 p.1irs nf ohserv<1tinm (x 1..11), (x,, y;), .. .,( x:n, y,,), whid1 arc
1

gra11hLd on a 'Lattl'I plot. f-igure 2.7 111 ~CLtion 2.3 is an example of such .1 plot. The ohieL
tl\L' ''to dctl'lm1nc the line th.ll 111.s the data hc-,t. l.c.1st -,quarcs cst1m<1t1on selech the e-,t1
mates for {3 11 and f3, \\hich we denote h: {3 11 and f3, hy minimi;ing the sum of the squMcd
1crtiL.li di'>lancc' ~ ., .l,
13 1 {3 1x) ". !'he estimatl's can be calculated quite ea.,ilv. l:xpre'>siom for the L''t1m,1te'> Lan he \Hittcn down in vector/matrix format,

lhc 11 X I column \CL.tor y rnns1sts of the response (y) observations. Then X 2 m.itrix X
consists of two n X I colum"s: a vector of ones, denoted by I, and the vector x cont,1ining
the values of the rcgrcssor (x: variable. That is,

xl

Xn

!hi: matrix X'


Lxl

x~

Xn

. X, ( .\ ' .\) 1<.


ts t hc transpose of. t he matrix

the m.1tn\ product of,\"and \",and (X' X) 1 is the inverse of(X' X). The matrix exprL''><.ion
lor the estimates 111 equat10 (4A. I) is very L.011vcn1cnt as it also wori<s for more general
mndck
lhL' lilted \,dues from thL' regression lit,

y,

/3 11 + {3

x,, arc obtained hy rcpl;1c1ng the

regres'1on cocft1uenh in the model equation hy their estimates. rhe residuals arc the
d1ffercrKl''> hetl\'l'L'n the oh-,enations and the fitted 1alues, y,
y
J',
11 -L fe, -:,).

(fe

i\n cst1111atc of the var1dncc ir- is obtained from

TWO-LEVE L FRACTIONAL FACTORIAL DESIGNS

5.1 INTRODUCTION
In Ch;1ptc:r 4, we di-;rnsscd 2-lcvcl factorial designs. These designs arc very useful when
there <ire relatively few factors. Rut ask, the number of factors, increase~, the required number of runs in a 2' factorial design grcws rapidly, with each additional factor doubling the
number olruns. \\'ith 4 factors, there arc 2 4
6 factors, 2 <,

16 runs; with 5 factors, 2'

64 runs; ancl .~o forth. With I 0 factors, there wou ld be 2

10

32 runs; with

1,024 r uns I Ob-

viously, an experiment with that many runs would be out of the question. If full factorial designs were the only choice for the experimenter, experimental design tools would ha\'C limited value. But as we will sc<' in this chapter, fractional designs in which the experimenter
performs only a fraction of the number of runs required in a full factorial design offer an
extremely powerful .1pprnalii to experimentation.

5.2 SP I LLING THE BEANS : A F R AC TI ONAL D ESI GN


FO R 5 FACTORS IN 16 R UNS
A company supplies freshly roasted rnffre to restaurants and gourmet food stores. ln a recent blind taste test, the company's Kenya AA coffee was judged inferior to the same variety
of coffee produced by a competitor. In light of this disappointing outcome, the firm's chief
coffee roaster, with the help of a statistical consultant, decides to conduct an experiment
aimed at improving the taste of the Kenya AA . (This examp le is based on an actual study.
For simplicity, some minor details have been changed, but the essent ia. elements of the real
experiment, including the c nclusions, have not been altered.)
The chief roaster has identified 5 facto rs likely to be important. The factors arc the in itial temperature of the roas~ing machine when the green (unroasted) beans are put into it
(factor l ); the temperature cifthe flame (factor 2), which determines how qu ickly the beans
are roasted; the color of the beans when they are removed from the roaster (factor 3); the
supplier of the green beam (factor 4); and the roast ing machine (facto r 5). Two small
(')~pound) roasting machines are used in the experiment. The operating ranges for tlamc

temperature arc the same f(Jr both machines.

CASE 8
EXPERIMENTS IN RETAIL OPERATIONS:
DESIGN ISSUES AND APPLICATION
Gordon H. Bell, Johannes Ledolter, and Arthur}. Swersey

INTRODUCTION
Experimental design methods have long been recognized as an integral part of production
and operations management in general and quality management in particular. With its origins in the pioneering work of Sir Ronald fisher, who published 'J'hc lJt'sign of faperirnents
in 1935, experimental design methods have been widely applied to manufacturing problems, with numerous case studic:s and examples appearing in the literciturc.
In the early 1980s, largely in response to competition from Japan, LJ.S. firms took a renewed interest in these statistical methods, with the Big Three aulomubik makers al the
forefront of these activities. Experimental design was emphasized throughout that decade
as <111 important aspect of statistical process control (SPC ) and total quality management
(TQM) activities. More recently, Six Sigma programs have gained widespread attention,
with L'xperimental design being a prominent part of that methodology. Principles uf lean
production have been combined with Six Sigma, resulting in an approach that simultaneously focuses on both these methudulugies. Over time, the focus of ~ix ~igma and uther
qualit)' <1ctivities has shifted from its original focus 011111a11ufactur111g to a b1 uadcr sl'OflL' th cit
incl udcs health care and other service areas. Similar! y, concepb of lcc111 prud uction have 111ure
recently been applied to service operations. For example, patient-focused care in hospitals is
designed to increase quality ofcare by decentralizing many ancillary services and bringing the
caregivers to the patient. This approach is very similar to just- in - time production /cell manufacturing in a factory setting.
Bisgaard ( 1992) provides a notable , historical review ur ex perimcntal design case studies
that includes what he calls "a partial and unsystematic list of articles ... showing engineering and manufacturing applications of experimental design." This list comprises more than
130 case studies. More recent case studies applying experimental design methods to manufacturing problem are discussed by Lin and Chanada ( 2003 ), C:herfi, Bechard, and
Boudaoud (2002), Schaub and Montgomery (1997), and Young (1996 ).
In contrast to work on manufacturing problems, applications of experimental design to
service problems, including marketing and retail operations, have been limited, with examples rarely appearing in the academic literature. In searchrng for service applications ot

CASE

PEAK ELECTRONICS:
THE BROKEN TENT PROBLEM (PART B)

Peak r<l!l a replicated 2 1 fractional factorial design to solve the probkm discw,sed in Case 4,
with each run being a single test panel. The response variable is the number of broken lenb.
The order of the 32 runs was randomized, with the run seyuence shown in parentlwses. The
results are shown in Table A7. l.
QUESTIONS
1. Analyze the results. Estimate the effects, and obtain their significance by comparing

the estimates lo their standard error.


2. Which are the significant effects? What are the best settings? How well did Lou
Pagentine predict which variables would be significant? What is the regression prediction equation for the number of broken tents on a panel' Estimc1le by how much
Peak would reduce the number of broken tents by using the best settings (as compared to the current settings).

CASE

MOTHER JONES (PART B)

A random sample of 40,000 persons participated in the test de,cribt:d in Case 3, with the letters mailed on March 15, 2000. The 27 3 design shown below (Table A6. l) w<1s used, resulting in 16 different experimental runs. bch run consisted ofa [1articular cornhi11at1u11 l>fL1Ltor settings, with each combination sent to 2,500 persons. The response variable ,hown in
Ta bk A6. l is the net response rate (in%), which is the percentage of people who subscribed
and paid (either by cash or credit card). The estimated effects are shown in Table /\6.2.
QUESTIONS
1. Analyze the results of the experiment. Which effects arc stafr,tically sig11ificant at the
50fci level? At the 10% level?

2. What settings for the factors would you recommend?


J. What is the regression prediction equation and the predicted response if significant

factors are set at their best levels?

'i'AHLI

2' 'Design witl1 Generators I:


A

i\(J.

ABC, F =
LJ

!l(.'J),
F.'

C = ACIJ, and

l~fsponses

Respoml' (%)

2.08
2.7b
2 j()

-1.1/4

2.-3(>

2.M

t-

2.64
2.40
2.52
3.24
2.12
L\2.

+
+

,_

.\. \ 2

+
+

I.%

uo

5.7

TABU

16-Run Fractional h1ctorial Designs: Genemtors, Co11fo1111ding Patterns, and Resolution


Resolution V design:

Resolution Ill designs:

t-1
I

5 factors:

E = ABCJ)

Resolution l\ 1 designs:
F = ABC, F = BCD
F = ABC, F = BCD, G = ACD
/: =ABC, f- = BCD, G = ACD,

20[\'2

6 factors:
7 factors:

2~\J

8 factors:

2;,~'

9 factor~.:
10 factors:

2"
2 10

11 factors:
12 factors:
13 factors:

2/i1
212
Ill
21 J

14 factor':

2i1~

15 factors:

2i1;

Ill

ff= ABD

Hun

.A

+
-+

+
+

14

+
+
+

+
+
+

+
+

G = ACD, H =ABO, j =ARCO, K =AB, L =AC, M =AD,


G

CD

ACD, H

= ABIJ, 1 -

ABC:D, K =AB, L =AC, M =AD,

CD

ABC

ABD

ACD

+
+

+
+

+
+
+

+
+

+
+

+
+

+
+

+
+

+
+
+
+

+
+

+
-'

+
+

+
+
+

+
+

ABCD

+
+

+
+
+

+
+
+

+
+

BCD

ABCD, K =AB, L =AC, M - AD,

BD

+
+
+
+

BC

G = ACD, H = ABD,] = ABCD, K =AB, L = AC, M =AD


G - ACD, l-/ = ABD, j

AD

E = ABC, F = BCD,
E = AFW, l = BCJJ,
.\'=BC
f =AB< , I = B< V,
N = BC , CJ = BIJ
l: =ABC, J- = BCJJ,
X = BC. 0 = BD, P

+
+
+

15

IA

AC

+
+

9
10
11
12
13

AB

+
A

F =AB< , I - BCD, G = ACD, IF= ABJ), J = AllCD


1: =AB<, I= BCD, G = ACD,JJ = ABD,J = ABCD,K =AB
L = ABC, F = BCD, G = ACD, H = ABD, j = ABCD, K = AB, L = AC

(>

Ill

+
+

+
+
+

+
+

+
+

SAMPLE SIZE DETERMINATION IN A COMPARAJ'JVE EXPERIMENT

In many comparative studies we evaluate the success of a new strategy or method through
the resulting change in a proportion. for example, we may have two different advertising
strategies (I and 2) and may be interested in whether or not strategy 2 increases the pro
1T2 - 1T,
portion of people who buy a certain product. Under the null hypothesis 1/0 : 7Tt
the distribution of the difference of the two sample proportions p
Pt is normal with
mean ()and variance 211( I
11 )! n, where 11 is the si1c ol the ilrsl (,lllJ scLOnd) sample.
for a lest with significance level a, we reject the null hypothesis in favor of the one-sided al

lernative J-lt: 11 2
7Tt ~ 8 > 0 whenever p_
Pt> Zt "\, 211(1
rr)/11; zt ,. 1s the
I 00( I
a) pem:ntile of the standard 11ormal distribul!on.
\Ne are looking for a test with power I - (3, which irnplic, probability (3 of falsely accept i1lg the 11 ull hypothesis if the al ternali ve (11 2
8) is actually true. This require
menl implies the equality
7T)

. 211( I
-'t

,' 11( I

v'

"\
7T)

<")

II

(11

Zt

+ 8)(1

7T

/J

<'>)

II

The .ihove equali<rn can be solved l(ir the sample si1c 11, leading to

rr) + Zt \! r.( I

rr)

(rr 'i5)(1

7T

?/

Selling 8 --' 0 in the 11umerator leads to the approximation


211( J

7T )[ Zt

"

Zt

/J

j2

32
Example Consider the planning value 11
0.03 for the common success proportion, and
assume that it is important to detect an increase ofone-halfo;'a percent (8 - 0.005). for
a
(3 - 0.05 and z0 .95
1.645, we must sample
2.(_\.l.\)) j\().97)~ \.64.5
n

(0.005)2

+ 1..645 1-

25,200

in each group, for a total of 50,400 people for the two grour s combined. StatisticJ! LOll1
puter software such as Mini tab and JM P includes routines for .,uch -;ample size calculations.
Mini tab, for example, returns this samplc size when asking fo1 the power/sample si;e in the
two proportions case.

Comment. The result in Appendix 2.1 is pertinent to the design ot cumpuruuve experiments that ctttempt lo estimate the difference between two unknown suLcess proportions. It
shows how to select the two sample sizes such that a certain specified difference (8) in the

A REVIEW OF BASIC STATISTICAL CONCEPTS

2.1 INTRODUCTION

This chapter reviews basic concepts that we use Ill the remamdcr of this book. Section 2.2 reviews discrete and cont111uous probability distributiom including t>rn important speual
cast:s, the binomial and normal distributiom. In Section 2.3, \\l' focus 011 the graphiLal display and numerical summary of information. lop1cs covered include bar and pie charts for
categorical data; dot diagrams; histograms and scatter plot> for L011ti11uous data; and sum
man measures including the mean, median, standard dev1at10n, and u>rrclation coef11c1cnt.
In SeLtion 2.4, we di.-,cuss sampling and tllld(>llJ -,am piing, and 111 SL'Lt1011 2 ;, \\L' rL'YiL'" thL
basILS ofstatistiLal i11frre111.e. We discuss LOlifidellLe intnvals dnd hy r111thL'SIS tests ford '111gk
mean and a single proportion, and determine the sample site th.it is rcqu1rLd for estimates lo
achieve a given le,el of precision. \\'e also address the comp.mson uf t\\'O f'llpulat1om, using
data from the completely randomized as well as the randomi;ed bloLk experiment. ,\ e<1se
study on the effectiveness of two advertising strategies completes the chapter.
2.2 PROBABILITY DISTRIBU rIONS

The world is uncertain, and measurements on products and processes vary. Probability dis
tributions descnbe the vanability among the measurements.
/(u11dom variables are variables whose outcomes arc u11cenai11. I or example, the pur
chasing response of a customer who receives a catalog or an e mail offer can be "yes" or
"no" or in coded form, I for "yes," and 0 for "no." ~imilarl:', the soldering quality o( a circuit board, expressed in terms of the number of flaws, is a random variable. The board may
have 1ero flaws, exactly one error, two errors, and so on.
R.andom variables with a discrete number of possible outcomes (in the first example, 0
and I; in the second example, 0, I, 2, , , . ) are called dzscretc rundum l'urwhles. \\'e use discrete probability distributions to describe the uncertainty. Later in this chapter, we discuss
the binomial distribution, the most important discrete distribution.
Variables such as the length or the width of a product, the amount -,pent on purchases,
the commuting time to work, the gas mileage of a car, or the yield of.1 process arc continuow,

_J

xii

PREFACE

We thank him for contributing so much to this book and look forward to continuing our
collaborations with him in the future.
We also tbank the many students who Look our classes at the University or lowa, the
Vienna University of Economics and Business Administration, and Yale University. We
treasure the interactions we have had with our students and value all we have gained from
them. Finally, we could not have completed this book without the encouragement or our
families and closest friends. Writing a book is inevitably more time consuming than anticipated, and we will always be thankful for the patience and support we received from those
nearest to us.
We welcome comments from readers. Our e-mail addresses arc johanncs-ledoltcr@'
uiowa.cdu and arthur.swersey@yale.edu. Throughout the book we have tried to Lonvey our
passion for the subject of experimental design and to share with readers our strongly ldt belieCs in the power of these methods and their practical value. The SULcess of this book will
depend in large part on the experirnents carried out in the future by those who read it.
Johannes Ledolter
Arthur J. Swcrsey

-,-

PREFACE

I
_[_
Our interest in writing this hook began about 10 years ago when in our own work we st,1rtcd
to explore the applications of experimental design methods to problems outside of manufacturing. We recogni1ed as others had before that these powerful approaches were valuable
tools for marketing problems. We also discovered that beyond marketing applications there
were other important questions outside of manufacturing for which experimental design
method' rnuld he mcfulh, applied. r'or example, in education much research has been directed at determining the relationship between student learnmg as measured by standardi1ed test'> .ind <..lass s11c. LHge sLale tests have been <..arried out, but researchers have missed
the opportunitv to use experimental design methods that would allow the experimcnter to
simult,1nrnusl) and efficient I\' tl'st other variables such as textbook, use of computers, level
of p<1rc11t.d invoh L'lllent, .ind amount of homework. \\c also ohserved that existing hooks
Oil experiment.ii dc'>ign rou1scd almost exclw,ively Oil industrial applications. Recogni1ing
this, we h.we written a hook th.it <11ms to fill this large gap in the literature hy cmph.1si1ing
marketing, scnicc opcr.ition'>, ,rnd general business problems.
We h;1ve writtcn this hook for hoth academic and practitioner audiences. It can he used
effectively in MRJ\ cour'>es in quality management and marketing research and in undergraduate and graduate engineering courses 111 design of experiments. It is aho well su1ll'cl for
sLll ;,tudy hy qualit) profess1011,ds, 111d11ageme11t consult,rnts, and other practitioners.
\\'e ,1ssu111e that readers h.ivc had ,1 basic undergr,1duate course in statistics or an introductory statist 1cs mu rsc at the M RJ\ level. Chapter 2 provides a review of the basic stat is ti
c.il Lnncepts th.it we use throughout the hook. In suhscqucnt chapters, material that is more
mathematic,illy ad\'anced (review of regression using basic matrix algebra) is incluckd in
appendixes. We have included this material for the sake of mathernatical rigor and completeness and to give those with more mathematical backgrounds the opportunity to delve
dcqwr into the has1~ methodolog).
In teaching statistical methods we have found that students learn best if they sec the relevan<..c of the m.1tcrial, learn dearly how to apply the tools, and understand the underlying
st.it1stical LOllcepts. People learn ahout design of experiments best by solving exercises,

ccc\ <l<J <J


I

-.or--.

I n

-x,

n 1~

II

<J<J <J
<J <J <J

<]

<]

<:

/1

<J<J<J

<J

<J<J

<J

<J

r1

<J
<J<J<J

<] <!

<] <]

<J

< <J

<

<] <]

<]

<J<<J I ~

r-1

,..__

il

<J <J<<

<]

: : : ~~:1i
11

<]

::J

<]

<]

<]

<I

<J<J
<J<J<J <l

<]
-

00

<l<l<J<J

,..._

<J <J<J<J

CJ

c
c

"'

0
~

0v

e
OJ

<J<l

<]<]<]

""
"'

"'
iii

.~

r<""l

CJ

- "'

<]

<J<J <l
<l<J<l <l<l

'

TABI I.

6.5

!he l:st1111atcd A1ain J;ffccts


Estimate

Factor

,,

1.298

\vcragc

ll.OM
O.(F!1
0.0\2
OiJ.11
() 092
n J1X

fl
('
/)

I
I

c;

-0.556

II
I

-0.192

11.0XX

"

,\1

ll.116
(i.()61

()

().(1')2

,.

0.0)2

I'
<)
fl

O.ll'lh

0.08()
-0.304
-0.864

s
-..:n

10.1

0.296

J'ffcLts that ncccd twice the standard error


+ 0.14 \arc indicated 111 hold face.

1 1:

( 2)(0.' !717)

"\ lntcn...i r.11<

( ,: '>ilcKL'f

0.8h-I

0.)511

F?- Second hue blip - IUOl


I:< opv mc\\,1gc
'0.2%
I lc1ter hc.1d\1nc ll. t 92
1

I: Price graphic

S1gn1hc.rnt effects (ahmc line)

0.128

/ .. I ct tcr r1m1su1pt

0.1111

//: Pcrson.t!11.1t ion


I': ffrplv l'll\Tlopc

( ): \ .t!uc of free gift


11.1192

I. ,\dd1t1on,i\ gr.111h1c
K: I isl ofhcndits

0.1188

0.(181)

Q: Info on hucks\1p
/l: !<ct urn <1dd re"

,\1: S1gn.1111rc
,\ lnnlnpl
\

ft,t-.L'r

Prndt1Lt ".clc1..l1on

/) l't1 . . t.lgl'
( :tlt(ic1.1\ st.1111p

0.0C-4

I
<

l
'

on

11.llhl

0.05)
O.llll

+11.ll.\2
l
0 I

T
0.2

tl..l

T
ll.'!

I
() 'i

-,-0.6

f'lfrcl 111 percentage pn1nts

hgurc6.I

(t1.1phtL.d llispla1 ofthr l.s1im.1tl'd ,\1.i in Ulccts

0.7

I
1.0

PROPERTIES OF l'l.ACKETT-TIURMAN DESTGNS:


CONFOUN[)]1'Jl; PATTERNS AND PROJECTIVITY

Sections fi.2 and fi.3 describe the rather complicated confounding patterns of PL1ckcttBu rm an designs. In th is <lppcndix wed iscuss these pat terns in more de ta ii, and we show how
they arc derived. We ignore interactions of order 3 and higher, and focus on the confounding of main effects and 2-frictnr interactions. Furthermore, we discuss the projcctivity properties of Pl:ickctt - f~urm,1n dc-;igns, which make these designs useful for factor screening.

Result
Consider an orthogonal (' csign with k factors at 2 levels each, such a' the fraction di foc tnri;d or the PLickc11 - l111rman design. The confounding coefficient between the main effect
of factor i and the 2-Lictoi intcrnction among factors j and r is given hy the correlation coefficient between the clc-,ign \CCtor x, and the interaction (calculation) column x 1,. Let us
denote this correlation coefficient as

fl, ( l"i '

Proof
General regression results about the bias of regression estimates when fitting an incorrect model arc used to show this result. This approach was employed by Margolin ( 1968) in
his analysis of the confounding patterns in Plackett-Burman designs.
We are fitting the main-e'fects model:
(A6. l)

where X = [ x 1, x7, ... , xkJ is the orthogonal design matrix and f3 = (/3 1, (3 2, ... , (3k)' is
the vector of rnai n effects. Table fi. 1 Iis ts the design matrix of the Plackett- J)urman design in
N - 12 runs. The de-,ign matrix for N = 20 is shown in Table 6.4.
Assume that 2-Lictor interactions arc present and that the true model is given by
y =

X/3 -'

X.f3. +

t:'

(Ah 2)

X.
, x 12, x 1.1, ... , x 23 , ... , x 1 ul is the design matrix consisting of 2-factor interaction
(calculation) columns, and f3. = ((3 12 , {3 1,, ... , {3 23 , ... , f3k 1.d' is the vector of2-fallor in teractions. Ccncral regression results (sec Draper and Smith, 1981, p. 117; Abrah;1m and
Ledoltcr, 2006, p. 208) implv that the calculated main effects, obtained by taking (onc -h;1lf
of) the difference of the rcS~'Onse averages at the plus and minus levcis of the factors, arc
cstimdtes of
(Md)

For orthogonal desigm (such as the fractional factorial and Plackett-Burman designs) the
matrix X' Xis diagonal with diagonal elements given by N, the rn1rnber of runs. f11rthcrmorc, the columrn, of X dn d X, sum to zero, and their squares add up to N. Hence, the
matrix (X' X)

X' X. is

mairix of correlations between design columns x, and calculation

-..- --

EXPERIMENTS WITH FACTORS


AT THREE OR MORE LEVELS

__J_ _ _

7.1 INTRODUCTION

lintil no"" we have d1srns.scd experiments with factors at just two different levels. \Ve uKlcd
thetwolcvclsas"lcm"a11d"h1~h,"or"
J"and"l-J,"orsirnply" "and"+."Afactor111ay
descrihc two catalysts in a chrn1ical reaction, two ways of displaying information in an ad
copv, two cover prices ofa 1111gazinc, or two different budgets for an advertising campaign.
With 1ust two lenls, the assinmcnt of the " - ",rnd "-+"levels is arbitrary.
In so111e appliLations a t '1 factor mav have three (or more) level-,. There Ill<!)' he
threL' L.1lal1sts, 1hrec methmk ,111d three priles. It i'> common to code the three le1el.s as
I, 0, and
I. The foll or mav he citegoric1l, with no particular order among the categoric,, Jn thi-, c.hL'. the assignment of the c.1tegories to the coded levels is arbitrarv as
,rnv one of the three Liltegories un be ,1ssoc1<1ted with .1 certain level. This will not he the
case if the factor is continuous. The cover price of a maga1ine may have been set at one,
two, or three dollars. Or, th temperature of

chemical reaction may 1iavc been studied

at I ,500, 2,000, and .\000 dc~rees. For continuous factors, the assignment of the actu,d levels to the uJLicd one,, I, 0, ,rnd l- I, LJ1T1es additional meaning. Jn addition to stud ying
1Vhcthn or not the llll'.ln IL',pon-,e-, ,1t the three levels MC the same, Pill' can explore the
funltion,d rL'iat1on-,h1p hctw"cn the mean response <1nd the cont 1nt10L1' factor. \Ve 11mh,1hl1 \\'ould not l'\J'l'll th.it th LOYL'r price has a linear effect on sales. J.inc;irity of the '>.lies
re'>ponse to Lhangn 111 priLL' 11,i1 ,1du;illv he the hypothesis that needs to he confirmed or
refuted from the data. Where.is a linear function of price can be fitted (perfectly) to s,lics responses at two different priLc levels, we need at least three price levels to fit ,1 quadrat1L function \\'ith 1ust two levels for pnce, it is impossible to check whether a linear relationship is
.1ppropr1,1te.
'->ell1on 7.2 d1sLU\SL'S the )-:neral factorial experiment with two factors; the first f,1Ltor A
ts studied <it a different lcveb while the second factor H 1s studied at h leh'ls. A complete focton,1! experiment requires r!m at all nh factor-level cornhinatiom. \Ve show how tn estimate and test the main effec '>of the two factors, and we discuss how to assess the interaction effect. An example is given in Section 7.3. Section 7.4 discusses additional useful

Interaction riot of sales

90

80

. .. ' . . . . . . .

70

-------

........

--

.....

20
~

()

I' rice

.. Displ<iy

i\lain effects plot of sales


Pr1ce

7(1

hll

)()

20

,---

~----~-..L...-----------,--

()

Figure 7.2

.\!la in Effects <lnd Interaction Plots: Sales of Apple Juice

TABLE 7.6
A NOVA Tahle: Sales n( Apple Juice
~-~--

Source

DF

SS

Display
D(1 in)

2
1
1
2
1
1
4
1
1
1
1
9
17

4636.1
4385.4
250.7
2624.8
2411. 2
213. 6
130.1
85.8
9.9
29.0
5.3
1079.7
8470.7

D(qua)
Price
P (1 in)

P(qua)
Interaction
DP(linxlin)
DP(linxqua)
DP(quax 1 in)
DP(quaxqua)
Error
Total
"o

1 1: Signific~nt

--

MS
2318.0
4385.4
250.7
1312. 4
2411. 2
213. 6
32.5
85.8
9.9
29.0
5.3
119. 97

F
19.32
36.55
2.09
10. 94
20.10
1. 78
0. 27
0. 72
0.08
0. 24
0.04

p
0.001
0.000
0.182
0.004
0.002
0. 215
0.889
0.420
0.781
0.634
0.838

effects arc shown in boldface.

TAHLF 7.10
Regression Formulatwn of the Mixed 2 23' Factonal Experiment, wllh Linear and Quadratic Main and lnteract1011 Effects of the 3-1 e1cl Factor

-- -- -REC,RF....,SOR COl.U:\1'.:S

---M-\l\: O!

DESlC'\J FACTORS

A
-]

I
-]

-1

-]
-I

l
l
-l

1
-1

-1

-]

- l
-]

-]
-]

I
l
l

l
l
-]
-]
0
0
0
0
l
1
l

1
-]

C{11n

-l

I
I
I
-]

l
- I
l
-I
I
l
l
1
I
-]

I
I
l
-I
I
I
-]

-1

-I

-I

I
l

-1

-1

-1

-]

-1

0
0

-1

I
-I
-I

I
l
I
1
-I

-1
1
-]
I
-]

l
l
-]

-1

0
l
l
I

1
Sum of squares

l'.'JTERAC rJO\i :-\.(

C(qua

AB

AC(lin I

.-\Ci qua)

l'.'JTFIL\Lll()'\i .4'

HCl1in1

HC1qua)

IN fl IZ.-\( Tl<l'."'

.-\BC:l1n

Al~

.-\/l( qua I

Re'l1ome

---

-1

-I

- - - -(

45.375

I
I
I
-]
l
I

22.0"2

l
I
-]
-]

0
0
0
0
I
I
I
I
-I
-I
-l

-I

I
l
l

l
I
2

-2
-2
2

1
l

-1
-1
l
I
l
- l
I

I
I
l
I
I
I
l
l

I
-]

I
2
2

-2
-I
I
l
I
-I

0
0
0
0
-]
-I
I
I
I

2
2

I
I
l
0
0

-1

0
0
0
0
-I

2
2
-2
-2

-I

- I

l
I

l
I
l

1
-l
l

I
1.042

--

5.063

0.187

0.563

l)

- 2
2

I
I

0
0
0
0
-I

I
I
-I
-]

-]

-2

I
I

I
-I
-I
I

2
-2

I
I
0

I
I
I
I
I

-I

-I

-1

-I

I
0

-1

-2

-]

-2

4.687

I
l
0
0
0
0
l
l
l
I
l

-I

l
I

248..062

-]

-2

I
l

I
I

0
0
0
0
I

-]

0.021

-]

-I
I
I
I
-I

6
5
7
7
]()

-I

0
0

0
0
i
-I
-I
I
() 063

-t
9
6

-1
l
I
1.021

11

.,

AB I I

X.2

I >n1n/i/1011 of the I 1 <'rcutn1s 1111d till' R<'<1ilt111g \'/~IF\< //(KS, and


CIR= U I< 'KS! \'IS/1S

Clrr~

/'hrough

/~otc,

! '\( IOH

( reativcs A
1
run') ho1to111

2
.1

1
ti

top
l
.1

(1

1
3

(1

<)

J(I

<)

11
12
I\
11
1,

lh
17
IX
19

UN
U(H

I>

x
Ill
Ill

2,2'12
2, I 'i I

2,212
2,ll-1
2, Ill I
2,089
2.111

I>

(l

')

Ill

"

21

(J

(J

22

Ill

(J

2l

I
10

2h

,,

21

(,

2H
24

')

6
h

2.

](1~

l,hILJ

289

'Cl

\
.l

"2

Ii

.1

,1

6
3

:i

6
I

5
5

I>

1
I

2fl ; ~
I ~q2

2
2

1
I

2.211:

15

IX. l'i

2,07'7

'i

44

ll).l)~

l7LJ
h1

lh. lh
I S.IJ'
.'il.llh

'i

1-.22

4 I\

.068
2,lhI

119

Ill

()

160

11 l
12n
1()(1

\.I

18:i

2,[)l)[)

~Sh

lS I

2,211
2.1 21

ti

I 'J. -,2
I ;2
IX. \2
li>.91
!h.HlJ
1- 9')

2,0)9

479
120
l'il
191

IX.K \

(J

160
452
11 \

IX.\\
l')\4
2'1 I\
I 21
211. 17
1'1. \X
21 1-

20.

HI

IX.91
l'J.().J
1- In

ll.\
\.17

1
1
I

\
4

401
410

IX hO
19 ,\

\9\

\,

420
176
412
19 I

~'I.'. ~

,,

18
39
40
41
42
4.>

40"i

\71

\\

\(,

JJ{ (<>10)

( :i<cks

2,118.'
2.0X7

\II
\I
\2

,.~

2, 119
2.121
2,0HX
2,208
2, 1.l 1

(J

.'''

h
(J

21l

. I

(J

2, I "7
2, I 50
1,988
2, I (11

')

I>

I
4

('

(>

(J

Visilor.,

(1

10
10
10
h

II

/)

/l

.l

2
I
2

4
:i
2

I
2

2,2;7

"ill

2. 12 \
2. 188
2,244
2,202
2,241
2, IH'i
2, 166
2, 194
2,094
2,214
2,061
2,072

.184

7,oLJR

189
406
114
448
454
479
451
364
397
390
441
1,282

~q

I -11

IX '7
I '. -,2
~ :! 72
!HllX
Ii. "J
IX.OLJ
11 '()
ILJ.'J'J

20.77
22.11

20. ~'.)
I !.IX
I

l))

I X.92
21.28
lh.(l)

The number' under .ireas A-hot tom, A top, fl through I refer to the" .ailablc leveJ.,
I I
L'.tch 111 the JO tl''t .ire.is listed 111 J.ihk8.J. l.cvcl I rcprcscnh the baseline level.

111

\
I

APPENDIX
CASE STUDIES

I hi-, appcnd1\ Lnnt.1i11' thirteen c.1sc '>tll(iil''>. The following tahlc shows for each C<l'>l' the
scct1011-. oft he book that u>11ta111 rclev,rnt material.
(.,1sc"\o.

2
3
4

9
IO
II
12

Titlc

Relevant C:ha11tcr'

I .aglc llr,111d.,
Maga11nc !'nee Tcst
t-.lothcr Jones (Part A)
Peak l-lectronics (Part A)
( )fliLc '->upplic'> I mad I est
Mother Jones (Part H)
Peak f kll rornL'> r Pa rt BJ

4 and 5
4
4 and 5

I xpcrimc111' 111 Rl"lad Operations


Lxpcnmcntal Design on the hont Lines of Marketing

Piggly Wiggly
l'nitcd J).iiry Industries
Almquist & \Nyn r
Phnncl Ing

4 and 5

5
5
(l

7
Ii and 7

8
8

ACKNOWLEDGMENT

(,ordon I I. Bell (President, I ucidView, 80 Rolling Links Blvd., Oak Ridge, TN 37830, L1SA,
( 865) 693-1222, ( 865) 220 8410 (fox), gbcll@lucidview.com) contributed to Cases 2, 5, 8,
and 9. \\'c MC vcr) grateful tn him for allowing us to include these case studies in our text.

CASE

MAGAZINE PRICE TEST


(;ordo11 l l. !lei/

INTRODUCTION

The publishing industry has seen a continual decline in magazine sales over the last few years.
rvJore maga1.ine title.,, free content nn the Internet, and lower readership have all led to in dustry difTirnllics. Publishers push subscriptions through direct mail and online sales, .ind
they try to advance -.ingle-rnpy newsstdnd sales in supermarkets and other retail outlets.
One leading publisher wanted to find new ways to increase profitability by testing new
price points. I [owever, profit depends not only on the magazine cover price but also on the
number of new subscribers Jnd the cost of unsold copies. So, the publisher focused on three
factors to test:
Cover prier. This is the price paid for a single newsstand copy; it is considerably higher
than the per-copy subscription price.
S1;hscription pncc. This is the price shown on the subscription card inside each magazine. While publishers like the higher per-copy profit from single copy sales, they
want to get as many long -term subscribers as possible since a larger subscriber b.ise
incrc;1scs their advert i.-.111g revenues.
,\ 'w11hcr o/copu_
, 011the11cwssla11d. The publisher loob at the balance between h.1,i11g
enough copies <lvailahlc for every customer, yet minimi7ing the number ofleftovn
maga1.incs. J'he m.1rketing team wanted to test if a larger (or smaller) excess of cop ies might increase sales. With a few magazine racks in each store, they wondered
if more copies in every rack would lead to higher sales, or if fewer copies in each
rack-even an empty rack or two-might encourage customers to "buy now while
supplies last."
PRICE TEST

With the high cost nf printing different covers and subscription cards, the publisher wanted
to minimi;e the number oftest cells. But the marketing director also e,pected to sec some
interactions among studied factors and "curvature" in the relationship between sales and
price.

CASE

PEAK ELECTRONICS:
THE BROKEN TENT PROBLEM (PART A)

In the late foll of J 9lJ I, Peak l\bnagement '>tarted to get concerned with "broken tent<' the
nulllher one c.1use of rework. 1\ tent is a piece ofphotoresist (or film) that covers a hole that
is not to be plated with copper. If the tent brakes before copper plating, then the hole is
plated and the panel needs to he reworked by scraping the copper from 1.he hole.
THE PROCESS

rhe production pr<Kcss consists of many steps. Steps 4 and 5 arc relevant for this discus'>ion. In step 4 (laminate p1otorcs1st), a th111 photosensitive film or resist is laminated
(bonded) onto a copper panel. The film is applied by rolling a sheet of film onto the panel
.ind puttmg the panel between two rollers with the heat and pressure causing the film to adhere tn the panel. In '>tep "i, a film negative showing the circuitry is placed over the photoresi'>t cmTred p.1ncl, and thL' panel is exposed to ultraviolet light. The circuitry on the negative 1s opaque ;111d hl<lLb the l '\'light. The rest of the resist on the panel is polymeri1ed
I h.irdened ) hv the l 'V light. In the developer, the p.mel moves on a conveyer through an
X foot-long challlhLT 1\ den loping solution I'> sprayed onto the panel and the resist, which
1s not polvmcri1ed (hardened), is washed away, exposing the circuitry. At the end nf this
.'>tage, e,!Lh hole th.it 1s not t he plated should he tented (covered by film). However, ,1 tent
Illa\ he broken ,1t this point.
HERCULES, INC.

,\ qll.il1ty 1111J'lllH'llll'llt te,rn1 at Peak, using d (1.,hho11e diagram, identified the photorl,1st
as a likelv contrihlltor to the hroken tent problem. Peak had been using Dupont 421 'i resi'1.
'iuentists at I lerculcs, Inc., a competitor of Dupont, suggested that their film could s1gnific.111th reduce hrnkcn tent'>. Peak ran a test with the new resist, using 40 lots of 36 panels
c.ilh, 1ntcrspcr-.ing I 0 lots ti it used the current Dupont 4215 resist. T 11ey found no stat 1st IL<lll) signific111t diffcrrnces .n the m1mbcr of broken tents per panel between the Dupont
,rnd Hercules resish.
l ll'rLules was olw1ouslv 11nhappy with these results .ind suggested that Peak run .i de'>1gned experiment th.it might improve the process. The Hercules representative and Lou

S-ar putea să vă placă și