Sunteți pe pagina 1din 7

A Regression Model for Market Segmentation Studies

Author(s): Albert R. Wildt and John M. McCann


Source: Journal of Marketing Research, Vol. 17, No. 3 (Aug., 1980), pp. 335-340
Published by: American Marketing Association
Stable URL: http://www.jstor.org/stable/3150531 .
Accessed: 25/11/2014 08:24
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp

.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.

American Marketing Association is collaborating with JSTOR to digitize, preserve and extend access to
Journal of Marketing Research.

http://www.jstor.org

This content downloaded from 193.226.34.227 on Tue, 25 Nov 2014 08:24:26 AM


All use subject to JSTOR Terms and Conditions

ALBERTR. WILDT and JOHN M. McCANN*


A model is developed for incorporating the long-standing marketing research
effort of explaining variation in observed consumption behavior into the more
recent notion that such behavior has an inherent component of randomness.
The model recognizes that consumption behavior on any given occasion will
fluctuate around some mean consumption level and that this mean consumption
level will vary across the population in a systematic manner which is related
to the characteristics of the consuming unit. However, it is not assumed that
the relationship is exact, but that it contains an element of randomness.
The properties of the model are examined and it is compared with ordinary
least squares models of the same phenomenon.

Regression

Model

Market

for

Segmentation

Studies

unbiasedness, minimum variance, linearity, and maximum likelihood (Johnston 1972, p. 126).
However, the appropriateness of using OLS in
analyzing consumption data has been questioned.
Empirical work by Ehrenberg (1972) suggests that
typical measures of consumption and purchase, such
as number of items purchased in a given time period,
are not normally distributed but are better represented
by a Poisson process. Morrison (1973) indicates that
a strict Poisson process may not be appropriate because the mean of the process may not always be
equal to the variance, and proposes that the process
may be better represented by a distribution whose
variance is proportional to the mean. In addition,
Morrison argues that the usual R2 statistic obtained
from such an OLS regression is not an appropriate
statistic to be used in evaluating the results of the
usual segmentation study. This argument centers on
the notion that, though a model may accurately predict
a consumer's average purchase rate, it may do a very
poor job of predicting the exact number of purchases
for that consumer in any given time period. Beckwith
and Sasieni (1976) develop these ideas further, but
do not offer a feasible alternative to OLS for the
researcher interested in relating consumption behavior
to predictor variables, such as consumer characteristics. Wildt (1976) suggests a regression approach and
the decomposition of the error variance which permits
one to assess the results of a segmentation analysis.
We report on the development of a regression model

One of the oldest and most popular types of marketing research activity is the explanation of the variance
in some measure of consumption among a population
of consuming units. It is usually done by relating such
behavior to one or more managerially relevant characteristics of the consuming unit. Typical of such research are the attempts to use demographic and
socioeconomic variables to explain variations in
household consumption (purchase) of food items
(Frank 1968). A common feature of many of these
analyses is the use of the normal regression model
of the form
(1)

X= Z +u

where X is an N x 1 vector of observations on the


dependent variable, Z is an N x p matrix of observations on p predictor variables which may include an
intercept, 13 is a p x 1 vector of coefficients, and
u is an N x 1 vector of disturbances. In almost all
applications the disturbances are assumed to be
independently and normally distributed with constant
variance, and ordinary least squares (OLS) is used
in estimating the unknown coefficients. Under these
assumptions, OLS has the desirable properties of
*Albert R. Wildt is Associate Professor, Department of Marketing
and Distribution, The University of Georgia. John M. McCann
is Associate Professor, Graduate School of Business Administration,
Duke University.
335

Journal of Marketing Research


Vol. XVII (August 1980), 335-40

This content downloaded from 193.226.34.227 on Tue, 25 Nov 2014 08:24:26 AM


All use subject to JSTOR Terms and Conditions

336

JOURNALOF MARKETINGRESEARCH,AUGUST 1980

which considers the intraconsuming-unit variance in


purchase quantity to be proportional to the mean
purchase rate, and which distinguishes between
average purchase rate and the number of purchases
in a given time period. The model can be contrasted
to the work of Ehrenberg (1972) and Bass (1974) who
assume an individual's purchase behavior follows a
Poisson distribution, with parameter X, and who recognize that this distribution may differ from individual
to individual over the population but make no attempt
to explain differences in X. They further assume that
the distribution of X over purchasers is gamma, which
coupled with the Poisson distribution results in a
negative binomial distribution for the purchase behavior. Our approach is to try to explain variation in
mean purchase rate over the population in terms of
managerially relevant consumer characteristics.
MODEL OF THE PURCHASE PROCESS
Let xij be the number of units (or equivalent units)
purchased by an individual purchasing unit i during
time period j, where the values of xi from successive
periods are uncorrelated. With little loss in generality
we assume the additive form, so that
(2)

xi =

i= 1, ..., N; j= 1, ...,

i + Vj

where, under conditions of a stationary process, K,


= Ej(xji) is the mean purchase rate of purchasing
unit i and vj is a random disturbance, independently
2
distributed with Ej(vj) = 0 and Ej(v2.) = cr = ki,,
where k is some positive constant. The variance of
the disturbance follows from empirical evidence
(Ehrenberg 1972) which indicates the distribution of
x,i given k, is Poisson or Poisson-type. Further, it
is reasonable to assume that the mean purchase rate
is related to the characteristics of the purchasing unit
and we propose the linear model
(3)

X,=

ESTIMA TION
The estimation of the model is influenced by two
considerations: (1) the functional form of the distribution of x j given X, is unspecified (only the mean and
variance are assumed) and (2) the parameters oa, k,
and ,, i = 1, ..., N, are unknown. In this section
we discuss the estimation of ?, the Nm x Nm
variance-covariance matrix of the ui in equation 4,
and a modified Aitken's procedure for estimating 13p.
The approach taken here is similar to that suggested
by Wallace and Hussain (1969) in the context of
combining cross-section with time-series data.
If the ui were observable, they could be considered
in the context of a one-way, random effects, analysis
of variance model with N levels and m observations
per level, and best quadratic unbiased estimators of
cri = kX, and Cr2 could be obtained (Graybill 1961,
Ch. 16). Further, if the ,i were also observable, the
least squares estimator of k could be obtained by
considering the relationship
(u,ij-u.)2
j=

=kX,+e,

m-1

x, = z;'p + E, + Vi = z, P+

u,j

i=

, ...,N;

ui. =

Because E, is constant over all values of j and Ei Ej


(vi) = 0, it is reasonable to assume that vijis indepenThe preceding formulation differs considerably from
the traditional linear regression model used in past
studies. Specifically, this formulation recognizes two
components of the disturbance term. The first
component involves the difference between the true
mean purchase rate and the estimated (predicted) mean

uij/m

and ei is a random error with mean zero. However,


the u i and X, are not observable. Therefore, the
estimates ui,j,the observed residuals from using ordinary least squares on equation 4, and ki = xi, the
sample mean of x i for unit i, are substituted for u
and Xi to obtain:
N

dent of Ei and z;.

M--=

'ui

where:

+ E

where zi is a p x 1 vector of predictor variables,


13 is a p x 1 vector of (unknown) parameters, and
e i is a disturbance term, independently distributed with
mean zero and constant variance, cr , and is independent of z,. Combining equations 2 and 3 yields
(4)

purchase rate; the second involves the difference


between the true mean purchase rate and the number
of units purchased in a given time period. The inclusion
of the first component of the disturbance term, the
random difference between the predicted and true
mean purchase rates, also differs from typical Poisson
regression models which assume the Poisson parameter, Xi, to be a deterministic function of z . By
considering these aspects of the process in the model
specification, we arrive at a model in which the error
term is heteroskedastic over purchasing units and
correlated within purchasing units.

-2

S u,

,2

i.

i=, N- 1

i=, '= mN(m - 1)

and
N

E'Xi

i-='

j=I

(( l - U))2
a.

(Uij-Ui-)2/(m-1)
N

Ji

This content downloaded from 193.226.34.227 on Tue, 25 Nov 2014 08:24:26 AM


All use subject to JSTOR Terms and Conditions

337

REGRESSIONMODELFOR MARKETSEGMENTATIONSTUDIES

The elements of Y. are then estimated as follows.


k(^.

UjjUkq=

+ a. kAj)

where bik = 1 if i = k and 0 otherwise, and Sjq is


similarly defined. The modified Aitken estimator,'
similar to that suggested by Zellner (1962), is

= (z't~-'z)-' z' t~-'x.


Further efficiency may be gained by using the residuals
obtained from this second stage estimate to recompute
and obtain a third stage estimate of P. This procedure
may be iterated until convergence. The estimated
variance-covariance of this estimator is given by
Var(I) = (Z' - z)-l.
This estimation procedure allows the decomposition
of the sum of squares error into its two components:
N

S S=

f )2 = m

(-z
i=I

(,

- z,)2,

i=I

j=l

the sum of squares of the estimated mean purchase


rates X, about their predicted values z; 1, and
N

SSW=

E
i=l

i)

(x,jj=I

the sum of squares of the observed values about the


estimated mean purchase rates.
The proportion of the variance of the mean purchase
rates explained by the predictor variables is
R= SSR/(SS

- SSW) = SSR/(SSR + SSA)

where SSR and SST are the sum of squares regression


and total, respectively. Adjusting for degrees of freedom2 yields
RX= 1-

1))

[SSA/(N-p)]/[(SSr/(Nm-(SSw/N(m

1))].
2

and the hypothesis that aCoequals zero is tested with


the F-ratio
F= [SSA/(N-p)]/

[SSR/(P-

Data
The data are for a low-price, frequently purchased
consumer good and consist of 36 monthly observations
for the three-year period 1964-1966 taken from the
household purchase panel operated by Market Research Corporation of America (MRCA). (See McCann
1974 for a more detailed description of the data.) Data
were made available for only those panel members
who purchased the product category at least once
during the time period considered. As with most
panels, some members dropped out of the panel (or
were added to the panel) during the study period and
some panel members occasionally failed to submit
a diary. For these data it was impossible to distinguish
between zero purchases and missing data. Hence, the
available data were screened for those households
who were in the panel for the full three-year period
and who submitted diaries every week. The screening
resulted in a usable sample of 110 households. In
addition to the purchase behavior of the households,
measures were taken on several household characteristics.
A nalysis
Models relating to the problem under discussion
typically consider a single criterion variable, which
is some measure of purchase or consumption, and
a set of predictor variables, often consisting of consumer characteristics. In this section we consider three
different models. Model I is an ordinary least squares
model relating the number of equivalent units of the
product purchased per month to a set of predictor
variables consisting of education level of head of
household, household size, and household income.
Table 1 describes these variables in more detail. Model
Table 1

1)] /[SSA/(N-p)]

DEFINITIONOF PREDICTOR
VARIABLES
A.

'Computational difficulties arise if any household included in


the analysis has zero mean purchase rate for the observed time
period. In this case ? is singular. This can be corrected by substituting
a small positive value, say 0.01, for X, for those households having
x, of zero.
2This adjustment is similar to that used with the conventional
R2 statistic,

2 =

1 -

s/s,

where

and

ST

SST/(Nm-

1), and sw = SSw/N(m

-1).

Dummy variables
1. Education of head of household
Education 1
1 = 9-12 years
Education 2
1 = 13 or more years
(excluded class = 0-8 years)
2.

s2 are unbiased

estimators. In the present case, the variance being explained is


the total variance, s2, less the unexplainable within-household
variance, sw; the variance unexplained by the model is s2, the
error variance less the unexplainable within-household variance.
- S2 ), where S2 = SSA/(NTherefore, R2 = I- s2/(s
),
=

1)].

ILL USTRA TION

The statistical significance of R is tested with the


F-ratio
F=

[SSW/N(m-

B.

Household size
HH Size I
HH Size 2
HH Size 3
(excluded

I = 3 members
1 = 4 or 5 members
1 = 6 or more members
class = I or 2 members)

Continuous variable
Household income (thousands of dollars)

This content downloaded from 193.226.34.227 on Tue, 25 Nov 2014 08:24:26 AM


All use subject to JSTOR Terms and Conditions

0 = other
0 = other

0 = other
0 = other
0 = other

338

JOURNAL OF MARKETING RESEARCH, AUGUST 1980

II, the iterative generalized least squares model previously described, uses the same criterion and predictor variables as Model I. Model III is an ordinary least
squares model using the same predictor variables as
the other models, but with the average number of
equivalent units of the product purchased per month
as the criterion variable. Though our purpose is to
illustrate the estimation of Model II, the other two
models provide a useful comparison. The results of
all three models are presented in Table 2.
The results of pooling all of the data (36 observations
on 110 households) and using ordinary least squares
(OLS) to estimate the unknown parameters are shown
in the first panel of Table 2. The R2 value is relatively
low (0.14) and all predictor variables have coefficient
estimates with absolute values greater than twice their
estimated standard errors.
The results of the iterative generalized least squares
regression are shown in the second panel of Table
2. These results indicate that approximately 23% of
the total variation in consumption (and 27% of the
error variance) is attributable to within-household
variation, i.e., variation of individual purchases about

the mean household purchase rates. The predictor


variables should not be expected to predict this variation. Hence, the more appropriate measure of explained variance is R2 or the adjusted R2, which in
this case are 0.186 and 0.181, respectively, compared
with the OLS R2 of 0.143 and R2 of 0.142. The
relationship between the coefficient estimates and their
estimated standard errors has changed considerably
from Model I. Only two estimated coefficients have
absolute values twice their estimated standard errors
whereas all of the Model I OLS estimates meet this
criterion. These results are somewhat typical when
a GLS procedure is used to estimate a regression
equation that has a complicated error structure. Because OLS ignores this error structure, it tends to
underestimate the variances of the coefficients and
hence overstates coefficient "significance."
Models I and II consider the number of equivalent
units purchased by a household per monthly time
period as the dependent variable. That is, the models
use 36 monthly observations for each household. More
typical of research concerned with the modeling and
estimation of consumer purchase rate is the use of

Table 2
ANALYSISRESULTSFOR THREEMODELS
Model III

Model II

Model I
Estimation method

Ordinary least squares

Dependent variable

No. equivalent units purchased


per period

Iterative generalized least


squares
No. equivalent units purchased
per period

Ordinary least squares


Average no. units purchased per
period

Parameter estimates
Variable
Constant
Education 1
Education 2
HH size I
HH size 2
HH size 3
Income
Summary statistics
Multiple R
R2
Adjusted R2
Est. std. error
R2x
2Adjusted R,
Estimated k
Est. variance of mean
purchase rate (r 2)
No. of iterations
Analysis of variance
Source
Regression
Error
Among
Within

2.11
0.88
0.57
0.81
0.93
5.36
-0.08

S,
0.10
0.14
0.16
0.15
0.15
0.22
0.01

P/so

20.86
7.68
3.60
5.28
6.11
24.41
-6.47

13i
2.09
0.86
0.56
0.81
0.93
5.32
-0.08

So
0.52
0.59
0.81
0.79
0.78
1.14
0.06

i/so

4.02
1.46
0.69
1.03
1.19
4.68
-1.25

i,
2.11
0.88
0.57
0.81
0.93
5.36
-0.08

l/ S
3.94
1.45
0.68
1.00
1.15
4.61
-1.22

.432
.186
.138
2.807

.379
.143

.379
.143
.142
3.180

So
0.53
0.60
0.84
0.81
0.81
1.16
0.07

.186
.181
0.974
7.312
2
Sum of
squares
6693.86
39965.83

d.f.
6
3953

F-ratio
110.35

Sum of
squares
6690.62
39969.07
29221.65
10747.42

d.f.
6
3953
103
3850

F-ratio
3.98a

Sum of
squares
185.94
811.62

d.f.
6
103

F-ratio
3.93

101.63b

"F-ratio for testing the significance of z, in explaining the mean purchase rate, Xi. The value of the traditional F-ratio, F = MSR/MSE,
is 110.29.
bComputed as F = MSA/MSW and used in testing the hypothesis: aor= 0.

This content downloaded from 193.226.34.227 on Tue, 25 Nov 2014 08:24:26 AM


All use subject to JSTOR Terms and Conditions

REGRESSIONMODELFOR MARKETSEGMENTATIONSTUDIES

the mean purchase rate, computed over all time periods


for which data are available, as the dependent variable.
In such applications there is one observation for each
household. The results of this analysis are given in
the third panel of Table 2. A comparison of Models
II and III indicates almost identical estimates of p
and only slightly lower estimated standard errors of
13for the iterative generalized least squares procedure
(IGLS). But given that the data cover 36 observation
periods and the among-household variation is relatively
large, large differences between the results of the IGLS
procedure and OLS on mean purchase rate are not
to be expected. The IGLS procedure does provide
additional information about the partitioning of the
error variance.
DISCUSSION
At this point it is useful to contrast the three models.
Model I is identical to Model II except for the
specification on the error term. Model III differs from
Model I in the definition (and level of aggregation)
of the dependent variable and is the model typically
employed in studies relating consumption behavior
to characteristics of the consuming units.
Both OLS models, Models I and III, yield the same
estimated coefficients. This will be true so long as
the household characteristics remain constant over
the observation period and the same number of observations is made on each purchasing unit. However,
the estimated standard errors of the estimated coefficients are smaller for Model I, the individual level
model. This difference is a result of the model specifications and will always occur. The relationship between the estimated standard errors of the Models
I and III is given by
SoI

(a I

/
l111
i M) S

il1

where &, and &mi are the estimated standard errors


for Models I and III, respectively. R2 will be higher
and a lower for Model III because intrahousehold
error variance is not considered. However, the adjusted R2 will be higher for Model I because of the large
value for error degrees of freedom. Model III, the
OLS model with mean consumption as the dependent
variable, is the most commonly employed model in
this context and should be preferred to Model I because
it departs less from the OLS assumption of constant
error variance; for Model III r2 = cr2 + kA,/m,
whereas for Model I or = or- + kXA.
In comparing results of Models II and III, the first
thing one notes is the similarity in estimated coefficients and standard errors. However, if m, the number
of time periods, is large, this similarity is to be
expected. Consider equation 4 and observe what
happens as x,i is averaged for purchasing unit i. If
one assumes no change in the characteristics of the

339

purchasing units, the model becomes


x, = z ;1 + U,

and
E(2)

= EE,

[({i + v,)2]

= o2 + E, [k,/m]

But as m -> oo, kX,/m - 0, and E (U2 )> r 2. Therefore, as m -> oo, the OLS estimator of Model III
will approach BLUE and as a property of the Aitken's
estimator it is a minimum variance unbiased linear
estimator of 13. In fact, the estimators for Models
II and III will, for all practical purposes, be identical
for very large m. For the example data used, these
two analyses were repeated using only the first six
purchase occasions. The results are not reported here,
but the difference in estimated coefficients ranged
to approximately 9% and the estimated standard errors
of IGLS coefficients were about 2.5 to 3% lower than
the Model III OLS values. Even with only six observations per household, however, the among-household variation was very large. One other comparison
worth noting is that the R2 of Model III will be higher
2
than that of Model II but identical to the R of the
Model II. However, the adjusted R2's are different,
R2 being the more representative statistic for describing the variation in mean purchase rates explained
by the model.
After all things are considered, Model II, the IGLS
model, offers a more realistic representation of the
process under investigation and is more consistent
with available empirical evidence. With the functional
form and independent variables correctly specified,
the estimated coefficients of all three models are
unbiased. However, the modified Aitken's estimator
of Model II has the smallest true variance of the three
and the estimated variances of the coefficients for
it are unbiased, whereas they are biased in the case
of Models I and III. This fact alone should be sufficient
grounds for the acceptance of the IGLS model, Model
II. However, as has been shown, if the number of
purchase occasions becomes large, the unexplained
variation in Model III attributable to within-purchasing-unit variation becomes small and Model III provides a reasonably good estimate of 13. To extend
this reasoning further, in any case where the difference
between cr2 and E(u2) is small, the gain in efficiency
of the Aitken's estimator over the OLS estimate of
13 based on Model III may not be great enough to
justify the added computational effort. This situation
can occur when m is larger or when o 2, the unexplained
variance in mean purchase rate, is large.
SUMMARY
We incorporate the long-standing marketing research effort of explaining variation in observed consumption behavior (by person-specific characteristics)

This content downloaded from 193.226.34.227 on Tue, 25 Nov 2014 08:24:26 AM


All use subject to JSTOR Terms and Conditions

340

JOURNALOF MARKETINGRESEARCH,AUGUST 1980

into the more recent notion that such behavior has


an inherent component of randomness (at least from
the perspective of an observer) and that one should
not attempt to explain what is random but should
recognize the randomness and explain what is not
random. We recognize that consumption behavior on
any given occasion will fluctuate around some mean
consumption level, that this mean level will vary across
the population (as is usually assumed in stochastic
models), and that this variation will have a systematic
component because mean consumption is probably
related to person- or household-specific characteristics. However, we do not assume an exact relationship
between mean consumption and consuming unit
characteristics, but allow this relationship to be disturbed by unobservable events and characteristics.
Our approach provides estimates of the effects of
each of the characteristics and estimates of the relative
impact of the two disturbances. This effort, aimed
at separating the explanation of individual purchases
from the explanation of the mean purchase rate, is
intended to show the need for future research on the
latter. Such concentration has been called for by Bass
(1974) and Beckwith and Sasieni (1976), who imply
that because the prediction of specific purchase behavior by OLS is doomed to failure one should concentrate on predicting mean purchase behavior. However,
our results also indicate that in situations involving
a large number of purchase occasions or when amonghousehold variation is large in relation to withinhousehold variation, OLS estimation employing mean
household purchase rate as the dependent variable
provides relatively efficient estimates.

REFERENCES
Bass, Frank M. (1974), "The Theory of Stochastic Preference and Brand Switching," Journal of Marketing Research, 11 (February), 1-20.
Beckwith, Neal E. and Maurice W. Sasieni (1976), "Criteria
for Market Segmentation Studies," Management Science,
22 (April), 892-903.
Ehrenberg, A. S. C. (1972), Repeat Buying. Amsterdam:
North Holland Publishing Co.
Frank, Ronald E. (1968), "Market Segmentation Research:
in The Application
Findings and Implications,"
of the

Sciences to Marketing Management, Frank M. Bass,


Charles W. King, and Edgar A. Pessemier, eds. New
York: John Wiley & Sons, Inc., 39-68.
Graybill, Franklin A. (1961), An Introduction to Linear
Book
Models.
New York: McGraw-Hill
Statistical
Company.
Johnston, J. (1972), Econometric Methods, 2nd ed. New
York: McGraw-Hill Book Company.
McCann, John M. (1974), "Market Segment Response to
the Marketing Decision Variables," Journal of Marketing
Research, 11 (November), 399-412.
Morrison, Donald G. (1973), "Evaluating Market Segmentation Studies: The Properties of R2," Management Science,
19 (July), 1213-21.

Wallace, T. D., and A. Hussain (1969), "The Use of Error


Components Models in Combining Cross Section With
Time Series Data," Econometrica, 37 (January), 55-68.
Wildt, Albert R. (1976), "On Evaluating Market Segmentation Studies and the Properties of R2," Management
Science, 22 (April), 904-8.
Zellner, Arnold (1962), "An Efficient Method of Estimating
Seemingly Unrelated Regressions and Tests for Aggregation Bias," Journal of the American Statistical Association, 57 (June), 348-68.

I
I

TUITION
August 10-15, 1980
Madison, Wl
The Wisconsin Center
Guest House (Lowell Hall)

School of Marketing

AMA members
Non-members

$750
$900

AMA's first School aimed at business professionals with three-to-five years practice in industry. The curriculum over 37 hours of classwork - concentrates on the role of marketing in a profit-making institution and stresses the
integration and coordination of the marketing discipline. Students will be housed in residence at the Continuing
Education Center with room and board included in the tuition fee. Deadline for admissions: June 10, 1980. Cosponsored with the Graduate School of Business, University of Wisconsin.
EDUCATION DEPARTMENT
American Marketing Association
Suite 606
222 South Riverside Plaza
Chicago, Illinois 60606

This content downloaded from 193.226.34.227 on Tue, 25 Nov 2014 08:24:26 AM


All use subject to JSTOR Terms and Conditions

S-ar putea să vă placă și