Sunteți pe pagina 1din 52

DISCRIMINANT FUNCTION ANALYSIS 2012 Edition

Copyright @c 2012 by G. David Garson and Statistical Associates Publishing Page 1

Single User License. Do not copy or post.


DISCRIMINANT FUNCTION ANALYSIS 2012 Edition

@c 2012 by G. David Garson and Statistical Associates Publishing. All rights reserved
worldwide in all media. No permission is granted to any user to copy or post this work in
any format or any media.

The author and publisher of this eBook and accompanying materials make no
representation or warranties with respect to the accuracy, applicability, fitness, or
completeness of the contents of this eBook or accompanying materials. The author and
publisher disclaim any warranties (express or implied), merchantability, or fitness for any
particular purpose. The author and publisher shall in no event be held liable to any party for
any direct, indirect, punitive, special, incidental or other consequential damages arising
directly or indirectly from any use of this material, which is provided “as is”, and without
warranties. Further, the author and publisher do not warrant the performance,
effectiveness or applicability of any sites listed or linked to in this eBook or accompanying
materials. All links are for information purposes only and are not warranted for content,
accuracy or any other implied or explicit purpose. This eBook and accompanying materials is
© copyrighted by G. David Garson and Statistical Associates Publishing. No part of this may
be copied, or changed in any format, sold, or used in any way under any circumstances
other than reading by the downloading individual.

Contact:

G. David Garson, President


Statistical Publishing Associates
274 Glenn Drive
Asheboro, NC 27205 USA

Email: gdavidgarson@gmail.com
Web: www.statisticalassociates.com

Copyright @c 2012 by G. David Garson and Statistical Associates Publishing Page 2

Single User License. Do not copy or post.


DISCRIMINANT FUNCTION ANALYSIS 2012 Edition

Table of Contents
Overview .................................................................................................................................... 6
Key Terms and Concepts ............................................................................................................ 7
Variables ................................................................................................................................ 7
Discriminant functions........................................................................................................... 7
Pairwise group comparisons.................................................................................................. 8
Output statistics..................................................................................................................... 8
Examples .................................................................................................................................... 9
SPSS user interface ..................................................................................................................... 9
The “Statistics” button ........................................................................................................ 10
The “Classify” button ........................................................................................................... 10
The “Save” button ............................................................................................................... 13
The “Bootstrap” button ....................................................................................................... 13
The “Method” button .......................................................................................................... 14
SPSS Statistical output for two-group DA ................................................................................ 16
The “Analysis Case Processing Summary” table.................................................................. 16
The “Group Statistics” table ................................................................................................ 16
The “Tests of Equality of Group Means” table .................................................................... 16
The “Pooled Within-Group Matrices” and “Covariance Matrices” tables. ......................... 18
The “Box’s Test of Equality of Covariance Matrices” tables ............................................... 18
The “Eigenvalues” table....................................................................................................... 19
The “Wilks’ Lambda” table .................................................................................................. 21
The “Standardized Canonical Discriminant Function Coefficients” table ........................... 21
The “Structure Matrix” table ............................................................................................... 23
The “Canonical Discriminant Functions Coefficients” table ................................................ 23
The “Functions at Group Centroids” table .......................................................................... 24
The “Classification Processing Summary” table .................................................................. 24
The “Prior Probabilities for Groups” table .......................................................................... 25
The “Classification Function Coefficients” table ................................................................. 25

Copyright @c 2012 by G. David Garson and Statistical Associates Publishing Page 3

Single User License. Do not copy or post.


DISCRIMINANT FUNCTION ANALYSIS 2012 Edition

The “Casewise Statistics” table ........................................................................................... 26


Separate-groups graphs of canonical discriminant functions ............................................. 27
The “Classification Results” table ........................................................................................ 27
SPSS Statistical output for three-group MDA .......................................................................... 28
Overview and example ........................................................................................................ 28
MDA and DA similarities ...................................................................................................... 28
The “Eigenvalues” table....................................................................................................... 29
The “Wilks’ Lambda” table .................................................................................................. 29
The “Structure Matrix” table ............................................................................................... 30
The “Territorial Map” .......................................................................................................... 31
Combined-groups plot ......................................................................................................... 34
Separate-groups plots ......................................................................................................... 34
SPSS Statistical output for stepwise discriminant analysis ...................................................... 35
Overview .............................................................................................................................. 35
Example ............................................................................................................................... 35
Stepwise discriminant analysis in SPSS ............................................................................... 36
Assumptions.................................................................................................................................. 41
Proper specification ................................................................................................................. 41
True categorical dependent variables...................................................................................... 41
Independence........................................................................................................................... 41
No lopsided splits ..................................................................................................................... 41
Adequate sample size .............................................................................................................. 41
Interval data ............................................................................................................................. 42
Variance.................................................................................................................................... 42
Random error ........................................................................................................................... 42
Homogeneity of variances (homoscedasticity) ........................................................................ 42
Homogeneity of covariances/correlations............................................................................... 42
Absence of perfect multicollinearity ........................................................................................ 43
Low multicollinearity of the independents .............................................................................. 43

Copyright @c 2012 by G. David Garson and Statistical Associates Publishing Page 4

Single User License. Do not copy or post.


DISCRIMINANT FUNCTION ANALYSIS 2012 Edition

Linearity .................................................................................................................................... 43
Additivity .................................................................................................................................. 43
Multivariate normality ............................................................................................................. 43
Frequently Asked Questions ......................................................................................................... 44
Isn't discriminant analysis the same as cluster analysis?......................................................... 44
When does the discriminant function have no constant term? .............................................. 44
How important is it that the assumptions of homogeneity of variances and of multivariate
normal distribution be met? .................................................................................................... 44
In DA, how can you assess the relative importance of the discriminating variables?............. 44
Dummy variables ................................................................................................................. 45
In DA, how can you assess the importance of a set of discriminating variables over and above
a set of control variables? (What is sequential discriminant analysis?) .................................. 45
What is the maximum likelihood estimation method in discriminant analysis (logistic
discriminate function analysis)?............................................................................................... 45
What are Fisher's linear discriminant functions? .................................................................... 46
I have heard DA is related to MANCOVA. How so? ................................................................. 46
How does MDA work? .............................................................................................................. 46
How can I tell if MDA worked?................................................................................................. 46
For any given MDA example, how many discriminant functions will there be, and how can I
tell if each is significant? .......................................................................................................... 47
What are Mahalonobis distances? ........................................................................................... 47
How are the multiple discriminant scores on a single case interpreted in MDA? .................. 47
Likewise in MDA, there are multiple standardized discriminant coefficients - one set for each
discriminant function. In dichotomous DA, the ratio of the standardized discriminant
coefficients is the ratio of the importance of the independent variables. But how are the
multiple set of standardized coefficients interpreted in MDA? .............................................. 48
Are the multiple discriminant functions the same as factors in principal-components factor
analysis? ................................................................................................................................... 48
What is the syntax for discriminant analysis in SPSS? ............................................................. 48
Bibliography .................................................................................................................................. 50

Copyright @c 2012 by G. David Garson and Statistical Associates Publishing Page 5

Single User License. Do not copy or post.


DISCRIMINANT FUNCTION ANALYSIS 2012 Edition

Discriminant Function Analysis


Overview
Discriminant function analysis, also known as discriminant analysis or simply DA,
is used to classify cases into the values of a categorical dependent, usually a
dichotomy. If discriminant function analysis is effective for a set of data, the
classification table of correct and incorrect estimates will yield a high percentage
correct. Discriminant function analysis is found in SPSS under
Analyze>Classify>Discriminant. If the specified grouping variable has two
categories, the procedure is considered “discriminant analysis” (DA). If there are
more than two categories the procedure is considered “multiple discriminant
analysis” (MDA).

Multiple discriminant analysis (MDA) is a cousin of multiple analysis of variance


(MANOVA), sharing many of the same assumptions and tests. MDA is sometimes
also called discriminant factor analysis or canonical discriminant analysis.

While binary and multinomial logistic regression, treated in a separate Statistical


Associates “Blue Book” volume, is often used in place of DA or MDA respectively,
if the assumptions of discriminant analysis are met, it has greater power than
logistic regression: there is less chance of Type II errors - accepting a false null
hypothesis. If the data violate assumptions of discriminant analysis, outlined
below, then logistic regression may be preferred because it usually involves fewer
violations of assumptions (independent variables needn't be normally distributed,
linearly related, or have equal within-group variances), is robust, handles
categorical as well as continuous variables, and has coefficients which many find
easier to interpret. Logistic regression is preferred when data are not normal in
distribution or group sizes are very unequal.

There are several purposes for DA and/or MDA:

• To classify cases into groups using a discriminant prediction equation.


• To test theory by observing whether cases are classified as predicted.
• To investigate differences between or among groups.
• To determine the most parsimonious way to distinguish among groups.

Copyright @c 2012 by G. David Garson and Statistical Associates Publishing Page 6

Single User License. Do not copy or post.


DISCRIMINANT FUNCTION ANALYSIS 2012 Edition

• To determine the percent of variance in the dependent variable explained


by the independents.
• To determine the percent of variance in the dependent variable explained
by the independents over and above the variance accounted for by control
variables, using sequential discriminant analysis.
• To assess the relative importance of the independent variables in classifying
the dependent variable.
• To discard variables which are little related to group distinctions.
• To infer the meaning of MDA dimensions which distinguish groups, based
on discriminant loadings.

Discriminant analysis has basic two steps: (1) an F test (Wilks' lambda) is used to
test if the discriminant model as a whole is significant, and (2) if the F test shows
significance, then the individual independent variables are assessed to see which
differ significantly in mean by group and these are used to classify the dependent
variable.

Discriminant analysis shares all the usual assumptions of correlation, requiring


linear and homoscedastic relationships and untruncated interval or near interval
data. Like multiple regression and most statistical procedures, DA also assumes
proper model specification (inclusion of all important independents and exclusion
of causally extraneous but correlated variables). DA also assumes the dependent
variable is a true dichotomy since data which are forced into dichotomous coding
are truncated, attenuating correlation.

Key Terms and Concepts


Variables
Discriminating variables are the independent variables (predictor variables).
The criterion variable is the dependent variable, also called the grouping variable
in SPSS. It is the object of classification efforts.
Discriminant functions
A discriminant function, also called a canonical root, is a latent variable which is
created as a linear combination of discriminating (independent) variables, such
that L = b1x1 + b2x2 + ... + bnxn + c, where the b's are discriminant coefficients, the
x's are discriminating variables, and c is a constant. This is analogous to multiple

Copyright @c 2012 by G. David Garson and Statistical Associates Publishing Page 7

Single User License. Do not copy or post.


DISCRIMINANT FUNCTION ANALYSIS 2012 Edition

regression, but the b's are discriminant coefficients which maximize the distance
between the means of the criterion (dependent) variable. Note that the foregoing
assumes the discriminant function is estimated using ordinary least-squares, the
traditional method, but note maximum likelihood estimation is also possible.

There is one discriminant function for 2-group discriminant analysis, but for
higher order DA, the number of functions (each with its own cut-off value) is the
lesser of (g - 1), where g is the number of categories in the grouping variable, or p,
the number of discriminating (independent) variables. Each discriminant function
is orthogonal to the others. A dimension is simply one of the discriminant
functions when there are more than one, in multiple discriminant analysis.

The first function maximizes the differences between the values of the dependent
variable. The second function is orthogonal to it (uncorrelated with it) and
maximizes the differences between values of the dependent variable, controlling
for the first factor. And so on. Though mathematically different, each discriminant
function is a dimension which differentiates a case into categories of the
dependent (here, religions) based on its values on the independents. The first
function will be the most powerful differentiating dimension, but later functions
may also represent additional significant dimensions of differentiation.

Pairwise group comparisons


Pairwise group comparisons display the distances between group means of the
dependent variable in the multidimensional space formed by the discriminant
functions. Pairwise comparisons are not applicable to two-group DA, where there
is only one function. The pairwise group comparisons table gives an F test of
significance, based on Mahalanobis distances, of the distance of the group means.
This enables the researcher to determine if every group mean is significantly
distant from every other group mean. Also, the magnitude of the F values can be
used to compare distances between groups in multivariate space. In SPSS,
Analyze, Classify, Discriminant; check "Use stepwise method"; click Method, check
"F for pairwise distances."

Output statistics
DA and MDA output a variety of coefficients and tables to be discussed in
conjunction with examples below. Among these are eigenvalues, canonical
correlations, discriminant scores, discriminant coefficients, functions at group
centroids, and various measures of significance.
Copyright @c 2012 by G. David Garson and Statistical Associates Publishing Page 8

Single User License. Do not copy or post.


DISCRIMINANT FUNCTION ANALYSIS 2012 Edition

Examples
The sections which follow discuss two examples, one for DA and one for MDA.

DA example. Using a modified SPSS sample data file, GSS93 subset.sav, voter
participation in the 1992 presidential election (vote92, coded 1=voted, 2=did not
vote) is predicted from sex (sex, coded 1=male, 2=female), age (age in years),
educ (highest year of school completed), rincome91 (respondent’s 1991 income,
coded in 21 ascending income ranges), and self-classified liberalism (polviews,
coded from 1= extremely liberal to 7=extremely conservative).

MDA example. Using the same dataset, MDA is used to try to classify race (race,
coded 1=white, 2=black, 3=other) using the predictor variables educ, rincome91,
polviews, agewed (age when first wed), sibs (number of siblings), and rap (rap
music, coded from 1=like very much to 5=dislike very much).

SPSS user interface


The same user interface is used in SPSS for DA and for MDA, arrived at by
selecting Analyze>Classify>Discriminant. The dependent variable to be classified
or predicted is the grouping variable. After it is entered, the researcher clicks the
“Define Range” button, shown in the figure below, to enter its minimum and
maximum values. As illustrated below, this is 1, 2 for the grouping variable
vote92. Had this been MDA, the range would have been defined to include more
than two adjacent values.

Copyright @c 2012 by G. David Garson and Statistical Associates Publishing Page 9

Single User License. Do not copy or post.


DISCRIMINANT FUNCTION ANALYSIS 2012 Edition

The “Statistics” button


The “Statistics” button defaults to no statistical output, but as shown in the
corresponding dialog below, a variety of outputs may be selected. These are
discussed below, in the output section for this example.

The “Classify” button


The “Classify” button allows the researcher to determine the prior probabilities
and the covariance matrix used in computing output, and also allows the
researcher to request various supplementary displays and plots. The defaults are
shown in the figure below.

Copyright @c 2012 by G. David Garson and Statistical Associates Publishing Page 10

Single User License. Do not copy or post.


DISCRIMINANT FUNCTION ANALYSIS 2012 Edition

The nine selections in the classification dialog are discussed below.


1. Prior probabilities. The default “All groups equal” selection means
coefficients are not adjusted for prior knowledge of group sizes. “Compute
from group sizes” means coefficients are adjusted for group size. For the
example data, 71.1% were in the “Voted” group and 28.9% were in the “Did
not vote” group, so classification coefficients would be adjusted to increase
the likelihood of being in the “Voted” group and decrease the likelihood of
being in the “Did not vote” group. The choice of options depends on the
researcher’s assumptions. “All groups equal” assumes any given person is
equally likely to vote as not to vote. “Compute from group sizes” assumes
any given person is more likely to be a voter by a ratio of about 7:3.
Selecting “Compute from group sizes” will usually improve predictions. For
this example, with equal prior probabilities, 66.8% of observations were
classified correctly but 76.1% with group size proportional prior
probabilities. The example below selects “Compute from group sizes.”
2. Casewise results. If checked, this option prints the predicted group,
posterior probability, and discriminant scores for each observation. Usually
the researcher will also limit cases to the first 10 or 20 as otherwise output
can be extremely large. The example below selects this option, limiting
output to 10 cases.
3. The summary table outputs the prediction matrix, sometimes called the
confusion matrix. Rows are and columns are both . The number of cases
Copyright @c 2012 by G. David Garson and Statistical Associates Publishing Page 11

Single User License. Do not copy or post.


DISCRIMINANT FUNCTION ANALYSIS 2012 Edition

correctly and incorrectly assigned to each of the groups based on the


discriminant analysis. Sometimes called the "Confusion Matrix." The
example below selects and discusses this option.
4. Leave-one-out classification, if selected, causes coefficients for a given case
to be calculated using all observations in the dataset except the given one.
By comparing results with and without leave-one-out classification, this “U-
method” discriminant function analysis can be used to test the theory that
group characteristics outweigh the individual’s own characteristics in
determining group membership. Some researchers argue that leave-one-
out classification yields a better estimate of what classification results
would be in the population because it is a form of cross-validation. The
example below does not select this option.
5. Replace missing values with mean causes classification for cases with
missing values to be based on substituting the mean of independent
variables for the missing value rather than dropping cases listwise. Mean
substitution is now a derogated method of dealing with missing values.
Rather, multiple imputation is now the preferred method, discussed in the
separate Statistical Associates “Blue Book” volume on missing values
analysis and data imputation. The example below does not select this
option.
6. Use Covariance Matrix. If “Within-groups” is selected, the pooled
covariance matrix is the basis of calculations, whereas if “Separate-groups”
is selected, then the basis is the covariance matrix of the group to which
the observation belongs. When groups are large and relationships among
independent variables are similar across groups, this selection will yield
similar coefficients and classifications either way. The example accepts the
default “Within groups” selection.
7. Combined-groups plot. This option creates scatterplot of the first two
discriminant function values based on all observations (pooled groups).
When there is only one function (as there is in DA but not in MDA), SPSS
outputs a histogram rather than a scatterplot.
8. Separate-groups plots. This option outputs for each group a scatterplot of
the first two discriminant function values. When there is only one
significant function, SPSS outputs a histogram rather than a scatterplot.
9. Territorial map. This option outputs a plot of the boundaries used when
classifying observations based on function values. In the map, an asterisk
denotes the group mean for each group. When there is only one

Copyright @c 2012 by G. David Garson and Statistical Associates Publishing Page 12

Single User License. Do not copy or post.


DISCRIMINANT FUNCTION ANALYSIS 2012 Edition

discriminant function there is no output for this option, as for two-group


discriminant function analysis in the example below.
The “Save” button
The “Save” button makes it possible to save as additional columns in the active
dataset the predicted group memberships (the actual classifications of each case),
the discriminant scores, and the probabilities of group membership. By default, as
shown below, none of these are saved. To permanently save saved variables, the
researcher must select File>Save or Save As. Saved variables can be used in any
number of other statistical procedures. For instance, measures of association can
be computed by the crosstabs procedure in SPSS if the researcher saves the
predicted group membership for all cases and then crosstabulates with any
appropriate categorical variable.

The discriminant score is the value resulting from applying a discriminant function
formula to the data for a given case. A “Z score” is the discriminant score for
standardized data. To get discriminant scores in SPSS, check "Discriminant scores"
in the dialog above. One can also view the discriminant scores by clicking the
Classify button and checking "Casewise results."
The “Bootstrap” button
The “Bootstrap” button is shown below with defaults if “Perform bootstrapping”
is selected (not selected is the default, which grays out all selections in the
bootstrap dialog). Bootstrapping, which was not selected in the example below,
cannot be selected at the same time as requesting saved variables. The example
below does not select bootstrapping.

Copyright @c 2012 by G. David Garson and Statistical Associates Publishing Page 13

Single User License. Do not copy or post.


DISCRIMINANT FUNCTION ANALYSIS 2012 Edition

The “Method” button


The “Method” button becomes active only if the researcher sets the method
selection to “Use stepwise method” in the main discriminant function analysis
dialog shown above instead of the default “Enter independents together” (the
enter method). The figure below shows defaults for the method button dialog.
For the example below, the default enter method was accepted. Stepwise
methods are used in the exploratory phase of research and are derogated for
confirmatory analysis.

Copyright @c 2012 by G. David Garson and Statistical Associates Publishing Page 14

Single User License. Do not copy or post.


DISCRIMINANT FUNCTION ANALYSIS 2012 Edition

If the stepwise method is employed, selections in the method dialog discussed


below govern it operates.
1. Method section. By default, the stepwise method uses Wilks’ lambda as the
criterion for entering or removing independent variables in the discriminant
function equation. The variable which minimizes Wilks’ lambda (the one
which increases lambda the least) is judged the best variable to enter in the
next step. The researcher can override the default and select any of four
alternative criteria: unexplained variance (the best variable is the one
which minimizes the sum of unexplained variation between groups),
Mahalanobis distance (the best variable is the one which maximizes the
distances between groups), smallest F ratio (the best variable is the one
which maximizes the F ratio computed from Mahalanobis distances
between groups), and Rao's V (also known as the Lawley-Hotelling trace or
simply Hotelling’s trace, the best variable maximizes how much V increases,
and the researcher can specify the minimum increase value) .
2. Criteria section. By default, entry and removal F values are set and a
variable is entered in the stepwise model if its F value is exceeds the entry
value and is removed if its F value is less than the removal value. If the
entry value is reduced, more variables will qualify to enter the model. If the
removal value is increased, more variables will qualify to be removed from
the model. Alternatively, if “Use probability of F” is selected as a criterion,
similar entry and removal values are set in probability terms, with the
Copyright @c 2012 by G. David Garson and Statistical Associates Publishing Page 15

Single User License. Do not copy or post.


DISCRIMINANT FUNCTION ANALYSIS 2012 Edition

default being .05 for entry and .10 for removal. Using these defaults,
variables are added if the p significance level of F is less than .05 and are
removed if greater than .10.
3. Summary of steps is default output for stepwise discriminant function
analysis. At each step, statistics are displayed for all variables.
4. F for pairwise distances is not default output. If selected, a matrix is output
showing the pairwise F ratios for each pair of groups. In DA as opposed to
MDA, there is only one such ratio since there are only two groups.

SPSS Statistical output for two-group DA


In this section, statistical tables are discussed in the order output by SPSS.

The “Analysis Case Processing Summary” table


This table, not shown, lists how many cases were missing or had values beyond
the range specified by the “Define Range” button discussed above. It also shows
how many cases had a missing value on one of the independent variables. Finally,
the total number of valid cases is shown. The researcher should inspect this table
to consider if data imputation or dropping predictors with a high proportion of
missing values is called for.
The “Group Statistics” table
This table, also not shown, contains descriptive statistics on the dependent
variable by group (here, the voting and non-voting groups) as well as the pooled
total, for each of the independent variables, including means, standard
deviations, and valid (listwise) number of cases.
The “Tests of Equality of Group Means” table
In the table below, the smaller the Wilks' lambda, the more important the
independent variable to the discriminant function. Wilks' lambda is significant by
the F test for age, education, and income, but not for sex or polviews. The
researcher might consider dropping these variables from the model, preferable
one at a time starting with the most non-significant predictor. Coefficients will
change and at times may even flip between significance and non-significance
when the mode is specified differently, which is why “one at a time” is the
prudent approach. This is not done here for pedagogical reasons, to avoid
redundant discussion, but some researchers would use this table to refine the
Copyright @c 2012 by G. David Garson and Statistical Associates Publishing Page 16

Single User License. Do not copy or post.


DISCRIMINANT FUNCTION ANALYSIS 2012 Edition

specified model until all predictors were significant and only then consider
subsequent tables discussed below.

The smaller the Wilks' lambda for an independent variable, the more that variable
contributes to the discriminant function, so in the table above, education is the
variable contributing the most to classification of voters and non-voters. Lambda
varies from 0 to 1, with 1 meaning all group means are the same and any lower
value indicated difference in means across groups. Wilks' lambda is sometimes
called the U statistic.

If at least one independent variable is significant, then the model as a whole is


significant. However, there is an alternative overall test of model significance. The
researcher can obtain an ANOVA table in SPSS by selecting Analyze>Compare
Means>One-Way ANOVA, using discriminant scores from the “Save” button
results (which SPSS will label Dis1_1 or similar) as a dependent variable. The
dependent variable from discriminant analysis (here, race) becomes the “factor”
to enter in the one-way ANOVA dialog. The ANOVA table provides an F test,
where a "Sig." p value < .05, as in the output shown below, means the model
differentiates discriminant scores between the groups (here, the three races)
significantly better than chance (than a model with just the constant).

Copyright @c 2012 by G. David Garson and Statistical Associates Publishing Page 17

Single User License. Do not copy or post.


DISCRIMINANT FUNCTION ANALYSIS 2012 Edition

Note that dummy independent variables are more accurately tested with a Wilks’
lambda difference test than with Wilks' lambda as it appears in the table above.
The researcher may run a model with and without a set of dummies (ex., for
region, with values being East, West, North, and with South left out as the
reference level). The ratio of the F values for the two models may be tested. The
Wilks lambda for the model without the dummies is divided by Wilks lambda for
the model with the dummies, and an approximate F value for this ratio may be
computed using calculations reproduced in Tabachnick and Fidell (2001: 491).
SPSS does not directly support this test, which may also be used in any sequential
discriminant analysis, such as where the models are with and without a set of
control variables.
The “Pooled Within-Group Matrices” and “Covariance Matrices” tables.
These tables, not illustrated, show the covariance and correlation matrices overall
(the pooled table) and by group (the “Covariance Matrices” table). If covariances
and differences vary markedly by group, this may lead the researcher to select the
“Separate-groups” rather than “Within-groups” “Use covariance option” of the
classification dialog discussed above.
The “Box’s Test of Equality of Covariance Matrices” tables
Box’s M is a statistical test of whether the covariance matrices differ by group. As
such it is a more accurate test than visual inspection of the “Covariance Matrices”
table discussed above. The “Sig.” value in the “Test Results” table illustrated
below should be non-significant in a DA model using the default classification
setting of “Within-groups” classification discussed above.

Copyright @c 2012 by G. David Garson and Statistical Associates Publishing Page 18

Single User License. Do not copy or post.


DISCRIMINANT FUNCTION ANALYSIS 2012 Edition

When sample size is large, even very small differences in covariance matrices may
be found significant by Box's M. Moreover, even though DA may be robust even
when the assumption of multivariate normality is violated, but Box’s M is very
sensitive to that assumption being met. The Box’s M test is usually ignored if in
the “Log Determinants” table shown above, the log determinants of the two
groups are similar. If the determinants are markedly dissimilar, the researcher
may opt for quadratic DA (not supported by SPSS) or may check “Separate-
groups” in the “Classification” button dialog discussed above. There is also the
option of running the model on a “Within-groups” and on a “Separate-groups”
covariance basis and seeing if results are substantively similar, in which case a
significant Box’s M would be ignored.
The “Eigenvalues” table
Eigenvalues, also called characteristic roots, reflect the ratio of importance of the
discriminant functions (equations representing dimensions) used to classify
observations. There is one eigenvalue for each discriminant function. For two-
group DA, there is only one discriminant function and one eigenvalue, which
accounts for 100% of the explained variance. Therefore the researcher cannot

Copyright @c 2012 by G. David Garson and Statistical Associates Publishing Page 19

Single User License. Do not copy or post.


DISCRIMINANT FUNCTION ANALYSIS 2012 Edition

compare eigenvalues, making the “Eigenvalues” table of low utility in DA. In


MDA, however, where there are three or more groups, there are multiple
discriminant functions, with the first being the largest, the second the next most
important in explanatory power, etc., and the ratios show relative importance.
The eigenvalues assess relative importance because they reflect the percents of
variance explained in the model, cumulating to 100% for all functions. That is, the
ratio of the eigenvalues indicates the relative discriminating power of the
discriminant functions. If the ratio of two eigenvalues is 1.4, for instance, then the
first discriminant function accounts for 40% more between-group variance in the
dependent categories than does the second discriminant function. This is similar
to the interpretation of eigenvalues in factor analysis since factors in factor
analysis correspond to discriminant functions in discriminant function analysis.

The “canonical correlation”, Rc or R*, is a measure of the association between the


groups formed by the dependent and the given discriminant function. The
canonical correlation of any discriminant function is displayed in SPSS by default
as a column in the "Eigenvalues" output table, as shown above. There is one
canonical correlation per discriminant function. When Rc is zero, there is no
relation between the groups and the function. When the canonical correlation is
large, there is a high correlation between the discriminant function and the
groups. Squared canonical correlation, Rc2, is the percent of variation in the
dependent discriminated by the set of independents in DA or MDA as expressed
in the given discriminant function. Rc is used to tell how much each function is
useful in determining group differences. The canonical correlation of each
Copyright @c 2012 by G. David Garson and Statistical Associates Publishing Page 20

Single User License. Do not copy or post.


DISCRIMINANT FUNCTION ANALYSIS 2012 Edition

discriminant function is also the correlation of that function with the discriminant
scores. A canonical correlation close to 1 means that nearly all the variance in the
discriminant scores can be attributed to group differences explained by the given
function. Note that for two-group DA, the canonical correlation is equivalent to
the Pearsonian correlation of the discriminant scores with the grouping variable.

The “relative percentage” of a discriminant function equals a function's


eigenvalue divided by the sum of all eigenvalues of all discriminant functions in
the model. Thus it is the percent of discriminating power for the model associated
with a given discriminant function. Relative % is used to tell how many functions
are important. One may find that only the first two or so eigenvalues are of
importance. Note that relative % and Rc do not have to be correlated.

The “Wilks’ Lambda” table


Shown in the figure above, this is model Wilks’ lambda, testing the significance of
the discriminant function. For DA, this is equivalent to testing the significance of
the model as a whole, here shown to be significant. For MDA, if the first
discriminant function is significant, the model as a whole is significant. This use of
model Wilks’ lambda is not to be confused with variable Wilks’ lambda discussed
above with reference to the “Tests of Equality of Group Means” table. A
significant model Wilks’ lambda means the researcher can reject the null
hypothesis that the two or more groups have the same mean discriminant
function scores and can conclude the model is discriminating.
The “Standardized Canonical Discriminant Function Coefficients” table
The standardized discriminant function coefficients in the table below serve the
same purpose as beta weights in multiple regression: they indicate the relative
importance of the independent variables in predicting (in regression) or
classifying (in DA) the dependent variable. Standardized discriminant function
coefficients reflect the semi-partial contribution of each variable to each of the
discriminant functions. The semi-partial contribution is the unique effect size
controlling the independent variables but not the dependent variable for other
independent variables in the equation. They are roughly analogous to beta
weights in OLS regression in that standardized regression coefficients are also
semi-partial coefficients. Standardized discriminant function coefficients should
be used to assess the relative importance of each independent variable's unique
contribution to the discriminant function. Structure coefficients, discussed below,

Copyright @c 2012 by G. David Garson and Statistical Associates Publishing Page 21

Single User License. Do not copy or post.


DISCRIMINANT FUNCTION ANALYSIS 2012 Edition

are preferred if the researcher wishes to impute meaningful labels to the


discriminant functions.
Note that importance is assessed relative to the model being analyzed. Addition
or deletion of variables in the model can change discriminant coefficients
markedly. Also, in MDA, where there are more than two groups of the
dependent, the standardized discriminant coefficients do not tell the researcher
between which groups the variable is most or least discriminating. For this
purpose, group centroids and factor structure are examined.
The table below shows that in the current model, education has the highest
unique contribution to the single discriminant function in this DA example.

Copyright @c 2012 by G. David Garson and Statistical Associates Publishing Page 22

Single User License. Do not copy or post.


DISCRIMINANT FUNCTION ANALYSIS 2012 Edition

The “Structure Matrix” table


Structure coefficients, also called structure correlations or discriminant loadings,
are the correlations between a given independent variable and the discriminant
scores associated with a given discriminant function. The “Structure Matrix” table
is sometimes called the canonical structure matrix or factor structure matrix. In
the figure above, education has the highest correlation with the single
discriminant function (two-group DA always has just one function).
In contrast to standardized canonical discriminant function coefficients discussed
above, structure coefficients are whole (not partial) coefficients, similar to
correlation coefficients. They reflect the uncontrolled association of the
discriminant scores with the criterion variable. That is, the structure coefficients
indicate the simple correlations between the variables and the discriminant
functions. The structure coefficients are used to impute meaningful labels to the
discriminant functions when this is a research goal. Standardized discriminant
function coefficients discussed above are preferred when the research goal is to
assess the importance of each independent variable's unique contribution to the
discriminant function.

Technically, structure coefficients are pooled within-groups correlations between


the independent variables and the standardized canonical discriminant functions.
When the dependent has more than two categories there will be more than one
discriminant function. In that case, there will be multiple columns in the table,
one for each function. The correlations then serve like factor loadings in factor
analysis -- by considering the set of variables that load most heavily on a given
dimension, the researcher may infer a suitable label for that dimension.

Thus the structure coefficients show the order of importance of the discriminating
variables by total correlation, whereas the standardized discriminant coefficients
show the order of importance by unique contribution. The sign of the structure
coefficient also shows the direction of the relationship. For multiple discriminant
analysis, the structure coefficients additionally allow the researcher to see the
relative importance of each independent variable on each dimension.

The “Canonical Discriminant Functions Coefficients” table


This table contains unstandardized discriminant function coefficients which are
used like unstandardized regression (b) coefficients in multiple regression. That is,
they are used to construct the actual discriminant function equation which can be

Copyright @c 2012 by G. David Garson and Statistical Associates Publishing Page 23

Single User License. Do not copy or post.


DISCRIMINANT FUNCTION ANALYSIS 2012 Edition

used to classify new cases just as unstandardized regression coefficients are used
to construct the prediction equation. In the table shown above, the canonical
discriminant function coefficient for education is .288 and that is its value (slope)
in the discriminant function equation for the first (and in DA, only) function.
Unstandardized discriminant function coefficients represent an intermediate step
in discriminant function analysis and usually are not reported in research findings.
The constant plus the sum of products of the unstandardized coefficients with the
observations yields the discriminant scores. That is, unstandardized discriminant
coefficients are the regression-like b coefficients in the discriminant function, in
the form L = b1x1 + b2x2 + ... + bnxn + c, where L is the latent variable formed by
the discriminant function, the b's are discriminant coefficients, the x's are
discriminating variables, and c is a constant. The discriminant function coefficients
are partial coefficients, reflecting the unique contribution of each variable to the
classification of the criterion variable.
If one clicks the Statistics button in SPSS after running discriminant analysis and
then checks "Unstandardized coefficients," then SPSS output will include the
unstandardized discriminant coefficients.
The “Functions at Group Centroids” table
Functions at group centroids are the mean discriminant scores for each of the
dependent variable categories for each of the discriminant functions. In the figure
above, for instance, the mean discriminant score for function 1 (the only function
in DA) is .236. Two-group discriminant analysis has two centroids, one for each
group. In a well-discriminating model, the means should be well apart. The closer
the means, the more errors of classification there likely will be.
Functions at group centroids are used to establish the cutting point for classifying
cases. If the two groups are of equal size, the best cutting point is half way
between the values of the functions at group centroids (that is, the average). If
the groups are unequal, the optimal cutting point is the weighted average of the
two values. Cases which evaluate on the function above the cutting point are
classified as "did not vote," while those evaluating below the cutting point are
evaluated as "Voted."
The “Classification Processing Summary” table
This table, not illustrated, reports the number of cases with missing or out-of-
range codes on the dependent variable, and also reports the number of cases

Copyright @c 2012 by G. David Garson and Statistical Associates Publishing Page 24

Single User License. Do not copy or post.


DISCRIMINANT FUNCTION ANALYSIS 2012 Edition

with at least one missing discriminating variable. Both types of cases are excluded
from analysis. The table also reports the remaining cases used in output.
The “Prior Probabilities for Groups” table
This table reminds the researcher of the prior probabilities assumed for purposes
of classification. If prior probabilities were set to “All groups equal” in the
classification dialog discussed above, then for DA, which has two dependent
groups, this table will report both prior probabilities as being .500. If, as in this
example, the prior probability option is set to “Compute from group sizes”, then
the table below is output. The coefficient in the “Prior” column is the
“Unweighted” value for that row divided by the total: it is that groups percent of
the sample. Prior probabilities are used to make classification in the more
numerous group more likely.

The “Classification Function Coefficients” table

The table illustrated below is output when "Fisher's" is checked under "Function
Coefficients" in the "Statistics" option of discriminant analysis discussed above.
Two sets (one for each dependent group in DA) of unstandardized linear
discriminant coefficients are calculated, which can be used to classify cases. This is
the classical method of classification, now little used.

Copyright @c 2012 by G. David Garson and Statistical Associates Publishing Page 25

Single User License. Do not copy or post.


DISCRIMINANT FUNCTION ANALYSIS 2012 Edition

The “Casewise Statistics” table


If “Casewise results” is checked in the classification dialog shown above, a table
like that below is output. For each case, the table lists the actual group, the
predicted group based on largest posterior probabilities, the prior probability (the
probability of the observed group score given membership in the predicted
group), the posterior probability (the chance the case belongs to the predicted
group, based on the independents), the square Mahalanobis distance of the case
to the group centroid (large scores indicate outliers), and the discriminant score
for the case. The case is classified based on the discriminant score in relation to
the cutoff (not shown). Misclassified cases are marked with asterisks. The "Second
Highest Group" columns show the posterior probabilities and Mahalanobis
distances for the case had the case been classed based on the second highest
posterior probability. Since there are only two groups in this example, the
"second highest" is equivalent to the other of the two groups. The researcher sets
the number of cases printed in the classification dialog, displayed earlier above.
Mahalanobis distances are discussed further in the FAQ section below.

Copyright @c 2012 by G. David Garson and Statistical Associates Publishing Page 26

Single User License. Do not copy or post.


DISCRIMINANT FUNCTION ANALYSIS 2012 Edition

Separate-groups graphs of canonical discriminant functions


The graphs below result from checking "Combined-groups" and "Separate-
groups" under "Plots" in the "Classify" options of discriminant analysis, shown
earlier above. In MDA, discriminant function plots, also called canonical plots, can
be created in which the two axes are two of the discriminant functions. In DA,
however, a histogram such as that illustrated below is printed since there is only
one discriminant function. In a well-fitting discriminant function model, the bar
chart will have most cases near the mean, with small tails.

The “Classification Results” table


The “Classification Results” table, also called a classification matrix, confusion,
assignment, or prediction matrix table, is used to assess the performance of DA.
That is, it provides a type of effect size measure. It is simply a table in which the
rows are the observed categories of the dependent and the columns are the
predicted categories of the dependents. When prediction is perfect, all cases will
lie on the diagonal. The percentage of cases on the diagonal is the percentage of
correct classifications. This percentage is called the hit ratio. For the table below,
as shown in table note a, the hit rate is 76.1%.

The hit ratio (here, 76.1%) must be compared not to zero or even to 50%, but to
the percent that would have been correctly classified by chance alone.

Copyright @c 2012 by G. David Garson and Statistical Associates Publishing Page 27

Single User License. Do not copy or post.


DISCRIMINANT FUNCTION ANALYSIS 2012 Edition

• Perhaps the most common criterion for “by chance alone” is obtained by
multiplying the prior probabilities times the group sizes, summing for all
groups, and dividing the sum by N. Deriving the numbers from the prior
probabilities table shown earlier above,

((.739*696) + (.261*246))/942 = 61.4%

• An alternative criterion for “by chance alone” is the percentage correct if


classifying all cases in the most numerous category. For this example,
voters are the most numerous category, and classifying all cases as voters
would result in a hit rate of 73.9%.

SPSS Statistical output for three-group MDA


Overview and example
In this section, only differences in MDA output compared to DA output are
discussed. The example below uses the same dataset, but this time trying to
classify race (race, coded 1=white, 2=black, 3=other) using the predictor variables
educ, rincome91, polviews, agewed (age when first wed), sibs (number of
siblings), and rap (rap music, coded from 1=like very much to 5=dislike very
much). Can races as coded above be distinguished by these six discriminating
variables?
MDA and DA similarities
The dialogs for MDA are all the same as for DA, except, of course, on the main
dialog shown at the outset above, the dependent variable is one with three or
Copyright @c 2012 by G. David Garson and Statistical Associates Publishing Page 28

Single User License. Do not copy or post.


DISCRIMINANT FUNCTION ANALYSIS 2012 Edition

more categories (here, race). For the following tables the reader is referred to
similar DA output discussed above:
• Analysis Case Processing Summary table
• Group Statistics table
• Tests of Equality of Group Means table
• Pooled Within-Groups Matrices table
• Covariance Matrices table
• Log Determinants table
• Test Results table (for Box’s M)
• Standardized Canonical Discriminant Function Coefficients table
• Canonical Discriminant Function Coefficients table
• Functions at Group Centroids table
• Classification Processing Summary table
• Prior Probabilities for Groups table
• Classification Function Coefficients table
• Casewise Statistics table
• Classification Results table
The “Eigenvalues” table
As discussed above, eigenvalues reflect the ratio of importance of the
discriminant functions. Since DA has only 1 function but MDA has more, the
“ratio” aspect is more easily seen in MDA. Here the first discriminant function is
able to account for 95% of the variance accounted for by the model, while the
second function accounts for the other 5%. Note this is not the same as variance
accounted for by race as DA and MDA percentages in this table always add to
100%. Rather, the eigenvalues show the relative importance of the discriminant
functions.

The “Wilks’ Lambda” table


As discussed above, this table contains model Wilks’ lambda, which tests the
significance of each discriminant function. The first row, “1 through 2”, tests the
Copyright @c 2012 by G. David Garson and Statistical Associates Publishing Page 29

Single User License. Do not copy or post.


DISCRIMINANT FUNCTION ANALYSIS 2012 Edition

significance of both discriminant functions, equivalent to testing the significance


of the model. The second row is the test of the significance of the second
discriminant function, here not significant.

The “Structure Matrix” table


Discussed above, structure coefficients are the correlations between a given
independent variable and the discriminant scores associated with a given
discriminant function. While the MDA table below is essentially similar to that in
DA, the presence of more than one function makes more meaningful a major use
of structure coefficients: imputing labels to the functions. Looking at which
variables load most heavily on which functions, it can be said that function 1
(which explains the bulk of the variance accounted for by the model, as shown by
the eigenvalues) is associated with more educated respondents who dislike rap
music (or the opposite, less educated respondents who like rap music, since each
function represents a dimension with two poles). Discriminant function 2
represents younger (since -.540 is negative) conservative (since higher values of
polviews were more conservative) women (since women were the higher coded
value, 2) respondents, or the opposite. However, function 2 was weak and non-
significant.

Copyright @c 2012 by G. David Garson and Statistical Associates Publishing Page 30

Single User License. Do not copy or post.


DISCRIMINANT FUNCTION ANALYSIS 2012 Edition

The “Territorial Map”


This graph is unique to MDA since it requires plotting two discriminant functions.
In MDA, territorial maps are discriminant function plots, also called canonical
plots, in which the two axes are two of the discriminant functions. The map is also
called the discriminant function space. For the current three-group MDA example,
there are only two functions. In the map below, the x axis is function1 and the y
axis is function 2. The dimensional meaning of the axes is determined by looking
at the structure coefficients, discussed above.
Circled asterisks within the map locate the centroids of each category being
analyzed (here three categories of race; the color highlight is not part of SPSS
output). That the centroids are close together suggests the model is not very
discriminating. The vertical line of “21” symbols show where function
discriminates between 1 (white) and 2 (black). Cases to the left of the 2’s on
function 1 are classed as 2. The line of 1’s and to the right are the territory classed
as race=1. That the centroids are all in race=1=white territory shows this model
will make many classification errors for blacks and other race. That “3” symbols
are not even represented on the map shows the model does not discriminate for
race = 3 (other race). In general, each group has a numeric symbol: 1, 2, 3, etc.
Cases falling within the boundaries formed by the 2's, for instance, are classified
as 2. The individual cases are not shown in territorial maps under SPSS.

Copyright @c 2012 by G. David Garson and Statistical Associates Publishing Page 31

Single User License. Do not copy or post.


DISCRIMINANT FUNCTION ANALYSIS 2012 Edition

Territorial map areas appear more clearly in the map below, in which different
variables were used to predict the categories of race (colors and labels added).

Copyright @c 2012 by G. David Garson and Statistical Associates Publishing Page 32

Single User License. Do not copy or post.


DISCRIMINANT FUNCTION ANALYSIS 2012 Edition

Copyright @c 2012 by G. David Garson and Statistical Associates Publishing Page 33

Single User License. Do not copy or post.


DISCRIMINANT FUNCTION ANALYSIS 2012 Edition

Combined-groups plot
Instead of the histogram given in DA, an MDA request for a combined-groups plot
(in the “Classify” button dialog) generates a scatterplot such as that shown below.
That the group centroids are close suggests a weak model which does not
discriminate well. While discriminant function 1 does discriminate somewhat
between blacks (green circles tending to be on the minus side of function 1) and
whites (purple circles tending to be on the positive side), there is lots of overlap.
Moreover, the grey circles representing race=3=other seem randomly placed.

Separate-groups plots
Similar scatterplots, not shown, can be output for each level of the dependent
variable in MDA (for each race in this example).

Copyright @c 2012 by G. David Garson and Statistical Associates Publishing Page 34

Single User License. Do not copy or post.


DISCRIMINANT FUNCTION ANALYSIS 2012 Edition

SPSS Statistical output for stepwise discriminant analysis


Overview

Stepwise discriminant function analysis, like other stepwise procedures, is usually


restricted to exploratory (as opposed to confirmatory) research. Stepwise
procedures select the most correlated independent first, remove the variance in
the dependent, then select the second independent which most correlates with
the remaining variance in the dependent, and so on until selection of an
additional independent does not change one of a number of researcher-set
criteria by a significant amount.. As in multiple regression, there are both forward
(adding variables) and backward (removing variables) stepwise versions.

In SPSS there are several available criteria for entering or removing new variables
at each step: Wilks’ lambda is the default. Others are unexplained variance,
Mahalanobis’ distance, smallest F ratio, and Rao’s V. The researcher typically sets
the critical significance level by setting the "F to remove" in most statistical
packages. These methods were discussed previously above.

Stepwise procedures are sometimes said to eliminate the problem of


multicollinearity, but this is misleading. The stepwise procedure uses an
intelligent criterion to set order, but it certainly does not eliminate the problem of
multicollinearity. To the extent that independents are highly intercorrelated, the
standard errors of their standardized discriminant coefficients will be inflated and
it will be difficult to assess the relative importance of the independent variables.

The researcher should keep in mind that the stepwise method capitalizes on
chance associations and thus significance levels are worse (that is, numerically
higher) than the true alpha significance rate reported. Thus a reported
significance level of .05 may correspond to a true alpha rate of .10 or worse. For
this reason, if stepwise discriminant analysis is employed, use of cross-validation
is recommended. In the split halves method, the original dataset is split in two at
random and one half is used to develop the discriminant equation and the other
half is used to validate it.

Example
In this section, only differences in stepwise MDA output compared to DA and
MDA output are discussed. The example below uses the same dataset as for MDA

Copyright @c 2012 by G. David Garson and Statistical Associates Publishing Page 35

Single User License. Do not copy or post.


DISCRIMINANT FUNCTION ANALYSIS 2012 Edition

above, trying to classify race (race, coded 1=white, 2=black, 3=other) using the
predictor variables educ, rincome91, polviews, agewed (age when first wed), sibs
(number of siblings), and rap (rap music, coded from 1=like very much to 5=dislike
very much). What is the “optimal” model produced by stepwise discriminant
function analysis?
Stepwise discriminant analysis in SPSS
As illustrated below, stepwise discriminant analysis is requested in the main SPSS
discriminant analysis dialog, by checking the “Use stepwise method” radio button.

Nearly all output is identical to that for the MDA example above using the “Enter”
method, except that it is presented in steps. Predictor variables are added to or
removed from the model according to criteria set by the “Method” button,
configured for this example as shown below.

Copyright @c 2012 by G. David Garson and Statistical Associates Publishing Page 36

Single User License. Do not copy or post.


DISCRIMINANT FUNCTION ANALYSIS 2012 Edition

The steps in stepwise analysis are clearly outlined in the “Variables


Entered/Removed” and “Variables in the Analysis” tables shown in output below.
1. At step 0, no variables are in the analysis. At step 1, sibs (number of
brothers and sisters) is entered as the single best classifier of race.
2. At step 2, rap music is added as a second discriminant variable.
3. At step 3, age when first married is added as a third discriminant variable.
The F-significance to remove was set as .10 under the method button
above, and it is .05 for age when first married, so it is retained.

Copyright @c 2012 by G. David Garson and Statistical Associates Publishing Page 37

Single User License. Do not copy or post.


DISCRIMINANT FUNCTION ANALYSIS 2012 Edition

Stepwise Wilks' lambda appears in the "Variables in the Analysis" table of


stepwise DA output above, after the "Sig. of F. to Remove" column. The Step 1
model has no entry as removing the first variable is removing the only variable.
The Step 2 model has two predictors, each with a Wilks' lambda, representing
what model Wilks' lambda would be if that variable were dropped, leaving only
the other one. The higher the stepwise Wilks' lambda, the more important the
variable in classifying the grouping variable (here, race).
To understand why a fourth variable was not included in the analysis, look at the
“Variables not in the Analysis” table below. As the row for step 3 for this table
shows, all remaining variables had an F-significance higher than the F-removal
value of .10, so all were removed from the final model for analysis.

Copyright @c 2012 by G. David Garson and Statistical Associates Publishing Page 38

Single User License. Do not copy or post.


DISCRIMINANT FUNCTION ANALYSIS 2012 Edition

Stepwise Wilks' lambda also appears in the "Variables Not in the Analysis" table of
stepwise DA output, after the "Sig. of F to Enter" column. Here the criterion is
reversed: the variable with the lowest stepwise Wilks' lambda is the best
candidate to add to the model in the next step. For instance, in the table below,
for step 1 the lowest Wilks’ lambda is the .886 for rap music and that is the
variable added in step 2.
The stepwise method for these data thus employed three variables. The enter
method presented earlier retained all variables entered in the initial discriminant
function analysis dialog (six variables). Since the stepwise method specified a
different model, the coefficients in ensuing tables differ somewhat from those for
the enter method model. That is, even non-significant discriminant variables will
affect the coefficients. This is why, as mentioned earlier, some researchers use
the enter method for confirmatory purposes but drop non-significant predictors
one at a time until all those remaining in the analysis are significant. For stepwise
models, all variables in the final analysis are always significant.

Copyright @c 2012 by G. David Garson and Statistical Associates Publishing Page 39

Single User License. Do not copy or post.


DISCRIMINANT FUNCTION ANALYSIS 2012 Edition

Copyright @c 2012 by G. David Garson and Statistical Associates Publishing Page 40

Single User License. Do not copy or post.


DISCRIMINANT FUNCTION ANALYSIS 2012 Edition

Assumptions
Proper specification
The discriminant coefficients can change substantially if variables are added to or
subtracted from the model.

True categorical dependent variables


The dependent variable is a true dichotomy in DA. When the range of a true
underlying continuous variable is constrained to form a dichotomy, correlation is
attenuated (biased toward underestimation). One should not dichotomize a
continuous variable simply for the purpose of applying discriminant function
analysis. To a progressively lesser extent, the same considerations apply to
trichotomies and higher in MDA. All cases must belong to a group formed by the
dependent variable. The groups must be mutually exclusive, with every case
belonging to only one group.

Independence
All cases must be independent. Thus one cannot have correlated data (not
before-after, panel, or matched pairs data, for instance).

No lopsided splits
Group sizes of the dependent should not be extremely different. If this
assumption is violated, logistic regression is preferred. Some authors use 90:10 or
worse as the criterion in DA.

Adequate sample size


There should be at least two cases for each category of the dependent and the
maximum number of independents is sample size minus 2. However, it is
recommended that there be at least four or five times as many cases as
independent variables.

Copyright @c 2012 by G. David Garson and Statistical Associates Publishing Page 41

Single User License. Do not copy or post.


DISCRIMINANT FUNCTION ANALYSIS 2012 Edition

Interval data
The independent variable is or variables are interval. As with other members of
the regression family, dichotomies, dummy variables, and ordinal variables with
at least 5 categories are commonly used as well.

Variance
No independents should have a zero standard deviation in one or more of the
groups formed by the dependent.

Random error
Errors (residuals) are randomly distributed.

Homogeneity of variances (homoscedasticity)


Within each group formed by the dependent, the variance of each interval
independent should be similar between groups. That is, the independents may
(and will) have different variances one from another, but for the same
independent, the groups formed by the dependent should have similar variances
and means on that independent. Discriminant analysis is highly sensitive to
outliers. Lack of homogeneity of variances may indicate the presence of outliers in
one or more groups. Lack of homogeneity of variances will mean significance tests
are unreliable, especially if sample size is small and the split of the dependent
variable is very uneven. Lack of homogeneity of variances and presence of outliers
can be evaluated through scatterplots of variables.

Homogeneity of covariances/correlations
Within each group formed by the dependent, the covariance/correlation between
any two predictor variables should be similar to the corresponding
covariance/correlation in other groups. That is, each group has a similar
covariance/correlation matrix as reflected in the log determinants (see "Large
samples" discussion above).

Copyright @c 2012 by G. David Garson and Statistical Associates Publishing Page 42

Single User License. Do not copy or post.


DISCRIMINANT FUNCTION ANALYSIS 2012 Edition

Absence of perfect multicollinearity


If one independent variables is very highly correlated with another, or one is a
function (ex., the sum) of other independents, then the tolerance value for that
variable will approach 0 and the matrix will not have a unique discriminant
solution. Such a matrix is said to be ill-conditioned. Tolerance is discussed in the
section on regression.

Low multicollinearity of the independents


To the extent that independents are correlated, the standardized discriminant
function coefficients will not reliably assess the relative importance of the
predictor variables. In SPSS, one check on multicollinearity is looking at the
"pooled within-groups correlation matrix," which is output when one checks
"Within-groups correlation" from the Statistics button in the DA dialog. "Pooled"
refers to averaging across groups formed by the dependent. Note that pooled
correlation can be very different from normal (total) correlation when two
variables are less correlated within groups than between groups (ex., race and
illiteracy are little correlated within region, but the total r is high because there
are proportionately more blacks in the South where illiteracy is high). When
assessing the correlation matrix for multicollinearity, a rule of thumb is no r > .90
and not several > .80.

Linearity
DA and MDA assume linearity (do not take into account exponential terms unless
such transformed variables are added as additional independents).

Additivity
DA and MDA assume additivity (do not take into account interaction terms unless
new crossproduct variables are added as additional independents).

Multivariate normality
For purposes of significance testing, predictor variables are assumed to follow
multivariate normal distributions. That is, each predictor variable has a normal
distribution about fixed values of all the other independents. As a rule of thumb,

Copyright @c 2012 by G. David Garson and Statistical Associates Publishing Page 43

Single User License. Do not copy or post.


DISCRIMINANT FUNCTION ANALYSIS 2012 Edition

discriminant analysis will be robust against violation of this assumption if the


smallest group has more than 20 cases and the number of independents is fewer
than six. When non-normality is caused by outliers rather than skew, violation of
this assumption has more serious consequences as DA is highly sensitive to
outliers. If this assumption is violated, logistic regression is preferred.

Frequently Asked Questions


Isn't discriminant analysis the same as cluster analysis?
No. In discriminant analysis the groups (clusters) are determined beforehand and
the object is to determine the linear combination of independent variables which
best discriminates among the groups. In cluster analysis the groups (clusters) are
not predetermined and in fact the object is to determine the best way in which
cases may be clustered into groups.

When does the discriminant function have no constant term?


When the data are standardized or are deviations from the mean.

How important is it that the assumptions of homogeneity of variances


and of multivariate normal distribution be met?
Lachenbruch (1975) indicates that DA is relatively robust even when there are
modest violations of these assumptions. Klecka (1980) points out that
dichotomous variables, which often violate multivariate normality, are not likely
to affect conclusions based on DA.

In DA, how can you assess the relative importance of the


discriminating variables?
The same as in regression, by comparing beta weights, which are the standardized
discriminant coefficients. If not output directly by one's statistical package (SPSS
does), one may obtain beta weights by running DA on standardized scores. That
is, betas are standardized discriminant function coefficients. The ratio of the betas

Copyright @c 2012 by G. David Garson and Statistical Associates Publishing Page 44

Single User License. Do not copy or post.


DISCRIMINANT FUNCTION ANALYSIS 2012 Edition

is the relative contribution of each variable. Note that the betas will change if
variables are added or deleted from the equation.

Dummy variables

As in regression, dummy variables must be assessed as a group, not on the basis


of individual beta weights. This is done through hierarchical discriminant analysis,
running the analysis first with, then without the set of dummies. The difference in
the squared canonical correlation indicates the explanatory effect of the set of
dummies.

Alternatively, for interval independents, one can correlate the discriminant


function scores with the independents. The discriminating variables which matter
the most to a particular function will be correlated highest with the DA scores.

In DA, how can you assess the importance of a set of discriminating


variables over and above a set of control variables? (What is
sequential discriminant analysis?)
As in sequential regression, in sequential discriminant analysis, control variables
may be entered as independent variables separately first. In a second run, the
discriminating variables of interest may be entered. . The difference in the
squared canonical correlation indicates the explanatory effect of discriminating
variables over and above the set of control variables. Alternatively, one could
compare the hit rate in the two classification tables.

What is the maximum likelihood estimation method in discriminant


analysis (logistic discriminate function analysis)?
Using MLE, a discriminant function is a function of the form T = k1X1 + k2X2 + ... +
knXn, where X1...Xn are the differences between the two groups on the ith
independent variable, k1...kn are the logit coefficients, and T is a function which
classes the case into group 0 or group 1. If the data are unstandardized, there is
also a constant term. The discriminant function arrives at coefficients which set
the highest possible ratio of between-group to within-groups variance (similar to
the ANOVA F test, except that in DA the group variable is the dependent rather
than the independent). This method, called logistic discriminant function analysis,
is supported by SPSS.
Copyright @c 2012 by G. David Garson and Statistical Associates Publishing Page 45

Single User License. Do not copy or post.


DISCRIMINANT FUNCTION ANALYSIS 2012 Edition

What are Fisher's linear discriminant functions?


The classical method of discriminant classification calculated one set of
discriminant function coefficients for each dependent category, using these to
make the classifications. SPSS still outputs these coefficients if you check the
"Fisher's" box under the Statistics option in discriminant function analysis. This
outputs a table with the discriminant functions (dimensions) as columns and the
independent variables plus constant as rows. The Fisher coefficients are used
down the columns to compute a discriminant score for each dimension and the
case is classified in the group generating the highest score. This method gives the
same results as using the discriminant function scores but is easier to compute.

I have heard DA is related to MANCOVA. How so?


Discriminant analysis can be conceptualized as the inverse of MANCOVA.
MANCOVA can be used to see the effect on multiple dependents of a single
categorical independent, while DA can be used to see the effect on a categorical
dependent of multiple interval independents. The SPSS MANOVA procedure,
which also covers MANCOVA, can be used to generate discriminant functions as
well, though in practical terms this is not the easiest route for the researcher
interested in DA.

How does MDA work?


A first function is computed on which the group means are as different as
possible. A second function is then computed uncorrelated with the first, then a
third function is computed uncorrelated with the first two, and so on, for as many
functions as possible. The maximum number of functions is the lesser of g - 1
(number of dependent groups minus 1) or k (the number of independent
variables).

How can I tell if MDA worked?


SPSS will print out a table of Classification Results, in which the rows are Actual
and the columns are Predicted. The better MDA works, the more the cases will all
be on the diagonal. Also, below the table SPSS will print the percent of cases
correctly classified.

Copyright @c 2012 by G. David Garson and Statistical Associates Publishing Page 46

Single User License. Do not copy or post.


DISCRIMINANT FUNCTION ANALYSIS 2012 Edition

For any given MDA example, how many discriminant functions will
there be, and how can I tell if each is significant?
The answer is min(g-1,p), where g is the number of groups (categories) being
discriminated and p is the number of predictor (independent variables). The min()
function, of course, means the lesser of. SPSS will print Wilks's lambda and its
significance for each function, and this tests the significance of the discriminant
functions.

What are Mahalonobis distances?


High Mahalanobis distances are a measure used to identify cases which are
outliers. A well-fitting model all cases have low to moderate Mahalanobis
distances. For instance, the researcher might wish to analyze a new, unknown set
of cases in comparison to an existing set of known cases. Mahalanobis distance is
the distance between a case and the centroid for each group (of the dependent)
in attribute space (n-dimensional space defined by n variables). A case will have
one Mahalanobis distance for each group, and it will be classified as belonging to
the group for which its Mahalanobis distance is smallest. Thus, the smaller the
Mahalanobis distance, the closer the case is to the group centroid and the more
likely it is to be classed as belonging to that group. Since Mahalanobis distance is
measured in terms of standard deviations from the centroid, therefore a case
which is more than 1.96 Mahalanobis distance units from the centroid has less
than .05 chance of belonging to the group represented by the centroid; 3 units
would likewise correspond to less than .01 chance. SPSS reports squared
Mahalanobis distance: click the Classify button and then check "Casewise results."

In MDA there will be multiple discriminant functions, so therefore


there will be more than one set of unstandardized discriminant
coefficients, and for each case a discriminant score can be obtained
for each of the multiple functions. In dichotomous discriminant
analysis, the discriminant score is used to classify the case as 0 or 1 on
the dependent variable. But how are the multiple discriminant scores
on a single case interpreted in MDA?
Take the case of three discriminant functions with three corresponding
discriminant scores per case. The three scores for a case indicate the location of
Copyright @c 2012 by G. David Garson and Statistical Associates Publishing Page 47

Single User License. Do not copy or post.


DISCRIMINANT FUNCTION ANALYSIS 2012 Edition

that case in three-dimensional discriminant space. Each axis represents one of the
discriminant functions, roughly analogous to factor axes in factor analysis. That is,
each axis represents a dimension of meaning whose label is attributed based on
inference from the structure coefficients.

One can also locate the group centroid for each group of the dependent in
discriminant space in the same manner.

In the case of two discriminant functions, cases or group centroids may be plotted
on a two-dimensional scatterplot of discriminant space (a canonical plot). Even
when there are more than two functions, interpretation of the eigenvalues may
reveal that only the first two functions are important and worthy of plotting.

Likewise in MDA, there are multiple standardized discriminant


coefficients - one set for each discriminant function. In dichotomous
DA, the ratio of the standardized discriminant coefficients is the ratio
of the importance of the independent variables. But how are the
multiple set of standardized coefficients interpreted in MDA?
In MDA the standardized discriminant coefficients indicate the relative
importance of the independent variables in determining the location of cases in
discriminant space for the dimension represented by the function for that set of
standardized coefficients.

Are the multiple discriminant functions the same as factors in


principal-components factor analysis?
No. There are conceptual similarities, but they are mathematically different in
what they are maximizing. MDA is maximizing the difference between values of
the dependent. PCA is maximizing the variance in all the variables accounted for
by the factor.

What is the syntax for discriminant analysis in SPSS?


DISCRIMINANT GROUPS=varname(min,max) /VARIABLES=varlist
[/SELECT=varname(value)]
[/ANALYSIS=varlist[(level)] [varlist...]]
[/OUTFILE MODEL('file')]
[/METHOD={DIRECT**}] [/TOLERANCE={0.001}]

Copyright @c 2012 by G. David Garson and Statistical Associates Publishing Page 48

Single User License. Do not copy or post.


DISCRIMINANT FUNCTION ANALYSIS 2012 Edition

{WILKS } { n }
{MAHAL }
{MAXMINF }
{MINRESID}
{RAO }
[/MAXSTEPS={n}]
[/FIN={3.84**}] [/FOUT={2.71**}] [/PIN={n}]
{ n } { n }
[/POUT={n}] [/VIN={0**}]
{ n }
[/FUNCTIONS={g-1,100.0,1.0**}] [/PRIORS={EQUAL** }]
{n1 , n2 , n3 } {SIZE }
{value list}
[/SAVE=[CLASS[=varname]] [PROBS[=rootname]]
[SCORES[=rootname]]]
[/ANALYSIS=...]
[/MISSING={EXCLUDE**}]
{INCLUDE }
[/MATRIX=[OUT({* })] [IN({* })]]
{'savfile'|'dataset'} {'savfile'|'dataset'}

[/HISTORY={STEP**} ]
{NONE }
[/ROTATE={NONE** }]
{COEFF }
{STRUCTURE}
[/CLASSIFY={NONMISSING } {POOLED } [MEANSUB]]
{UNSELECTED } {SEPARATE}
{UNCLASSIFIED}
[/STATISTICS=[MEAN] [COV ] [FPAIR] [RAW ] [STDDEV]
[GCOV] [UNIVF] [COEFF] [CORR] [TCOV ]
[BOXM] [TABLE] [CROSSVALID]
[ALL]]
[/PLOT=[MAP] [SEPARATE] [COMBINED] [CASES[(n)]] [ALL]]
**Default if subcommand or keyword is omitted.

Copyright @c 2012 by G. David Garson and Statistical Associates Publishing Page 49

Single User License. Do not copy or post.


DISCRIMINANT FUNCTION ANALYSIS 2012 Edition

Bibliography
George H. Dunteman (1984). Introduction to multivariate analysis. Thousand
Oaks, CA: Sage Publications. Chapter 5 covers classification procedures and
discriminant analysis.
Huberty, Carl J. (1994). Applied discriminant analysis . NY: Wiley-Interscience.
(Wiley Series in Probability and Statistics).
Klecka, William R. (1980). Discriminant analysis. Quantitative Applications in the
Social Sciences Series, No. 19. Thousand Oaks, CA: Sage Publications.
Lachenbruch, P. A. (1975). Discriminant analysis. NY: Hafner.
McLachlan, Geoffrey J. (2004). Discriminant analysis and statistical pattern
recognition. NY: Wiley-Interscience. (Wiley Series in Probability and
Statistics).
Press, S. J. and S. Wilson (1978). Choosing between logistic regression and
discriminant analysis. Journal of the American Statistical Association, Vol.
73: 699-705. The authors make the case for the superiority of logistic
regression for situations where the assumptions of multivariate normality
are not met (ex., when dummy variables are used), though discriminant
analysis is held to be better when assumptions are met. They conclude that
logistic and discriminant analyses will usually yield the same conclusions,
except in the case when there are independents which result in predictions
very close to 0 and 1 in logistic analysis.
Tabachnick, Barbara G. and Linda S. Fidell (2001). Using multivariate statistics,
Fourth ed. (Boston: Allyn and Bacon). chapter 11 covers discriminant
analysis.

Copyright 1998, 2008, 2012 by G. David Garson and Statistical Associates Publishers.
Worldwide rights reserved in all languages and all media. Do not copy or post in any format.
Last update 8/3/2012.

Copyright @c 2012 by G. David Garson and Statistical Associates Publishing Page 50

Single User License. Do not copy or post.


DISCRIMINANT FUNCTION ANALYSIS 2012 Edition

Statistical Associates Publishing


Blue Book Series

Association, Measures of
Assumptions, Testing of
Canonical Correlation
Case Studies
Cluster Analysis
Content Analysis
Correlation
Correlation, Partial
Correspondence Analysis
Cox Regression
Creating Simulated Datasets
Crosstabulation
Curve Fitting & Nonlinear Regression
Data Distributions and Random Numbers
Data Levels
Delphi Method
Discriminant Function Analysis
Ethnographic Research
Evaluation Research
Event History Analysis
Factor Analysis
Focus Groups
Game Theory
Generalized Linear Models/Generalized Estimating Equations
GLM (Multivariate), MANOVA, and MANCOVA
GLM (Univariate), ANOVA, and ANCOVA
GLM Repeated Measures
Grounded Theory
Hierarchical Linear Modeling/Multilevel Analysis/Linear Mixed Models
Integrating Theory in Research Articles and Dissertations
Kaplan-Meier Survival Analysis
Latent Class Analysis
Life Tables

Copyright @c 2012 by G. David Garson and Statistical Associates Publishing Page 51

Single User License. Do not copy or post.


DISCRIMINANT FUNCTION ANALYSIS 2012 Edition

Logistic Regression
Log-linear Models,
Longitudinal Analysis
Missing Values Analysis & Data Imputation
Multidimensional Scaling
Multiple Regression
Narrative Analysis
Network Analysis
Ordinal Regression
Parametric Survival Analysis
Partial Least Squares Regression
Participant Observation
Path Analysis
Power Analysis
Probability
Probit Regression and Response Models
Reliability Analysis
Resampling
Research Designs
Sampling
Scales and Standard Measures
Significance Testing
Structural Equation Modeling
Survey Research
Two-Stage Least Squares Regression
Validity
Variance Components Analysis
Weighted Least Squares Regression

Statistical Associates Publishing


http://www.statisticalassociates.com
sa.publishers@gmail.com

Copyright @c 2012 by G. David Garson and Statistical Associates Publishing Page 52

Single User License. Do not copy or post.

S-ar putea să vă placă și