Documente Academic
Documente Profesional
Documente Cultură
DFA
BASICS
QUESTIONS
The primary goal is to find a dimension(s) that
groups differ on and create classification functions
Can group membership be accurately predicted by
a set of predictors?
Along how many dimensions do groups differ
reliably?
QUESTIONS
Loadings
QUESTIONS
Which predictors are most important in predicting
group membership?
Can we predict group membership after removing
the effects of one or more covariates?
Can we use discriminate function analysis to
estimate population parameters?
ASSUMPTIONS
Z = a + B1X1 + B2X2 + ... + BkXk
ASSUMPTIONS
ASSUMPTIONS
Unequal samples, sample size and power
With DFA unequal samples are not necessarily an
issue
If there are more DVs than cases in any cell the cell will
become singular and cannot be inverted.
If only a few cases more than DVs equality of covariance
matrices is likely to be rejected.
ASSUMPTIONS
ASSUMPTIONS
Multivariate normality assumes that the means of
the various DVs in each cell and all linear
combinations of the DVs are normally distributed.
Homogeneity of Covariance Matrices
ASSUMPTIONS
When inference is the goal DFA is typically robust
to violations of this assumption (with respect to type
I error)
When classification is the primary goal than the
analysis is highly influenced by violations because
subjects will tend to be classified into groups with
the largest variance
ASSUMPTIONS
Linearity
Absence
of Multicollinearity/Singularity in
each cell of the design.
EQUATIONS
To begin with, well focus on interpretation
Significance of the overall analysis; do the
predictors separate the groups?
SPATIAL INTERPRETATION
We
So
Var #2
Var #1
Var #2
Var #1
Var #2
Var #1
SPATIAL INTERPRETATION
EQUATIONS
Stotal Sbg S wg
S wg
Sbg S wg
STATISTICAL INFERENCE
World data
Eigenvalue s
Func tion
1
2
Eigenvalue % of Varianc e
1.041a
89.0
.128 a
11.0
Canonical
Correlation
.714
.337
Cumulativ e %
89.0
100.0
Wilks '
Lambda
.434
.886
Chi-s quare
65.049
9.402
df
6
2
Sig.
.000
.009
Func tion
1
2
1.740
-.887
-1.596
.069
.652
1.073
INTERPRETING DISCRIMINANT
FUNCTIONS
2 FUNCTION PLOT
Notice
Note that for a one function situation we could inspect the histograms for each group along function values
TERRITORIAL MAPS
Provide
a picture (absolutely
hideous in SPSS) of the
relationship between predicted
group and two discriminant
functions
Asterisks are group centroids
This is just another way in which
to see the previous graphic but
with how cases would be
classified given a particular score
on the two functions
Functions at Group Ce ntroids
Function
religion3
Catholic
Mus lim
Prots tnt
1
.317
-1.346
1.394
2
-.342
.207
.519
LOADINGS
Loadings (structure
coefficients) are the
correlations between each
predictor and a function.
The squared loading tells you
how much variance of a
variable is accounted for by
the function
Function 1: perhaps
representative of country
affluence (positive
correlations on all)
Function 2: Seems mostly
related to GDP
Structure Matrix
Func tion
1
2
.666*
-.305
.315*
.530
-.054
.683*
A = RwD
A is the loading matrix, Rw is the within
groups correlation matrix, D is the
standardized discriminant function
coefficients.
CLASSIFICATION
As mentioned previously, the primary goal in
DFA may be geared more towards classification
Classification is a separate procedure in which
the discriminating variables (or functions) are
used to predict group membership
EQUATIONS
C j c j 0 c j1 x1
c jp x p
Catholic
-.392
religion3
Mus lim
-.570
Prots tnt
-.333
1.608
1.867
1.449
-.001
-.001
-.001
-39.384
-43.934
-35.422
ALTERNATIVE METHODS
It
Pr(Gk | X )
Pr( X | Gk )
g
Pr( X | G )
i 1
PRIOR PROBABILITY
What weve just discussed involves posterior
probabilities regarding group membership
However, weve been treating the situation thus far
as though the likelihood of the groups is equal in
the population
What if this is obviously not the case?
EVALUATING CLASSIFICATION
EVALUATING CLASSIFICATION
If the groups are not equal than there are a couple of steps
Calculate the expected probability for each group relative to
the whole sample.
Prior probabilities
CLASSIFICATION OUTPUT
Classification
coefficients for each
group
The results:
religion3
Catholic
Mus lim
Prots tnt
Total
Prior
.333
.333
.333
1.000
Cases Us ed in A nalys is
Unw eighted Weighted
40
40.000
26
26.000
16
16.000
82
82.000
Catholic
-.392
religion3
Mus lim
-.570
Prots tnt
-.333
1.608
1.867
1.449
-.001
-.001
-.001
-39.384
-43.934
-35.422
a
Clas sification Re s ults
Original
Count
religion3
Catholic
Mus lim
Prots tnt
Catholic
Mus lim
Prots tnt
Total
40
26
16
100.0
100.0
100.0
Predominant religion
Catholic
Muslim
Protstnt
Total
Prior
.488
.317
.195
1.000
Classification Resultsa
Original
Count
Predominant religion
Catholic
Muslim
Protstnt
Catholic
Muslim
Protstnt
Total
40
26
16
100.0
100.0
100.0
EVALUATING CLASSIFICATION
One can actually perform a test of sorts on the
overall classification
nc = number correctly classified
pi = prior probability of membership
Ni = number of cases for that group
N. = total n
tau
nc pi ni
i 1
g
n. pi ni
i 1
Calculation
Prevalence
(a + c)/N
(b + d)/N
(a + d)/N
Sensitivity
a/(a + c)
Specificity
d/(b + d)
b/(b + d)
c/(a + c)
a/(a + b)
d/(c + d)
Misclassification Rate
(b + c)/N
Odds-ratio
(ad)/(cb)
Kappa
NMI n(s)
1 - -a.ln(a)-b.ln(b)-c.ln(c)-d.ln(d)+(a+b).ln(a+b)+(c+d).ln(c+d)
N.lnN - ((a+c).ln(a+c) + (b+d).ln(b+d))
Actual +
Actual -
Predicted +
Predicted -
EVALUATING CLASSIFICATION
Cross-Validation
With larger datasets one can also test the classification
performance using cross validation techniques weve
discussed in the past
Estimate the classification coefficients for one part of the data
and then apply the coefficients to the other to see if they
perform similarly
This allows you to see how well the classification generalizes
to new data
In fact, for PDA, methodologists suggest that this is the way
one should be doing it period i.e. that the classification
coefficients used are not derived from the data to which they
are applied
All predictors enter the equation at the same time and each
predictor is credited for its unique variance
Sequential (hierarchical)
Predictors are given priority in terms of its theoretical importance,
User defined approach.
Can be used to assess a set of predictors in the presence of
covariates that are given highest priority.
DESIGN COMPLEXITY
Factorial DFA designs
Really best to just analyze through MANOVA
(e.g. if you have gender and IQ both with two levels you
would make four groups high males, high females, low males,
low females)
If the interaction is not significant then run the DFA on each
main effect separately for loadings etc.
Note that it will not produce the same results as the MANOVA
would