Multinomial Logistic Regression Basic Relationships

SW388R7
Data Analysis &

Computers II
Slide 1
Multinomial Logistic Regression

Basic Relationships
Multinomial Logistic Regression
Describing Relationships
Classification Accuracy
Sample Problems
Compu
ters II
Multinomial logistic regression
Slide 2
Multinomial logistic regression is used to analyze relationships

between a non-metric dependent variable and metric or
dichotomous independent variables.
Multinomial logistic regression compares multiple groups

through a combination of binary logistic regressions.
The group comparisons are equivalent to the comparisons for a

dummy-coded dependent variable, with the group with the
highest numeric score used as the reference group.
For example, if we wanted to study differences in BSW, MSW,

and PhD students using multinomial logistic regression, the
analysis would compare BSW students to PhD students and MSW
students to PhD students. For each independent variable, there
would be two comparisons.
Compu
ters II
What multinomial logistic regression predicts
Slide 3
Multinomial logistic regression provides a set of coefficients for

each of the two comparisons. The coefficients for the
reference group are all zeros, similar to the coefficients for the
reference group for a dummy-coded variable.
Thus, there are three equations, one for each of the groups
defined by the dependent variable.
The three equations can be used to compute the probability

that a subject is a member of each of the three groups. A case
is predicted to belong to the group associated with the highest
probability.
Predicted group membership can be compared to actual group

membership to obtain a measure of classification accuracy.
Compu
ters II
Level of measurement requirements
Slide 4
Multinomial logistic regression analysis requires that the

dependent variable be non-metric. Dichotomous, nominal, and
ordinal variables satisfy the level of measurement requirement.
Multinomial logistic regression analysis requires that the

independent variables be metric or dichotomous. Since SPSS will
automatically dummy-code nominal level variables, they can be
included since they will be dichotomized in the analysis.
In SPSS, non-metric independent variables are included as

factors. SPSS will dummy-code non-metric IVs.
In SPSS, metric independent variables are included as

covariates. If an independent variable is ordinal, we will
attach the usual caution.
Compu
ters II
Assumptions and outliers
Slide 5
Multinomial logistic regression does not make any assumptions

of normality, linearity, and homogeneity of variance for the
independent variables.
Because it does not impose these requirements, it is preferred

to discriminant analysis when the data does not satisfy these
assumptions.
SPSS does not compute any diagnostic statistics for outliers. To

evaluate outliers, the advice is to run multiple binary logistic
regressions and use those results to test the exclusion of
outliers or influential cases.
Compu
ters II
Sample size requirements
Slide 6
The minimum number of cases per independent variable is 10,

using a guideline provided by Hosmer and Lemeshow, authors of
Applied Logistic Regression, one of the main resources for
Logistic Regression.
For preferred case-to-variable ratios, we will use 20 to 1.
Compu
ters II
Methods for including variables
Slide 7
The only method for selecting independent variables in SPSS is

simultaneous or direct entry.
Compu
ters II
Overall test of relationship - 1
Slide 8
The overall test of relationship among the independent

variables and groups defined by the dependent is based on the
reduction in the likelihood values for a model which does not
contain any independent variables and the model that contains
the independent variables.
This difference in likelihood follows a chi-square distribution,

and is referred to as the model chi-square.
The significance test for the final model chi-square (after the
independent variables have been added) is our statistical
evidence of the presence of a relationship between the
dependent variable and the combination of the independent
variables.
Compu
ters II
Slide 9
Overall test of relationship - 2
Model Fitting Information

Model
Intercept Only
Final
-2 Log
Likelihood
284.429
265.972
Chi-Square
18.457
df
Sig.
6
.005
The presence of a relationship between the dependent

variable and combination of independent variables is
based on the statistical significance of the final model
chi-square in the SPSS table titled "Model Fitting
Information".
In this analysis, the probability of the model chi-square
(18.457) was 0.005, less than or equal to the level of
significance of 0.05. The null hypothesis that there was
no difference between the model without independent
variables and the model with independent variables was
rejected. The existence of a relationship between the
independent variables and the dependent variable was
supported.
ters II
Strength of multinomial logistic regression

relationship
Slide
10
While multinomial logistic regression does compute correlation

measures to estimate the strength of the relationship (pseudo R
square measures, such as Nagelkerke's R), these correlations
measures do not really tell us much about the accuracy or
errors associated with the model.
A more useful measure to assess the utility of a multinomial

logistic regression model is classification accuracy, which
compares predicted group membership based on the logistic
model to the actual, known group membership, which is the
value for the dependent variable.
ters II
Slide
11
Evaluating usefulness for logistic models
The benchmark that we will use to characterize a multinomial

logistic regression model as useful is a 25% improvement over
the rate of accuracy achievable by chance alone.
Even if the independent variables had no relationship to the

groups defined by the dependent variable, we would still
expect to be correct in our predictions of group membership
some percentage of the time. This is referred to as by chance
accuracy.
The estimate of by chance accuracy that we will use is the

proportional by chance accuracy rate, computed by summing
the squared percentage of cases in each group. The only
difference between by chance accuracy for binary logistic
models and by chance accuracy for multinomial logistic models
is the number of groups defined by the dependent variable.
ters II
Slide
12
Computing by chance accuracy

The percentage of cases in each group defined by the dependent
variable is found in the Case Processing Summary table.
Case Processing Summary
N
HIGHWAYS
AND BRIDGES
Valid
Missing
Total
Subpopulation
1
2
3
62
93
12
167
103
270
153a
Marginal
Percentage
37.1%
55.7%
7.2%
100.0%
a. The dependent variable has only one value observed

in 146 (95.4%) subpopulations.
The proportional by chance accuracy rate was

computed by calculating the proportion of cases for
each group based on the number of cases in each
group in the 'Case Processing Summary', and then
squaring and summing the proportion of cases in each
group (0.371 + 0.557 + 0.072 = 0.453).
The proportional by chance accuracy criteria is 56.6%
(1.25 x 45.3% = 56.6%).
ters II
Slide
13
Comparing accuracy rates
To characterize our model as useful, we compare the overall

percentage accuracy rate produced by SPSS at the last step in which
variables are entered to 25% more than the proportional by chance
accuracy. (Note: SPSS does not compute a cross-validated accuracy
rate for multinomial logistic regression .)
Classification
Predicted
Observed
1
2
3
Overall Percentage
1
15
7
5
16.2%
2
47
86
7
83.8%
3
0
0
0
.0%
The classification accuracy rate was 60.5%

which was greater than or equal to the
proportional by chance accuracy criteria of
56.6% (1.25 x 45.3% = 56.6%).
The criteria for classification accuracy is
satisfied in this example.
Percent
Correct
24.2%
92.5%
.0%
60.5%
ters II
Slide
14
Numerical problems
The maximum likelihood method used to calculate multinomial

logistic regression is an iterative fitting process that attempts
to cycle through repetitions to find an answer.
Sometimes, the method will break down and not be able to
converge or find an answer.
Sometimes the method will produce wildly improbable results,
reporting that a one-unit change in an independent variable
increases the odds of the modeled event by hundreds of
thousands or millions. These implausible results can be
produced by multicollinearity, categories of predictors having
no cases or zero cells, and complete separation whereby the
two groups are perfectly separated by the scores on one or
more independent variables.
The clue that we have numerical problems and should not
interpret the results are standard errors for some independent
variables that are larger than 2.0.
ters II
Relationship of individual independent

variables and the dependent variable
Slide
15
There are two types of tests for individual independent

variables:
The likelihood ratio test evaluates the overall relationship
between an independent variable and the dependent
variable
The Wald test evaluates whether or not the independent
variable is statistically significant in differentiating between
the two groups in each of the embedded binary logistic
comparisons.
If an independent variable has an overall relationship to the

dependent variable, it might or might not be statistically
significant in differentiating between pairs of groups defined by
the dependent variable.
ters II

Slide
16
The interpretation for an independent variable focuses on its

ability to distinguish between pairs of groups and the
contribution which it makes to changing the odds of being in
one dependent variable group rather than the other.
We should not interpret the significance of an independent

variables role in distinguishing between pairs of groups unless
the independent variable also has an overall relationship to the
dependent variable in the likelihood ratio test.
The interpretation of an independent variables role in

differentiating dependent variable groups is the same as we
used in binary logistic regression. The difference in
multinomial logistic regression is that we can have multiple
interpretations for an independent variable in relation to
different pairs of groups.
ters II

Slide
17
Parameter Estimates
HIGHWAYS
a
AND BRIDGES
1
Intercept
AGE
EDUC
CONLEGIS
Intercept
AGE
EDUC
CONLEGIS
B
3.240
.019
.071
-1.373
3.639
.003
.172
-1.657
Std. Error
2.478
.020
.108
.620
2.456
.020
.110
.613
95% Confidence Interva

Exp(B)
Waldidentifies
df the comparisons
Sig.
Exp(B)
Lower Bound
Upper B
SPSS
it makes for
groups
variable in
1.709defined by1the dependent
.191
the table
of
Parameter
Estimates,
using either .980
.906
1
.341
1.019
the value codes or the value labels, depending
.427
1
1.073
on the
options settings
for.514
pivot table
labeling. .868
4.913
1
.027
.253
.075
The 2.195
reference category
is
identified
in
the
1
.138
footnote to the table.
.017
1
.897
1.003
.963
2.463
1 comparisons
.117
1.188
.958
In this
analysis, two
will
be
made:
7.298
1
.007
.191
.057
a. The reference category is: 3.
HIGHWAYS
a
AND BRIDGES
TOO LITTLE
ABOUT RIGHT
Intercept
AGE
EDUC
CONLEGIS
Intercept
AGE
EDUC
CONLEGIS
the TOO LITTLE group (coded 1, shaded

blue) will be compared to the TOO MUCH
Parameter
Estimates
group (coded
3, shaded
purple)
the ABOUT RIGHT group (coded 2 ,
shaded orange)) will be compared to the
TOO MUCH group (coded 3, shaded
Std.purple).
Error
Wald
df
Sig.
Exp(B)
B
3.240
2.478
1.709
1
.191
The
reference
category
plays
the
same
role in
.019
.020
.906
1
.341
multinomial logistic regression that it plays in
.071
.108
.427
1 variable:
.514it is
the dummy-coding
of a nominal
-1.373
.620 that4.913
1 with .027
the category
would be coded
zeros
for all of
the dummy-coded
variables
that
3.639
2.456
2.195
1
.138 all
other categories are interpreted against.
.003
.020
.017
1
.897
.172
.110
2.463
1
.117
-1.657
.613
7.298
1
.007
a. The reference category is: TOO MUCH.
1.019
1.073
.253
1.003
1.188
.191
95% C
Lower B
ters II

Slide
18
Likelihood Ratio Tests
Effect
Intercept
AGE
EDUC
CONLEGIS
-2 Log
Likelihood of
Reduced
Model
268.323
268.625
270.395
275.194
Chi-Square
2.350
2.652
4.423
9.221
df
2
2
2
2
Sig.
.309
.265
.110
.010
In this example, there is a

statistically significant
relationship between the
independent variable
CONLEGIS and the dependent
variable. (0.010 < 0.05)
The chi-square statistic is the difference in -2 log-likelihoods

between the final model and a reduced model. The reduced model is
Parameter Estimates
formed by omitting an effect from the final model. The null hypothesis
is that all parameters of that effect are 0.
HIGHWAYS
a
AND BRIDGES
1
B
3.240
.019
.071
-1.373
3.639
.003
.172
-1.657
Intercept
AGE
EDUC
CONLEGIS
Intercept
AGE
EDUC
CONLEGIS
Std. Error
2.478
.020
.108
.620
2.456
.020
.110
.613
Wald
1.709
.906
.427
4.913
2.195
.017
2.463
7.298
df
1
1
1
1
1
1
1
1
As well, the independent

variable CONLEGIS is
significant in distinguishing
both category 1 of 95%
the Confidence Interval f
dependent variable from Exp(B)
category 3 of the dependent
Sig.
Exp(B) < 0.05)
Lower Bound
Upper Bou
variable. (0.027
.191
.341
.514
.027
.138
.897
.117
.007
And the independent variable CONLEGIS is significant in

distinguishing category 2 of the dependent variable from
category 3 of the dependent variable. (0.007 < 0.05)
1.019
1.073
.253
.980
.868
.075
1.0
1.3
.8
1.003
1.188
.191
.963
.958
.057
1.0
1.4
.6
ters II
Interpreting relationship of individual independent
variables to the dependent variable
Slide
19
Effect
Intercept
AGE
EDUC
CONLEGIS
-2 Log
Likelihood
of respondents who had less confidence in congress (higher
Survey
values correspond to lower confidence) were less likely to be in the
Reduced
group ofChi-Square
survey respondents
who
thought we spend too little money
Model
df
Sig.
on
highways
and
bridges
(DV
category
268.323
2.350
2
.309 1), rather than the group of
survey respondents who thought we spend too much money on
268.625
.265
highways and2.652
bridges (DV 2category
3).
270.395
4.423
2
.110
For each unit9.221
increase in confidence
in Congress, the odds of being
275.194
2
.010
in the group of survey respondents who thought we spend too little

money on highways and bridges decreased by 74.7%. (0.253 1.0
between the final model
and a reduced model. The reduced model is
= -0.747)
Parameter Estimates
HIGHWAYS
a
AND BRIDGES
1
Intercept
AGE
EDUC
CONLEGIS
Intercept
AGE
EDUC
CONLEGIS
B
3.240
.019
.071
-1.373
3.639
.003
.172
-1.657
Std. Error
2.478
.020
.108
.620
2.456
.020
.110
.613
Wald
1.709
.906
.427
4.913
2.195
.017
2.463
7.298
df
1
1
1
1
1
1
1
1
Sig.
.191
.341
.514
.027
.138
.897
.117
.007
Exp(B)
95% Confidence Interval f

Exp(B)
Lower Bound
Upper Bou
1.019
1.073
.253
.980
.868
.075
1.0
1.3
.8
1.003
1.188
.191
.963
.958
.057
1.0
1.4
.6
ters II
variables to the dependent variable
Slide
20
Effect
Intercept
AGE
EDUC
CONLEGIS
-2 Log
Likelihood of
Reduced
Model
268.323
268.625
270.395
275.194
Chi-Square
2.350
2.652
4.423
9.221
df
2
2
2
2
Sig.
.309
.265
.110
.010
Survey respondents who had less confidence in congress (higher
The chi-square statistic isvalues

the difference
in -2 log-likelihoods
correspond
to lower confidence) were less likely to be in the
between the final model and
a
reduced
model.
The reduced model
is thought we spend about the right
group of survey respondents
who
Parameter
Estimates
amount
of
money
on
highways
and
(DV category 2), rather
formed by omitting an effect from the final model. The null hypothesis bridges
than
the
group
of
survey
respondents
who
thought
we spend too
much money on highways and bridges (DV Category 3).
HIGHWAYS
a
AND BRIDGES
1
B increase
Std. Error
Wald
df
Sig.odds of
Exp(B)
For each unit
in confidence
in Congress,
the
being
Intercept
in the group
of survey
respondents
we spend
about the
3.240
2.478
1.709 who thought
1
.191
of money
on highways
decreased
by1.019
AGE right amount
.019
.020
.906 and bridges
1
.341
80.9%. (0.191 1.0 = 0.809)
EDUC
.071
.108
.427
1
.514
1.073
CONLEGIS
-1.373
.620
4.913
1
.027
.253
Intercept
3.639
2.456
2.195
1
.138
AGE
.003
.020
.017
1
.897
1.003
EDUC
.172
.110
2.463
1
.117
1.188
CONLEGIS
-1.657
.613
7.298
1
.007
.191
95% Confidence Interval f

Exp(B)
Lower Bound
Upper Bou
.980
.868
.075
1.0
1.3
.8
.963
.958
.057
1.0
1.4
.6
ters II

Slide
21
Effect
Intercept
AGE
EDUC
POLVIEWS
SEX
-2 Log
Likelihood of
Reduced
Model
327.463a
333.440
329.606
334.636
338.985
Chi-Square
.000
5.976
2.143
7.173
11.521
df
Sig.
0
2
2
2
2
.
.050
.343
.028
.003

Estimates
between the final model and a reduced model. The reducedParameter
model
is formed by omitting an effect from the final model. The null
hypothesis is that all parameters of that effect are 0.
a.
a
NATCHLD
B
Std. Error
Wald
df
This reducedIntercept
model is equivalent
to the final2.233
model because
TOO LITTLE
8.434
14.261
1
omitting the effect does not increase the degrees of freedom.
AGE
-.023
.017
1.756
1
EDUC
-.066
.102
.414
1
POLVIEWS
-.575
.251
5.234
1
[SEX=1]
-2.167
.805
7.242
1
b
[SEX=2]
0
.
.
0
ABOUT RIGHT Intercept
4.485
2.255
3.955
1
AGE
-.001
.018
.003
1
EDUC
.011
.104
.011
1
POLVIEWS
-.397
.257
2.375
1
[SEX=1]
-1.606
.824
3.800
1
b
[SEX=2]
0
.
.
0
In this example, there is

a statistically significant
relationship between SEX
and the dependent
variable, spending on
childcare assistance.
As well, SEX plays a

statistically significant role
Interval
in differentiating 95%
the Confidence
TOO
LITTLE group from the TOO
Exp(B)
(reference)
group.
Sig.MUCH Exp(B)
Lower
Bound
Upper Bo
(0.007
<
0.5)
.000
.185
.977
.944
.520
.936
.766
.022
.563
.344
.007
.115
.024
.
.
.
However, SEX does not
.047differentiate the ABOUT
.955RIGHT .999
.965
group from the
TOO
MUCH
(reference)
.916
1.011
.824
.123group.(0.51
.673 > 0.5) .406
.051
.201
.040
.
.
.
1.
1.
.
.
1.
1.
1.
1.
ters II
Slide
22

Effect
Intercept
AGE
EDUC
POLVIEWS
SEX
-2 Log
Likelihood of
Reduced
Model
Chi-Square
df
Sig.
327.463a
.000
0
.
333.440
5.976who were2male (code
.050 1 for sex) were less likely
Survey
respondents
to 329.606
be in the group
of
survey
respondents
2.143
2
.343 who thought we spend too
little money on childcare assistance (DV category 1), rather than the
334.636
2
.028 we spend too much
group
of survey 7.173
respondents who
thought
money
on childcare
3).
338.985
11.521assistance2 (DV category
.003

Parameter
Estimates
respondents
whoThe
were
male
were 88.5%
less likely (0.115
between the finalSurvey
model and
a reduced model.
reduced
model
1.0
=
-0.885)
to
be
in
the
group
of
survey
respondents
who thought
we spend too little money on childcare assistance.
a.
a
NATCHLD
B
Std. Error
Wald
df
Sig.
Exp(B)
This reducedIntercept
model is equivalent
to the final2.233
model because
TOO LITTLE
8.434
14.261
1
.000
omitting the effect does not increase the degrees of freedom.
AGE
-.023
.017
1.756
1
.185
.977
EDUC
-.066
.102
.414
1
.520
.936
POLVIEWS
-.575
.251
5.234
1
.022
.563
[SEX=1]
-2.167
.805
7.242
1
.007
.115
b
[SEX=2]
0
.
.
0
.
.
ABOUT RIGHT Intercept
4.485
2.255
3.955
1
.047
AGE
-.001
.018
.003
1
.955
.999
EDUC
.011
.104
.011
1
.916
1.011
POLVIEWS
-.397
.257
2.375
1
.123
.673
[SEX=1]
-1.606
.824
3.800
1
.051
.201
b
[SEX=2]
0
.
.
0
.
.
95% Confidence Interval

Exp(B)
Lower Bound
Upper Bo
.944
.766
.344
.024
.
1.
1.
.
.
.965
.824
.406
.040
.
1.
1.
1.
1.
ters II
Interpreting relationships for independent

variable in problems
Slide
23
In the multinomial logistic regression problems, the problem

statement will ask about only one of the independent variables.
The answer will be true or false based on only the relationship
between the specified independent variable and the dependent
variable. The individual relationships between other
independent variables are the dependent variable are not used
in determining whether or not the answer is true or false.
ters II
Slide
24
Problem 1
11. In the dataset GSS2000, is the following statement true, false, or an incorrect application of
a statistic? Assume that there is no problem with missing data, outliers, or influential cases, and
that the validation analysis will confirm the generalizability of the results. Use a level of
significance of 0.05 for evaluating the statistical relationships.
The variables "age" [age], "highest year of school completed" [educ] and "confidence in Congress"
[conlegis] were useful predictors for distinguishing between groups based on responses to
"opinion about spending on highways and bridges" [natroad]. These predictors differentiate
survey respondents who thought we spend too little money on highways and bridges from survey
respondents who thought we spend too much money on highways and bridges and survey
respondents who thought we spend about the right amount of money on highways and bridges
from survey respondents who thought we spend too much money on highways and bridges.
Among this set of predictors, confidence in Congress was helpful in distinguishing among the
groups defined by responses to opinion about spending on highways and bridges. Survey
respondents who had less confidence in congress were less likely to be in the group of survey
respondents who thought we spend too little money on highways and bridges, rather than the
group of survey respondents who thought we spend too much money on highways and bridges.
For each unit increase in confidence in Congress, the odds of being in the group of survey
respondents who thought we spend too little money on highways and bridges decreased by
74.7%. Survey respondents who had less confidence in congress were less likely to be in the
group of survey respondents who thought we spend about the right amount of money on
highways and bridges, rather than the group of survey respondents who thought we spend too
much money on highways and bridges. For each unit increase in confidence in Congress, the odds
of being in the group of survey respondents who thought we spend about the right amount of
money on highways and bridges decreased by 80.9%.
1.
2.
3.
4.
True
True with caution
False
Inappropriate application of a statistic
ters II
Slide
25
Dissecting problem 1 - 1
11. In the dataset GSS2000, is the following statement true, false, or an incorrect application of a
statistic? Assume that there is no problem with missing data, outliers, or influential cases, and
[conlegis] were useful predictors for distinguishing between groups based on responses to "opinion
about spending on highways and bridges" [natroad]. These predictors differentiate survey
respondents who thought we spend tooFor
little
these
money
problems,
on highways
we willand bridges from survey
respondents who thought we spend tooassume
much money
on
highways
and bridges and survey
that there is no problem
respondents who thought we spend about
the
right
amount
of
money
with missing data, outliers, or on highways and bridges
from survey respondents who thought we
spend too
much
money
on highways and bridges.
influential
cases,
and
that the
validation
analysis
confirm
Among this set of predictors, confidence
in Congress
was will
helpful
in distinguishing among the
the
generalizability
of
the
respondents who had less confidence inresults
congress were less likely to be in the group of survey
group of survey respondents who thought
we spend
toowe
much
In this
problem,
aremoney
told to on highways and bridges. For
each unit increase in confidence in Congress,
theasodds
offor
being
use 0.05
alpha
the in the group of survey respondents
who thought we spend too little moneymultinomial
on highwayslogistic
and bridges
decreased by 74.7%. Survey
regression.
respondents who thought we spend about the right amount of money on highways and bridges,
rather than the group of survey respondents who thought we spend too much money on highways
and bridges. For each unit increase in confidence in Congress, the odds of being in the group of
survey respondents who thought we spend about the right amount of money on highways and
bridges decreased by 80.9%.
1.
2.
3.
4.
True
True with caution
False
ters II
Slide
26
The variables listed first in the problem
statement are the independent variables
(IVs): "age" [age], "highest year of school
[educ]
and "confidence
in statement true, false, or an incorrect application of
11. Incompleted"
the dataset
GSS2000,
is the following
a statistic?
Assume
that there is no problem with missing data, outliers, or influential cases, and
Congress"
[conlegis].
The variables "age" [age], "highest year of school completed" [educ] and "confidence in
Congress" [conlegis] were useful predictors for distinguishing between groups based on
responses to "opinion about spending on highways and bridges" [natroad]. These predictors
differentiate survey respondents who thought we spend too little money on highways and
bridges from survey respondents who thought we spend too much money on highways and
bridges and survey respondents who thought we spend about the right amount of money on
variable
used
to define
highways andThe
bridges
from
survey
respondents who thought we spend too much money on
highways andgroups
bridges.
is the dependent
variable (DV): "opinion about
spending on highways and
groups defined
by responses to opinion about spending on highways and bridges. Survey
bridges"
respondents who had [natroad].
less confidence in congress were less likely to be in the group of survey
only supports
directdecreased
or
respondents who thought we spend too little money SPSS
on highways
and bridges
by
simultaneous
entry
of
independent
variables
multinomial
logistic
group of survey respondents who thought we spend about
the in
right
amount of
money on
regression,
so
we
have
no
choice
of
highways and bridges, rather than the group of survey respondents who thought
we spend
too
variables.
much money on highways and bridges. For each unit method
increasefor
in entering
confidence
in Congress, the odds
money on highways and bridges decreased by 80.9%.
ters II
Slide
27
SPSS multinomial logistic regression models the relationship by
comparing each of the groups defined by the dependent variable to the
group with the highest code value.
11. In the dataset

GSS2000,tois opinion
the following
true,
false, orand
an bridges
incorrectwere:
application of a
The responses
about statement
spending on
highways
statistic? Assume that there is no problem with missing data, outliers, or influential cases, and that
the validation
analysis will confirm the generalizability of the results. Use a level of significance of
1= Too little, 2 = About right, and 3 = Too much.
0.05 for evaluating the statistical relationships.
about spending on highways and bridges" [natroad]. These predictors differentiate survey
respondents who thought we spend too little money on highways and bridges from survey
Among this set of predictors, confidence in Congress was helpful in distinguishing among the groups
defined by responses to opinion about spending on highways and bridges. Survey respondents who had
less confidence in congress were less likely to be in the group of survey respondents who thought we
spend too little money on highways and bridges, rather than the group of survey respondents who
thought we spend too much money on highways and bridges. For each unit increase in confidence in
Congress, the oddsThe
of being
in the
respondents who thought we spend too little money
analysis
willgroup
resultofinsurvey
two comparisons:
on highways and bridges
decreased
by
74.7%.
Survey
respondents
had less
confidence in congress
survey respondents who thought
we spend who
too little
money
were less likely to be versus
in the group
of
survey
respondents
who
thought
we
spend
about the right
survey respondents who thought we spend too much
amount of money on highways and bridges, rather than the group of survey respondents who thought
money
highways
and
bridges
we spend too much money
onon
highways
and
bridges.
For each unit increase in confidence in Congress,
survey
respondents
who
thought
wethought
spend about
the about
right the right amount of
the odds of being in the group of survey respondents
who
we spend
amount
of decreased
money versus
survey respondents who thought we
money on highways and
bridges
by 80.9%.
spend too much money on highways and bridges.
ters II
Slide
28
Each problem includes a statement about the relationship between

one independent variable and the dependent variable. The answer
to the problem is based on the stated relationship, ignoring the
independent
variables
and
the"confidence in
The variablesrelationships
"age" [age], between
"highest the
yearother
of school
completed"
[educ]
and
dependent
variable.
Congress" [conlegis]
were
useful predictors for distinguishing between groups based on
responses to "opinion about spending on highways and bridges" [natroad]. These predictors
problem
identifies
a difference
forspend
both of
the
comparisons
differentiate This
survey
respondents
who
thought we
too
little
groups modeled
the multinomial
regression.
bridges from among
survey respondents
whobythought
we spendlogistic
too much
bridges and survey respondents who thought we spend about the right amount of money on
highways and bridges from survey respondents who thought we spend too much money on
highways and bridges.
respondents who had less confidence in congress were less likely to be in the group of
survey respondents who thought we spend too little money on highways and bridges, rather
than the group of survey respondents who thought we spend too much money on highways
and bridges. For each unit increase in confidence in Congress, the odds of being in the
group of survey respondents who thought we spend too little money on highways and
bridges decreased by 74.7%. Survey respondents who had less confidence in congress were
less likely to be in the group of survey respondents who thought we spend about the right
amount of money on highways and bridges, rather than the group of survey respondents
who thought we spend too much money on highways and bridges. For each unit increase in
confidence in Congress, the odds of being in the group of survey respondents who thought
we spend about the right amount of money on highways and bridges decreased by 80.9%.
ters II
Slide
29
statistic? Assume that there is no problem with missing data, outliers, or influential cases, and that the
validation analysis will confirm the generalizability of the results. Use a level of significance of 0.05 for
evaluating the statistical relationships.
[conlegis] were useful predictors for distinguishing between groups based on responses to "opinion about
spending on highways and bridges" [natroad]. These predictors differentiate survey respondents who
thought we spend too little money on highways and bridges from survey respondents who thought we
spend too much money on highways and bridges and survey respondents who thought we spend about the
right amount of money on highways and bridges from survey respondents who thought we spend too much
money on highways and bridges.
defined by responses to opinion about spending on highways and bridges. Survey respondents who had less
confidence in congress were less likely to be in the group of survey respondents who thought we spend too
little money on highways and bridges, rather than the group of survey respondents who thought we spend
too much money on highways and bridges. For each unit increase in confidence in Congress, the odds of
being in the group of survey respondents who thought we spend too little money on highways and bridges
decreased by 74.7%. Survey respondents who had less confidence in congress were less likely to be in the
group of survey respondents who thought we
about
right amount
of money
on highways and
In spend
order for
thethe
multinomial
logistic
regression
bridges, rather than the group of survey respondents who thought we spend too much money on highways
question
be true,the
theodds
overall
relationship
must
and bridges. For each unit increase in confidence
in to
Congress,
of being
in the group
of survey
be statistically
there
must and
be no
respondents who thought we spend about the
right amountsignificant,
of money on
highways
bridges decreased
by 80.9%.
evidence of numerical problems, the classification
accuracy rate must be substantially better than
could be obtained by chance alone, and the
stated individual relationship must be statistically
significant and interpreted correctly.
ters II
Slide
30
Request multinomial logistic regression
Select the Regression |

Multinomial Logistic
command from the
Analyze menu.
ters II
Slide
31
Selecting the dependent variable
First, highlight the

dependent variable
natroad in the list
of variables.
Second, click on the right

arrow button to move the
dependent variable to the
Dependent text box.
ters II
Slide
32
Selecting metric independent variables

Metric independent variables are specified as covariates
in multinomial logistic regression. Metric variables can
be either interval or, by convention, ordinal.
Move the metric

independent variables,
age, educ and conlegis to
the Covariate(s) list box.
In this analysis, there are no nonmetric independent variables. Nonmetric independent variables would be
moved to the Factor(s) list box.
ters II
Slide
33
Specifying statistics to include in the output
While we will accept most of

the SPSS defaults for the
analysis, we need to specifically
request the classification table.
Click on the Statistics button
to make a request.
ters II
Slide
34
Requesting the classification table
First, keep the SPSS

defaults for Summary
statistics, Likelihood
ratio test, and
Parameter estimates.
Second, mark the

checkbox for the
Classification table.
Third, click
on the
Continue
button to
complete the
request.
ters II
Slide
35
Completing the multinomial

logistic regression request
Click on the OK
button to request
the output for the
multinomial logistic
regression.
The multinomial logistic procedure supports

additional commands to specify the model
computed for the relationships (we will use the
default main effects model), additional
specifications for computing the regression,
and saving classification results. We will not
make use of these options.
ters II
Slide
36
LEVEL OF MEASUREMENT - 1
[conlegis] were useful predictors for distinguishing between groups based on responses to
"opinion about spending on highways and bridges" [natroad]. These predictors differentiate
Multinomial
logistic
regression
requires
that money
the
who
thought
we spend
too much
dependent variable be non-metric and the
For each unit increase
in confidence
in Congress,
of being in the group of survey
independent
variables
be metricthe
or odds
dichotomous.
74.7%. Survey respondents
hadspending
less confidence
in congress
were less likely to be in the
"Opinionwho
about
on highways
and
who thought
we spend
about the
the nonright amount of money on
bridges" [natroad]
is ordinal,
satisfying
highways and bridges,
rather
than
the group of survey
respondents
metric
level
of measurement
requirement
for the who thought we spend too
much money on highways
and variable.
bridges. For each unit increase in confidence in Congress, the odds
dependent
contains
three
categories:
survey respondents
money on highwaysItand
bridges
decreased
by 80.9%.
who thought we spend too little money, about
1. True
the right amount of money, and too much money
2. True with caution
ters II
Slide
37
"Age" [age] and "highest year of
school completed" [educ] are interval,
the metric
or dichotomous
11. satisfying
In the dataset
GSS2000,
is the following statement true, false, or an incorrect application of a
level of
measurement
requirement
for with missing data, outliers, or influential cases, and that
statistic?
Assume
that there
is no problem
variables.
the independent
validation analysis
will confirm the generalizability of the results. Use a level of significance of
The variables "age" [age], "highest year of school completed" [educ] and "confidence in
Congress" [conlegis] were useful predictors for distinguishing between groups based on responses
to "opinion about spending on highways and bridges" [natroad]. These predictors differentiate
respondents who thought we spend about the right amount of money on highways and bridges from
survey respondents who "Confidence
thought we spend
too much
money on
in Congress"
[conlegis]
is highways
ordinal, and bridges.
satisfying the metric or dichotomous level of

measurement
requirement
independent
Among this set of predictors,
confidence
in Congressfor
was
helpful in distinguishing among the groups
variables.
If
we
follow
the
convention
ofbridges.
treating Survey respondents who
defined by responses to opinion about spending on highways and
ordinal were
level variables
variables,
the
level respondents who
had less confidence in congress
less likelyas
tometric
be in the
group of
survey
of
measurement
requirement
for
the
analysis
is
thought we spend too little money on highways and bridges, rather than the group of survey
some money
data analysts
do notand
agree
respondents who thoughtsatisfied.
we spendSince
too much
on highways
bridges. For each unit
with this convention, a note of caution should be
increase in confidence inincluded
Congress,
the
odds
of
being
in
the
group
of
survey
respondents who
in our interpretation.
thought we spend too little money on highways and bridges decreased by 74.7%. Survey respondents
who had less confidence in congress were less likely to be in the group of survey respondents who
thought we spend about the right amount of money on highways and bridges, rather than the group
of survey respondents who thought we spend too much money on highways and bridges. For each
unit increase in confidence in Congress, the odds of being in the group of survey respondents who
thought we spend about the right amount of money on highways and bridges decreased by 80.9%.
ters II
Slide
38
Sample size ratio of cases to variables

N
HIGHWAYS
AND BRIDGES
Valid
Missing
Total
Subpopulation
1
2
3
62
93
12
167
103
270
153a
Marginal
Percentage
37.1%
55.7%
7.2%
100.0%
a. The dependent variable has only one value observed
Multinomial logistic
regression
requires that the minimum ratio
in 146
(95.4%) subpopulations.
of valid cases to independent variables be at least 10 to 1. The
ratio of valid cases (167) to number of independent variables
(3) was 55.7 to 1, which was equal to or greater than the
minimum ratio. The requirement for a minimum ratio of cases
to independent variables was satisfied.
The preferred ratio of valid cases to independent variables is
20 to 1. The ratio of 55.7 to 1 was equal to or greater than the
preferred ratio. The preferred ratio of cases to independent
variables was satisfied.
ters II
Slide
39
OVERALL RELATIONSHIP BETWEEN

INDEPENDENT AND DEPENDENT VARIABLES
Model
Intercept Only
Final
-2 Log
Likelihood
284.429
265.972
Chi-Square
18.457
df
Sig.
6
.005

Information".
supported.
ters II
Slide
40
NUMERICAL PROBLEMS
Parameter Estimates
HIGHWAYS
a
AND BRIDGES
1
Intercept
AGE
EDUC
CONLEGIS
Intercept
AGE
EDUC
CONLEGIS
B
3.240
.019
.071
-1.373
3.639
.003
.172
-1.657
Std. Error
2.478
.020
.108
.620
2.456
.020
.110
.613
Wald
1.709
.906
.427
4.913
2.195
.017
2.463
7.298
95% Confidence Inter

Exp(B)
Multicollinearity in the multinomial
df
Sig.
Exp(B)
logistic
regression
solution
is Lower Bound Upper
1
.191
detected by examining the standard
1.019 A
.980
errors1for the .341
b coefficients.
standard
error
larger
than
2.0
1
.514
1.073
.868
indicates numerical problems, such
1
.027 among
.253the
.075
as multicollinearity
1
.138
independent
variables,
zero cells for
a dummy-coded
independent
1
.897
1.003
.963
variable because all of the subjects
1
.958
have the
same.117
value for1.188
the
1 and 'complete
.007
.191
.057
variable,
separation'
whereby the two groups in the

dependent event variable can be
perfectly separated by scores on
one of the independent variables.
Analyses that indicate numerical
problems should not be interpreted.
None of the independent variables in

this analysis had a standard error
larger than 2.0. (We are not
interested in the standard errors
associated with the intercept.)
ters II
Slide
41
RELATIONSHIP OF INDIVIDUAL INDEPENDENT

VARIABLES TO DEPENDENT VARIABLE - 1
Effect
Intercept
AGE
EDUC
CONLEGIS
-2 Log
Likelihood of
Reduced
Model
268.323
268.625
270.395
275.194
Chi-Square
2.350
2.652
4.423
9.221
df
2
2
2
2
Sig.
.309
.265
.110
.010

between the final model and a reduced model. The reduced model is
The statistical significance of the relationship between

confidence in Congress and opinion about spending on
highways and bridges is based on the statistical significance of
the chi-square statistic in the SPSS table titled "Likelihood
Ratio Tests".
For this relationship, the probability of the chi-square statistic
significance of 0.05. The null hypothesis that all of the b
coefficients associated with confidence in Congress were equal
to zero was rejected. The existence of a relationship between
confidence in Congress and opinion about spending on
highways and bridges was supported.
ters II

Slide
42
Parameter Estimates
HIGHWAYS
a
AND BRIDGES
1
Intercept
AGE
EDUC
CONLEGIS
Intercept
AGE
EDUC
CONLEGIS
B
3.240
.019
.071
-1.373
3.639
.003
.172
-1.657
Std. Error
2.478
.020
.108
.620
2.456
.020
.110
.613
Wald
1.709
.906
.427
4.913
2.195
.017
2.463
7.298
df
1
1
1
1
1
1
1
1
Sig.
.191
.341
.514
.027
.138
.897
.117
.007
In the comparison of survey respondents who thought we spend

too little money on highways and bridges to survey respondents
who thought we spend too much money on highways and
bridges, the probability of the Wald statistic (4.913) for the
variable confidence in Congress [conlegis] was 0.027. Since the
probability was less than or equal to the level of significance of
0.05, the null hypothesis that the b coefficient for confidence in
Congress was equal to zero for this comparison was rejected.
Exp(B)
95% Confiden
Exp
Lower Bound
1.019
1.073
.253
.980
.868
.075
1.003
1.188
.191
.963
.958
.057
ters II

Slide
43
Parameter Estimates
HIGHWAYS
a
AND BRIDGES
1
Intercept
AGE
EDUC
CONLEGIS
Intercept
AGE
EDUC
CONLEGIS
B
3.240
.019
.071
-1.373
3.639
.003
.172
-1.657
Std. Error
2.478
.020
.108
.620
2.456
.020
.110
.613
Wald
1.709
.906
.427
4.913
2.195
.017
2.463
7.298
df
1
1
1
1
1
1
1
1
Sig.
.191
.341
.514
.027
.138
.897
.117
.007
The value of Exp(B) was 0.253 which implies that for each unit
increase in confidence in Congress the odds decreased by 74.7%
(0.253 - 1.0 = -0.747).
The relationship stated in the problem is supported. Survey
respondents who had less confidence in congress were less likely
to be in the group of survey respondents who thought we spend
too little money on highways and bridges, rather than the group of
highways and bridges. For each unit increase in confidence in
Congress, the odds of being in the group of survey respondents
who thought we spend too little money on highways and bridges
decreased by 74.7%.
Exp(B)
95% Confiden
Exp
Lower Bound
1.019
1.073
.253
.980
.868
.075
1.003
1.188
.191
.963
.958
.057
ters II

Slide
44
Parameter Estimates
HIGHWAYS
a
AND BRIDGES
1
Intercept
AGE
EDUC
CONLEGIS
Intercept
AGE
EDUC
CONLEGIS
B
3.240
.019
.071
-1.373
3.639
.003
.172
-1.657
Std. Error
2.478
.020
.108
.620
2.456
.020
.110
.613
Wald
1.709
.906
.427
4.913
2.195
.017
2.463
7.298
df
1
1
1
1
1
1
1
1
Sig.
.191
.341
.514
.027
.138
.897
.117
.007
In the comparison of survey respondents who thought we spend

about the right amount of money on highways and bridges to
highways and bridges, the probability of the Wald statistic
(7.298) for the variable confidence in Congress [conlegis] was
0.007. Since the probability was less than or equal to the level
of significance of 0.05, the null hypothesis that the b coefficient
for confidence in Congress was equal to zero for this comparison
was rejected.
Exp(B)
95% Confiden
Exp
Lower Bound
1.019
1.073
.253
.980
.868
.075
1.003
1.188
.191
.963
.958
.057
ters II
Slide
45

Parameter Estimates
95% Con
HIGHWAYS
a
AND BRIDGES
1
Intercept
AGE
EDUC
CONLEGIS
Intercept
AGE
EDUC
CONLEGIS
B
3.240
.019
.071
-1.373
3.639
.003
.172
-1.657
Std. Error
2.478
.020
.108
.620
2.456
.020
.110
.613
Wald
1.709
.906
.427
4.913
2.195
.017
2.463
7.298
df
1
1
1
1
1
1
1
1
Sig.
.191
.341
.514
.027
.138
.897
.117
.007
The value of Exp(B) was 0.191 which implies that for each unit increase in
confidence in Congress the odds decreased by 80.9% (0.191-1.0=-0.809).
The relationship stated in the problem is supported. Survey respondents
who had less confidence in congress were less likely to be in the group of
survey respondents who thought we spend about the right amount of
money on highways and bridges, rather than the group of survey
respondents who thought we spend too much money on highways and
bridges. For each unit increase in confidence in Congress, the odds of
being in the group of survey respondents who thought we spend about the
right amount of money on highways and bridges decreased by 80.9%.
Exp(B)
Lower Bou
1.019
1.073
.253
.9
.8
.0
1.003
1.188
.191
.9
.9
.0
ters II
Slide
46
CLASSIFICATION USING THE MULTINOMIAL LOGISTIC

REGRESSION MODEL: BY CHANCE ACCURACY RATE
The independent variables could be characterized as useful
predictors distinguishing survey respondents who thought we
spend too little money on highways and bridges, survey
respondents who thought we spend about the right amount
of money on highways and bridges and survey respondents
who thought we spend too much money on highways and
bridges if the classification accuracy rate was substantially
higher than the accuracy attainable by chance alone.
Operationally, the classification accuracy rate should be 25%
or more higher than the proportional by chance accuracy
rate.

N
HIGHWAYS
AND BRIDGES
1
2
3
Marginal
Percentage
37.1%
55.7%
7.2%
100.0%
62
93
12
Valid
167
Missing
103
Total
270
The proportional by chance accuracy rate
was computed by
Subpopulation
153agroup based on
calculating the proportion of cases for each
the number
of dependent
cases in variable
each group
inone
thevalue
'Case
Processing
a. The
has only
observed
Summary', and then squaring and summing the proportion of
in 146 (95.4%) subpopulations.
cases in each
group (0.371 + 0.557 + 0.072 = 0.453).
ters II
Slide
47
CLASSIFICATION USING THE MULTINOMIAL LOGISTIC

REGRESSION MODEL: CLASSIFICATION ACCURACY
Classification
Predicted
Observed
1
2
3
Overall Percentage
1
15
7
5
16.2%
2
47
86
7
83.8%
3
0
0
0
.0%
The classification accuracy rate was 60.5%

which was greater than or equal to the
proportional by chance accuracy criteria of
56.6% (1.25 x 45.3% = 56.6%).
The criteria for classification accuracy is
satisfied.
Percent
Correct
24.2%
92.5%
.0%
60.5%
ters II
Slide
48
Answering the question in problem 1 - 1

statistic? Assume that there is no problem with missing data, outliers, or influential cases, and that
the validation analysis will confirm the generalizability of the results. Use a level of significance of
about spending on highways and bridges" [natroad]. These predictors differentiate survey respondents
who thought we spend too little money on highways and bridges from survey respondents who
thought we spend too much money on highways and bridges and survey respondents who thought we
spend about the right amount of money on highways and bridges from survey respondents who
thought we spend too much money on highways and bridges.
defined by responses to opinion about spending on highways and bridges. Survey respondents who had
less confidence in congress were less likely to be in the group of survey respondents who thought we
spend too little money on highways and
bridges, rather than the group of survey respondents who
We found a statistically significant overall
thought we spend too much money on highways and bridges. For each unit increase in confidence in
relationship
between
the combination
Congress, the odds of being in the group
of survey
respondents
who thoughtofwe spend too little
independent
variables
and
the dependent
money on highways and bridges decreased by 74.7%. Survey respondents
who had less confidence in
congress were less likely to be in thevariable.
group of survey respondents who thought we spend about the
right amount of money on highways and bridges, rather than the group of survey respondents who
thought we spend too much money on
highways
and
bridges.ofFor
each unitproblems
increase in
in confidence in
There
was no
evidence
numerical
Congress, the odds of being in the group
of survey respondents who thought we spend about the right
the solution.
amount of money on highways and bridges decreased by 80.9%.
1.
2.
3.
4.
True
True with caution
False
Inappropriate application of a
Moreover, the classification accuracy surpassed

the proportional by chance accuracy criteria,
supporting the utility of the model.
statistic
ters II
Slide
49
Answering the question in problem 1 - 2

We verified that each statement about the relationship
The variables "age" [age],
"highest year of school completed" [educ] and "confidence in
an independent variable and the dependent
Congress" [conlegis]between
were useful
predictors for distinguishing between groups based on
variable
was
correct
both direction
of the relationship
responses to "opinion about spending oninhighways
and bridges"
[natroad]. These predictors
the change
in likelihood
associated
with
a one-unit
differentiate surveyand
respondents
who
thought we
spend too
little
change
of
the
independent
variable,
for
both
of
the
bridges from survey respondents who thought we spend too much money on highways and
comparisons who
between
groups
the problem.
bridges and survey respondents
thought
we stated
spend in
about
the right amount of money on
highways and bridges from survey respondents who thought we spend too much money on
highways and bridges.
group of survey respondents who thought we spend about the right amount of money on
highways and bridges, rather than the group of survey respondents who thought we spend too
much money on highways and bridges. For each unit increase in confidence in Congress, the
odds of being in the group of survey respondents who thought we spend about the right amount
of money on highways and bridges decreased by 80.9%.
1.
2.
3.
4.
True
True with caution
False
The answer to the question is true

with caution.
A caution is added because of the
inclusion of ordinal level variables.
ters II
Slide
50
Problem 2
a statistic? Assume that there is no problem with missing data, outliers, or influential cases,
and that the validation analysis will confirm the generalizability of the results. Use a level of
The variables "highest year of school completed" [educ], "sex" [sex] and "total family income"
[income98] were useful predictors for distinguishing between groups based on responses to
"opinion about spending on space exploration" [natspac]. These predictors differentiate survey
respondents who thought we spend too little money on space exploration from survey
respondents who thought we spend too much money on space exploration and survey
respondents who thought we spend about the right amount of money on space exploration from
survey respondents who thought we spend too much money on space exploration.
Among this set of predictors, total family income was helpful in distinguishing among the
groups defined by responses to opinion about spending on space exploration. Survey
respondents who had higher total family incomes were more likely to be in the group of survey
respondents who thought we spend about the right amount of money on space exploration,
rather than the group of survey respondents who thought we spend too much money on space
exploration. For each unit increase in total family income, the odds of being in the group of
survey respondents who thought we spend about the right amount of money on space
exploration increased by 6.0%.
1.
2.
3.
4.
True
True with caution
False
ters II
Slide
51
1. In the dataset GSS2000, is the following statement true, false, or an incorrect
application of a statistic? Assume that there is no problem with missing data, outliers, or
influential cases, and that the validation analysis will confirm the generalizability of the
results. Use a level of significance of 0.05 for evaluating the statistical relationships.
For these [natspac].
problems, we
will predictors differentiate survey
"opinion about spending on space exploration"
These
assume
that there
no problem
respondents who thought we spend too
little money
on is
space
exploration from survey
with
missing
data,
or
respondents who thought we spend too
much
money
on outliers,
space exploration
and survey
respondents who thought we spend about
the right
amount
of money
on space exploration from
influential
cases,
and that
the
survey respondents who thought we spend
too much
money
on space exploration.
validation
analysis
will confirm
the generalizability of the
results
Among this set of predictors, total family
income was helpful in distinguishing among the

In this
problem,
wemore
are told
to to be in the group of survey
respondents who had higher total family
incomes
were
likely
use
0.05
as
alpha
for
the
multinomial
rather than the group of survey respondents
who logistic
thoughtregression.
we spend too much money on space
1.
2.
3.
4.
True
True with caution
False
ters II
Slide
52
The variables listed first in the problem
statement are the independent variables
(IVs):
"highest
year ofis school
completed"
1. In the
dataset
GSS2000,
the following
statement true, false, or an incorrect application of a
[educ],
"sex"
[sex]
and
familywith missing data, outliers, or influential cases, and that
statistic?
Assume
that
there
is "total
no problem
the validation
will confirm the generalizability of the results. Use a level of significance of
income" analysis
[income98].
respondents who thought we spend too little money on space exploration from survey respondents
who thought we spend too much money on space exploration and survey respondents who thought we
spend about the right amount of money on space exploration from survey respondents who thought
we spend too much money on space exploration.
The variable used to define
groups
is set
theof
dependent
Among
this
predictors, total family income was helpful in distinguishing among the groups
variable
"opinion
about about spending on space exploration. Survey respondents who had
defined
by (DV):
responses
to opinion
higher
total
family
incomes
were more likely to be in the group of survey respondents who thought
spending on space
weexploration"
spend about[natspac].
the right amount of money on space exploration, rather than the group of survey
respondents who thought we spend too much money on space exploration. For each unit increase in
total family income, the odds of being in the group of survey respondents who thought we spend
about the right amount of money on space exploration increased by 6.0%.
1.
2.
3.
4.
True
True with caution
False
SPSS only supports direct or

simultaneous entry of independent
variables in multinomial logistic
regression, so we have no choice of
method for entering variables.
ters II
Slide
53
SPSS multinomial logistic regression models the relationship
by comparing each of the groups defined by the dependent
variable to the group with the highest code value.
1. In the dataset
is the
followingabout
statement
true, on
false,
an incorrect application of a statistic?
TheGSS2000,
responses
to opinion
spending
theorspace
Assume that there
is
no
problem
with
missing
data,
outliers,
or
influential
cases, and that the validation
program were:
analysis will confirm the generalizability of the results. Use a level of significance of 0.05 for evaluating the
1= Too little, 2 = About right, and 3 = Too much.
statistical relationships.
The variables "highest year of school completed" [educ], "sex" [sex] and "total family income" [income98]
were useful predictors for distinguishing between groups based on responses to "opinion about spending on
space exploration" [natspac]. These predictors differentiate survey respondents who thought we spend
too little money on space exploration from survey respondents who thought we spend too much money
on space exploration and survey respondents who thought we spend about the right amount of money on
space exploration from survey respondents who thought we spend too much money on space
exploration.
Among this set of predictors, total family income was helpful in distinguishing among the groups defined by
responses to opinion about spending on space exploration. Survey respondents who had higher total family
incomes were more likely to be in the group of survey respondents who thought we spend about the right
amount of money on space exploration, rather than the group of survey respondents who thought we spend
analysis
will result
in two
too much money onThe
space
exploration.
For each
unitcomparisons:
increase in total family income, the odds of being in the
who
thought we who
spendthought
about the
amount
of money
on space exploration
survey
respondents
weright
spend
too little
money
increased by 6.0%.
versus survey respondents who thought we spend too much
money on space exploration
1. True
survey respondents who thought we spend about the right
2. True with cautionamount of money versus survey respondents who thought we
3. False
spend too much money on space exploration.
ters II
Slide
54
Each problem includes a statement about the
The variables
"highest year of school completed" [educ], "sex" [sex] and "total family income"
one independent variable and
[income98]relationship
were usefulbetween
predictors
for distinguishing between groups based on responses to
the
dependent
variable.
The answer [natspac].
to the
"opinion about spending on space exploration"
These predictors differentiate survey
is based
the stated
relationship,
respondentsproblem
who thought
weonspend
too little
money on space exploration from survey
the relationships
between
otheron space exploration and survey
respondentsignoring
who thought
we spend too
muchthe
money
independent
variables
and
the
dependent
respondents who thought we spend about the right variable.
amount of money on space exploration from
respondents who had higher total family incomes were more likely to be in the group of
exploration, rather than the group of survey respondents who thought we spend too much
money on space exploration. For each unit increase in total family income, the odds of
being in the group of survey respondents who thought we spend about the right amount of
money on space exploration increased by 6.0%.
1.
2.
3.
4.
True
This problem identifies a difference for only one
True with caution
of the two comparisons based on the three values
False
of the dependent variable.
Other problems will specify both of the possible
comparisons.
ters II
Slide
55
respondents who had higher total family incomes were more likely to be in the group of survey
rather than the group of survey respondents who thought we spend too much money on space
1.
2.
3.
4.
In order for the multinomial logistic regression

True
question to be true, the overall relationship must
True with caution
be statistically significant, there must be no
False
evidence of numerical problems, the classification
accuracy rate must be substantially better than
could be obtained by chance alone, and the
stated individual relationship must be statistically
significant and interpreted correctly.
ters II
Slide
56
a statistic? Assume that there is no problem with missing data, outliers, or influential cases,
and that the validation analysis will confirm the generalizability of the results. Use a level of
"opinion about spending on space exploration" [natspac]. These predictors differentiate
survey respondents who thought we spend too little money on space exploration from
survey respondents who thought we spend too much money on space exploration and
exploration from survey respondents who thought we spend too much money on space
exploration.
Multinomial
requires
theexploration. Survey
groups defined by responses
tologistic
opinionregression
about spending
onthat
space
dependent variable be non-metric and the
respondents who had
higher total
family be
incomes
more likely to be in the group of survey
independent
variables
metricwere
or dichotomous.
rather than the group
of survey
respondents
thought
we spend too much money on space
"Opinion
about
spending onwho
space
exploration"
exploration. For each
unit increase
in total
family
income,
the odds of being in the group of
[natspac]
is ordinal,
satisfying
the
non-metric
survey respondentslevel
who of
thought
we spend
about the for
right
measurement
requirement
theamount of money on space
dependent
exploration increased
by 6.0%.variable.
1.
2.
3.
4.
It contains three categories: survey respondents
True
who thought we spend too little money, about
the right amount of money, and too much money
True with caution
on space exploration.
False
ters II
Slide
57
"Highest year of school
"Sex" [sex] is dichotomous,
completed" [educ] is interval,
satisfying the metric or
satisfying the metric or
dichotomous level of measurement
dichotomous
level ofGSS2000, is the following statement
1. In the dataset
true, false, or an incorrect application of a
requirement for independent
measurement
requirement
for is no problem with missing
statistic? Assume
that there
data, outliers, or influential cases, and
variables.
independent
variables.analysis will confirm the generalizability
that the validation
of the results. Use a level of
survey respondents
who family
thought
we spend
too much
"Total
income"
[income98]
is money
ordinal,on space exploration.
satisfying the metric or dichotomous level of
requirement for independent
Among this set of measurement
predictors,Iftotal
familythe
income
was helpful
in distinguishing among the groups
variables.
we follow
convention
of treating
defined by responses
to
opinion
about
spending
on
space
exploration.
ordinal level variables as metric variables, the level Survey respondents who
had higher total family
incomes were
more likely
beanalysis
in the group
of survey respondents who
of measurement
requirement
forto
the
is
thought we spend satisfied.
about theSince
rightsome
amount
of
money
on
space
exploration,
rather than the group
data analysts do not agree
of survey respondents
who
thought
we
spend
about
the
right
amount
of
money
on space
with this convention, a note of caution should be
exploration. For each
unit in
increase
in total family income, the odds of being in the group of
included
our interpretation.
1. True
2. True with caution
ters II
Slide
58
Request multinomial logistic regression
Select the Regression |

Multinomial Logistic
command from the
Analyze menu.
ters II
Slide
59
Selecting the dependent variable
First, highlight the

dependent variable
natspac in the list
of variables.
Second, click on the right

arrow button to move the
dependent variable to the
Dependent text box.
ters II
Slide
60
Selecting non-metric independent variables

Non-metric independent variables are specified as
factors in multinomial logistic regression. Non-metric
variables can be either dichotomous, nominal, or ordinal.
These variables will be dummy coded as needed and
each value will be listed separately in the output.
Select the
dichotomous
variable sex.
Move the non-metric

independent variables
listed in the problem to
the Factor(s) list box.
ters II
Slide
61
Selecting metric independent variables

Metric independent variables are specified as covariates
in multinomial logistic regression. Metric variables can
be either interval or, by convention, ordinal.
Move the metric

independent variables,
educ and income98, to
the Covariate(s) list box.
ters II
Slide
62
Specifying statistics to include in the output
While we will accept most of

the SPSS defaults for the
analysis, we need to specifically
request the classification table.
Click on the Statistics button
to make a request.
ters II
Slide
63
Requesting the classification table
First, keep the SPSS

defaults for Summary
statistics, Likelihood
ratio test, and
Parameter estimates.
Second, mark the

checkbox for the
Classification table.
Third, click
on the
Continue
button to
complete the
request.
ters II
Slide
64
Completing the multinomial

logistic regression request
Click on the OK
button to request
the output for the
multinomial logistic
regression.
The multinomial logistic procedure supports

additional commands to specify the model
computed for the relationships (we will use the
default main effects model), additional
specifications for computing the regression,
and saving classification results. We will not
make use of these options.
ters II
Slide
65
Sample size ratio of cases to variables

N
SPACE EXPLORATION
PROGRAM
RESPONDENTS SEX
Valid
Missing
Total
Subpopulation
1
2
3
1
2
33
90
85
94
114
208
62
270
138a
Marginal
Percentage
15.9%
43.3%
40.9%
45.2%
54.8%
100.0%
a. The dependent variable has only one value observed in 112
Multinomial
logistic
regression requires that the minimum ratio
(81.2%)
subpopulations.
of valid cases to independent variables be at least 10 to 1. The
ratio of valid cases (208) to number of independent
variables( 3) was 69.3 to 1, which was equal to or greater than
the minimum ratio. The requirement for a minimum ratio of
cases to independent variables was satisfied.
The preferred ratio of valid cases to independent variables is
20 to 1. The ratio of 69.3 to 1 was equal to or greater than the
preferred ratio. The preferred ratio of cases to independent
variables was satisfied.
ters II
Slide
66
OVERALL RELATIONSHIP BETWEEN

INDEPENDENT AND DEPENDENT VARIABLES
Model
Intercept Only
Final
-2 Log
Likelihood
354.268
334.967
Chi-Square
19.301
df
Sig.
6
.004

Information".
supported.
ters II
Slide
67
NUMERICAL PROBLEMS
Parameter Estimates
SPACE EXPLORATION
a
PROGRAM
1
Intercept
EDUC
INCOME98
[SEX=1]
[SEX=2]
Intercept
EDUC
INCOME98
[SEX=1]
[SEX=2]
B
Std. Error
-4.136
1.157
.101
.089
.097
.050
.672
.426
b
0
.
-2.487
.840
.108
.068
.058
.034
.501
.317
b
0
.

b. This parameter is set to zero because it is redundant.
Wald
12.779
1.276
3.701
2.488
.
8.774
2.521
2.932
2.492
.
df
95% Confidence
Exp(B)
Lower Bound
U
Sig.
Exp(B)
1
.000
Multicollinearity
in the multinomial
logistic regression
is
1
.259 solution
1.106
detected1 by examining
the
.054
1.102
standard errors for the b
1
.115
1.959
coefficients.
A standard
error
0
.
.
larger than
2.0 indicates
numerical
problems,
such
as
multicollinearity
1
.003
among the independent variables,
1 for a dummy-coded
.112
1.114
zero cells
1
.087 because
1.060 all of
independent
variable
the subjects
have
the
same
value
1
.114
1.650
for the variable, and 'complete
0
.
.
separation' whereby the two

groups in the dependent event
variable can be perfectly separated
by scores on one of the
independent variables. Analyses
that indicate numerical problems
should not be interpreted.
None of the independent variables
in this analysis had a standard
error larger than 2.0.
.929
.998
.850
.
.975
.992
.886
.
ters II
Slide
68

Effect
Intercept
EDUC
INCOME98
SEX
-2 Log
Likelihood of
Reduced
Model
334.967a
337.788
340.154
338.511
Chi-Square
.000
2.821
5.187
3.544
df
Sig.
0
2
2
2
.
.244
.075
.170

between the final model and a reduced model. The reduced model
a.
The statistical significance
of the relationship between
Thisopinion
reduced about
model isspending
equivalent to
final model because
total family income and
onthe
space
omitting
the effect does
not increaseofthe
degrees of freedom.
exploration is based on
the statistical
significance
the
chi-square statistic in the SPSS table titled "Likelihood

Ratio Tests".
For this relationship, the probability of the chi-square

statistic (5.187) was 0.075, greater than the level of
significance of 0.05. The null hypothesis that all of the b
coefficients associated with total family income were
equal to zero was not rejected. The existence of a
relationship between total family income and opinion
about spending on space exploration was not supported.
ters II
Slide
69
Answering the question in problem 2

survey respondents who thought we
spend
much money
on spaceoverall
exploration.
We
foundtoo
a statistically
significant
relationship between the combination of

Among this set of predictors, total independent
family income
was helpful
in distinguishing
among the groups
variables
and the
dependent
defined by responses to opinion about
spending on space exploration. Survey respondents who
variable.
had higher total family incomes were more likely to be in the group of survey respondents who
thought we spend about the right amount
of money
on space
exploration,
ratherinthan the group
There was
no evidence
of numerical
problems
of survey respondents who thoughtthe
wesolution.
spend too much money on space exploration. For each
unit increase in total family income, the odds of being in the group of survey respondents who
thought we spend about the right amount
of money on space exploration increased by 6.0%.
However, the individual relationship between
1.
2.
3.
4.
total family income and spending on space was
True
not statistically significant.
True with caution
The answer to the question is false.
False
ters II
Slide
70
Steps in multinomial logistic regression:

level of measurement and initial sample size
The following is a guide to the decision process for answering
problems about the basic relationships in multinomial logistic
regression:
Dependent non-metric?
Independent variables
metric or dichotomous?
No
Inappropriate
application of
a statistic
Yes
Ratio of cases to
independent variables at
least 10 to 1?
Yes
Run multinomial logistic regression
No
Inappropriate
application of
a statistic
ters II
Slide
71

overall relationship and numerical problems
Overall relationship
statistically significant?
(model chi-square test)
No
False
Yes
Standard errors of
coefficients indicate no
numerical problems (s.e.
<= 2.0)?
Yes
No
False
ters II
Slide
72

relationships between IV's and DV
Overall relationship
between specific IV and DV
is statistically significant?
(likelihood ratio test)
No
False
Yes
Role of specific IV and DV

groups statistically significant
and interpreted correctly?
(Wald test and Exp(B))
Yes
No
False
ters II
Slide
73

classification accuracy and adding cautions
Overall accuracy rate is

25% > than proportional
by chance accuracy rate?
No
False
Yes
Satisfies preferred ratio of

cases to IV's of 20 to 1
No
True with caution
Yes
One or more IV's are
ordinal level treated as
metric?
No
True
Yes
True with caution

Multinomial Logistic Regression Basic Relationships

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Multinomial Logistic Regression Basic Relationships

Încărcat de

Drepturi de autor:

Formate disponibile

SW388R7

Data Analysis &

Multinomial Logistic Regression

Multinomial logistic regression

Multinomial logistic regression is used to analyze relationships

Multinomial logistic regression compares multiple groups

The group comparisons are equivalent to the comparisons for a

For example, if we wanted to study differences in BSW, MSW,

What multinomial logistic regression predicts

Multinomial logistic regression provides a set of coefficients for

The three equations can be used to compute the probability

Predicted group membership can be compared to actual group

Level of measurement requirements

Multinomial logistic regression analysis requires that the

Multinomial logistic regression analysis requires that the

In SPSS, non-metric independent variables are included as

In SPSS, metric independent variables are included as

Assumptions and outliers

Multinomial logistic regression does not make any assumptions

Because it does not impose these requirements, it is preferred

SPSS does not compute any diagnostic statistics for outliers. To

Sample size requirements

The minimum number of cases per independent variable is 10,

For preferred case-to-variable ratios, we will use 20 to 1.

Methods for including variables

The only method for selecting independent variables in SPSS is

Overall test of relationship - 1

The overall test of relationship among the independent

This difference in likelihood follows a chi-square distribution,

Overall test of relationship - 2

Model Fitting Information

The presence of a relationship between the dependent

Strength of multinomial logistic regression

While multinomial logistic regression does compute correlation

A more useful measure to assess the utility of a multinomial

Evaluating usefulness for logistic models

The benchmark that we will use to characterize a multinomial

Even if the independent variables had no relationship to the

The estimate of by chance accuracy that we will use is the

Computing by chance accuracy

a. The dependent variable has only one value observed

The proportional by chance accuracy rate was

Comparing accuracy rates

To characterize our model as useful, we compare the overall

The classification accuracy rate was 60.5%

The maximum likelihood method used to calculate multinomial

Relationship of individual independent

There are two types of tests for individual independent

If an independent variable has an overall relationship to the

Relationship of individual independent

The interpretation for an independent variable focuses on its

We should not interpret the significance of an independent

The interpretation of an independent variables role in

Relationship of individual independent

95% Confidence Interva

a. The reference category is: 3.

the TOO LITTLE group (coded 1, shaded

a. The reference category is: TOO MUCH.

Relationship of individual independent

Likelihood Ratio Tests

In this example, there is a

The chi-square statistic is the difference in -2 log-likelihoods

As well, the independent