Sunteți pe pagina 1din 73

SW388R7

Data Analysis &


Computers II
Slide 1

Multinomial Logistic Regression


Basic Relationships
Multinomial Logistic Regression
Describing Relationships
Classification Accuracy
Sample Problems

Compu
ters II

Multinomial logistic regression

Slide 2

Multinomial logistic regression is used to analyze relationships


between a non-metric dependent variable and metric or
dichotomous independent variables.

Multinomial logistic regression compares multiple groups


through a combination of binary logistic regressions.

The group comparisons are equivalent to the comparisons for a


dummy-coded dependent variable, with the group with the
highest numeric score used as the reference group.

For example, if we wanted to study differences in BSW, MSW,


and PhD students using multinomial logistic regression, the
analysis would compare BSW students to PhD students and MSW
students to PhD students. For each independent variable, there
would be two comparisons.

Compu
ters II

What multinomial logistic regression predicts

Slide 3

Multinomial logistic regression provides a set of coefficients for


each of the two comparisons. The coefficients for the
reference group are all zeros, similar to the coefficients for the
reference group for a dummy-coded variable.

Thus, there are three equations, one for each of the groups
defined by the dependent variable.

The three equations can be used to compute the probability


that a subject is a member of each of the three groups. A case
is predicted to belong to the group associated with the highest
probability.

Predicted group membership can be compared to actual group


membership to obtain a measure of classification accuracy.

Compu
ters II

Level of measurement requirements

Slide 4

Multinomial logistic regression analysis requires that the


dependent variable be non-metric. Dichotomous, nominal, and
ordinal variables satisfy the level of measurement requirement.

Multinomial logistic regression analysis requires that the


independent variables be metric or dichotomous. Since SPSS will
automatically dummy-code nominal level variables, they can be
included since they will be dichotomized in the analysis.

In SPSS, non-metric independent variables are included as


factors. SPSS will dummy-code non-metric IVs.

In SPSS, metric independent variables are included as


covariates. If an independent variable is ordinal, we will
attach the usual caution.

Compu
ters II

Assumptions and outliers

Slide 5

Multinomial logistic regression does not make any assumptions


of normality, linearity, and homogeneity of variance for the
independent variables.

Because it does not impose these requirements, it is preferred


to discriminant analysis when the data does not satisfy these
assumptions.

SPSS does not compute any diagnostic statistics for outliers. To


evaluate outliers, the advice is to run multiple binary logistic
regressions and use those results to test the exclusion of
outliers or influential cases.

Compu
ters II

Sample size requirements

Slide 6

The minimum number of cases per independent variable is 10,


using a guideline provided by Hosmer and Lemeshow, authors of
Applied Logistic Regression, one of the main resources for
Logistic Regression.

For preferred case-to-variable ratios, we will use 20 to 1.

Compu
ters II

Methods for including variables

Slide 7

The only method for selecting independent variables in SPSS is


simultaneous or direct entry.

Compu
ters II

Overall test of relationship - 1

Slide 8

The overall test of relationship among the independent


variables and groups defined by the dependent is based on the
reduction in the likelihood values for a model which does not
contain any independent variables and the model that contains
the independent variables.

This difference in likelihood follows a chi-square distribution,


and is referred to as the model chi-square.

The significance test for the final model chi-square (after the
independent variables have been added) is our statistical
evidence of the presence of a relationship between the
dependent variable and the combination of the independent
variables.

Compu
ters II
Slide 9

Overall test of relationship - 2

Model Fitting Information


Model
Intercept Only
Final

-2 Log
Likelihood
284.429
265.972

Chi-Square
18.457

df

Sig.
6

.005

The presence of a relationship between the dependent


variable and combination of independent variables is
based on the statistical significance of the final model
chi-square in the SPSS table titled "Model Fitting
Information".
In this analysis, the probability of the model chi-square
(18.457) was 0.005, less than or equal to the level of
significance of 0.05. The null hypothesis that there was
no difference between the model without independent
variables and the model with independent variables was
rejected. The existence of a relationship between the
independent variables and the dependent variable was
supported.

ters II

Strength of multinomial logistic regression


relationship

Slide
10

While multinomial logistic regression does compute correlation


measures to estimate the strength of the relationship (pseudo R
square measures, such as Nagelkerke's R), these correlations
measures do not really tell us much about the accuracy or
errors associated with the model.

A more useful measure to assess the utility of a multinomial


logistic regression model is classification accuracy, which
compares predicted group membership based on the logistic
model to the actual, known group membership, which is the
value for the dependent variable.

ters II
Slide
11

Evaluating usefulness for logistic models

The benchmark that we will use to characterize a multinomial


logistic regression model as useful is a 25% improvement over
the rate of accuracy achievable by chance alone.

Even if the independent variables had no relationship to the


groups defined by the dependent variable, we would still
expect to be correct in our predictions of group membership
some percentage of the time. This is referred to as by chance
accuracy.

The estimate of by chance accuracy that we will use is the


proportional by chance accuracy rate, computed by summing
the squared percentage of cases in each group. The only
difference between by chance accuracy for binary logistic
models and by chance accuracy for multinomial logistic models
is the number of groups defined by the dependent variable.

ters II
Slide
12

Computing by chance accuracy


The percentage of cases in each group defined by the dependent
variable is found in the Case Processing Summary table.
Case Processing Summary
N
HIGHWAYS
AND BRIDGES
Valid
Missing
Total
Subpopulation

1
2
3

62
93
12
167
103
270
153a

Marginal
Percentage
37.1%
55.7%
7.2%
100.0%

a. The dependent variable has only one value observed


in 146 (95.4%) subpopulations.

The proportional by chance accuracy rate was


computed by calculating the proportion of cases for
each group based on the number of cases in each
group in the 'Case Processing Summary', and then
squaring and summing the proportion of cases in each
group (0.371 + 0.557 + 0.072 = 0.453).
The proportional by chance accuracy criteria is 56.6%
(1.25 x 45.3% = 56.6%).

ters II
Slide
13

Comparing accuracy rates

To characterize our model as useful, we compare the overall


percentage accuracy rate produced by SPSS at the last step in which
variables are entered to 25% more than the proportional by chance
accuracy. (Note: SPSS does not compute a cross-validated accuracy
rate for multinomial logistic regression .)
Classification
Predicted
Observed
1
2
3
Overall Percentage

1
15
7
5
16.2%

2
47
86
7
83.8%

3
0
0
0
.0%

The classification accuracy rate was 60.5%


which was greater than or equal to the
proportional by chance accuracy criteria of
56.6% (1.25 x 45.3% = 56.6%).
The criteria for classification accuracy is
satisfied in this example.

Percent
Correct
24.2%
92.5%
.0%
60.5%

ters II
Slide
14

Numerical problems

The maximum likelihood method used to calculate multinomial


logistic regression is an iterative fitting process that attempts
to cycle through repetitions to find an answer.
Sometimes, the method will break down and not be able to
converge or find an answer.
Sometimes the method will produce wildly improbable results,
reporting that a one-unit change in an independent variable
increases the odds of the modeled event by hundreds of
thousands or millions. These implausible results can be
produced by multicollinearity, categories of predictors having
no cases or zero cells, and complete separation whereby the
two groups are perfectly separated by the scores on one or
more independent variables.
The clue that we have numerical problems and should not
interpret the results are standard errors for some independent
variables that are larger than 2.0.

ters II

Relationship of individual independent


variables and the dependent variable

Slide
15

There are two types of tests for individual independent


variables:
The likelihood ratio test evaluates the overall relationship
between an independent variable and the dependent
variable
The Wald test evaluates whether or not the independent
variable is statistically significant in differentiating between
the two groups in each of the embedded binary logistic
comparisons.

If an independent variable has an overall relationship to the


dependent variable, it might or might not be statistically
significant in differentiating between pairs of groups defined by
the dependent variable.

ters II

Relationship of individual independent


variables and the dependent variable

Slide
16

The interpretation for an independent variable focuses on its


ability to distinguish between pairs of groups and the
contribution which it makes to changing the odds of being in
one dependent variable group rather than the other.

We should not interpret the significance of an independent


variables role in distinguishing between pairs of groups unless
the independent variable also has an overall relationship to the
dependent variable in the likelihood ratio test.

The interpretation of an independent variables role in


differentiating dependent variable groups is the same as we
used in binary logistic regression. The difference in
multinomial logistic regression is that we can have multiple
interpretations for an independent variable in relation to
different pairs of groups.

ters II

Relationship of individual independent


variables and the dependent variable

Slide
17

Parameter Estimates

HIGHWAYS
a
AND BRIDGES
1

Intercept
AGE
EDUC
CONLEGIS
Intercept
AGE
EDUC
CONLEGIS

B
3.240
.019
.071
-1.373
3.639
.003
.172
-1.657

Std. Error
2.478
.020
.108
.620
2.456
.020
.110
.613

95% Confidence Interva


Exp(B)
Waldidentifies
df the comparisons
Sig.
Exp(B)
Lower Bound
Upper B
SPSS
it makes for
groups
variable in
1.709defined by1the dependent
.191
the table
of
Parameter
Estimates,
using either .980
.906
1
.341
1.019
the value codes or the value labels, depending
.427
1
1.073
on the
options settings
for.514
pivot table
labeling. .868
4.913
1
.027
.253
.075
The 2.195
reference category
is
identified
in
the
1
.138
footnote to the table.
.017
1
.897
1.003
.963
2.463
1 comparisons
.117
1.188
.958
In this
analysis, two
will
be
made:
7.298
1
.007
.191
.057

a. The reference category is: 3.

HIGHWAYS
a
AND BRIDGES
TOO LITTLE

ABOUT RIGHT

Intercept
AGE
EDUC
CONLEGIS
Intercept
AGE
EDUC
CONLEGIS

the TOO LITTLE group (coded 1, shaded


blue) will be compared to the TOO MUCH
Parameter
Estimates
group (coded
3, shaded
purple)
the ABOUT RIGHT group (coded 2 ,
shaded orange)) will be compared to the
TOO MUCH group (coded 3, shaded
Std.purple).
Error
Wald
df
Sig.
Exp(B)

B
3.240
2.478
1.709
1
.191
The
reference
category
plays
the
same
role in
.019
.020
.906
1
.341
multinomial logistic regression that it plays in
.071
.108
.427
1 variable:
.514it is
the dummy-coding
of a nominal
-1.373
.620 that4.913
1 with .027
the category
would be coded
zeros
for all of
the dummy-coded
variables
that
3.639
2.456
2.195
1
.138 all
other categories are interpreted against.
.003
.020
.017
1
.897
.172
.110
2.463
1
.117
-1.657
.613
7.298
1
.007

a. The reference category is: TOO MUCH.

1.019
1.073
.253
1.003
1.188
.191

95% C

Lower B

ters II

Relationship of individual independent


variables and the dependent variable

Slide
18

Likelihood Ratio Tests

Effect
Intercept
AGE
EDUC
CONLEGIS

-2 Log
Likelihood of
Reduced
Model
268.323
268.625
270.395
275.194

Chi-Square
2.350
2.652
4.423
9.221

df
2
2
2
2

Sig.
.309
.265
.110
.010

In this example, there is a


statistically significant
relationship between the
independent variable
CONLEGIS and the dependent
variable. (0.010 < 0.05)

The chi-square statistic is the difference in -2 log-likelihoods


between the final model and a reduced model. The reduced model is
Parameter Estimates
formed by omitting an effect from the final model. The null hypothesis
is that all parameters of that effect are 0.

HIGHWAYS
a
AND BRIDGES
1

B
3.240
.019
.071
-1.373
3.639
.003
.172
-1.657

Intercept
AGE
EDUC
CONLEGIS
Intercept
AGE
EDUC
CONLEGIS

Std. Error
2.478
.020
.108
.620
2.456
.020
.110
.613

Wald
1.709
.906
.427
4.913
2.195
.017
2.463
7.298

df
1
1
1
1
1
1
1
1

As well, the independent


variable CONLEGIS is
significant in distinguishing
both category 1 of 95%
the Confidence Interval f
dependent variable from Exp(B)
category 3 of the dependent
Sig.
Exp(B) < 0.05)
Lower Bound
Upper Bou
variable. (0.027
.191
.341
.514
.027
.138
.897
.117
.007

a. The reference category is: 3.

And the independent variable CONLEGIS is significant in


distinguishing category 2 of the dependent variable from
category 3 of the dependent variable. (0.007 < 0.05)

1.019
1.073
.253

.980
.868
.075

1.0
1.3
.8

1.003
1.188
.191

.963
.958
.057

1.0
1.4
.6

ters II
Interpreting relationship of individual independent
variables to the dependent variable

Slide
19

Likelihood Ratio Tests

Effect
Intercept
AGE
EDUC
CONLEGIS

-2 Log
Likelihood
of respondents who had less confidence in congress (higher
Survey
values correspond to lower confidence) were less likely to be in the
Reduced
group ofChi-Square
survey respondents
who
thought we spend too little money
Model
df
Sig.
on
highways
and
bridges
(DV
category
268.323
2.350
2
.309 1), rather than the group of
survey respondents who thought we spend too much money on
268.625
.265
highways and2.652
bridges (DV 2category
3).
270.395
4.423
2
.110
For each unit9.221
increase in confidence
in Congress, the odds of being
275.194
2
.010

in the group of survey respondents who thought we spend too little

The chi-square statistic is the difference in -2 log-likelihoods


money on highways and bridges decreased by 74.7%. (0.253 1.0
between the final model
and a reduced model. The reduced model is
= -0.747)
Parameter Estimates
formed by omitting an effect from the final model. The null hypothesis
is that all parameters of that effect are 0.

HIGHWAYS
a
AND BRIDGES
1

Intercept
AGE
EDUC
CONLEGIS
Intercept
AGE
EDUC
CONLEGIS

a. The reference category is: 3.

B
3.240
.019
.071
-1.373
3.639
.003
.172
-1.657

Std. Error
2.478
.020
.108
.620
2.456
.020
.110
.613

Wald
1.709
.906
.427
4.913
2.195
.017
2.463
7.298

df
1
1
1
1
1
1
1
1

Sig.
.191
.341
.514
.027
.138
.897
.117
.007

Exp(B)

95% Confidence Interval f


Exp(B)
Lower Bound
Upper Bou

1.019
1.073
.253

.980
.868
.075

1.0
1.3
.8

1.003
1.188
.191

.963
.958
.057

1.0
1.4
.6

ters II
Interpreting relationship of individual independent
variables to the dependent variable

Slide
20

Likelihood Ratio Tests

Effect
Intercept
AGE
EDUC
CONLEGIS

-2 Log
Likelihood of
Reduced
Model
268.323
268.625
270.395
275.194

Chi-Square
2.350
2.652
4.423
9.221

df
2
2
2
2

Sig.
.309
.265
.110
.010

Survey respondents who had less confidence in congress (higher

The chi-square statistic isvalues


the difference
in -2 log-likelihoods
correspond
to lower confidence) were less likely to be in the
between the final model and
a
reduced
model.
The reduced model
is thought we spend about the right
group of survey respondents
who
Parameter
Estimates
amount
of
money
on
highways
and
(DV category 2), rather
formed by omitting an effect from the final model. The null hypothesis bridges
than
the
group
of
survey
respondents
who
thought
we spend too
is that all parameters of that effect are 0.

much money on highways and bridges (DV Category 3).

HIGHWAYS
a
AND BRIDGES
1

B increase
Std. Error
Wald
df
Sig.odds of
Exp(B)
For each unit
in confidence
in Congress,
the
being
Intercept
in the group
of survey
respondents
we spend
about the
3.240
2.478
1.709 who thought
1
.191
of money
on highways
decreased
by1.019
AGE right amount
.019
.020
.906 and bridges
1
.341
80.9%. (0.191 1.0 = 0.809)
EDUC
.071
.108
.427
1
.514
1.073
CONLEGIS
-1.373
.620
4.913
1
.027
.253
Intercept
3.639
2.456
2.195
1
.138
AGE
.003
.020
.017
1
.897
1.003
EDUC
.172
.110
2.463
1
.117
1.188
CONLEGIS
-1.657
.613
7.298
1
.007
.191

a. The reference category is: 3.

95% Confidence Interval f


Exp(B)
Lower Bound
Upper Bou
.980
.868
.075

1.0
1.3
.8

.963
.958
.057

1.0
1.4
.6

ters II

Relationship of individual independent


variables and the dependent variable

Slide
21

Likelihood Ratio Tests

Effect
Intercept
AGE
EDUC
POLVIEWS
SEX

-2 Log
Likelihood of
Reduced
Model
327.463a
333.440
329.606
334.636
338.985

Chi-Square
.000
5.976
2.143
7.173
11.521

df

Sig.
0
2
2
2
2

.
.050
.343
.028
.003

The chi-square statistic is the difference in -2 log-likelihoods


Estimates
between the final model and a reduced model. The reducedParameter
model
is formed by omitting an effect from the final model. The null
hypothesis is that all parameters of that effect are 0.
a.
a
NATCHLD
B
Std. Error
Wald
df
This reducedIntercept
model is equivalent
to the final2.233
model because
TOO LITTLE
8.434
14.261
1
omitting the effect does not increase the degrees of freedom.
AGE
-.023
.017
1.756
1
EDUC
-.066
.102
.414
1
POLVIEWS
-.575
.251
5.234
1
[SEX=1]
-2.167
.805
7.242
1
b
[SEX=2]
0
.
.
0
ABOUT RIGHT Intercept
4.485
2.255
3.955
1
AGE
-.001
.018
.003
1
EDUC
.011
.104
.011
1
POLVIEWS
-.397
.257
2.375
1
[SEX=1]
-1.606
.824
3.800
1
b
[SEX=2]
0
.
.
0
a. The reference category is: TOO MUCH.

In this example, there is


a statistically significant
relationship between SEX
and the dependent
variable, spending on
childcare assistance.

As well, SEX plays a


statistically significant role
Interval
in differentiating 95%
the Confidence
TOO
LITTLE group from the TOO
Exp(B)
(reference)
group.
Sig.MUCH Exp(B)
Lower
Bound
Upper Bo
(0.007
<
0.5)
.000
.185
.977
.944
.520
.936
.766
.022
.563
.344
.007
.115
.024
.
.
.
However, SEX does not
.047differentiate the ABOUT
.955RIGHT .999
.965
group from the
TOO
MUCH
(reference)
.916
1.011
.824
.123group.(0.51
.673 > 0.5) .406
.051
.201
.040
.
.
.

1.
1.
.
.

1.
1.
1.
1.

ters II
Slide
22

Interpreting relationship of individual independent


variables and the dependent variable
Likelihood Ratio Tests

Effect
Intercept
AGE
EDUC
POLVIEWS
SEX

-2 Log
Likelihood of
Reduced
Model
Chi-Square
df
Sig.
327.463a
.000
0
.
333.440
5.976who were2male (code
.050 1 for sex) were less likely
Survey
respondents
to 329.606
be in the group
of
survey
respondents
2.143
2
.343 who thought we spend too
little money on childcare assistance (DV category 1), rather than the
334.636
2
.028 we spend too much
group
of survey 7.173
respondents who
thought
money
on childcare
3).
338.985
11.521assistance2 (DV category
.003

The chi-square statistic is the difference in -2 log-likelihoods


Parameter
Estimates
respondents
whoThe
were
male
were 88.5%
less likely (0.115
between the finalSurvey
model and
a reduced model.
reduced
model
1.0
=
-0.885)
to
be
in
the
group
of
survey
respondents
who thought
is formed by omitting an effect from the final model. The null
we spend too little money on childcare assistance.
hypothesis is that all parameters of that effect are 0.
a.
a
NATCHLD
B
Std. Error
Wald
df
Sig.
Exp(B)
This reducedIntercept
model is equivalent
to the final2.233
model because
TOO LITTLE
8.434
14.261
1
.000
omitting the effect does not increase the degrees of freedom.
AGE
-.023
.017
1.756
1
.185
.977
EDUC
-.066
.102
.414
1
.520
.936
POLVIEWS
-.575
.251
5.234
1
.022
.563
[SEX=1]
-2.167
.805
7.242
1
.007
.115
b
[SEX=2]
0
.
.
0
.
.
ABOUT RIGHT Intercept
4.485
2.255
3.955
1
.047
AGE
-.001
.018
.003
1
.955
.999
EDUC
.011
.104
.011
1
.916
1.011
POLVIEWS
-.397
.257
2.375
1
.123
.673
[SEX=1]
-1.606
.824
3.800
1
.051
.201
b
[SEX=2]
0
.
.
0
.
.
a. The reference category is: TOO MUCH.

95% Confidence Interval


Exp(B)
Lower Bound
Upper Bo
.944
.766
.344
.024
.

1.
1.
.
.

.965
.824
.406
.040
.

1.
1.
1.
1.

ters II

Interpreting relationships for independent


variable in problems

Slide
23

In the multinomial logistic regression problems, the problem


statement will ask about only one of the independent variables.
The answer will be true or false based on only the relationship
between the specified independent variable and the dependent
variable. The individual relationships between other
independent variables are the dependent variable are not used
in determining whether or not the answer is true or false.

ters II
Slide
24

Problem 1
11. In the dataset GSS2000, is the following statement true, false, or an incorrect application of
a statistic? Assume that there is no problem with missing data, outliers, or influential cases, and
that the validation analysis will confirm the generalizability of the results. Use a level of
significance of 0.05 for evaluating the statistical relationships.
The variables "age" [age], "highest year of school completed" [educ] and "confidence in Congress"
[conlegis] were useful predictors for distinguishing between groups based on responses to
"opinion about spending on highways and bridges" [natroad]. These predictors differentiate
survey respondents who thought we spend too little money on highways and bridges from survey
respondents who thought we spend too much money on highways and bridges and survey
respondents who thought we spend about the right amount of money on highways and bridges
from survey respondents who thought we spend too much money on highways and bridges.
Among this set of predictors, confidence in Congress was helpful in distinguishing among the
groups defined by responses to opinion about spending on highways and bridges. Survey
respondents who had less confidence in congress were less likely to be in the group of survey
respondents who thought we spend too little money on highways and bridges, rather than the
group of survey respondents who thought we spend too much money on highways and bridges.
For each unit increase in confidence in Congress, the odds of being in the group of survey
respondents who thought we spend too little money on highways and bridges decreased by
74.7%. Survey respondents who had less confidence in congress were less likely to be in the
group of survey respondents who thought we spend about the right amount of money on
highways and bridges, rather than the group of survey respondents who thought we spend too
much money on highways and bridges. For each unit increase in confidence in Congress, the odds
of being in the group of survey respondents who thought we spend about the right amount of
money on highways and bridges decreased by 80.9%.
1.
2.
3.
4.

True
True with caution
False
Inappropriate application of a statistic

ters II
Slide
25

Dissecting problem 1 - 1
11. In the dataset GSS2000, is the following statement true, false, or an incorrect application of a
statistic? Assume that there is no problem with missing data, outliers, or influential cases, and
that the validation analysis will confirm the generalizability of the results. Use a level of
significance of 0.05 for evaluating the statistical relationships.
The variables "age" [age], "highest year of school completed" [educ] and "confidence in Congress"
[conlegis] were useful predictors for distinguishing between groups based on responses to "opinion
about spending on highways and bridges" [natroad]. These predictors differentiate survey
respondents who thought we spend tooFor
little
these
money
problems,
on highways
we willand bridges from survey
respondents who thought we spend tooassume
much money
on
highways
and bridges and survey
that there is no problem
respondents who thought we spend about
the
right
amount
of
money
with missing data, outliers, or on highways and bridges
from survey respondents who thought we
spend too
much
money
on highways and bridges.
influential
cases,
and
that the

validation
analysis
confirm
Among this set of predictors, confidence
in Congress
was will
helpful
in distinguishing among the
the
generalizability
of
the
groups defined by responses to opinion about spending on highways and bridges. Survey
respondents who had less confidence inresults
congress were less likely to be in the group of survey
respondents who thought we spend too little money on highways and bridges, rather than the
group of survey respondents who thought
we spend
toowe
much
In this
problem,
aremoney
told to on highways and bridges. For
each unit increase in confidence in Congress,
theasodds
offor
being
use 0.05
alpha
the in the group of survey respondents
who thought we spend too little moneymultinomial
on highwayslogistic
and bridges
decreased by 74.7%. Survey
regression.
respondents who had less confidence in congress were less likely to be in the group of survey
respondents who thought we spend about the right amount of money on highways and bridges,
rather than the group of survey respondents who thought we spend too much money on highways
and bridges. For each unit increase in confidence in Congress, the odds of being in the group of
survey respondents who thought we spend about the right amount of money on highways and
bridges decreased by 80.9%.

1.
2.
3.
4.

True
True with caution
False
Inappropriate application of a statistic

ters II
Slide
26

Dissecting problem 1 - 2
The variables listed first in the problem
statement are the independent variables
(IVs): "age" [age], "highest year of school
[educ]
and "confidence
in statement true, false, or an incorrect application of
11. Incompleted"
the dataset
GSS2000,
is the following
a statistic?
Assume
that there is no problem with missing data, outliers, or influential cases, and
Congress"
[conlegis].

that the validation analysis will confirm the generalizability of the results. Use a level of
significance of 0.05 for evaluating the statistical relationships.

The variables "age" [age], "highest year of school completed" [educ] and "confidence in
Congress" [conlegis] were useful predictors for distinguishing between groups based on
responses to "opinion about spending on highways and bridges" [natroad]. These predictors
differentiate survey respondents who thought we spend too little money on highways and
bridges from survey respondents who thought we spend too much money on highways and
bridges and survey respondents who thought we spend about the right amount of money on
variable
used
to define
highways andThe
bridges
from
survey
respondents who thought we spend too much money on
highways andgroups
bridges.
is the dependent
variable (DV): "opinion about

Among this set of predictors, confidence in Congress was helpful in distinguishing among the
spending on highways and
groups defined
by responses to opinion about spending on highways and bridges. Survey
bridges"
respondents who had [natroad].
less confidence in congress were less likely to be in the group of survey
respondents who thought we spend too little money on highways and bridges, rather than the
group of survey respondents who thought we spend too much money on highways and bridges.
For each unit increase in confidence in Congress, the odds of being in the group of survey
only supports
directdecreased
or
respondents who thought we spend too little money SPSS
on highways
and bridges
by
simultaneous
entry
of
independent
74.7%. Survey respondents who had less confidence in congress were less likely to be in the
variables
multinomial
logistic
group of survey respondents who thought we spend about
the in
right
amount of
money on
regression,
so
we
have
no
choice
of
highways and bridges, rather than the group of survey respondents who thought
we spend
too
variables.
much money on highways and bridges. For each unit method
increasefor
in entering
confidence
in Congress, the odds
of being in the group of survey respondents who thought we spend about the right amount of
money on highways and bridges decreased by 80.9%.

ters II
Slide
27

Dissecting problem 1 - 3
SPSS multinomial logistic regression models the relationship by
comparing each of the groups defined by the dependent variable to the
group with the highest code value.

11. In the dataset


GSS2000,tois opinion
the following
true,
false, orand
an bridges
incorrectwere:
application of a
The responses
about statement
spending on
highways
statistic? Assume that there is no problem with missing data, outliers, or influential cases, and that
the validation
analysis will confirm the generalizability of the results. Use a level of significance of
1= Too little, 2 = About right, and 3 = Too much.
0.05 for evaluating the statistical relationships.
The variables "age" [age], "highest year of school completed" [educ] and "confidence in Congress"
[conlegis] were useful predictors for distinguishing between groups based on responses to "opinion
about spending on highways and bridges" [natroad]. These predictors differentiate survey
respondents who thought we spend too little money on highways and bridges from survey
respondents who thought we spend too much money on highways and bridges and survey
respondents who thought we spend about the right amount of money on highways and bridges
from survey respondents who thought we spend too much money on highways and bridges.
Among this set of predictors, confidence in Congress was helpful in distinguishing among the groups
defined by responses to opinion about spending on highways and bridges. Survey respondents who had
less confidence in congress were less likely to be in the group of survey respondents who thought we
spend too little money on highways and bridges, rather than the group of survey respondents who
thought we spend too much money on highways and bridges. For each unit increase in confidence in
Congress, the oddsThe
of being
in the
respondents who thought we spend too little money
analysis
willgroup
resultofinsurvey
two comparisons:
on highways and bridges
decreased
by
74.7%.
Survey
respondents
had less
confidence in congress
survey respondents who thought
we spend who
too little
money
were less likely to be versus
in the group
of
survey
respondents
who
thought
we
spend
about the right
survey respondents who thought we spend too much
amount of money on highways and bridges, rather than the group of survey respondents who thought
money
highways
and
bridges
we spend too much money
onon
highways
and
bridges.
For each unit increase in confidence in Congress,

survey
respondents
who
thought
wethought
spend about
the about
right the right amount of
the odds of being in the group of survey respondents
who
we spend
amount
of decreased
money versus
survey respondents who thought we
money on highways and
bridges
by 80.9%.
spend too much money on highways and bridges.

ters II
Slide
28

Dissecting problem 1 - 4

Each problem includes a statement about the relationship between


one independent variable and the dependent variable. The answer
to the problem is based on the stated relationship, ignoring the
independent
variables
and
the"confidence in
The variablesrelationships
"age" [age], between
"highest the
yearother
of school
completed"
[educ]
and
dependent
variable.
Congress" [conlegis]
were
useful predictors for distinguishing between groups based on

responses to "opinion about spending on highways and bridges" [natroad]. These predictors
problem
identifies
a difference
forspend
both of
the
comparisons
differentiate This
survey
respondents
who
thought we
too
little
money on highways and
groups modeled
the multinomial
regression.
bridges from among
survey respondents
whobythought
we spendlogistic
too much
money on highways and
bridges and survey respondents who thought we spend about the right amount of money on
highways and bridges from survey respondents who thought we spend too much money on
highways and bridges.
Among this set of predictors, confidence in Congress was helpful in distinguishing among the
groups defined by responses to opinion about spending on highways and bridges. Survey
respondents who had less confidence in congress were less likely to be in the group of
survey respondents who thought we spend too little money on highways and bridges, rather
than the group of survey respondents who thought we spend too much money on highways
and bridges. For each unit increase in confidence in Congress, the odds of being in the
group of survey respondents who thought we spend too little money on highways and
bridges decreased by 74.7%. Survey respondents who had less confidence in congress were
less likely to be in the group of survey respondents who thought we spend about the right
amount of money on highways and bridges, rather than the group of survey respondents
who thought we spend too much money on highways and bridges. For each unit increase in
confidence in Congress, the odds of being in the group of survey respondents who thought
we spend about the right amount of money on highways and bridges decreased by 80.9%.

ters II
Slide
29

Dissecting problem 1 - 5
11. In the dataset GSS2000, is the following statement true, false, or an incorrect application of a
statistic? Assume that there is no problem with missing data, outliers, or influential cases, and that the
validation analysis will confirm the generalizability of the results. Use a level of significance of 0.05 for
evaluating the statistical relationships.
The variables "age" [age], "highest year of school completed" [educ] and "confidence in Congress"
[conlegis] were useful predictors for distinguishing between groups based on responses to "opinion about
spending on highways and bridges" [natroad]. These predictors differentiate survey respondents who
thought we spend too little money on highways and bridges from survey respondents who thought we
spend too much money on highways and bridges and survey respondents who thought we spend about the
right amount of money on highways and bridges from survey respondents who thought we spend too much
money on highways and bridges.
Among this set of predictors, confidence in Congress was helpful in distinguishing among the groups
defined by responses to opinion about spending on highways and bridges. Survey respondents who had less
confidence in congress were less likely to be in the group of survey respondents who thought we spend too
little money on highways and bridges, rather than the group of survey respondents who thought we spend
too much money on highways and bridges. For each unit increase in confidence in Congress, the odds of
being in the group of survey respondents who thought we spend too little money on highways and bridges
decreased by 74.7%. Survey respondents who had less confidence in congress were less likely to be in the
group of survey respondents who thought we
about
right amount
of money
on highways and
In spend
order for
thethe
multinomial
logistic
regression
bridges, rather than the group of survey respondents who thought we spend too much money on highways
question
be true,the
theodds
overall
relationship
must
and bridges. For each unit increase in confidence
in to
Congress,
of being
in the group
of survey
be statistically
there
must and
be no
respondents who thought we spend about the
right amountsignificant,
of money on
highways
bridges decreased
by 80.9%.
evidence of numerical problems, the classification
accuracy rate must be substantially better than
could be obtained by chance alone, and the
stated individual relationship must be statistically
significant and interpreted correctly.

ters II
Slide
30

Request multinomial logistic regression

Select the Regression |


Multinomial Logistic
command from the
Analyze menu.

ters II
Slide
31

Selecting the dependent variable

First, highlight the


dependent variable
natroad in the list
of variables.

Second, click on the right


arrow button to move the
dependent variable to the
Dependent text box.

ters II
Slide
32

Selecting metric independent variables


Metric independent variables are specified as covariates
in multinomial logistic regression. Metric variables can
be either interval or, by convention, ordinal.

Move the metric


independent variables,
age, educ and conlegis to
the Covariate(s) list box.

In this analysis, there are no nonmetric independent variables. Nonmetric independent variables would be
moved to the Factor(s) list box.

ters II
Slide
33

Specifying statistics to include in the output

While we will accept most of


the SPSS defaults for the
analysis, we need to specifically
request the classification table.
Click on the Statistics button
to make a request.

ters II
Slide
34

Requesting the classification table

First, keep the SPSS


defaults for Summary
statistics, Likelihood
ratio test, and
Parameter estimates.

Second, mark the


checkbox for the
Classification table.

Third, click
on the
Continue
button to
complete the
request.

ters II
Slide
35

Completing the multinomial


logistic regression request

Click on the OK
button to request
the output for the
multinomial logistic
regression.

The multinomial logistic procedure supports


additional commands to specify the model
computed for the relationships (we will use the
default main effects model), additional
specifications for computing the regression,
and saving classification results. We will not
make use of these options.

ters II
Slide
36

LEVEL OF MEASUREMENT - 1
11. In the dataset GSS2000, is the following statement true, false, or an incorrect application of
a statistic? Assume that there is no problem with missing data, outliers, or influential cases, and
that the validation analysis will confirm the generalizability of the results. Use a level of
significance of 0.05 for evaluating the statistical relationships.
The variables "age" [age], "highest year of school completed" [educ] and "confidence in Congress"
[conlegis] were useful predictors for distinguishing between groups based on responses to
"opinion about spending on highways and bridges" [natroad]. These predictors differentiate
survey respondents who thought we spend too little money on highways and bridges from survey
respondents who thought we spend too much money on highways and bridges and survey
respondents who thought we spend about the right amount of money on highways and bridges
from survey respondents who thought we spend too much money on highways and bridges.
Among this set of predictors, confidence in Congress was helpful in distinguishing among the
groups defined by responses to opinion about spending on highways and bridges. Survey
respondents who had less confidence in congress were less likely to be in the group of survey
respondents who thought we spend too little money on highways and bridges, rather than the
Multinomial
logistic
regression
requires
that money
the
group of survey respondents
who
thought
we spend
too much
on highways and bridges.
dependent variable be non-metric and the
For each unit increase
in confidence
in Congress,
of being in the group of survey
independent
variables
be metricthe
or odds
dichotomous.
respondents who thought we spend too little money on highways and bridges decreased by
74.7%. Survey respondents
hadspending
less confidence
in congress
were less likely to be in the
"Opinionwho
about
on highways
and
group of survey respondents
who thought
we spend
about the
the nonright amount of money on
bridges" [natroad]
is ordinal,
satisfying
highways and bridges,
rather
than
the group of survey
respondents
metric
level
of measurement
requirement
for the who thought we spend too
much money on highways
and variable.
bridges. For each unit increase in confidence in Congress, the odds
dependent
of being in the group of survey respondents who thought we spend about the right amount of
contains
three
categories:
survey respondents
money on highwaysItand
bridges
decreased
by 80.9%.
who thought we spend too little money, about

1. True
the right amount of money, and too much money
on highways and bridges.
2. True with caution

ters II
Slide
37

LEVEL OF MEASUREMENT - 2
"Age" [age] and "highest year of
school completed" [educ] are interval,
the metric
or dichotomous
11. satisfying
In the dataset
GSS2000,
is the following statement true, false, or an incorrect application of a
level of
measurement
requirement
for with missing data, outliers, or influential cases, and that
statistic?
Assume
that there
is no problem
variables.
the independent
validation analysis
will confirm the generalizability of the results. Use a level of significance of

0.05 for evaluating the statistical relationships.

The variables "age" [age], "highest year of school completed" [educ] and "confidence in
Congress" [conlegis] were useful predictors for distinguishing between groups based on responses
to "opinion about spending on highways and bridges" [natroad]. These predictors differentiate
survey respondents who thought we spend too little money on highways and bridges from survey
respondents who thought we spend too much money on highways and bridges and survey
respondents who thought we spend about the right amount of money on highways and bridges from
survey respondents who "Confidence
thought we spend
too much
money on
in Congress"
[conlegis]
is highways
ordinal, and bridges.

satisfying the metric or dichotomous level of


measurement
requirement
independent
Among this set of predictors,
confidence
in Congressfor
was
helpful in distinguishing among the groups
variables.
If
we
follow
the
convention
ofbridges.
treating Survey respondents who
defined by responses to opinion about spending on highways and
ordinal were
level variables
variables,
the
level respondents who
had less confidence in congress
less likelyas
tometric
be in the
group of
survey
of
measurement
requirement
for
the
analysis
is
thought we spend too little money on highways and bridges, rather than the group of survey
some money
data analysts
do notand
agree
respondents who thoughtsatisfied.
we spendSince
too much
on highways
bridges. For each unit
with this convention, a note of caution should be
increase in confidence inincluded
Congress,
the
odds
of
being
in
the
group
of
survey
respondents who
in our interpretation.

thought we spend too little money on highways and bridges decreased by 74.7%. Survey respondents
who had less confidence in congress were less likely to be in the group of survey respondents who
thought we spend about the right amount of money on highways and bridges, rather than the group
of survey respondents who thought we spend too much money on highways and bridges. For each
unit increase in confidence in Congress, the odds of being in the group of survey respondents who
thought we spend about the right amount of money on highways and bridges decreased by 80.9%.

ters II
Slide
38

Sample size ratio of cases to variables


Case Processing Summary
N
HIGHWAYS
AND BRIDGES
Valid
Missing
Total
Subpopulation

1
2
3

62
93
12
167
103
270
153a

Marginal
Percentage
37.1%
55.7%
7.2%
100.0%

a. The dependent variable has only one value observed

Multinomial logistic
regression
requires that the minimum ratio
in 146
(95.4%) subpopulations.
of valid cases to independent variables be at least 10 to 1. The
ratio of valid cases (167) to number of independent variables
(3) was 55.7 to 1, which was equal to or greater than the
minimum ratio. The requirement for a minimum ratio of cases
to independent variables was satisfied.
The preferred ratio of valid cases to independent variables is
20 to 1. The ratio of 55.7 to 1 was equal to or greater than the
preferred ratio. The preferred ratio of cases to independent
variables was satisfied.

ters II
Slide
39

OVERALL RELATIONSHIP BETWEEN


INDEPENDENT AND DEPENDENT VARIABLES
Model Fitting Information
Model
Intercept Only
Final

-2 Log
Likelihood
284.429
265.972

Chi-Square
18.457

df

Sig.
6

.005

The presence of a relationship between the dependent


variable and combination of independent variables is
based on the statistical significance of the final model
chi-square in the SPSS table titled "Model Fitting
Information".
In this analysis, the probability of the model chi-square
(18.457) was 0.005, less than or equal to the level of
significance of 0.05. The null hypothesis that there was
no difference between the model without independent
variables and the model with independent variables was
rejected. The existence of a relationship between the
independent variables and the dependent variable was
supported.

ters II
Slide
40

NUMERICAL PROBLEMS
Parameter Estimates

HIGHWAYS
a
AND BRIDGES
1

Intercept
AGE
EDUC
CONLEGIS
Intercept
AGE
EDUC
CONLEGIS

a. The reference category is: 3.

B
3.240
.019
.071
-1.373
3.639
.003
.172
-1.657

Std. Error
2.478
.020
.108
.620
2.456
.020
.110
.613

Wald
1.709
.906
.427
4.913
2.195
.017
2.463
7.298

95% Confidence Inter


Exp(B)
Multicollinearity in the multinomial
df
Sig.
Exp(B)
logistic
regression
solution
is Lower Bound Upper
1
.191
detected by examining the standard
1.019 A
.980
errors1for the .341
b coefficients.
standard
error
larger
than
2.0
1
.514
1.073
.868
indicates numerical problems, such
1
.027 among
.253the
.075
as multicollinearity
1
.138
independent
variables,
zero cells for
a dummy-coded
independent
1
.897
1.003
.963
variable because all of the subjects
1
.958
have the
same.117
value for1.188
the
1 and 'complete
.007
.191
.057
variable,
separation'

whereby the two groups in the


dependent event variable can be
perfectly separated by scores on
one of the independent variables.
Analyses that indicate numerical
problems should not be interpreted.

None of the independent variables in


this analysis had a standard error
larger than 2.0. (We are not
interested in the standard errors
associated with the intercept.)

ters II
Slide
41

RELATIONSHIP OF INDIVIDUAL INDEPENDENT


VARIABLES TO DEPENDENT VARIABLE - 1
Likelihood Ratio Tests

Effect
Intercept
AGE
EDUC
CONLEGIS

-2 Log
Likelihood of
Reduced
Model
268.323
268.625
270.395
275.194

Chi-Square
2.350
2.652
4.423
9.221

df
2
2
2
2

Sig.
.309
.265
.110
.010

The chi-square statistic is the difference in -2 log-likelihoods


between the final model and a reduced model. The reduced model is
formed by omitting an effect from the final model. The null hypothesis
is that all parameters of that effect are 0.

The statistical significance of the relationship between


confidence in Congress and opinion about spending on
highways and bridges is based on the statistical significance of
the chi-square statistic in the SPSS table titled "Likelihood
Ratio Tests".
For this relationship, the probability of the chi-square statistic
(9.221) was 0.010, less than or equal to the level of
significance of 0.05. The null hypothesis that all of the b
coefficients associated with confidence in Congress were equal
to zero was rejected. The existence of a relationship between
confidence in Congress and opinion about spending on
highways and bridges was supported.

ters II

RELATIONSHIP OF INDIVIDUAL INDEPENDENT


VARIABLES TO DEPENDENT VARIABLE - 2

Slide
42

Parameter Estimates

HIGHWAYS
a
AND BRIDGES
1

Intercept
AGE
EDUC
CONLEGIS
Intercept
AGE
EDUC
CONLEGIS

B
3.240
.019
.071
-1.373
3.639
.003
.172
-1.657

Std. Error
2.478
.020
.108
.620
2.456
.020
.110
.613

Wald
1.709
.906
.427
4.913
2.195
.017
2.463
7.298

df
1
1
1
1
1
1
1
1

Sig.
.191
.341
.514
.027
.138
.897
.117
.007

a. The reference category is: 3.

In the comparison of survey respondents who thought we spend


too little money on highways and bridges to survey respondents
who thought we spend too much money on highways and
bridges, the probability of the Wald statistic (4.913) for the
variable confidence in Congress [conlegis] was 0.027. Since the
probability was less than or equal to the level of significance of
0.05, the null hypothesis that the b coefficient for confidence in
Congress was equal to zero for this comparison was rejected.

Exp(B)

95% Confiden
Exp
Lower Bound

1.019
1.073
.253

.980
.868
.075

1.003
1.188
.191

.963
.958
.057

ters II

RELATIONSHIP OF INDIVIDUAL INDEPENDENT


VARIABLES TO DEPENDENT VARIABLE - 3

Slide
43

Parameter Estimates

HIGHWAYS
a
AND BRIDGES
1

Intercept
AGE
EDUC
CONLEGIS
Intercept
AGE
EDUC
CONLEGIS

B
3.240
.019
.071
-1.373
3.639
.003
.172
-1.657

Std. Error
2.478
.020
.108
.620
2.456
.020
.110
.613

Wald
1.709
.906
.427
4.913
2.195
.017
2.463
7.298

df
1
1
1
1
1
1
1
1

Sig.
.191
.341
.514
.027
.138
.897
.117
.007

a. The reference category is: 3.

The value of Exp(B) was 0.253 which implies that for each unit
increase in confidence in Congress the odds decreased by 74.7%
(0.253 - 1.0 = -0.747).
The relationship stated in the problem is supported. Survey
respondents who had less confidence in congress were less likely
to be in the group of survey respondents who thought we spend
too little money on highways and bridges, rather than the group of
survey respondents who thought we spend too much money on
highways and bridges. For each unit increase in confidence in
Congress, the odds of being in the group of survey respondents
who thought we spend too little money on highways and bridges
decreased by 74.7%.

Exp(B)

95% Confiden
Exp
Lower Bound

1.019
1.073
.253

.980
.868
.075

1.003
1.188
.191

.963
.958
.057

ters II

RELATIONSHIP OF INDIVIDUAL INDEPENDENT


VARIABLES TO DEPENDENT VARIABLE - 4

Slide
44

Parameter Estimates

HIGHWAYS
a
AND BRIDGES
1

Intercept
AGE
EDUC
CONLEGIS
Intercept
AGE
EDUC
CONLEGIS

B
3.240
.019
.071
-1.373
3.639
.003
.172
-1.657

Std. Error
2.478
.020
.108
.620
2.456
.020
.110
.613

Wald
1.709
.906
.427
4.913
2.195
.017
2.463
7.298

df
1
1
1
1
1
1
1
1

Sig.
.191
.341
.514
.027
.138
.897
.117
.007

a. The reference category is: 3.

In the comparison of survey respondents who thought we spend


about the right amount of money on highways and bridges to
survey respondents who thought we spend too much money on
highways and bridges, the probability of the Wald statistic
(7.298) for the variable confidence in Congress [conlegis] was
0.007. Since the probability was less than or equal to the level
of significance of 0.05, the null hypothesis that the b coefficient
for confidence in Congress was equal to zero for this comparison
was rejected.

Exp(B)

95% Confiden
Exp
Lower Bound

1.019
1.073
.253

.980
.868
.075

1.003
1.188
.191

.963
.958
.057

ters II
Slide
45

RELATIONSHIP OF INDIVIDUAL INDEPENDENT


VARIABLES TO DEPENDENT VARIABLE - 5
Parameter Estimates

95% Con
HIGHWAYS
a
AND BRIDGES
1

Intercept
AGE
EDUC
CONLEGIS
Intercept
AGE
EDUC
CONLEGIS

B
3.240
.019
.071
-1.373
3.639
.003
.172
-1.657

Std. Error
2.478
.020
.108
.620
2.456
.020
.110
.613

Wald
1.709
.906
.427
4.913
2.195
.017
2.463
7.298

df
1
1
1
1
1
1
1
1

Sig.
.191
.341
.514
.027
.138
.897
.117
.007

a. The reference category is: 3.

The value of Exp(B) was 0.191 which implies that for each unit increase in
confidence in Congress the odds decreased by 80.9% (0.191-1.0=-0.809).
The relationship stated in the problem is supported. Survey respondents
who had less confidence in congress were less likely to be in the group of
survey respondents who thought we spend about the right amount of
money on highways and bridges, rather than the group of survey
respondents who thought we spend too much money on highways and
bridges. For each unit increase in confidence in Congress, the odds of
being in the group of survey respondents who thought we spend about the
right amount of money on highways and bridges decreased by 80.9%.

Exp(B)

Lower Bou

1.019
1.073
.253

.9
.8
.0

1.003
1.188
.191

.9
.9
.0

ters II
Slide
46

CLASSIFICATION USING THE MULTINOMIAL LOGISTIC


REGRESSION MODEL: BY CHANCE ACCURACY RATE
The independent variables could be characterized as useful
predictors distinguishing survey respondents who thought we
spend too little money on highways and bridges, survey
respondents who thought we spend about the right amount
of money on highways and bridges and survey respondents
who thought we spend too much money on highways and
bridges if the classification accuracy rate was substantially
higher than the accuracy attainable by chance alone.
Operationally, the classification accuracy rate should be 25%
or more higher than the proportional by chance accuracy
rate.

Case Processing Summary


N
HIGHWAYS
AND BRIDGES

1
2
3

Marginal
Percentage
37.1%
55.7%
7.2%
100.0%

62
93
12
Valid
167
Missing
103
Total
270
The proportional by chance accuracy rate
was computed by
Subpopulation
153agroup based on
calculating the proportion of cases for each

the number
of dependent
cases in variable
each group
inone
thevalue
'Case
Processing
a. The
has only
observed
Summary', and then squaring and summing the proportion of
in 146 (95.4%) subpopulations.
cases in each
group (0.371 + 0.557 + 0.072 = 0.453).

ters II
Slide
47

CLASSIFICATION USING THE MULTINOMIAL LOGISTIC


REGRESSION MODEL: CLASSIFICATION ACCURACY

Classification
Predicted
Observed
1
2
3
Overall Percentage

1
15
7
5
16.2%

2
47
86
7
83.8%

3
0
0
0
.0%

The classification accuracy rate was 60.5%


which was greater than or equal to the
proportional by chance accuracy criteria of
56.6% (1.25 x 45.3% = 56.6%).
The criteria for classification accuracy is
satisfied.

Percent
Correct
24.2%
92.5%
.0%
60.5%

ters II
Slide
48

Answering the question in problem 1 - 1


11. In the dataset GSS2000, is the following statement true, false, or an incorrect application of a
statistic? Assume that there is no problem with missing data, outliers, or influential cases, and that
the validation analysis will confirm the generalizability of the results. Use a level of significance of
0.05 for evaluating the statistical relationships.
The variables "age" [age], "highest year of school completed" [educ] and "confidence in Congress"
[conlegis] were useful predictors for distinguishing between groups based on responses to "opinion
about spending on highways and bridges" [natroad]. These predictors differentiate survey respondents
who thought we spend too little money on highways and bridges from survey respondents who
thought we spend too much money on highways and bridges and survey respondents who thought we
spend about the right amount of money on highways and bridges from survey respondents who
thought we spend too much money on highways and bridges.
Among this set of predictors, confidence in Congress was helpful in distinguishing among the groups
defined by responses to opinion about spending on highways and bridges. Survey respondents who had
less confidence in congress were less likely to be in the group of survey respondents who thought we
spend too little money on highways and
bridges, rather than the group of survey respondents who
We found a statistically significant overall
thought we spend too much money on highways and bridges. For each unit increase in confidence in
relationship
between
the combination
Congress, the odds of being in the group
of survey
respondents
who thoughtofwe spend too little
independent
variables
and
the dependent
money on highways and bridges decreased by 74.7%. Survey respondents
who had less confidence in
congress were less likely to be in thevariable.
group of survey respondents who thought we spend about the
right amount of money on highways and bridges, rather than the group of survey respondents who
thought we spend too much money on
highways
and
bridges.ofFor
each unitproblems
increase in
in confidence in
There
was no
evidence
numerical
Congress, the odds of being in the group
of survey respondents who thought we spend about the right
the solution.
amount of money on highways and bridges decreased by 80.9%.
1.
2.
3.
4.

True
True with caution
False
Inappropriate application of a

Moreover, the classification accuracy surpassed


the proportional by chance accuracy criteria,
supporting the utility of the model.

statistic

ters II
Slide
49

Answering the question in problem 1 - 2


We verified that each statement about the relationship
The variables "age" [age],
"highest year of school completed" [educ] and "confidence in
an independent variable and the dependent
Congress" [conlegis]between
were useful
predictors for distinguishing between groups based on
variable
was
correct
both direction
of the relationship
responses to "opinion about spending oninhighways
and bridges"
[natroad]. These predictors
the change
in likelihood
associated
with
a one-unit
differentiate surveyand
respondents
who
thought we
spend too
little
money on highways and
change
of
the
independent
variable,
for
both
of
the
bridges from survey respondents who thought we spend too much money on highways and
comparisons who
between
groups
the problem.
bridges and survey respondents
thought
we stated
spend in
about
the right amount of money on
highways and bridges from survey respondents who thought we spend too much money on
highways and bridges.

Among this set of predictors, confidence in Congress was helpful in distinguishing among the
groups defined by responses to opinion about spending on highways and bridges. Survey
respondents who had less confidence in congress were less likely to be in the group of survey
respondents who thought we spend too little money on highways and bridges, rather than the
group of survey respondents who thought we spend too much money on highways and bridges.
For each unit increase in confidence in Congress, the odds of being in the group of survey
respondents who thought we spend too little money on highways and bridges decreased by
74.7%. Survey respondents who had less confidence in congress were less likely to be in the
group of survey respondents who thought we spend about the right amount of money on
highways and bridges, rather than the group of survey respondents who thought we spend too
much money on highways and bridges. For each unit increase in confidence in Congress, the
odds of being in the group of survey respondents who thought we spend about the right amount
of money on highways and bridges decreased by 80.9%.
1.
2.
3.
4.

True
True with caution
False
Inappropriate application of a statistic

The answer to the question is true


with caution.
A caution is added because of the
inclusion of ordinal level variables.

ters II
Slide
50

Problem 2
1. In the dataset GSS2000, is the following statement true, false, or an incorrect application of
a statistic? Assume that there is no problem with missing data, outliers, or influential cases,
and that the validation analysis will confirm the generalizability of the results. Use a level of
significance of 0.05 for evaluating the statistical relationships.
The variables "highest year of school completed" [educ], "sex" [sex] and "total family income"
[income98] were useful predictors for distinguishing between groups based on responses to
"opinion about spending on space exploration" [natspac]. These predictors differentiate survey
respondents who thought we spend too little money on space exploration from survey
respondents who thought we spend too much money on space exploration and survey
respondents who thought we spend about the right amount of money on space exploration from
survey respondents who thought we spend too much money on space exploration.
Among this set of predictors, total family income was helpful in distinguishing among the
groups defined by responses to opinion about spending on space exploration. Survey
respondents who had higher total family incomes were more likely to be in the group of survey
respondents who thought we spend about the right amount of money on space exploration,
rather than the group of survey respondents who thought we spend too much money on space
exploration. For each unit increase in total family income, the odds of being in the group of
survey respondents who thought we spend about the right amount of money on space
exploration increased by 6.0%.
1.
2.
3.
4.

True
True with caution
False
Inappropriate application of a statistic

ters II
Slide
51

Dissecting problem 2 - 1
1. In the dataset GSS2000, is the following statement true, false, or an incorrect
application of a statistic? Assume that there is no problem with missing data, outliers, or
influential cases, and that the validation analysis will confirm the generalizability of the
results. Use a level of significance of 0.05 for evaluating the statistical relationships.
The variables "highest year of school completed" [educ], "sex" [sex] and "total family income"
[income98] were useful predictors for distinguishing between groups based on responses to
For these [natspac].
problems, we
will predictors differentiate survey
"opinion about spending on space exploration"
These
assume
that there
no problem
respondents who thought we spend too
little money
on is
space
exploration from survey
with
missing
data,
or
respondents who thought we spend too
much
money
on outliers,
space exploration
and survey
respondents who thought we spend about
the right
amount
of money
on space exploration from
influential
cases,
and that
the
survey respondents who thought we spend
too much
money
on space exploration.
validation
analysis
will confirm
the generalizability of the
results
Among this set of predictors, total family
income was helpful in distinguishing among the

groups defined by responses to opinion about spending on space exploration. Survey


In this
problem,
wemore
are told
to to be in the group of survey
respondents who had higher total family
incomes
were
likely
use
0.05
as
alpha
for
the
respondents who thought we spend about the right amount of money on space exploration,
multinomial
rather than the group of survey respondents
who logistic
thoughtregression.
we spend too much money on space
exploration. For each unit increase in total family income, the odds of being in the group of
survey respondents who thought we spend about the right amount of money on space
exploration increased by 6.0%.
1.
2.
3.
4.

True
True with caution
False
Inappropriate application of a statistic

ters II
Slide
52

Dissecting problem 2 - 2
The variables listed first in the problem
statement are the independent variables
(IVs):
"highest
year ofis school
completed"
1. In the
dataset
GSS2000,
the following
statement true, false, or an incorrect application of a
[educ],
"sex"
[sex]
and
familywith missing data, outliers, or influential cases, and that
statistic?
Assume
that
there
is "total
no problem
the validation
will confirm the generalizability of the results. Use a level of significance of
income" analysis
[income98].

0.05 for evaluating the statistical relationships.

The variables "highest year of school completed" [educ], "sex" [sex] and "total family income"
[income98] were useful predictors for distinguishing between groups based on responses to
"opinion about spending on space exploration" [natspac]. These predictors differentiate survey
respondents who thought we spend too little money on space exploration from survey respondents
who thought we spend too much money on space exploration and survey respondents who thought we
spend about the right amount of money on space exploration from survey respondents who thought
we spend too much money on space exploration.
The variable used to define
groups
is set
theof
dependent
Among
this
predictors, total family income was helpful in distinguishing among the groups
variable
"opinion
about about spending on space exploration. Survey respondents who had
defined
by (DV):
responses
to opinion
higher
total
family
incomes
were more likely to be in the group of survey respondents who thought
spending on space
weexploration"
spend about[natspac].
the right amount of money on space exploration, rather than the group of survey

respondents who thought we spend too much money on space exploration. For each unit increase in
total family income, the odds of being in the group of survey respondents who thought we spend
about the right amount of money on space exploration increased by 6.0%.
1.
2.
3.
4.

True
True with caution
False
Inappropriate application of a statistic

SPSS only supports direct or


simultaneous entry of independent
variables in multinomial logistic
regression, so we have no choice of
method for entering variables.

ters II
Slide
53

Dissecting problem 2 - 3
SPSS multinomial logistic regression models the relationship
by comparing each of the groups defined by the dependent
variable to the group with the highest code value.
1. In the dataset
is the
followingabout
statement
true, on
false,
an incorrect application of a statistic?
TheGSS2000,
responses
to opinion
spending
theorspace
Assume that there
is
no
problem
with
missing
data,
outliers,
or
influential
cases, and that the validation
program were:
analysis will confirm the generalizability of the results. Use a level of significance of 0.05 for evaluating the
1= Too little, 2 = About right, and 3 = Too much.
statistical relationships.
The variables "highest year of school completed" [educ], "sex" [sex] and "total family income" [income98]
were useful predictors for distinguishing between groups based on responses to "opinion about spending on
space exploration" [natspac]. These predictors differentiate survey respondents who thought we spend
too little money on space exploration from survey respondents who thought we spend too much money
on space exploration and survey respondents who thought we spend about the right amount of money on
space exploration from survey respondents who thought we spend too much money on space
exploration.
Among this set of predictors, total family income was helpful in distinguishing among the groups defined by
responses to opinion about spending on space exploration. Survey respondents who had higher total family
incomes were more likely to be in the group of survey respondents who thought we spend about the right
amount of money on space exploration, rather than the group of survey respondents who thought we spend
analysis
will result
in two
too much money onThe
space
exploration.
For each
unitcomparisons:
increase in total family income, the odds of being in the
group of survey respondents
who
thought we who
spendthought
about the
amount
of money
on space exploration
survey
respondents
weright
spend
too little
money
increased by 6.0%.
versus survey respondents who thought we spend too much
money on space exploration
1. True
survey respondents who thought we spend about the right
2. True with cautionamount of money versus survey respondents who thought we
3. False
spend too much money on space exploration.

ters II
Slide
54

Dissecting problem 2 - 4
Each problem includes a statement about the
The variables
"highest year of school completed" [educ], "sex" [sex] and "total family income"
one independent variable and
[income98]relationship
were usefulbetween
predictors
for distinguishing between groups based on responses to
the
dependent
variable.
The answer [natspac].
to the
"opinion about spending on space exploration"
These predictors differentiate survey
is based
the stated
relationship,
respondentsproblem
who thought
weonspend
too little
money on space exploration from survey
the relationships
between
otheron space exploration and survey
respondentsignoring
who thought
we spend too
muchthe
money
independent
variables
and
the
dependent
respondents who thought we spend about the right variable.
amount of money on space exploration from
survey respondents who thought we spend too much money on space exploration.

Among this set of predictors, total family income was helpful in distinguishing among the
groups defined by responses to opinion about spending on space exploration. Survey
respondents who had higher total family incomes were more likely to be in the group of
survey respondents who thought we spend about the right amount of money on space
exploration, rather than the group of survey respondents who thought we spend too much
money on space exploration. For each unit increase in total family income, the odds of
being in the group of survey respondents who thought we spend about the right amount of
money on space exploration increased by 6.0%.
1.
2.
3.
4.

True
This problem identifies a difference for only one
True with caution
of the two comparisons based on the three values
False
of the dependent variable.
Inappropriate application of a statistic
Other problems will specify both of the possible
comparisons.

ters II
Slide
55

Dissecting problem 2 - 5
The variables "highest year of school completed" [educ], "sex" [sex] and "total family income"
[income98] were useful predictors for distinguishing between groups based on responses to
"opinion about spending on space exploration" [natspac]. These predictors differentiate survey
respondents who thought we spend too little money on space exploration from survey
respondents who thought we spend too much money on space exploration and survey
respondents who thought we spend about the right amount of money on space exploration from
survey respondents who thought we spend too much money on space exploration.
Among this set of predictors, total family income was helpful in distinguishing among the
groups defined by responses to opinion about spending on space exploration. Survey
respondents who had higher total family incomes were more likely to be in the group of survey
respondents who thought we spend about the right amount of money on space exploration,
rather than the group of survey respondents who thought we spend too much money on space
exploration. For each unit increase in total family income, the odds of being in the group of
survey respondents who thought we spend about the right amount of money on space
exploration increased by 6.0%.
1.
2.
3.
4.

In order for the multinomial logistic regression


True
question to be true, the overall relationship must
True with caution
be statistically significant, there must be no
False
evidence of numerical problems, the classification
Inappropriate application of a statistic
accuracy rate must be substantially better than
could be obtained by chance alone, and the
stated individual relationship must be statistically
significant and interpreted correctly.

ters II
Slide
56

LEVEL OF MEASUREMENT - 1
1. In the dataset GSS2000, is the following statement true, false, or an incorrect application of
a statistic? Assume that there is no problem with missing data, outliers, or influential cases,
and that the validation analysis will confirm the generalizability of the results. Use a level of
significance of 0.05 for evaluating the statistical relationships.
The variables "highest year of school completed" [educ], "sex" [sex] and "total family income"
[income98] were useful predictors for distinguishing between groups based on responses to
"opinion about spending on space exploration" [natspac]. These predictors differentiate
survey respondents who thought we spend too little money on space exploration from
survey respondents who thought we spend too much money on space exploration and
survey respondents who thought we spend about the right amount of money on space
exploration from survey respondents who thought we spend too much money on space
exploration.
Among this set of predictors, total family income was helpful in distinguishing among the
Multinomial
requires
theexploration. Survey
groups defined by responses
tologistic
opinionregression
about spending
onthat
space
dependent variable be non-metric and the
respondents who had
higher total
family be
incomes
more likely to be in the group of survey
independent
variables
metricwere
or dichotomous.
respondents who thought we spend about the right amount of money on space exploration,
rather than the group
of survey
respondents
thought
we spend too much money on space
"Opinion
about
spending onwho
space
exploration"
exploration. For each
unit increase
in total
family
income,
the odds of being in the group of
[natspac]
is ordinal,
satisfying
the
non-metric
survey respondentslevel
who of
thought
we spend
about the for
right
measurement
requirement
theamount of money on space
dependent
exploration increased
by 6.0%.variable.
1.
2.
3.
4.

It contains three categories: survey respondents

True
who thought we spend too little money, about
the right amount of money, and too much money
True with caution
on space exploration.
False
Inappropriate application of a statistic

ters II
Slide
57

LEVEL OF MEASUREMENT - 2
"Highest year of school
"Sex" [sex] is dichotomous,
completed" [educ] is interval,
satisfying the metric or
satisfying the metric or
dichotomous level of measurement
dichotomous
level ofGSS2000, is the following statement
1. In the dataset
true, false, or an incorrect application of a
requirement for independent
measurement
requirement
for is no problem with missing
statistic? Assume
that there
data, outliers, or influential cases, and
variables.
independent
variables.analysis will confirm the generalizability
that the validation
of the results. Use a level of

significance of 0.05 for evaluating the statistical relationships.

The variables "highest year of school completed" [educ], "sex" [sex] and "total family income"
[income98] were useful predictors for distinguishing between groups based on responses to
"opinion about spending on space exploration" [natspac]. These predictors differentiate survey
respondents who thought we spend too little money on space exploration from survey
respondents who thought we spend too much money on space exploration and survey
respondents who thought we spend about the right amount of money on space exploration from
survey respondents
who family
thought
we spend
too much
"Total
income"
[income98]
is money
ordinal,on space exploration.
satisfying the metric or dichotomous level of
requirement for independent
Among this set of measurement
predictors,Iftotal
familythe
income
was helpful
in distinguishing among the groups
variables.
we follow
convention
of treating
defined by responses
to
opinion
about
spending
on
space
exploration.
ordinal level variables as metric variables, the level Survey respondents who
had higher total family
incomes were
more likely
beanalysis
in the group
of survey respondents who
of measurement
requirement
forto
the
is
thought we spend satisfied.
about theSince
rightsome
amount
of
money
on
space
exploration,
rather than the group
data analysts do not agree
of survey respondents
who
thought
we
spend
about
the
right
amount
of
money
on space
with this convention, a note of caution should be
exploration. For each
unit in
increase
in total family income, the odds of being in the group of
included
our interpretation.

survey respondents who thought we spend about the right amount of money on space
exploration increased by 6.0%.
1. True
2. True with caution

ters II
Slide
58

Request multinomial logistic regression

Select the Regression |


Multinomial Logistic
command from the
Analyze menu.

ters II
Slide
59

Selecting the dependent variable

First, highlight the


dependent variable
natspac in the list
of variables.

Second, click on the right


arrow button to move the
dependent variable to the
Dependent text box.

ters II
Slide
60

Selecting non-metric independent variables


Non-metric independent variables are specified as
factors in multinomial logistic regression. Non-metric
variables can be either dichotomous, nominal, or ordinal.
These variables will be dummy coded as needed and
each value will be listed separately in the output.

Select the
dichotomous
variable sex.

Move the non-metric


independent variables
listed in the problem to
the Factor(s) list box.

ters II
Slide
61

Selecting metric independent variables


Metric independent variables are specified as covariates
in multinomial logistic regression. Metric variables can
be either interval or, by convention, ordinal.

Move the metric


independent variables,
educ and income98, to
the Covariate(s) list box.

ters II
Slide
62

Specifying statistics to include in the output

While we will accept most of


the SPSS defaults for the
analysis, we need to specifically
request the classification table.
Click on the Statistics button
to make a request.

ters II
Slide
63

Requesting the classification table

First, keep the SPSS


defaults for Summary
statistics, Likelihood
ratio test, and
Parameter estimates.

Second, mark the


checkbox for the
Classification table.

Third, click
on the
Continue
button to
complete the
request.

ters II
Slide
64

Completing the multinomial


logistic regression request

Click on the OK
button to request
the output for the
multinomial logistic
regression.

The multinomial logistic procedure supports


additional commands to specify the model
computed for the relationships (we will use the
default main effects model), additional
specifications for computing the regression,
and saving classification results. We will not
make use of these options.

ters II
Slide
65

Sample size ratio of cases to variables


Case Processing Summary
N
SPACE EXPLORATION
PROGRAM
RESPONDENTS SEX
Valid
Missing
Total
Subpopulation

1
2
3
1
2

33
90
85
94
114
208
62
270
138a

Marginal
Percentage
15.9%
43.3%
40.9%
45.2%
54.8%
100.0%

a. The dependent variable has only one value observed in 112

Multinomial
logistic
regression requires that the minimum ratio
(81.2%)
subpopulations.
of valid cases to independent variables be at least 10 to 1. The
ratio of valid cases (208) to number of independent
variables( 3) was 69.3 to 1, which was equal to or greater than
the minimum ratio. The requirement for a minimum ratio of
cases to independent variables was satisfied.
The preferred ratio of valid cases to independent variables is
20 to 1. The ratio of 69.3 to 1 was equal to or greater than the
preferred ratio. The preferred ratio of cases to independent
variables was satisfied.

ters II
Slide
66

OVERALL RELATIONSHIP BETWEEN


INDEPENDENT AND DEPENDENT VARIABLES
Model Fitting Information
Model
Intercept Only
Final

-2 Log
Likelihood
354.268
334.967

Chi-Square
19.301

df

Sig.
6

.004

The presence of a relationship between the dependent


variable and combination of independent variables is
based on the statistical significance of the final model
chi-square in the SPSS table titled "Model Fitting
Information".
In this analysis, the probability of the model chi-square
(19.301) was 0.004, less than or equal to the level of
significance of 0.05. The null hypothesis that there was
no difference between the model without independent
variables and the model with independent variables was
rejected. The existence of a relationship between the
independent variables and the dependent variable was
supported.

ters II
Slide
67

NUMERICAL PROBLEMS
Parameter Estimates

SPACE EXPLORATION
a
PROGRAM
1

Intercept
EDUC
INCOME98
[SEX=1]
[SEX=2]
Intercept
EDUC
INCOME98
[SEX=1]
[SEX=2]

B
Std. Error
-4.136
1.157
.101
.089
.097
.050
.672
.426
b
0
.
-2.487
.840
.108
.068
.058
.034
.501
.317
b
0
.

a. The reference category is: 3.


b. This parameter is set to zero because it is redundant.

Wald
12.779
1.276
3.701
2.488
.
8.774
2.521
2.932
2.492
.

df

95% Confidence
Exp(B)
Lower Bound
U

Sig.
Exp(B)
1
.000
Multicollinearity
in the multinomial
logistic regression
is
1
.259 solution
1.106
detected1 by examining
the
.054
1.102
standard errors for the b
1
.115
1.959
coefficients.
A standard
error
0
.
.
larger than
2.0 indicates
numerical
problems,
such
as
multicollinearity
1
.003
among the independent variables,
1 for a dummy-coded
.112
1.114
zero cells
1
.087 because
1.060 all of
independent
variable
the subjects
have
the
same
value
1
.114
1.650
for the variable, and 'complete
0
.
.

separation' whereby the two


groups in the dependent event
variable can be perfectly separated
by scores on one of the
independent variables. Analyses
that indicate numerical problems
should not be interpreted.
None of the independent variables
in this analysis had a standard
error larger than 2.0.

.929
.998
.850
.
.975
.992
.886
.

ters II
Slide
68

RELATIONSHIP OF INDIVIDUAL INDEPENDENT


VARIABLES TO DEPENDENT VARIABLE - 1
Likelihood Ratio Tests

Effect
Intercept
EDUC
INCOME98
SEX

-2 Log
Likelihood of
Reduced
Model
334.967a
337.788
340.154
338.511

Chi-Square
.000
2.821
5.187
3.544

df

Sig.
0
2
2
2

.
.244
.075
.170

The chi-square statistic is the difference in -2 log-likelihoods


between the final model and a reduced model. The reduced model
is formed by omitting an effect from the final model. The null
hypothesis is that all parameters of that effect are 0.
a.
The statistical significance
of the relationship between
Thisopinion
reduced about
model isspending
equivalent to
final model because
total family income and
onthe
space
omitting
the effect does
not increaseofthe
degrees of freedom.
exploration is based on
the statistical
significance
the

chi-square statistic in the SPSS table titled "Likelihood


Ratio Tests".

For this relationship, the probability of the chi-square


statistic (5.187) was 0.075, greater than the level of
significance of 0.05. The null hypothesis that all of the b
coefficients associated with total family income were
equal to zero was not rejected. The existence of a
relationship between total family income and opinion
about spending on space exploration was not supported.

ters II
Slide
69

Answering the question in problem 2


1. In the dataset GSS2000, is the following statement true, false, or an incorrect application of
a statistic? Assume that there is no problem with missing data, outliers, or influential cases, and
that the validation analysis will confirm the generalizability of the results. Use a level of
significance of 0.05 for evaluating the statistical relationships.
The variables "highest year of school completed" [educ], "sex" [sex] and "total family income"
[income98] were useful predictors for distinguishing between groups based on responses to
"opinion about spending on space exploration" [natspac]. These predictors differentiate survey
respondents who thought we spend too little money on space exploration from survey
respondents who thought we spend too much money on space exploration and survey
respondents who thought we spend about the right amount of money on space exploration from
survey respondents who thought we
spend
much money
on spaceoverall
exploration.
We
foundtoo
a statistically
significant

relationship between the combination of


Among this set of predictors, total independent
family income
was helpful
in distinguishing
among the groups
variables
and the
dependent
defined by responses to opinion about
spending on space exploration. Survey respondents who
variable.

had higher total family incomes were more likely to be in the group of survey respondents who
thought we spend about the right amount
of money
on space
exploration,
ratherinthan the group
There was
no evidence
of numerical
problems
of survey respondents who thoughtthe
wesolution.
spend too much money on space exploration. For each
unit increase in total family income, the odds of being in the group of survey respondents who
thought we spend about the right amount
of money on space exploration increased by 6.0%.
However, the individual relationship between
1.
2.
3.
4.

total family income and spending on space was

True
not statistically significant.
True with caution
The answer to the question is false.
False
Inappropriate application of a statistic

ters II
Slide
70

Steps in multinomial logistic regression:


level of measurement and initial sample size
The following is a guide to the decision process for answering
problems about the basic relationships in multinomial logistic
regression:
Dependent non-metric?
Independent variables
metric or dichotomous?

No

Inappropriate
application of
a statistic

Yes

Ratio of cases to
independent variables at
least 10 to 1?

Yes
Run multinomial logistic regression

No

Inappropriate
application of
a statistic

ters II
Slide
71

Steps in multinomial logistic regression:


overall relationship and numerical problems

Overall relationship
statistically significant?
(model chi-square test)

No

False

Yes

Standard errors of
coefficients indicate no
numerical problems (s.e.
<= 2.0)?

Yes

No

False

ters II
Slide
72

Steps in multinomial logistic regression:


relationships between IV's and DV

Overall relationship
between specific IV and DV
is statistically significant?
(likelihood ratio test)

No

False

Yes

Role of specific IV and DV


groups statistically significant
and interpreted correctly?
(Wald test and Exp(B))

Yes

No

False

ters II
Slide
73

Steps in multinomial logistic regression:


classification accuracy and adding cautions

Overall accuracy rate is


25% > than proportional
by chance accuracy rate?

No

False

Yes

Satisfies preferred ratio of


cases to IV's of 20 to 1

No

True with caution

Yes
One or more IV's are
ordinal level treated as
metric?

No
True

Yes

True with caution

S-ar putea să vă placă și