Sunteți pe pagina 1din 33

Prepared by:

Assoc. Prof. Dr Bahaman Abu Samah


Department of Professional Development and Continuing Education
Faculty of Educational Studies
Universiti Putra Malaysia
Serdang

Logistic regression is an alternative to multiple linear regression


Used to predict outcome variable that is a categorical
dichotomy from a set of categorical or continuous predictor
variables
Used because with the categorical dichotomy outcome variable
violates the assumption of linearity in normal regression
Logistic regression emphasizes the probability of a particular
outcome for each case
Stat TEMPLATE

The outcome variable () is the probability of having one


outcome or another based on the best linear combination of
predictors using maximum-likelihood estimation
Probability of Y is calculated based on the following formula:
u
e
P (Y ) Y
u
1 e

where

Formula 1

p Probability
e the base of natural logarithms ( 2.718)
u b0 b1 X 1 b2 X 2 ....... b p X p

With one predictor variable, the formula will be:


b0 b1 X 1
e
P (Y ) Yi
1 eb0 b1 X 1

With multiple predictor variables (p), the formula will be:

P (Y ) Yi

where

b0 b1 X 1 b2 X 2 ...... b p X p

1 e

b0 b1 X 1 b2 X 2 ....... b p X p

p Probability
e the base of natural logarithms ( 2.718)
u b0 b1 X 1 b2 X 2 ....... b p X p

The resulting value from the above computing (probability)


ranges between 0 and 1
::
::

A value close to 0 means Y is very unlikely to occur


A value close to 1 means Y is very likely to occur

Example 1 Pass
0 Fail

1. Predict outcome variable based on from a set of


categorical or continuous predictor variables. Logistic
regression calculate probability of success over
probability of failure, the result is presented as an
odds ratio or likelihood ratio
2. Determine relationships among constructs

DV
IV

Dichotomous, assigned as 1 and 0


Continuous/categorical

Can outcome be predicted from a set of predictor


variables?
Which predictor variables predict the outcome?
How strong is the relationship between outcome and
the predictor variables?

Assessing Model Fit


Assessing the Predictor
Relationship between
Predictors - Outcome
Odds Ratio
Classification of Cases

Use the observed and predicted value of the outcome to assess the
fit of the model.
The statistic used to measure the fit of the model is called loglikelihood:
N

Log - likelihood Yi ln (Yi ) (1 Yi ) ln (1 Yi )


i 1

Formula 2

The log-likelihood is the summation of probabilities associated with


the predicted and actual outcomes
This log-likelihood statistic is comparable to residual sum of squares
(SSE) in multiple regression

Log-likelihood will be calculated for two different models (bigger and


smaller)
The two models are compared by computing the difference in their
log-likelihood using Chi-square (2)

2 LL( B) LL(0)
2

Formula 3

LL(B) is log-likelihood for the bigger model which includes all the
predictors
LL(0) is log-likelihood for the smaller model which includes only the
intercept

degrees of freedom (df) = kB k0 where k is number of parameters

Test the null hypothesis that HO: i = 0


Test the individual contribution of predictor variables using
Wald statistic
The Wald statistic is comparable to t-test in multiple regression
Wald statistic is the squared ratio of the unstandardized logistic
coefficient to its standard error.
b
Wald

SE (b)

The Wald statistic and its corresponding p probability level is


part of SPSS output in the "Variables in the Equation" table.

A number of statistics can be used as measures of association


between predictors and outcome
The measures include:

1. R-Statistic
2. Cox and Snell R2

3. Nagelkerke R2
4. Hosmer and Lemeshows R2

R-statistic is comparable to multiple correlation coefficient

Formula:

Wald (2 * df )

R
2 LL(0)

Formula 4

R-statistic ranges between -1 to +1

A positive value: as the predictor increases, likelihood of the


outcome occurring increases, vice versa

R2cs is comparable to R2 in multiple linear regression


The value is displayed in SPSS Logistic Regression
Formula:
2
CS

1 e

n ( LL ( B ) LL ( 0 ))

Formula 5

However the value of R2cs never reaches its theoretical


maximum of 1

Nagelkerke suggested for amendment to the earlier R2CS

The value is displayed in SPSS Logistic Regression


Formula:
2
RCS

R
2
N

1 e

2 ( LL ( 0 ))

Formula 6

Formula to calculate R2L

2 LL( B)
R
2 LL(0)
2
L

Formula 7

Odds ratio is an indicator of the change in odds (likelihood) resulting


from a unit change in the predictor
The odds ratio is the increase (or decrease if the ratio is less than 1) in
odds of being in one outcome category when the predictor increases
by one unit.
It is similar to b-coefficient but is easier to interpret (it does not involve
logarithmic transformation)
The odd of an event occurring are defined as the probability of an
event occurring divided by the probability of the event not occurring
P (event )
Odds
P (no event )

Formula 8

The coefficients (b) are the natural logs of the odds ratio, thus
odds ratio can be calculated using the following formula:

odds ratio e

Formula 9

Odds ratio indicates the change in odds resulting from a unit


change in the predictor
Odds ratio > 1
Predictor , Probability of outcome occurring
Odds ratio < 1
Predictor , Probability of outcome occurring

X is income (in RM1,000) to predict home ownership (1 = Yes & 0


=No)
if b = 1.25

odd ratio e

1.25

3.49
1 unit increase in income (in RM1,000) will increase the odd
(likelihood) of home ownership by 3.49 times

One method of assessing the success of a model is to evaluate its


ability to predict correctly the outcome

The cut-off value for classification is .50


A case is assigned to category 1 if the model predicts an outcome
probability of greater than .5

i.e. Y = 1 if > .5
Y = 0 if < .5
SPSS provides:
1. Percentage of correctly classify category 1
2. Percentage of correctly classify category 0
3. Overall percentage

1. Enter
All variables entered simultaneously
2. Sequential/Hierarchical
Variables entered in blocks
Blocks should be based on past research or theory being
tested
3. Stepwise
Variables entered on the basis of statistical criteria (relative
contribution to predict outcome)
Should be employed only for exploratory analysis

(From Tabachnick)

The following data set


include three variables:
1. FALL
0 - Not falling
1 - Falling
2. DIFFICULTY
Rated on 1 to 3 scale
3. SEASON
1 - autumn
2 - winter
3 - spring

Data set:
Fall
Difficulty
1
3
1
1
0
1
1
2
1
3
0
2
0
1
1
3
1
2
1
2
0
2
0
2
1
3
1
2
0
3

Season
1
1
3
3
2
2
2
1
3
1
2
3
2
2
1

Data: Logistic Regression Tabachnick SKI

e 1.776(1.010)( DIFF )( 0.928)( SEAS1)( 0.418)( SEAS 2)

Prob( Fall ) Yi
1 e 1.776(1.010)( DIFF )( 0.928)( SEAS1)( 0.418)( SEAS 2)
N

Log - likelihood Yi ln (Yi ) (1 Yi ) ln (1 Yi )


i 1

2 2 LL( B) LL(0)

Formula 1

Formula 2

Formula 3
Excel Computation

Excel Computation

Table 1: Logistic Regression Analysis of Falling on a Ski Run as a


Function of Difficulty of Run and Season
Variables

Wald Test

Odds ratio

Constant

-1.776

0.88

.347

.169

Difficulty

1.010

1.27

.259

2.747

Season(1)

.927

0.34

.560

2.527

Season(2)

-.418

0.09

.763

.658

Note: R2 = .165 (Cox & Snell), .227 (Nagelkerke)


Model 2 (3)= 2.710, p = .439
May want to also report CI for Odds ratio

(Adapted from Andy Field)

Variable

Label/Value

PERFORM

Performance in Subject
0 No
1 Yes
Interest in the Subject
0 No
1 Yes
Age in years

INTEREST

AGE

Data: Logistic Regression PERFORM

Table 2: Logistic Regression Analysis of Performance


as a Function of Interest and Age
Variables

Wald Test

Constant
Interest
Age
Note: R2 = ___ (Cox & Snell), ___ (Nagelkerke)
Model 2 (_)= _____, p = ___

Odds ratio

(From Tabachnick)

Variable

Label/Value

WorkStatus

Work status
1 Working
2 Housewives
Presence of children
0 No
1 Yes
Locus of control
Attitudes toward current marriage
Attitudes toward housework
Attitudes toward role of women
Age group
Years of education

Children

Control
AttMar
AttHouse
AttRole
Age
Educ

Data: Logistic Regression Tabachnick WORK STATUS

Table 3: Logistic Regression Analysis of Work Status as a Function of


Attitudinal Variables
Variables

Constant
Locus of control
Attitude towards marital status
Attitude towards role of women
Attitude towards housework
Note: R2 = ___ (Cox & Snell), ___ (Nagelkerke)
Model 2 (_)= _____, p = ___

Wald Test

Odds ratio

Table 3: Logistic Regression Analysis of Work Status as a Function of


Attitudinal Variables and Children
Variables

Constant
Presence of children
Locus of control

Attitude towards marital status


Attitude towards role of women
Attitude towards housework
Note: R2 = ___ (Cox & Snell), ___ (Nagelkerke)
Model 2 (_)= _____, p = ___

Wald Test

Odds ratio

S-ar putea să vă placă și