Sunteți pe pagina 1din 42

Logistic Regression & Survival Analysis

Analysis of binary outcome & time to event data

Larry Holmes, Jr Joabyer Hossain

Stats Research, Lecture 7

November 13, 2008

Presentation Objectives

At the end of this presentation, participants should be able to : Rationale for logistic regression, conduct and interpretation of result Survival analysis Measure Time and Events Understand Truncation and ensoring Understand Survival and !a"ard #unctions $efine ompeting Ris%s Understand Models and !ypothesis Testing &og ran% 'aplan( Meier survival curve ) estimates o* +roportional !a"ards Model ,semi(parametric model-

What is Logistic Regression?

&ogistic regression is often used

because the relationship bet.een the $/ ,a discrete variable- and a predictor is non(linear 0lood glucose level and diabetes mellitus !ypertension and &$& level

Logistic Regression
1n logistic regression: 2utcome variable is binary +urpose of the analysis is to assess the effects of multiple e*planatory variables, .hich can be numeric and3or categorical, on the outcome variable4

Requirements for Logistic Regression


The #ollo.ing need to be specified: 5- An outcome variable .ith t.o possible categorical outcomes ,56success7 86failure-4 9- Estimating the probability P of the outcome variable4 :- &in%ing the outcome variable to the e*planatory variables4 ;- Estimating the coefficients of the regression e<uation, as .ell as their confidence intervals4 =- Testing the goodness of fit of the regression model4

Measuring the Probability of Outcome The probability of the outcome is measured by the odds of occurrence of an event4 1f + is the probability of an event, then ,5(+is the probability of it not occurring4 2dds of success 6 + 3 5(+

P 5 P

T e lo!istic function

The logistic function


u ) e Yi = u 5+e

>here

?(hat is the estimated probability that the ith case is in a category and u is the regular linear regression e<uation:

u = A + B5 X 5 + B9 X 9 + L + BK X K

Logistic function
For a response variable y with p(y=1)= P and p(y=0) = 1- P
1.0 0.8

Probability of disease

0.6 0.4 0.2 0.0

Logistic regression will allow for the estimation of an equation that fits a curve the age/probability of CHD relationship A regression method to deal with the case when the dependent variable y is binary (dichotomous)

The logistic function hange in probability is not constant ,linear- .ith constant changes in @ This means that the probability of a success ,? 6 5- given the predictor variable ,@- is a non(linear function, specifically a logistic function

The logistic function 1t is not obvious ho. the regression coefficients for @ are related to changes in the dependent variable ,?- .hen the model is .ritten this .ay hange in ?,in probability units-A@ depends on value of @4 &oo% at S(shaped function

The Logistic Regression


The Boint effects of all e*planatory variables put together on the odds is 2dds 6 +35(+ 6 e + 1X1 + 2X2 + +pXp Ta%ing the logarithms of both sides &ogC+35(+D 6 log +1X1+2X2++pXp &ogit + 6 EFG5@5FG9@9F44FGp@p The coefficients G5, G9, Gp are such that the sums of the s<uared distance bet.een the observed and predicted values ,i4e4 regression line- are smallest4

The Logistic Regression


&ogit p 6 E F G5@5 FG9@9 F 44 F Gp@p E represents the overall disease ris% G5 represents the fraction by .hich the disease ris% is altered by a unit change in @5 G9 is the fraction by .hich the disease ris% is altered by a unit change in @9 HH4 and so on4 >hat changes is the log odds4 The odds themselves are changed by e

1f G 6 54I the odds are e54I 6 ;4J=

Logistic Regression-Demo
MS(E*cel: S+SS:

Ko default functions

Analy"e L Regression L 0inary &ogistic L Select

$ependent variable: L Select independent variable ,covariate-

Logistic Regression P
ependent !ariable "n#odin$ Original "alue $ % #nternal "alue $ % %ate$ori#al !ariables %odin$s Parameter co&ing 'requency ha&es % , *$ *$ (%) %+$$$ +$$$ %lassifi#ation &able(a'b) Pre&icte& !c te! $ Observe& !c Overall Percentage a -onstant is inclu&e& in the mo&el+ b The cut value is +.$$ !ariables in the "()ation / te! $ -onstant +$$$ +0+ +,.2 Wal& +$$$ &f % ig+ %+$$$ $ $ % $ $ % *$ *$

out!ut

Percentage -orrect +$ %$$+$ .$+$

01!(/) %+$$$

!ariables not in the "()ation core %3+$43 %3+$43 &f % % ig+ +$$$ +$$$

te! $

"ariables Overall tatistics

ha&es(%)

Logistic Regression P
*+nib)s &ests of ,odel %oeffi#ients -hi-square %3+52. %3+52. %3+52. ,odel -)++ary -, Log li6elihoo& 4.+%5*(a) -o1 7 nell R quare +,.5 8agel6er6e R quare +*9. &f % % % ig+ +$$$ +$$$ +$$$ te! % te! /loc6 Mo&el

out!ut

te! %

a 0stimation terminate& at iteration number 9 because !arameter estimates change& by less than +$$%+ %lassifi#ation &able(a) Pre&icte& !c te! % Observe& !c Overall Percentage a The cut value is +.$$ !ariables in the "()ation / -,+*35 +0+ +4%$ +9*, Wal& %.+%25 3+.59 &f % % ig+ +$$$ +$$4 01!(/) +$5* *+,24 $ $ % ,* 3 % 3 ,* Percentage -orrect 34+3 34+3 34+3

%+%5$ a "ariable(s) entere& on ste! %: ha&es+

te! %(a)

ha&es(%) -onstant

Regression vs+ urvival ;nalysis


Techni<ue &inear Regression &ogistic Regression Survival Analyses +redictor /ariables
ategorical or continuous

2utcome /ariable
Kormally distributed

ensoring permittedM
Ko Ko ?es

ategorical or 0inary ,e*cept in polytomous log4 continuous


regression-

Time and categorical or continuous

0inary

Regression vs+ urvival ;nalysis


Techni<ue &inear Regression &ogistic Regression Survival Analyses Mathematical model
?605@ F 0o ,linear&n,+35(+-605@F0o ,sigmoidal prob4-

?ields
&inear changes 2dds ratios !a"ard rates

h,t- 6 ho,t-e*p,05@F0o-

>hat is survival analysisM


"odel time to failure or time to event
# $nli%e linear re!ression, survival analysis as a dic otomous

&binary' outcome # $nli%e lo!istic re!ression, survival analysis analy(es t e time to an event

) y is t at im*ortant+

Able to account for censorin! ,an compare survival bet-een 2. !rou*s Assess relationship between covariates and survival

time

urvival ;nalysis
Survival

analysis deals .ith ma%ing inference about E/EKT RATES Rate at t 6 Rate among those at ris% at t $eals .ith Median survival ,=8N- 4 Kot Mean survival ,need everyone to have an eventH44>hyM

Survival vs4 time(to(event 2utcome variable 6 event time E*amples of events: $eath, infection, M1,prostate cancer death, hospitali"ation

Ty*es of censorin!
/ub0ect does not

e1*erience event of interest 2ncom*lete follo-3u*


# Lost to follo-3u* # )it dra-s from study # 4ies &if not bein! studied'

Left or right censored

urvival 'unction
S,t-

6 +O T P t Q 6 5 +O T R t Q +lot: ? a*is 6 N alive, @ a*is 6 time +roportion of population still .ithout the event by time t

Survival urve

urvival -urve
Pro!ortion ;live $+, $+9 $+4 $+2 $+$ $ %+$

* 9 . 4 Months since surgery

<a=ar& 'unction
Also

termed incidence rate, instantaneous ris%, force of mortality S,t Event rate at t among those at ris% for an event 'ey function Estimated in a straightfor.ard .ay

ensored Truncated

Time to Cardiovascular Adverse Event in VIGOR Trial

<a=ar& 'unction
Event

6 death, scale 6 months since T* TS,t- 6 5N at t 6 59 monthsU TAt 5 year, patients are dying at a rate of 5N per monthU TAt 5 year the chance of dying in the follo.ing month is 5NU

Relationship bet.een survivor function and ha"ard function Survivor function, S,t- defines the probability of surviving longer than time t
this is .hat the 'aplan(Meier curves sho.4 !a"ard function is the derivative of the survivor

function over time h,t-6dS,t-3dt

instantaneous ris% of event at time t ,conditional failure rate-

Survivor

and ha"ard functions can be converted into each other

>se of survival analysis: clinical trial


Accrual

into the study over 9 years $ata analysis at year : Reasons for e*iting a study
$ied Alive at study end >ithdra.al for non(study related reasons

,&T#U $ied from other causes

?a!lan-Meier
2ne

.ay to estimate survival Kice, simple, can compute by hand an add stratification factors annot evaluate covariates li%e o* model Ko sensible interpretation for competing ris%s

?a!lan-Meier estimate

Multiply together a series of conditional probabilities

Time ti 8 = I 58 5:

V at ris% 98 98 5X 5= 5;

V events 8 9 8 5 9

Z S
5488 O5(,9398-QW5488684J8 O5(,835X-QW84J8684J8 O5(,535=-QW84J8684X; ,5(,935;-QW84X;684Y9

Pro!ortion urviving (5.@ -onfi&ence) $+4 $+3 $+2 $+5 %+$ $


3 $ +4 $ % 5 +2

?a!lan-Meier -urve

. %$ urvival Time %. ,$

?a!lan Meier -urve

&imit of 'aplan(Meier curves


>hat happens .hen you have several covariates that you believe contribute to survivalM E*ample
Smo%ing, hyperlipidemia, diabetes, hypertension, contribute to

time to myocardial infarct

an use stratified K-M curves for 9 or maybe : covariates Keed another approach multivariate Cox proportional a!ards model is most common (( for many covariates
,thin% multivariate regression or logistic regression rather than a

Student[s t(test or the odds ratio from a 9 * 9 table-

Multivaria"le met od# Cox proportional a!ards


Keeded

to assess effect of multiple covariates on survival o*(proportional ha"ards is the most commonly used multivariable survival method

Cox proportional a!ard model

>or%s .ith ha"ard model onveniently separates baseline ha"ard function from covariates
0aseline ha"ard function over time

h,t- 6 ho,t-e*p,05@F0o ovariates are time independent 05 is used to calculate the ha"ard ratio, .hich is similar to the relative ris%

Semi(parametric

-o1 Pro!ortional <a=ar&s Mo&el


Add

covariates to the model hange in a prognostic factor \ proportional change in the ha"ard ,on the log scale an test the effect of the prognostic factor as in linear regression ( !8: G68

Limitations of Cox ! model


4oes not accommodate variables t

over time

at c an!e

# "ost variables &e5!5 !ender, et nicity, or con!enital

condition' are constant

2f necessary, one can *ro!ram time3de*endent variables ) en mi! t you -ant t is+

6aseline

a(ard function, o&t', is never s*ecified

# 7ou can estimate o&t' accurately if you need to

estimate /&t'5

/ummary
/urvival analyses 8uantifies time to a single,

dichotomous event Handles censored data -ell /urvival and a(ard can be mat ematically converted to eac ot er "aplan#$eier survival curves can be com*ared statistically and !ra* ically Cox proportional ha%ards models el* distin!uis individual contributions of covariates on survival, *rovided certain assum*tions are met5

out!ut of urvival functions


-)rvival &able -umulative Pro!ortion urviving at the Time 0stimate t&+ 0rror +2$$ +%35 +4$$ +,%5 + + +*$$ +,*5 +$$$ +$$$ 8 of -umulative 0vents % , , * 9 8 of Remaining -ases 9 * , % $

% , * 9 .

Time 4+$$$ %9+$$$ ,%+$$$ 99+$$$ 4,+$$$

tatus % % $ % %

,eans and ,edians for -)rvival &i+e Mean 5.@ -onfi&ence #nterval t&+ 0rror LoAer / oun& >!!er / oun& %%+2%$ %,+4., .2+592
a

0stimate *.+2$$

0stimate 99+$$$

Me&ian 5.@ -onfi&ence #nterval t&+ 0rror LoAer / oun& >!!er /oun& ,*+23. +$$$ 5$+359

a+ 0stimation is limite& to the largest survival time if it is censore&+

out!ut of ?M !lot

out!ut of cumulative ha=ar&

out!ut of -o1 Regression


*+nib)s &ests of ,odel %oeffi#ientsa'b

-, Log Li6elihoo& 4+3*,

Overall (score) -hi-square &f +942 %

ig+ +959

-hange 'rom Previous te! -hi-square &f ig+ +494 % +9,,

-hange 'rom Previous /loc6 -hi-square &f ig+ +494 % +9,,

a+ /eginning /loc6 8umber $B initial Log Li6elihoo& function: -, Log li6elihoo&: 3+*32 b+ /eginning /loc6 8umber %+ Metho& C 0nter

!ariables in the "()ation !sa / -%+*5* 0 ,+*$. Wal& +*4. &f % ig+ +.94 01!(/) +,92

S-ar putea să vă placă și