Notes 4

Chapter 4 Modelling with Cox
In the last chapter, we saw some of the mechanics of the Cox model. We covered its derivation, how to t it, how to compare hazard ratios, how to test if a covariate has an eect and how to plot adjusted survival functions. In this chapter, we look at the art of modelling with the Cox proportional hazards model. There are two main parts to the chapter: assessing the validity of the assumptions of the Cox model, namely that hazards are proportional and that the eect of covariates is linear in the log hazard; and building models that contain multiple predictors, that may modify or confound the eect of predictors of interest. In the chapter that follows this, we will extend the Cox model to deal with cases when hazards are non-proportional. For now, we limit ourselves to detecting non-proportionality. In the next few sections, we consider three kinds of approaches for doing this detection: graphical, testing using residuals, and extending the model to be non-proportional. 80
CHAPTER 4. MODELLING WITH COX
81
4.1
Graphical methods to assess the validity of the proportional hazards assumption
Consider a standard linear regression. The model is yi = a + bxi + ei where the errors are ei N(0, 2 ). Note that there are various assumptions in this model, i.e. that the mean is linear in x and that errors have a Gaussian distribution with constant variance. You have undoubtedly been told to assess the validity of these assumptions in previous classes. How might one do this? One could plot the line of best t through the data. If it corresponds quite closely to the observed data (bearing in mind the additional error not explained by the mean) then we might be happy that the model is appropriate (although wed wish to check the homoskedasticity assumption also). We might alternatively plot residuals against x. If these appear independent of x, the model may be appropriate. A nal alternative would be to t a more general model, with a quadratic term for the mean, and test whether the quadratic coecient is equal to zero. If it is, we can be fairly safe in assuming that the mean is indeed linear in x. All three approaches have parallels in the Cox proportional hazards model. We may plot expected versus observed estimates of S (t) to check if they are close. We may use suitably dened residuals to test the goodness-oft. And we may generalise the model to be non-proportional and then assess whether the generalised model ts better than the proportional one. A fourth alternative is to plot loglog functions of the estimated survival function as another graphical check of proportionality.
4.1.1
Expected versus observed plots
Again, cast your mind back to when you studied linear regression. The model is yi = a + bxi + ei , ei N(0, 2 ). For a particular value of xi , the expected value of yi is E(Yi |xi ) = a + bxi
82
so that for individual i, yi is the observed response and a + bxi is the expected under the assumption that the model is correct. Thus, if the model is correct, the expected values should be close to the observed values if we plot them.
Turn your mind back to the Cox model. Consider a single binary predictor x {0, 1}. We obtain the observed survival curves by using the Kaplan 0 (t) and S 1 (t). Note that Meier approach to obtain two survival curves, S these are obtained by tting to each subset of the data separately. We obtain expected survival curves under the assumption that the model is correct, i.e. that the hazards are indeed proportional by using a Cox phm t using x. Note that h(t, x) = h0 (t)ex , and that the model is t to both subsets jointly. If we plot these survival curves and they are close (in some vague sense) then the proportional hazards assumption is reasonable. If they are discrepant, we would reject this assumption. Here is a plot of survival among veterans with lung cancer (see Kleinbaum and Klein, 2005, for details) using prior treatment as a predictor. The dashed lines are the observed survival curves using KaplanMeier, the solid the expected under the proportional hazards assumption. They seem quite close.

1.0
83
0.0
0.2
^ S(t) 0.4 0.6
0.8
no prior therapy prior therapy
0.0
0.5
1.0
1.5 t
2.0
2.5
The plot was created using the following R commands.

phm.prior=coxph(S~prior) d.phm.prior = coxph.detail(phm.prior) times =c(0,d.phm.prior$t) h0 = c(0,d.phm.prior$hazard) S0=exp(-cumsum(h0)) beta = c(phm.prior$coef) x1=c(0)-mean(prior) Sx1 = S0 ^ exp(t(beta) %*% x1) x2=c(10)-mean(prior) Sx2 = S0 ^ exp(t(beta) %*% x2) cm=1/2.54 xp=8*cm;yp=6*cm;xm1=2*cm;xm2=0.5*cm;ym1=2*cm;ym2=0.5*cm;hei=yp+ym1+ym2;wid=xp+xm1+xm2 pdf("v1.pdf",height=hei,width=wid);par(mai=c(ym1,xm1,ym2,xm2),mgp=c(2,0.75,0)) k=survfit(S~prior) plot(k,col=1:2,lty=2,xlab=t,ylab=expression(hat(S)(t))) lines(times,Sx1,col=1,type=s) lines(times,Sx2,col=2,type=s) legend(max(t/365.25)*.4,1,c(no prior therapy,prior therapy),lty=1,col=c(1,2),bty=n) dev.off()
84
If we have a categorical covariate, the approach is the same, but with more lines. For example, using cell type as a predictor we obtain the following graph.
1.0
0.0
0.2
^ S(t) 0.4 0.6
large squamous adeno small
0.8 0.0
0.5
1.0
1.5 t
2.0
2.5
(This looks better in colour: see the pdf on the web.) These curves look fairly similar, but we can see some examples where the two curves of the same colour are quite far from each other, especially for squamous cells. For continuous covariates, we would have to categorise the covariate. This might be splitting in two around the mean or median or some signicant number, or splitting in three around the terciles. It would be unwise to split into more than four categories unless you had a lot of data. The expected curves would be created by tting to the continuous version of the covariate rst, then plugging in the mean value for that category in to the tted model. The observed plots are obtained by using KaplanMeier on the categorised covariates. For instance, consider performance (on the Karnofsky scale) of the veterans. The mean value is just under 60 for this covariate, so we might use 60 as our cut o. So doing yields the following graph.

1.0
85
0.0
0.2
^ S(t) 0.4 0.6
0.8
perf >= 60 perf < 60
0.0
0.5
1.0
1.5 t
2.0
2.5
Here there is a clear discrepancy between the two groups. For low times, the observed curves (dashed) are further apart than wed expect if the proportional hazards assumption were correct. This switches as time goes on. The suggestion from this plot is that having a good Karnofsky performance is initially very benecial but as time progresses it becomes less and less so. The R commands to make this plot were as follows.
phm.perf=coxph(S~perf) d.phm.perf = coxph.detail(phm.perf) times =c(0,d.phm.perf$t) h0 = c(0,d.phm.perf$hazard) S0=exp(-cumsum(h0)) beta = c(phm.perf$coef) meanx=mean(perf) x1=mean(perf[perf>=60])-meanx Sx1 = S0 ^ exp(t(beta) %*% x1) x2=mean(perf[perf<60])-meanx Sx2 = S0 ^ exp(t(beta) %*% x2) cm=1/2.54 xp=8*cm;yp=6*cm;xm1=2*cm;xm2=0.5*cm;ym1=2*cm;ym2=0.5*cm;hei=yp+ym1+ym2;wid=xp+xm1+xm2
86
pdf("v4.pdf",height=hei,width=wid);par(mai=c(ym1,xm1,ym2,xm2),mgp=c(2,0.75,0)) k=survfit(S~(perf<60)) plot(k,col=1:2,lty=2,xlab=t,ylab=expression(hat(S)(t))) lines(times,Sx1,col=1,type=s) lines(times,Sx2,col=2,type=s) legend(max(t/365.25)*.4,1,c(perf >= 60,perf < 60),lty=1,col=c(1,2),bty=n) dev.off()
If we have more than one covariate for which we wish to assess the proportional hazards assumption, we can in principle obtain this by creating or using categories for each variable separately and then creating a matrix of categories.
This is ne for large data sets, but for smaller ones, the results are usually too sparse. In such situations, I recommend instead assessing each individually.
4.1.2
Using loglog plots to assess proportionality
An alternative that is in some ways easier to interpret is to create a plot in which the curves for each group should be parallel if the assumption of proportionality is correct. Recall that if the phm is correct, that h(t, x1 ) = h(t, x2 )e (x1 x2 ) , i.e. the log hazards can be written log h(t, x1 ) = log h(t, x2 )+ (x1 x2 ). Clearly, these are parallel. Now, you may recall from chapter 3 that the estimates of the hazard function itself are not particularly nice:
87
they are quite stochastic and dicult to interpret. However, this motivates the following approach, which is much smoother. (t, x) for binary or categorical or categorised-continuous x We estimate S using the KaplanMeier approach. Then, ignoring the limits t 0 and (t, x) (0, 1). So log S (t, x) (, 0), and log S (t, x) (0, ), t , S t which is the cumulative hazard 0 h(, x) d . If we take logs again, then, (t, x1 )} = log{ log S (t, x2 )}+ a constant. Usually we we get log{ log S multiply again by minus 1 so that the curves go down with time, i.e. we plot (t, x)} against t for each category of x. log{ log S If the proportional hazards assumption is correct then the resulting plots should be parallel for a xed t. Consider the following such plot for performance of veterans with lung cancer.
^ S(t) 0
perf >= 60 perf < 60 0.0 0.5 1.0 t 1.5 2.0 2.5
If we zoom in on the initial period we obtain the following.
88
^ S(t) 1 0.0 0 1
0.2
0.4 t
0.6
0.8
1.0
Clearly these are not parallel, supporting our previous nding that performance does not satisfy the proportional hazards assumption. The R code to create this plot follows.
cm=1/2.54 xp=8*cm;yp=6*cm;xm1=2*cm;xm2=0.5*cm;ym1=2*cm;ym2=0.5*cm;hei=yp+ym1+ym2;wid=xp+xm1+xm2 pdf("v7.pdf",height=hei,width=wid);par(mai=c(ym1,xm1,ym2,xm2),mgp=c(2,0.75,0)) k=survfit(S~(perf<60)) k$label=c(rep(1,k$strata[1]),rep(2,k$strata[2])) t1=c(0,subset(k$time,k$label==1)) t2=c(0,subset(k$time,k$label==2)) St1=c(1,subset(k$surv,k$label==1)) St2=c(1,subset(k$surv,k$label==2)) plot(t1,-log(-log(St1)),col=1,type=s,xlab=t,ylab=expression(hat(S)(t)),ylim=c(-3,5)) lines(t2,-log(-log(St2)),col=2,type=s) legend(max(t/365.25)*.4,-1,c(perf >= 60,perf < 60),lty=1,col=c(1,2),bty=n) dev.off()
89
4.2
Testing the proportional hazards assumption
Recall that one aim of this chapter is to assess the validity of the proportional hazards assumption (pha), namely that h(t, x) = h0 (t, ) exp( T x). We considered two graphical approaches: expected survival curves (under pha) versus observed, and loglog plots of the survival function. In the former, if expected curves look the same as the observed, we deem the pha to be okay. If they look dierent, we deem it to be false.
1.0
0.0
0.2
^ S(t) 0.4 0.6
0.8
no prior therapy prior therapy
0.0
0.5
1.0
1.5 t
2.0
2.5
Question: are these the same? In the second approach, if the same vertical distance separates the two curves for all t, we deem the pha to be okay. If the distance changes, we deem it violated.
90
^ S(t) 0
perf >= 60 perf < 60 0.0 0.5 1.0 t 1.5 2.0 2.5
Question: are these curves parallel? It can be dicult to answer these questions, and the answers to both are subjective. It would be nice to have an objective (though nothing in statistics is actually objective) quantity to say if we can reject the pha. Think again about linear regression, i.e. we model a variable yi as follows: yi = a + bxi + ei where ei N(0, 2 ). (This is a model. Dont think it is true!) Then E(Yi ) = a + bxi . The residual for individual i is r i = yi a bxi . It is an 2 estimate of ei . If the model were true, then r i should be N (0, ) regardless of xi . Commonly, we plot r i versus xi . But we could also test whether r i is related to xi , e.g. by testing if the population correlation coecient is equal to zero. The key thing about residuals is not that they are the dierence between observed and expected values, but rather that they are variables whose distribution (or some properties of their distribution) is known and is simple under H0 .
91
4.2.1
Schoenfeld residuals
Now return to the Cox phm. We will consider eponymous residuals due to Schoenfeld (1982) that are centred on zero and should be independent of time if the pha is true. Deviations from this, i.e. residuals that exhibit some trend in time, indicate that the pha is violated. Furthermore, we can perform a formal test of this hypothesis. Note that we can never accept the pha, merely fail to reject it (as with all classical hypothesis testing: though Bayesian hypothesis testing is dierent). The approach considers one covariate at a time, that is we get one set of residuals and one p-value per covariate in the model. To make it simpler, we start with the case where there is only one covariate in the modelthe more general case is trivially dealt with and described later. Recall that at any instant in time t, if a failure were to occur and the model were correct, the failure would happen to individual k with probability p(k fails) = exk
j R(t)
exj
The individual who fails at time ti has covariate xi . The expected value of this is E(Xi ) =
kR(ti )
xk p(k fails) xk
kR(ti ) kR(ti )
= =
exk
j R(t)
exj
xk exk exj
j R(t)
And so the dierence between the observed and expected covariate of the person who fails at time ti is r i = xi
kR(ti )
xk exk exj
j R(t)
92
Clearly this has expected value zero (if the pha is true). It should also be independent of time if the pha is true. What will happen if pha is not true? To illustrate this, imagine that the true model is h(t, x) = 0.1 exp(0.5x + 0.5xt), so that the pha is violated. Ive simulated 1000 realisations of the process with x U(0, 1). Here is the resulting plot of the Schoenfeld residuals versus time.
residual for x
0.6 0.01
0.2
0.0
0.2
0.4
0.6
0.1
10
t (log scale)
Compare this to data from the model h(t, x) = 0.1 exp(0.5x).
93
residual for x
0.4 0.01
0.0
0.4
0.1
10
t (log scale)
Visually, there doesnt appear to be much dierence. However, for the rst data set, the correlation coecient between the residuals and time is -0.32, which is signicantly dierent from zero. For the second data set, it is -0.01, which is not.
4.2.2
Scaled Schoenfeld residuals
An alternative was proposed by Grambsch & Therneau (1994), who suggested scaling the residuals by an estimate of their variance. Their residuals are dened to be ) ( r i = mV ri , where m is the number of uncensored observations. This is what R and other programs plot.
94
4.2.3
Models with multiple parameters
The same approach applied to models with more than one covariate. Let xi be the covariate of interest for individual i (at any time there must be only one) and xi be the vector of all covariates. Then r i = xi
kR(ti )
xk e xk e xj
j R(t)
This can be repeated for each covariate.
4.2.4
Testing the proportional hazards assumption using Schoenfelds residuals
It is simple to do a formal test of whether Schoenfelds residuals are correlated with time. Klein & Kleinbaum (2005) suggest ranking times, rather than using time itself, and this is what we choose to do here. The null hypothesis is that the (scaled) residuals are independent of failure times. This implies that they are uncorrelated. (However, technically, we cannot go from a lack of correlation to independence. We gloss over this.) The usual test statistic for testing the correlation coecient is based on a t-test. However, most software packages use a chi-squared test, derived in Grambsch & Therneau (1994). The derivation is quite dicult and not particularly interesting (i.e. it is hard to see how it would be generalised to other situations), so we wont cover it. We dont need to, as R will calculate it automatically if we use the cox.zph command. This is illustrated below. We t a model using body mass index (bmi) as the only predictor of survival following heart attacks, using the WHAS data described in Hosmer et al. (2008).
CHAPTER 4. MODELLING WITH COX S=Surv(t,delta) phm.bmi = coxph(S~bmi) cox.zph(phm.bmi,transform=rank,global=F) This gives the following output: rho chisq p bmi 0.0857 2.01 0.156
95
The p-value tells us there is no evidence to reject the hypothesis that the residuals are uncorrelated with time. We would therefore conclude that we can use the phm with bmi until further evidence comes to light.
4.2.5
What to do if it is violated
If the p-value is large, then we do not reject the pha. We do not accept it. However, we assume for the purposes of modelling that it holds. If the p-value is small, then we can reject the pha. This makes modelling more dicult. There are two solutions: extend the phm so that incorporates an interaction between the awkward covariate(s) x and t; or stratify the covariate(s). These are covered in the next chapter. There is another test that we can do, namely extending the phm and seeing whether the extended model ts signicantly better. This is covered at the end of this chapter.
96
4.2.6
Example
We reconsider the heart attack data with more covariates. Now, we use in addition to bmi the subjects age and systolic blood pressure. The code and output are as follows: > phm.bmi_age_sysbp = coxph(S~bmi+age+sysbp) > cox.zph(phm.bmi_age_sysbp,transform=rank,global=F) rho chisq p bmi 0.0724 1.373 0.24135 age 0.0568 0.666 0.41434 sysbp 0.1647 6.771 0.00926 From these output, na vely it appears that bmi and age satisfy the pha but systolic blood pressure does not.
4.2.7
Multiple tests warning
Question 1: if we choose a level of = 0.05 (as is common) to specify a signicant departure from the null hypothesis, what is the probability of making a type I error?
Question 2: if we perform two tests when H0 is true, what is the probability of making at least one type I error?
97
In either case, the more tests, the greater the chance of spurious signicance. In general, for n tests, p(at least one type I error) 1 (1 )n . To account for this, we need to adjust the threshold for signicance to account for multiple tests. The simplest way to do this (though there are many other, better, more complicated ways to do this) is to use Bonferronis correction, namely: If n tests are being performed and an overall level is desired, the signicance threshold for any individual test is /n. So, for example, with two tests at level 0.05, each hypothesis is rejected if its p < 0.025. Note the following two points: It is not just in survival analysis that multiple tests necessitate correction of the level. This applies in all statistical settings, though it is often ignored. Despite this, for applied research to be publishable it will often/usually need to be corrected in this way. By lowering the p-value threshold for signicance, you keep the level of the test at the desired level, i.e. the probability of a type I error is as advertised. However, the probability of a type II error increases, i.e. there is a greater chance of not rejecting H0 when actually you should. So, in the example we considered earlier with three predictors for survival following a heart attack, to obtain a signicant result at the 5% level would require a p-value of 0.05/3 = 0.017. We still reject the hypothesis of pha for systolic blood pressure (p-value of 0.009), but the evidence against it is now weaker.
98
4.3
Checking the scale of covariates
Recall a previous example where body mass index (bmi) was used to predict survival following a heart attack. The tted model was h(t, x) = h0 (t) exp(0.0985x) where xi is the bmi of individual i. You may nd this surprising, as it implies that heavier people have longer survival times than people of normal weight. Indeed, taken to a limit, it suggests that someone who is clinically obese (say with a bmi of 35) has a hazard ratio of 0.23 times that of someone with a healthy bmi of 20. The explanation here is that two kinds of extreme weights are bad for survival: it is reduced among obese and severely underweight alike. This kind of eect cannot be handled by a model of form h(t, x) = h0 (t) exp(x). However, it can be trivially accomodated by using the square of bmi as a further covariate, viz : h(t, x) = h0 (t) exp(1 x + 2 x2 ). Similar, there is no reason why a covariate should be modelled as exp(x) and not exp( x) or exp( log x), for example. These kinds of models are easy to incorporate, e.g. in R, but they introduce a new diculty: if we are considering dierent possible models for bmi, say, which one should we use? We would therefore like to assess which of competing models for the scale of a covariate best ts the data. If we were only considering models that were nested, like h0 (t) exp(x) nested within h0 (t) exp(1 x + 2 x2 ), then we could use the likelihood ratio test (see earlier in the notes). However, this is not appropriate for non-nested models, such as when comparing h0 (t) exp(x) and h0 (t) exp( log x). We need an alternative. The alternative I recommend is the Akaike Information Criterion. We briey mentioned this earlier in the course.
99
4.3.1
Akaike Information Criterion
This is a method to select between two or more competing models, possibly with dierent numbers of parameters. It balances parsimony with goodnessof-t by penalising models that have a greater number of parameters. Consider just two models: Model 1 has p1 parameters denoted . Model 2 has p2 parameters denoted b. and b , repectively. The maximum likelihood estimates are The AIC for model m with a pm -length parameter vector m is dened to be m , m). AIC (m) = 2pm 2 log f (data| That is, for our two models above, , model 1) AIC (model 1) = 2p1 2 log f (data| m AIC (model 2) = 2p2 2 log f (data|bm , model 2). The model with the smaller AIC is preferred. The approach generalises trivially to three or more competing models. Note that for the Cox phm we can use the partial log-likelihood in place of the the log-likelihood function in the above denitions. The AIC can readily be found for Cox phms in R using the extractAIC function. Its use is exhibited in the following example, again based on bmi and survival following heart attacks. library(survival) x=read.table("whas500.dat",col.names=c(id,age,sex,heartrate, sysbp,diabp,bmi,history,af,shock,chc,chb,mi_order, mi_type,year,admission,discharge,followup,hospital_stay,
CHAPTER 4. MODELLING WITH COX discharge_status,length_followup,delta)) attach(x) t=length_followup/365.25 S=Surv(t,delta) phm.bmi1 =coxph(S~bmi) phm.bmi2 =coxph(S~bmi+I(bmi^2)) phm.bmi3 =coxph(S~log(bmi)) phm.bmi4 =coxph(S~sqrt(bmi)) phm.bmi5 =coxph(S~bmi+I(bmi^2)+I(bmi^3)) aic1=extractAIC(phm.bmi1)[2] aic2=extractAIC(phm.bmi2)[2] aic3=extractAIC(phm.bmi3)[2] aic4=extractAIC(phm.bmi4)[2] aic5=extractAIC(phm.bmi5)[2]
100
The I() functions in the model specication is needed when there are powers, to stop R thinking you mean interactions. (This is a silly thing about R sorry.) The AIC scores are as follows: Model 1 2 3 4 5 linear quadratic log square root cubic AIC AIC min AIC 2408.3 4.4 2403.9 0.0 2404.6 0.7 2406.1 2.2 2404.6 0.7
The quadratic for x is better than the linear model. The alternative formulations considered above do not improve on this. Unless we could think of a better model, we would use the quadratic form in all future models using bmi to predict survival in these data. The tted quadratic model is h(t, x) = h0 (t) exp(0.33x + 0.0044x2 ) which is plotted below.

0.020
101
hazard ratio
0.005 15
0.010
0.015
20
25
30 bmi
35
40
45
4.3.2
Advice
There are an innite number of scaling functions you could use. My suggestion is rst to try a quadratic: if that is better, try a cubic. Then try a few basic functions of x, such as the log or square root. There are a lot of possible covariates in most models. My suggestion is to consider the parameters one at a time and see if a change of scale makes them t better. If not, stick with linear (in the log-hazard) for that covariate throughout.
102
4.4
Selection of covariates
The next part of the chapter relates to selecting which covariates to include in a model. If we have only one covariate, this is not an issue. For example, we may have (partially right censored) survival times and a binary covariate signifying if patients received the drug or a placebo. Here we would t the Cox phm with drug as a predictor and see if it improves the t: if so we conclude there is a dierence between drug and placebo, if not we conclude the opposite and revert to the simpler model with no covariates. On the other hand, we may have a host of covariates, some of which are clinically interesting, and others of which may aect the covariates we are primarily interested in. Alternatively, we may have no special covariates and simply want to nd the model that best describes the data. If we have p possible covariates, then we have a total of 2p models excluding interactions between covariates. If, say, we had p = 5 possible covariates, we only have 32 possible models (a fairly small number that could feasibly be fully considered without any special routines). If, on the other hand, we had p = 10, there would be 1024 possible models to be considered, and if p = 20, as well it might for large cohort studies (such as the Singapore Chinese Health Study), there are over a million possible models, and this is before interactions are accounted for. There are two methods for building models in such cases. These impose guidelines that help make the task of selecting a model easier, by reducing the size of the problem from order 2p to something that is much more linear in p. The rst one we consider is fairly automatic and is called step-wise selection of covariates. Its use may be familiar to you from linear regression. The second is what Hosmer et al. (2008) call purposeful selection : this requires more contemplation on your behalf. In both approaches, you might bear in mind the following advice from Hosmer et al. (2008): . . . statistical selection procedures suggest, but do not dictate, the model.
103
That is, if you have a reason to use an alternative model, perhaps because you feel strongly that a particular term should be included in the model, go ahead and do it.
4.4.1
Step-wise selection
Step-wise selection of covariates works by adding step-at-a-time the best covariate to the model from those that are currently excluded from the model, and then checking to see if each term that is currently in the model should be excluded from the model. Conditions for inclusion or exclusion are imposed to dene a stopping rule for the algorithm. There are two main ways to dene which parameter to include or drop. One approach is the use the AIC to determine the parameter which best improves the model. This is easy to implement in R, so we will describe it later, but it has a serious problem, in that it tends to include too many terms in the model. Adaptations of the AIC approach that penalise parameters more strongly do not have this problem. The other approach that we describe below in more detail is to use p-values to determine which parameters to include or exclude. These come from likelihood ratio tests for the models with and without the covariate of interest. It requires a little more work to do this in R.
Using p-values Firstly, we dene two thresholds pE < pR for entering (pE ) or removing (pR ) a parameter from the model. One might think to make these both 5%, but Hosmer et al. (2008) recommend setting them higher (e.g. to pE = 0.15 and pR = 0.2) for the following reasons: this approach is to create a preliminary model that may later be changed based on its goodness of t;
104
being too harsh reduces the chance of important eect modiers being included in the model. This is the algorithm.
CHAPTER 4. MODELLING WITH COX Using AIC
105
The approach using the AIC is slightly simpler. At each step, we try to add in each excluded covariate and try to remove each included covariate. We then choose the model that minimises the AIC. This could be a model including a new covariate, in which case we add it in, a model excluding an old covariate, in which we cut it out, or the current model, in which case we stop. This can be trivially done in R using the commands used in the following example.
library(survival,MASS) #you need MASS too attach(veteran) #using a built in data set # This defines the set of models being considered. This formulation # allows at most 2-way interactions (ie the ^2 term): Scope=list(upper=~(trt+celltype+karno+diagtime+age+prior)^2,lower=~1) phm_0 = coxph(Surv(time,status)~1) #the empty model phm_f = stepAIC(phm_0,Scope,direction="both") #the final model The output of this is as follows. Start: AIC=1010.9 Surv(time, status) ~ 1 Df + karno 1 + celltype 3 <none> + diagtime 1 + age 1 + prior 1 + trt 1 Step: AIC 970.87 992.05 1010.90 1011.99 1012.27 1012.38 1012.89
AIC=970.87
CHAPTER 4. MODELLING WITH COX Surv(time, status) ~ karno Df 3 1 1 1 1 1 AIC 959.53 970.87 971.93 972.80 972.82 972.85 1010.90
106
+ celltype <none> + trt + age + prior + diagtime - karno
Step: AIC=959.53 Surv(time, status) ~ karno + celltype Df <none> + trt + age + prior + diagtime + celltype:karno - celltype - karno 1 1 1 1 3 3 1 AIC 959.53 959.83 961.09 961.27 961.35 962.51 970.87 992.05
This means in the rst step, it seeks to add all covariates one at a time, and karno is the best, so it gets added. At the next step, it seeks either to remove karno or to add all other covariates one at a time, and nds the model with karno and celltype is best. At the last stage, it seeks either to remove one of the two covariates already selected or add a single other covariate or add an interaction between the two existing covariates, but none of these models improves the AIC, so it stops there. However, if we had run stepAIC from the full model backwards, phm_0 = coxph(Surv(time,status)~ (trt+celltype+karno+diagtime+age+prior)^2) #the full model phm_f = stepAIC(phm_0,Scope,direction="both") #the final model
107
we get a completely dierent nal model. The last part of the output is:
Step: AIC=948.07 Surv(time, status) ~ trt + celltype + karno + diagtime + age + prior + trt:karno + trt:prior + celltype:diagtime + karno:age + karno:prior Df <none> + trt:diagtime + diagtime:prior + karno:diagtime + diagtime:age + trt:age + age:prior + trt:celltype - trt:prior - karno:age - trt:karno + celltype:age + celltype:prior + celltype:karno - celltype:diagtime - karno:prior 1 1 1 1 1 1 3 1 1 1 3 3 3 3 1 AIC 948.07 948.49 949.38 949.88 949.94 949.99 950.07 950.22 950.37 950.54 950.61 951.25 951.83 952.79 954.47 955.66
This is a much more complex model with many interactions. The output of the tted model shows that many of the parameters are not signicant at the 5% level:
> phm_f Call: coxph(formula = Surv(time, status) ~ trt + celltype + karno + diagtime + age + prior + trt:karno + trt:prior + celltype:diagtime + karno:age + karno:prior)
108
coef exp(coef) se(coef) z p trt 2.09396 8.117 0.741526 2.824 4.7e-03 celltypesmallcell 1.74063 5.701 0.379889 4.582 4.6e-06 celltypeadeno 1.75210 5.767 0.452094 3.876 1.1e-04 celltypelarge 0.57337 1.774 0.494589 1.159 2.5e-01 karno -0.05186 0.949 0.034183 -1.517 1.3e-01 diagtime 0.05285 1.054 0.016524 3.198 1.4e-03 age -0.07032 0.932 0.029999 -2.344 1.9e-02 prior 0.38277 1.466 0.101817 3.759 1.7e-04 trt:karno -0.02438 0.976 0.011527 -2.115 3.4e-02 trt:prior -0.08963 0.914 0.043551 -2.058 4.0e-02 celltypesmallcell:diagtime -0.07747 0.925 0.021308 -3.636 2.8e-04 celltypeadeno:diagtime -0.04136 0.959 0.044077 -0.938 3.5e-01 celltypelarge:diagtime -0.02421 0.976 0.043754 -0.553 5.8e-01 karno:age 0.00108 1.001 0.000521 2.079 3.8e-02 karno:prior -0.00406 0.996 0.001312 -3.098 1.9e-03 Likelihood ratio test=92.8 on 15 df, p=2.93e-13 n= 137
As a result, you may wish to do it in two stages: rst selecting the terms to go in the model without interactions, then to repeat the procedure by adding interactions between these terms. Note, for both the AIC and the log-likelihood ratio test, it is imperative that the data are the same for all models being considered. This will not be the case if you have to remove observations corresponding to non-recorded covariates from one model t but not the other. In this situation, it is safest to t the model only to those with full covariate information during the model building stage. There are more sophisticated ways of dealing with missing data (data augmentation, say), but these are beyond the scope of the course.
4.4.2
Example
We consider again the Worcester Heart Attack Study (see Hosmer et al., 2008). The data set we use has 500 individuals who have suered a myocardial infarction and been treated in a hospital in Worcester, USA. As well
109
as the time of death or censoring and the censoring indicator, the following covariates have been recorded: age sex heart rate (hrate) systolic blood pressure (sysbp) diastolic blood pressure (diabp) body mass index (bmi) history of disease (history) atrial brillation (afb) cardiogenic shock (shock) congestive heart complications (chf) complete heart block (av3) MI order (miord) MI type (mitype). There is no clear covariate of interest (such as treatment), so a priori all models are equally interesting. First, we process the data: library(survival,MASS) cols=c(id,age,sex,hrate,sysbp,diabp,bmi,history,afb,shock ,chf,av3,miord,mitype,year,c16,c17,c18,c19,c20,t, delta) attach(read.table(http://www.umass.edu/statdata/statdata/data/whas500.dat, col.names=cols))
CHAPTER 4. MODELLING WITH COX S=Surv(t,delta) x1=age;x2=sex;x3=hrate;x4=sysbp x5=diabp;x6=bmi;x7=history;x8=afb x9=shock;x10=chf;x11=av3;x12=miord;x13=mitype pE=0.15;logpE=log(pE) pR=0.2;logpR=log(pR)
110
Note that it will be easier to use the variables as x1 etc. rather than age etc. Now, t the rst batch of models: m0=coxph(S~1) ;ll0=m0$logl[1] # a bit weird m1=coxph(S~x1);ll1=m1$logl[2] m2=coxph(S~x2);ll2=m2$logl[2] m3=coxph(S~x3);ll3=m3$logl[2] m4=coxph(S~x4);ll4=m4$logl[2] m5=coxph(S~x5);ll5=m5$logl[2] m6=coxph(S~x6);ll6=m6$logl[2] m7=coxph(S~x7);ll7=m7$logl[2] m8=coxph(S~x8);ll8=m8$logl[2] m9=coxph(S~x9);ll9=m9$logl[2] m10=coxph(S~x10);ll10=m10$logl[2] m11=coxph(S~x11);ll11=m11$logl[2] m12=coxph(S~x12);ll12=m12$logl[2] m13=coxph(S~x13);ll13=m13$logl[2] Now consider the p-values. It will be easier to consider the log p-values, as they will be very small.
pchisq(2*(ll1-ll0),lower.tail=FALSE,log.p=TRUE,df=1) pchisq(2*(ll2-ll0),lower.tail=FALSE,log.p=TRUE,df=1) pchisq(2*(ll3-ll0),lower.tail=FALSE,log.p=TRUE,df=1) pchisq(2*(ll4-ll0),lower.tail=FALSE,log.p=TRUE,df=1) pchisq(2*(ll5-ll0),lower.tail=FALSE,log.p=TRUE,df=1)
CHAPTER 4. MODELLING WITH COX pchisq(2*(ll6-ll0),lower.tail=FALSE,log.p=TRUE,df=1) pchisq(2*(ll7-ll0),lower.tail=FALSE,log.p=TRUE,df=1) pchisq(2*(ll8-ll0),lower.tail=FALSE,log.p=TRUE,df=1) pchisq(2*(ll9-ll0),lower.tail=FALSE,log.p=TRUE,df=1) pchisq(2*(ll10-ll0),lower.tail=FALSE,log.p=TRUE,df=1) pchisq(2*(ll11-ll0),lower.tail=FALSE,log.p=TRUE,df=1) pchisq(2*(ll12-ll0),lower.tail=FALSE,log.p=TRUE,df=1) pchisq(2*(ll13-ll0),lower.tail=FALSE,log.p=TRUE,df=1)
111
The log p-value for the model with x1 is the smallest, with value -74. This is much less than our threshold of -1.9, so we proceed with x1 in the model (here, age). The next step seeks to add another covariate.
m0=coxph(S~x1) ;ll0=m0$logl[2] m2=coxph(S~x1+x2);ll2=m2$logl[2] m3=coxph(S~x1+x3);ll3=m3$logl[2] m4=coxph(S~x1+x4);ll4=m4$logl[2] m5=coxph(S~x1+x5);ll5=m5$logl[2] m6=coxph(S~x1+x6);ll6=m6$logl[2] m7=coxph(S~x1+x7);ll7=m7$logl[2] m8=coxph(S~x1+x8);ll8=m8$logl[2] m9=coxph(S~x1+x9);ll9=m9$logl[2] m10=coxph(S~x1+x10);ll10=m10$logl[2] m11=coxph(S~x1+x11);ll11=m11$logl[2] m12=coxph(S~x1+x12);ll12=m12$logl[2] m13=coxph(S~x1+x13);ll13=m13$logl[2] pchisq(2*(ll2-ll0),lower.tail=FALSE,log.p=TRUE,df=1) pchisq(2*(ll3-ll0),lower.tail=FALSE,log.p=TRUE,df=1) pchisq(2*(ll4-ll0),lower.tail=FALSE,log.p=TRUE,df=1) pchisq(2*(ll5-ll0),lower.tail=FALSE,log.p=TRUE,df=1) pchisq(2*(ll6-ll0),lower.tail=FALSE,log.p=TRUE,df=1) pchisq(2*(ll7-ll0),lower.tail=FALSE,log.p=TRUE,df=1) pchisq(2*(ll8-ll0),lower.tail=FALSE,log.p=TRUE,df=1) pchisq(2*(ll9-ll0),lower.tail=FALSE,log.p=TRUE,df=1)
CHAPTER 4. MODELLING WITH COX pchisq(2*(ll10-ll0),lower.tail=FALSE,log.p=TRUE,df=1) pchisq(2*(ll11-ll0),lower.tail=FALSE,log.p=TRUE,df=1) pchisq(2*(ll12-ll0),lower.tail=FALSE,log.p=TRUE,df=1) pchisq(2*(ll13-ll0),lower.tail=FALSE,log.p=TRUE,df=1)
112
The model with x1 and x10 has the lowest log p-value of -20. Again, this is small enough to include x10 in the model. We check to ensure that adding x10 doesnt make x1 non-signicant. m0=coxph(S~x1+x10) ;ll0=m0$logl[2] m1=coxph(S~x10) ;ll1=m1$logl[2] pchisq(2*(ll0-ll1),lower.tail=FALSE,log.p=TRUE,df=1) #(note order of subtraction) The log p-value for the reduced model is well below the log pR that we have set, so we do not remove the rst covariate. We proceed in this way, with models: 1. x1 2. x1 and x10 3. x1 , x10 and x9 4. x1 , x10 , x9 and x3 5. x1 , x10 , x9 , x3 and x2 (almost remove x2 ) 6. x1 , x10 , x9 , x3 , x2 and x5 7. x1 , x10 , x9 , x3 , x2 , x5 and x6 8. x1 , x10 , x9 , x3 , x2 , x5 , x6 and x13 (only just include x13 ) 9. no further terms are signicant
113
The nal preliminary model therefore contains terms for age, sex, heart rate, diastolic blood pressure, body mass index, cardiogenic shock, congestive heart complications, and MI type. The output from this model is as follows: Call: coxph(formula = S ~ x1 + x10 + x9 + x3 + x2 + x5 + x6 + x13) n= 500 coef exp(coef) se(coef) z p x1 0.0480 1.049 0.00675 7.11 1.1e-12 x10 0.7471 2.111 0.14826 5.04 4.7e-07 x9 1.2157 3.373 0.26949 4.51 6.5e-06 x3 0.0116 1.012 0.00298 3.88 1.1e-04 x2 -0.2930 0.746 0.14324 -2.05 4.1e-02 x5 -0.0112 0.989 0.00349 -3.20 1.4e-03 x6 -0.0477 0.953 0.01641 -2.90 3.7e-03 x13 -0.2690 0.764 0.18202 -1.48 1.4e-01 exp(coef) exp(-coef) lower .95 upper .95 1.049 0.953 1.035 1.063 2.111 0.474 1.579 2.823 3.373 0.297 1.989 5.720 1.012 0.988 1.006 1.018 0.746 1.340 0.563 0.988 0.989 1.011 0.982 0.996 0.953 1.049 0.923 0.985 0.764 1.309 0.535 1.092
x1 x10 x9 x3 x2 x5 x6 x13
Rsquare= 0.361 (max possible= 0.993 ) Likelihood ratio test= 224 on 8 df, p=0 Wald test = 201 on 8 df, p=0 Score (logrank) test = 225 on 8 df, p=0 Note that in deriving this model, we built less than 100 models, while the total number of models we could have built was 213 = 8000.
114
This is actually the same model as the AIC method would give us. However, the AIC doesnt perform very well when interactions are included. It is, however, much simpler to implement: Scope=list(upper=~(x1+x2+x3+x4+x5+x6+x7+x8+x9+x10+x11+x12+x13),lower=~1) m0 = coxph(S~1) m1 = stepAIC(m0,Scope,direction=both)
4.4.3
Next steps
The next steps are: Attempt to adjust the scale of the continuous covariates such as body mass index. Attempt to add interactions between these eight covariates. Adjusting the scale is covered in the previous lecture. Adding interactions takes the same form as the algorithm described above. We start with the model derived above without interactions. Then we attempt to add each kind of interaction to the model, select the one with the lowest p-value, and include if the p-value is less than some threshold. Some modications to the approach are described in the next section for the purposeful selection routine, and these might be included here. These include using a more stringent p-value threshold for the interactions, such as the usual 5%.
115
4.4.4
Purposeful selection of covariates
In the last section we described an automatic method of selecting covariates to go into the model. This was based on a step-wise selection algorithm, which dictated which terms to add or remove. Much of the art to model building is removed from this process, and in this section we discuss an alternative, more pensive, approach described by Hosmer et al. (2008), which they term purposeful selection. It is still written in the form of an algorithm. However, you should feel free to ignore any aspect of it that clashes with your interpretation of the data at hand.
116
4.4.5
An example: recidivism
Here we consider some data on recidivism (that is, the return to crime of ex-criminals upon release from gaol) described by Rossi et al. (1980). There are 432 ex-cons in the study, and the following data have been recorded: week arrest fin age race wexp mar paro prio educ time of rearrest or censoring arrest indicator indicator of receipt of nancial aid upon release age at release race (1 for blacks, 0 for others) indicator of work experience prior to imprisonment indicator of marital status at time of release indicator of release on parole number of prior convictions education level (2: grade 6 or less, 3: grade 6 to 9, 4: grades 10 and 11, 5: grade 12, 6: tertiary)
In addition, weekly indicators of employment for one year are recorded, but for now we will ignore these.
Univariate ts We t eight models with a single covariate in each. We then consider the p-values associated with these covariates to decide if that covariate should be included in the next stage. We have chosen to use a threshold of p = 0.2 to decide inclusion.
library(survival) attach(read.table("Rossi.txt",header=T)) S=Surv(week,arrest) m1=coxph(S~fin) m2=coxph(S~age) m3=coxph(S~race)
CHAPTER 4. MODELLING WITH COX m4=coxph(S~wexp) m5=coxph(S~mar) m6=coxph(S~paro) m7=coxph(S~prio) m8=coxph(S~factor(educ)) Here are the summaries of the models, with comments: > summary(m1) Call: coxph(formula = S ~ fin) n= 432 coef exp(coef) se(coef) z p fin -0.369 0.691 0.190 -1.95 0.052 exp(coef) exp(-coef) lower .95 upper .95 0.691 1.45 0.477 1.00 0.956 ) 1 df, p=0.0501 1 df, p=0.0517 1 df, p=0.0504
117
fin
Rsquare= 0.009 (max possible= Likelihood ratio test= 3.84 on Wald test = 3.78 on Score (logrank) test = 3.83 on
Receipt of nancial aid may possibly reduce the risk of reoending. We include it in the next step.
> summary(m2) Call: coxph(formula = S ~ age) n= 432 coef exp(coef) se(coef) z p age -0.0728 0.93 0.0208 -3.50 0.00046
CHAPTER 4. MODELLING WITH COX exp(coef) exp(-coef) lower .95 upper .95 age 0.93 1.08 0.893 0.968 Rsquare= 0.035 (max possible= Likelihood ratio test= 15.3 on Wald test = 12.3 on Score (logrank) test = 12.7 on 0.956 ) 1 df, p=9.33e-05 1 df, p=0.000459 1 df, p=0.000368
118
Age has a large reduction on the risk of reoending. We include it in the next step. > summary(m3) Call: coxph(formula = S ~ race) n= 432 coef exp(coef) se(coef) z p race 0.231 1.26 0.305 0.756 0.45 exp(coef) exp(-coef) lower .95 upper .95 race 1.26 0.794 0.693 2.29 Rsquare= 0.001 (max possible= Likelihood ratio test= 0.61 on Wald test = 0.57 on Score (logrank) test = 0.57 on 0.956 ) 1 df, p=0.436 1 df, p=0.449 1 df, p=0.448
Race has no strong evidence of an eect when considered by itself. We drop if for now, but try to reincorporate it again later. > summary(m4) Call: coxph(formula = S ~ wexp) n= 432
CHAPTER 4. MODELLING WITH COX coef exp(coef) se(coef) z p wexp -0.583 0.558 0.188 -3.1 0.0019 exp(coef) exp(-coef) lower .95 upper .95 wexp 0.558 1.79 0.386 0.807 Rsquare= 0.022 (max possible= Likelihood ratio test= 9.63 on Wald test = 9.61 on Score (logrank) test = 9.88 on 0.956 ) 1 df, p=0.00191 1 df, p=0.00194 1 df, p=0.00167
119
Work-experience has a strong reduction on the risk of reoense. We include it. > summary(m5) Call: coxph(formula = S ~ mar) n= 432 coef exp(coef) se(coef) z p mar -0.712 0.491 0.367 -1.94 0.052 exp(coef) exp(-coef) lower .95 upper .95 mar 0.491 2.04 0.239 1.01 Rsquare= 0.011 (max possible= Likelihood ratio test= 4.64 on Wald test = 3.77 on Score (logrank) test = 3.93 on 0.956 ) 1 df, p=0.0312 1 df, p=0.0521 1 df, p=0.0473
Marital status may inuence recidivism, so we include it. > summary(m6) Call: coxph(formula = S ~ paro)
120
n= 432 coef exp(coef) se(coef) z p paro -0.109 0.897 0.191 -0.568 0.57 exp(coef) exp(-coef) lower .95 upper .95 paro 0.897 1.11 0.617 1.30 Rsquare= 0.001 (max possible= Likelihood ratio test= 0.32 on Wald test = 0.32 on Score (logrank) test = 0.32 on 0.956 ) 1 df, p=0.571 1 df, p=0.57 1 df, p=0.57
Parole seems to have no eect. We drop it for now.
> summary(m7) Call: coxph(formula = S ~ prio) n= 432 coef exp(coef) se(coef) z p prio 0.101 1.11 0.0267 3.78 0.00016 exp(coef) exp(-coef) lower .95 upper .95 1.11 0.904 1.05 1.17 0.956 ) 1 df, p=0.000531 1 df, p=0.000156 1 df, p=0.00013
prio
Rsquare= 0.027 (max possible= Likelihood ratio test= 12 on Wald test = 14.3 on Score (logrank) test = 14.6 on
The number of prior times an excon has done porridge seems to increase the risk of going back inside. We include it.
CHAPTER 4. MODELLING WITH COX > summary(m8) Call: coxph(formula = S ~ factor(educ)) n= 432 factor(educ)3 factor(educ)4 factor(educ)5 factor(educ)6 coef exp(coef) se(coef) z p 0.758 2.134 0.513 1.478 0.14 0.375 1.455 0.536 0.700 0.48 -0.238 0.788 0.671 -0.354 0.72 -0.656 0.519 1.118 -0.587 0.56 exp(coef) exp(-coef) lower .95 upper .95 2.134 0.469 0.781 5.83 1.455 0.687 0.509 4.16 0.788 1.268 0.212 2.94 0.519 1.928 0.058 4.64 0.956 ) 4 df, p=0.0190 4 df, p=0.0444 4 df, p=0.0335
121
factor(educ)3 factor(educ)4 factor(educ)5 factor(educ)6
No individual factor is signicantly dierent from an education class of 2. However, education as a whole is signicant (witness the overall p-values at the bottom). We therefore include it in the next round. Thus the covariates we decide to keep for the next stage are: fin, age, wexp, mar, prio and educ. We drop race and parole for now.
First multivariate model We t a model with the six covariates that were found to be signicant at the rst stage. The tted models output is:
122
> m9=coxph(S~fin+age+wexp+mar+prio+factor(educ)) > summary(m9) Call: coxph(formula = S ~ fin + age + wexp + mar + prio + factor(educ)) n= 432 fin age wexp mar prio factor(educ)3 factor(educ)4 factor(educ)5 factor(educ)6 coef exp(coef) se(coef) z p -0.3814 0.683 0.1920 -1.987 0.047 -0.0488 0.952 0.0219 -2.225 0.026 -0.1432 0.867 0.2125 -0.674 0.500 -0.4806 0.618 0.3800 -1.265 0.210 0.0802 1.083 0.0292 2.749 0.006 0.5771 1.781 0.5195 1.111 0.270 0.3624 1.437 0.5432 0.667 0.500 -0.1166 0.890 0.6749 -0.173 0.860 -0.4404 0.644 1.1221 -0.392 0.690 exp(coef) exp(-coef) lower .95 upper .95 0.683 1.464 0.4688 0.995 0.952 1.050 0.9123 0.994 0.867 1.154 0.5714 1.314 0.618 1.617 0.2937 1.302 1.083 0.923 1.0233 1.147 1.781 0.562 0.6434 4.930 1.437 0.696 0.4954 4.167 0.890 1.124 0.2371 3.340 0.644 1.553 0.0714 5.806 0.956 ) 9 df, p=2.60e-05 9 df, p=9.87e-05 9 df, p=4.22e-05
fin age wexp mar prio factor(educ)3 factor(educ)4 factor(educ)5 factor(educ)6
Rsquare= 0.082 (max possible= Likelihood ratio test= 37 on Wald test = 33.8 on Score (logrank) test = 35.9 on
The rst thing we note is that (excluding education, which had an overall p-value much lower than the individual factors p-values) the highest p-value is from the Wald test of work experience. Before, this was highly signicant, now, it is not at all. It seems possible that it is correlated with one of the
123
other predictors. A moments thought suggests that age and education are plausible confounders of work experience. The mean age for those with and without work experience are:
> for(i in 0:1){print(mean(age[wexp==i]))} [1] 22.12432 [1] 26.44939 i.e. 22 and 26 years, respectively. So it is possible that we can just use age instead of work experience or vice versa. The proportion of those with work experience by education level is: > for(i in 2:6){print(mean(wexp[educ==i]))} [1] 0.5833333 [1] 0.4769874 [1] 0.6806723 [1] 0.7948718 [1] 0.6363636 for which the general trend is for a greater education level to be more likely to result in work experience. Since education level is both signicant overall and non-signicant when we consider each category, we try a plot of the coecients to see if any pattern emerges, before we drop work experience. Recall that the factor with education 2 must have coecient 0.
cm=1/2.54;pdf("rec1.pdf",height=10*cm,width=10*cm) par(mai=c(2,2,0.5,0.5)*cm,mgp=c(2,0.75,0)) plot(2:6,c(0,m9$coeff[6:9]),xlab=education level, ylab=expression(beta)) dev.off()

0.6
124
0.4 2 0.2 0.0
0.2
0.4
4 education level
Except for those with an education level of 2, this is remarkably linear. We therefore attempt a model that has a linear response in education with an additional dummy variable for education 2.
> e2=educ==2 > m10=coxph(S~fin+age+wexp+mar+prio+educ+e2) > summary(m10) Call: coxph(formula = S ~ fin + age + wexp + mar + prio + educ + e2) n= 432 fin age wexp mar prio coef exp(coef) se(coef) z p -0.3788 0.685 0.1916 -1.977 0.0480 -0.0494 0.952 0.0219 -2.260 0.0240 -0.1369 0.872 0.2115 -0.647 0.5200 -0.4848 0.616 0.3799 -1.276 0.2000 0.0795 1.083 0.0292 2.726 0.0064
CHAPTER 4. MODELLING WITH COX educ -0.2948 e2TRUE -0.8824 0.745 0.414 0.1565 -1.884 0.0600 0.5593 -1.578 0.1100
125
exp(coef) exp(-coef) lower .95 upper .95 fin 0.685 1.461 0.470 0.997 age 0.952 1.051 0.912 0.993 wexp 0.872 1.147 0.576 1.320 mar 0.616 1.624 0.292 1.297 prio 1.083 0.924 1.023 1.146 educ 0.745 1.343 0.548 1.012 e2TRUE 0.414 2.417 0.138 1.238 Rsquare= 0.082 (max possible= Likelihood ratio test= 36.8 on Wald test = 34.0 on Score (logrank) test = 35.8 on 0.956 ) 7 df, p=5.11e-06 7 df, p=1.75e-05 7 df, p=7.82e-06
Neither this nor removing age makes wexp signicant, so we drop it from the model. A quick check of the non-signicance of this parameter after removal can be done as follows: > m11=coxph(S~fin+age+mar+prio+factor(educ)) > pchisq(2*(m9$logl[2]-m11$logl[2]),df=1,lower.tail=T) [1] 0.5004285 using the likelihood ratio test.
Second multivariate model We remove wexp and see whether any further parameters can be removed.
> summary(m11) Call:
CHAPTER 4. MODELLING WITH COX coxph(formula = S ~ fin + age + mar + prio + factor(educ)) n= 432 fin age mar prio factor(educ)3 factor(educ)4 factor(educ)5 factor(educ)6 coef exp(coef) se(coef) z p -0.3808 0.683 0.1919 -1.984 0.0470 -0.0533 0.948 0.0211 -2.532 0.0110 -0.5270 0.590 0.3734 -1.411 0.1600 0.0851 1.089 0.0283 3.008 0.0026 0.5633 1.756 0.5193 1.085 0.2800 0.3317 1.393 0.5414 0.613 0.5400 -0.1484 0.862 0.6734 -0.220 0.8300 -0.4451 0.641 1.1220 -0.397 0.6900 exp(coef) exp(-coef) lower .95 upper .95 0.683 1.464 0.4691 0.995 0.948 1.055 0.9097 0.988 0.590 1.694 0.2840 1.227 1.089 0.918 1.0301 1.151 1.756 0.569 0.6347 4.860 1.393 0.718 0.4822 4.026 0.862 1.160 0.2303 3.227 0.641 1.561 0.0711 5.777 0.956 ) 8 df, p=1.38e-05 8 df, p=5.79e-05 8 df, p=2.59e-05
126
fin age mar prio factor(educ)3 factor(educ)4 factor(educ)5 factor(educ)6
None of the other parameter values have been unduly aected by this removal. Education as a factor is clearly not going to be signicant. We thus try reverting to our previous two-part model for educations eect:
> m12=coxph(S~fin+age+mar+prio+educ+e2) > summary(m12) Call: coxph(formula = S ~ fin + age + mar + prio + educ + e2)
127
n= 432 fin age mar prio educ e2TRUE coef exp(coef) se(coef) z p -0.3782 0.685 0.1916 -1.97 0.0480 -0.0536 0.948 0.0210 -2.55 0.0110 -0.5292 0.589 0.3734 -1.42 0.1600 0.0843 1.088 0.0283 2.98 0.0029 -0.3037 0.738 0.1564 -1.94 0.0520 -0.8770 0.416 0.5598 -1.57 0.1200 exp(coef) exp(-coef) lower .95 upper .95 0.685 1.46 0.471 0.997 0.948 1.06 0.910 0.988 0.589 1.70 0.283 1.225 1.088 0.92 1.029 1.150 0.738 1.35 0.543 1.003 0.416 2.40 0.139 1.246 0.956 ) 6 df, p=2.32e-06 6 df, p=8.91e-06 6 df, p=4.24e-06
fin age mar prio educ e2TRUE
This has made the linear part almost signicant at the usual level (5%), though the education 2 part is not brilliant. A linear t does not help this, nor does a quadratic:
> m13=coxph(S~fin+age+mar+prio+educ) > summary(m13) Call: coxph(formula = S ~ fin + age + mar + prio + educ) n= 432 coef exp(coef) se(coef) z p fin -0.3399 0.712 0.1910 -1.78 0.0750 age -0.0593 0.942 0.0207 -2.87 0.0042
CHAPTER 4. MODELLING WITH COX mar -0.5211 prio 0.0896 educ -0.1816 0.594 1.094 0.834 0.3731 -1.40 0.1600 0.0280 3.20 0.0014 0.1301 -1.40 0.1600
128
fin age mar prio educ
exp(coef) exp(-coef) lower .95 upper .95 0.712 1.405 0.490 1.035 0.942 1.061 0.905 0.981 0.594 1.684 0.286 1.234 1.094 0.914 1.035 1.155 0.834 1.199 0.646 1.076 0.956 ) 5 df, p=3.08e-06 5 df, p=7.31e-06 5 df, p=4.15e-06
> m14=coxph(S~fin+age+mar+prio+educ+I(educ^2)) > summary(m14) Call: coxph(formula = S ~ fin + age + mar + prio + educ + I(educ^2)) n= 432 coef exp(coef) se(coef) z p fin -0.3679 0.692 0.192 -1.92 0.0550 age -0.0546 0.947 0.021 -2.60 0.0092 mar -0.5237 0.592 0.373 -1.40 0.1600 prio 0.0880 1.092 0.028 3.14 0.0017 educ 1.2517 3.496 1.022 1.23 0.2200 I(educ^2) -0.1956 0.822 0.139 -1.41 0.1600 exp(coef) exp(-coef) lower .95 upper .95 fin 0.692 1.445 0.476 1.008 age 0.947 1.056 0.909 0.987 mar 0.592 1.688 0.285 1.231 prio 1.092 0.916 1.034 1.153 educ 3.496 0.286 0.472 25.897 I(educ^2) 0.822 1.216 0.626 1.080
129
0.956 ) 6 df, p=3.05e-06 6 df, p=1.77e-05 6 df, p=7.67e-06
We therefore stick with the two-part model.
Can we reintroduce any covariates? We have excluded race, parole and work experience. We have already checked the non-signicance of work experience, but we have yet to double check that race and parole are non-signicant in the multivariate model. We check these now: > m15=coxph(S~fin+age+mar+prio+educ+I(educ^2)+paro) > summary(m15) Call: coxph(formula = S ~ fin + age + mar + prio + educ + I(educ^2) + paro) n= 432 fin age mar prio educ I(educ^2) paro coef exp(coef) se(coef) z p -0.3729 0.689 0.1920 -1.942 0.0520 -0.0555 0.946 0.0211 -2.629 0.0086 -0.5137 0.598 0.3739 -1.374 0.1700 0.0861 1.090 0.0283 3.037 0.0024 1.2496 3.489 1.0221 1.223 0.2200 -0.1955 0.822 0.1393 -1.403 0.1600 -0.0836 0.920 0.1949 -0.429 0.6700 exp(coef) exp(-coef) lower .95 upper .95 0.689 1.452 0.473 1.003 0.946 1.057 0.908 0.986 0.598 1.671 0.288 1.245
fin age mar
CHAPTER 4. MODELLING WITH COX prio educ I(educ^2) paro 1.090 3.489 0.822 0.920 0.918 0.287 1.216 1.087 1.031 0.471 0.626 0.628 1.152 25.866 1.081 1.348
130
Rsquare= 0.08 (max possible= 0.956 ) Likelihood ratio test= 36.0 on 7 df, Wald test = 31.9 on 7 df, Score (logrank) test = 33.8 on 7 df,
p=7.38e-06 p=4.33e-05 p=1.87e-05
> m16=coxph(S~fin+age+mar+prio+educ+I(educ^2)+race) > summary(m16) Call: coxph(formula = S ~ fin + age + mar + prio + educ + I(educ^2) + race) n= 432 coef exp(coef) se(coef) z p fin -0.3787 0.685 0.1917 -1.98 0.0480 age -0.0557 0.946 0.0211 -2.64 0.0083 mar -0.4741 0.622 0.3757 -1.26 0.2100 prio 0.0892 1.093 0.0279 3.20 0.0014 educ 1.2000 3.320 1.0183 1.18 0.2400 I(educ^2) -0.1907 0.826 0.1391 -1.37 0.1700 race 0.3347 1.398 0.3095 1.08 0.2800 exp(coef) exp(-coef) lower .95 upper .95 fin 0.685 1.460 0.470 0.997 age 0.946 1.057 0.908 0.986 mar 0.622 1.607 0.298 1.300 prio 1.093 0.915 1.035 1.155 educ 3.320 0.301 0.451 24.432 I(educ^2) 0.826 1.210 0.629 1.085 race 1.398 0.716 0.762 2.563 Rsquare= 0.082 (max possible= 0.956 ) Likelihood ratio test= 37.0 on 7 df, p=4.6e-06
CHAPTER 4. MODELLING WITH COX Wald test = 33.2 Score (logrank) test = 35.1 on 7 df, on 7 df, p=2.4e-05 p=1.06e-05
131
As before, neither warrants inclusion.
Scale We have already briey investigated the scale of education in an attempt to decide whether to retain it. We now consider its scale more thoroughly as well as the scale of the continuous covariates, age and prior stints inside. First, consider education: > m17=coxph(S~fin+age+mar+prio+I(educ^(-1))) > m18=coxph(S~fin+age+mar+prio+I(educ^(-2))) > m19=coxph(S~fin+age+mar+prio+I(educ^(-1))+I(educ^(-2))) > m20=coxph(S~fin+age+mar+prio+I(educ^(-1))+e2) > m21=coxph(S~fin+age+mar+prio+educ+I(educ^2)+I(educ^3)) > extractAIC(m12)[2] [1] 1326.379 > extractAIC(m17)[2] [1] 1328.726 > extractAIC(m18)[2] [1] 1329.173 > extractAIC(m19)[2] [1] 1326.285 > extractAIC(m20)[2] [1] 1326.744 > extractAIC(m21)[2] [1] 1328.193 A second order term in one over education marginally ts better than the two-part model we developed earlier. However, I (for one) feel that the twopart model is easier to understand and reects something special in those
132
with very low education levels. I have therefore decided to stick to the twopart model. Now consider age. > m22=coxph(S~fin+age+I(age^2)+mar+prio+educ+e2) > m23=coxph(S~fin+I(age^(-1))+mar+prio+educ+e2) > m24=coxph(S~fin+I(log(age))+mar+prio+educ+e2) > m25=coxph(S~fin+I(sqrt(age))+mar+prio+educ+e2) > extractAIC(m12)[2] [1] 1326.379 > extractAIC(m22)[2] [1] 1326.187 > extractAIC(m23)[2] [1] 1324.457 > extractAIC(m24)[2] [1] 1325.314 > extractAIC(m25)[2] [1] 1325.827 From these it appears that something involving the inverse of age might be better than age by itself. A few variants on this can be tried: > m26=coxph(S~fin+age+I(age^(-1))+mar+prio+educ+e2) > m27=coxph(S~fin+I(age^(-1))+I(age^(-2))+mar+prio+educ+e2) > extractAIC(m23)[2] [1] 1324.457 > extractAIC(m26)[2] [1] 1325.676 > extractAIC(m27)[2] [1] 1325.504 but neither improve the t. We therefore consider the model with just the inverse of age. We now consider the number of prior oences. Note that this can be 0, which limits our options.
CHAPTER 4. MODELLING WITH COX > m28=coxph(S~fin+I(age^(-1))+mar+prio+I(prio^2)+educ+e2) > m29=coxph(S~fin+I(age^(-1))+mar+I(sqrt(prio))+educ+e2) > extractAIC(m23)[2] [1] 1324.457 > extractAIC(m28)[2] [1] 1325.721 > extractAIC(m29)[2] [1] 1326.787
133
Neither transformation improves the t. We therefore stick with a linear function of the number of prior oences (in the log hazard). Our nal no-interaction model has the following R output: > summary(m23) Call: coxph(formula = S ~ fin + I(age^(-1)) + mar + prio + educ + e2) n= 432 fin I(age^(-1)) mar prio educ e2TRUE coef exp(coef) se(coef) z p -0.3743 6.88e-01 0.1915 -1.95 0.0510 38.2527 4.10e+16 12.8748 2.97 0.0030 -0.4840 6.16e-01 0.3754 -1.29 0.2000 0.0847 1.09e+00 0.0284 2.98 0.0029 -0.2928 7.46e-01 0.1567 -1.87 0.0620 -0.8822 4.14e-01 0.5581 -1.58 0.1100 exp(coef) exp(-coef) lower .95 upper .95 6.88e-01 1.45e+00 4.72e-01 1.00e+00 4.10e+16 2.44e-17 4.51e+05 3.73e+27 6.16e-01 1.62e+00 2.95e-01 1.29e+00 1.09e+00 9.19e-01 1.03e+00 1.15e+00 7.46e-01 1.34e+00 5.49e-01 1.01e+00 4.14e-01 2.42e+00 1.39e-01 1.24e+00
fin I(age^(-1)) mar prio educ e2TRUE
Rsquare= 0.085 (max possible= 0.956 ) Likelihood ratio test= 38.3 on 6 df, p=9.8e-07
CHAPTER 4. MODELLING WITH COX Wald test = 36.0 Score (logrank) test = 37.8 on 6 df, on 6 df, p=2.71e-06 p=1.22e-06
134
Note that some terms such as marriage and education level are not signicant at the traditional level. We retain these for now, in case they have an interaction with any other eect.
Interactions Normally, when there are p parameters in a model, there are p(1 p)/2 possible interactions. Here, though, there is no point in including an interaction in the two education variables, and so we have only 14 possible interactions. These are specied using the following R commands:
mi1 =coxph(S~fin*I(age^(-1))+mar+prio+educ+e2) mi2 =coxph(S~I(age^(-1))+fin*mar+prio+educ+e2) mi3 =coxph(S~I(age^(-1))+mar+fin*prio+educ+e2) mi4 =coxph(S~I(age^(-1))+mar+prio+fin*educ+e2) mi5 =coxph(S~I(age^(-1))+mar+prio+educ+fin*e2) mi6 =coxph(S~fin+I(age^(-1))*mar+prio+educ+e2) mi7 =coxph(S~fin+mar+I(age^(-1))*prio+educ+e2) mi8 =coxph(S~fin+mar+prio+I(age^(-1))*educ+e2) mi9 =coxph(S~fin+mar+prio+educ+I(age^(-1))*e2) mi10=coxph(S~fin+I(age^(-1))+mar*prio+educ+e2) mi11=coxph(S~fin+I(age^(-1))+prio+mar*educ+e2) mi12=coxph(S~fin+I(age^(-1))+prio+educ+mar*e2) mi13=coxph(S~fin+I(age^(-1))+mar+prio*educ+e2) mi14=coxph(S~fin+I(age^(-1))+mar+educ+prio*e2) Warning messages about convergence issues come from models mi5 and mi12, presumably as a result of the very small number of cases when both binary covariates in the interaction are positive. In any case, only the interaction between education and prior incarcerations is signicant:
135
> mi13 Call: coxph(formula = S ~ fin + I(age^(-1)) + mar + prio * educ + e2) coef exp(coef) se(coef) z p -0.3991 6.71e-01 0.1926 -2.07 0.0380 37.9055 2.90e+16 12.9048 2.94 0.0033 -0.4354 6.47e-01 0.3772 -1.15 0.2500 -0.2183 8.04e-01 0.1574 -1.39 0.1700 -0.5706 5.65e-01 0.2211 -2.58 0.0099 -0.9128 4.01e-01 0.5624 -1.62 0.1000 0.0928 1.10e+00 0.0462 2.01 0.0450 on 7 df, p=5.33e-07 n= 432
fin I(age^(-1)) mar prio educ e2TRUE prio:educ
Likelihood ratio test=42.0
We now have to decide whether to include this interaction or not. If we do, then the coecient for prior itself is no longer signicant. As we have almost nished building the nal preliminary model, we must also decide whether to keep marital status as a predictor or not. Fitting the interaction model without marriage yields: > mi15 Call: coxph(formula = S ~ fin + I(age^(-1)) + prio * educ + e2) coef exp(coef) se(coef) z p fin -0.3865 6.79e-01 0.1921 -2.01 0.04400 I(age^(-1)) 41.7700 1.38e+18 12.6183 3.31 0.00093 prio -0.2359 7.90e-01 0.1572 -1.50 0.13000 educ -0.5854 5.57e-01 0.2212 -2.65 0.00810 e2TRUE -0.9031 4.05e-01 0.5621 -1.61 0.11000 prio:educ 0.0979 1.10e+00 0.0461 2.12 0.03400 Likelihood ratio test=40.5 on 6 df, p=3.7e-07 n= 432
Almost all parameters are the same. No added signicance to any other parameter results. We therefore decide to drop marriage.
136
The quandary remains: to keep the interaction between education and prior incarcerations or not? The next step in the model building process is model evaluation. It therefore seems prudent to take two models forward to this stage: one with and one without the interaction. Thus our nal models are mi15 and > m30=coxph(S~fin+I(age^(-1))+prio+educ+e2) > summary(m30) Call: coxph(formula = S ~ fin + I(age^(-1)) + prio + educ + e2) n= 432 fin I(age^(-1)) prio educ e2TRUE coef exp(coef) se(coef) z p -0.3597 6.98e-01 0.1910 -1.88 0.06000 42.4985 2.86e+18 12.6309 3.36 0.00077 0.0835 1.09e+00 0.0285 2.93 0.00340 -0.2940 7.45e-01 0.1569 -1.87 0.06100 -0.8714 4.18e-01 0.5577 -1.56 0.12000 exp(coef) exp(-coef) lower .95 upper .95 6.98e-01 1.43e+00 4.80e-01 1.01e+00 2.86e+18 3.49e-19 5.08e+07 1.62e+29 1.09e+00 9.20e-01 1.03e+00 1.15e+00 7.45e-01 1.34e+00 5.48e-01 1.01e+00 4.18e-01 2.39e+00 1.40e-01 1.25e+00 0.956 ) 5 df, p=7.83e-07 5 df, p=1.73e-06 5 df, p=8.43e-07
fin I(age^(-1)) prio educ e2TRUE
Written mathematically using their maximum likelihood estimates, these are respectively:
h(1) (t, x) = h0 (t) exp{0.36xn + 42/xage + 0.084xprio 0.29xeduc 0.87xe2 }
(1)
CHAPTER 4. MODELLING WITH COX and
137
h(2) (t, x) = h0 (t) exp{0.39xn + 42/xage 0.24xprio 0.59xeduc 0.90xe2 + 0.098xprio xeduc }. where x = (xn , xage , xprio , xeduc , xe2 )T .
(2)
Is the proportional hazards assumption met? The easiest way to assess this is via tests of the hypothesis that it is, as described earlier in the chapter. If we do these tests for our models, we nd that for the no interaction model, for both our transformed age and education, the pha is doubtful: > cox.zph(m30,transform=rank,global=F) rho chisq p fin -0.000837 8.07e-05 0.9928 I(age^(-1)) 0.186255 4.30e+00 0.0380 prio -0.113074 1.71e+00 0.1907 educ -0.173203 3.37e+00 0.0663 e2TRUE -0.013299 1.87e-02 0.8911 However, the model with an interaction solves the problem for education: > cox.zph(mi15,transform=rank,global=F) rho chisq p fin 0.003147 1.16e-03 0.9729 I(age^(-1)) 0.198099 4.82e+00 0.0281 prio -0.027347 9.82e-02 0.7540 educ -0.102920 1.40e+00 0.2375 e2TRUE -0.000256 6.91e-06 0.9979 prio:educ 0.015642 3.42e-02 0.8533
138
If we had used a linear scale for age, we would still have had this problem: > mi16=coxph(S~fin+age+prio*educ+e2) > zi16=cox.zph(mi16,transform=rank,global=F) > zi16 rho chisq p fin 0.00857 0.00864 0.9259 age -0.21510 6.30122 0.0121 prio -0.02552 0.08457 0.7712 educ -0.10309 1.39455 0.2376 e2TRUE 0.00899 0.00833 0.9273 prio:educ 0.01286 0.02290 0.8797 Therefore, although our interaction helps, it does not yield a model that fully satises the pha. We would thus not use this model. Instead we would have to adjust it to account for the non-proportionality in the hazards. This is discussed in the next chapter.
139
4.5
A further method for assessing the PHA
Thus far we have considered three methods for assessing the validity of the pha. These were: Graphical: plot of expected (under the pha) versus observed (using the KaplanMeier method without assuming the pha) survival curves. Graphical: plot of complementary loglog plots using the KaplanMeier method Quantitative: test of no correlation between the Schoenfeld residuals and (possibly rank) survival time. There is a fourth method which we left until last as it leads directly on to the next chapter. This is also a quantitative method in that it gives a p-value for the hypothesis that the pha is satised.
4.5.1
Motivation for extending the PHM
Consider again linear regression. The model is yi = a + bxi + ei where ei N(0, 2 ). This model would be inappropriate if the mean for y were not linear in x, but in fact quadratic, say. A test of the hypothesis of linearity could be performed by tting the quadratic model yi = a + bxi + cx2 i + ei and testing if c = 0 (e.g. by the likelihood ratio test). That is, we could generalise the model and test whether this generalisation improved the t. We can do a similar thing for the phm. We attempt to generalise th phm so that the pha is no longer satised, i.e. by adding an interaction between covariates and time. Then we assess if the interaction is signicant. The form for the extended Cox model is
h(t, x) = h0 (t) exp{1 x1 + 2 x2 + . . . + 1 x1 g1 (t) + 2 x2 g2 (t) + . . .}
140
where gi (t) is some function that we must specify (examples follow shortly) and i is a parameter. If we t this model, we can test the pha by testing if i = 0. If H0 : 1 = 2 = . . . = 0 is true then we reduce to the phm. There are several alternatives for gi (t). These include: gi (t) = t gi (t) = log t gi (t) = 1{t > T0 } where 1{A} = 1 if A is true and 0 otherwise, and T0 is some specied time Of these it transpires that the latter, called the heaviside function, is considerably easier to t and it is the function I recommend. In the next chapter we should how to t this model and interpret its results. Note that the choice of g and of T0 if using the heaviside function is subjective. Your decision might be inuenced by the graphical assessment of the pha, but it still involves your personal judgement. The disadvantage of this approach relative to the Schoenfeld residuals is that it requires precisely this subjective judgement.

Notes 4

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Notes 4

Încărcat de

Drepturi de autor:

Formate disponibile

Chapter 4 Modelling with Cox

CHAPTER 4. MODELLING WITH COX

Graphical methods to assess the validity of the proportional hazards assumption

Expected versus observed plots

CHAPTER 4. MODELLING WITH COX

CHAPTER 4. MODELLING WITH COX

^ S(t) 0.4 0.6

no prior therapy prior therapy

The plot was created using the following R commands.

CHAPTER 4. MODELLING WITH COX

^ S(t) 0.4 0.6

large squamous adeno small

CHAPTER 4. MODELLING WITH COX

^ S(t) 0.4 0.6

perf >= 60 perf < 60

CHAPTER 4. MODELLING WITH COX

Using loglog plots to assess proportionality

CHAPTER 4. MODELLING WITH COX

If we zoom in on the initial period we obtain the following.

CHAPTER 4. MODELLING WITH COX

CHAPTER 4. MODELLING WITH COX

Testing the proportional hazards assumption

^ S(t) 0.4 0.6

no prior therapy prior therapy

CHAPTER 4. MODELLING WITH COX

CHAPTER 4. MODELLING WITH COX

CHAPTER 4. MODELLING WITH COX

Compare this to data from the model h(t, x) = 0.1 exp(0.5x).

CHAPTER 4. MODELLING WITH COX

Scaled Schoenfeld residuals

CHAPTER 4. MODELLING WITH COX

Models with multiple parameters

This can be repeated for each covariate.

Testing the proportional hazards assumption using Schoenfelds residuals

CHAPTER 4. MODELLING WITH COX

Multiple tests warning

CHAPTER 4. MODELLING WITH COX

CHAPTER 4. MODELLING WITH COX

Checking the scale of covariates

CHAPTER 4. MODELLING WITH COX

Akaike Information Criterion

CHAPTER 4. MODELLING WITH COX

CHAPTER 4. MODELLING WITH COX

CHAPTER 4. MODELLING WITH COX

CHAPTER 4. MODELLING WITH COX

CHAPTER 4. MODELLING WITH COX Using AIC

+ celltype <none> + trt + age + prior + diagtime - karno

CHAPTER 4. MODELLING WITH COX

CHAPTER 4. MODELLING WITH COX

CHAPTER 4. MODELLING WITH COX

pchisq(2*(ll1-ll0),lower.tail=FALSE,log.p=TRUE,df=1) pchisq(2*(ll2-ll0),lower.tail=FALSE,log.p=TRUE,df=1) pchisq(2*(ll3-ll0),lower.tail=FALSE,log.p=TRUE,df=1) pchisq(2*(ll4-ll0),lower.tail=FALSE,log.p=TRUE,df=1) pchisq(2*(ll5-ll0),lower.tail=FALSE,log.p=TRUE,df=1)

CHAPTER 4. MODELLING WITH COX pchisq(2*(ll10-ll0),lower.tail=FALSE,log.p=TRUE,df=1) pchisq(2*(ll11-ll0),lower.tail=FALSE,log.p=TRUE,df=1) pchisq(2*(ll12-ll0),lower.tail=FALSE,log.p=TRUE,df=1) pchisq(2*(ll13-ll0),lower.tail=FALSE,log.p=TRUE,df=1)

CHAPTER 4. MODELLING WITH COX

CHAPTER 4. MODELLING WITH COX

CHAPTER 4. MODELLING WITH COX

Purposeful selection of covariates

CHAPTER 4. MODELLING WITH COX

library(survival) attach(read.table("Rossi.txt",header=T)) S=Surv(week,arrest) m1=coxph(S~fin) m2=coxph(S~age) m3=coxph(S~race)

CHAPTER 4. MODELLING WITH COX

Parole seems to have no eect. We drop it for now.

factor(educ)3 factor(educ)4 factor(educ)5 factor(educ)6

CHAPTER 4. MODELLING WITH COX

fin age wexp mar prio factor(educ)3 factor(educ)4 factor(educ)5 factor(educ)6

pchisq(2(ll1-ll0),lower.tail=FALSE,log.p=TRUE,df=1) pchisq(2(ll2-ll0),lower.tail=FALSE,log.p=TRUE,df=1) pchisq(2(ll3-ll0),lower.tail=FALSE,log.p=TRUE,df=1) pchisq(2(ll4-ll0),lower.tail=FALSE,log.p=TRUE,df=1) pchisq(2*(ll5-ll0),lower.tail=FALSE,log.p=TRUE,df=1)

CHAPTER 4. MODELLING WITH COX pchisq(2(ll10-ll0),lower.tail=FALSE,log.p=TRUE,df=1) pchisq(2(ll11-ll0),lower.tail=FALSE,log.p=TRUE,df=1) pchisq(2(ll12-ll0),lower.tail=FALSE,log.p=TRUE,df=1) pchisq(2(ll13-ll0),lower.tail=FALSE,log.p=TRUE,df=1)

cm=1/2.54;pdf("rec1.pdf",height=10cm,width=10cm) par(mai=c(2,2,0.5,0.5)*cm,mgp=c(2,0.75,0)) plot(2:6,c(0,m9$coeff[6:9]),xlab=education level, ylab=expression(beta)) dev.off()