[R] Problem in anova with coxph object

Peter Dalgaard P.Dalgaard at biostat.ku.dk
Tue Jan 8 18:54:37 CET 2008


Matthias Gondan wrote:
> Dear R users,
>
> I noticed a problem in the anova command when applied on
> a single coxph object if there are missing observations in
> the data:
>
> This example code was run on R-2.6.1:
>
>  > library(survival)
>  > data(colon)
>  > colondeath = colon[colon$etype==2, ]
>  > m = coxph(Surv(time, status) ~ rx + sex + age + perfor, data=colondeath)
>  > m
> Call:
> coxph(formula = Surv(time, status) ~ rx + sex + age + perfor,
>     data = colondeath)
>
>                coef exp(coef) se(coef)      z      p
> rxLev     -0.028895     0.972  0.11037 -0.262 0.7900
> rxLev+5FU -0.374286     0.688  0.11885 -3.149 0.0016
> sex       -0.000754     0.999  0.09431 -0.008 0.9900
> age        0.002442     1.002  0.00405  0.603 0.5500
> perfor     0.155695     1.168  0.26286  0.592 0.5500
>
> Likelihood ratio test=12.8  on 5 df, p=0.0251  n= 929
>
>  > anova(m, test='Chisq')
> Analysis of Deviance Table
>  Cox model: response is Surv(time, status)
> Terms added sequentially (first to last)
>
>         Df  Deviance Resid. Df Resid. Dev P(>|Chi|)
> NULL                       929     5860.4         
> rx       2      12.1       927     5848.2 2.302e-03
> sex      1 2.054e-05       926     5848.2       1.0
> age      1       0.3       925     5847.9       0.6
> perfor   1       0.3       924     5847.6       0.6
>
> Now I include nodes which has some missing data:
>
>  > m = coxph(Surv(time, status) ~ rx + sex + age + perfor + nodes, 
> data=colondeath)
>  > m
> Call:
> coxph(formula = Surv(time, status) ~ rx + sex + age + perfor +
>     nodes, data = colondeath)
>
>               coef exp(coef) se(coef)      z       p
> rxLev     -0.08245     0.921  0.11168 -0.738 0.46000
> rxLev+5FU -0.40310     0.668  0.12054 -3.344 0.00083
> sex       -0.02854     0.972  0.09573 -0.298 0.77000
> age        0.00547     1.005  0.00405  1.350 0.18000
> perfor     0.19040     1.210  0.26335  0.723 0.47000
> nodes      0.09296     1.097  0.00889 10.460 0.00000
>
> Likelihood ratio test=88.3  on 6 df, p=1.11e-16  n=911 (18 observations 
> deleted due to missingness)
>
>  > anova(m, test='Chisq')
> Analysis of Deviance Table
>  Cox model: response is Surv(time, status)
> Terms added sequentially (first to last)
>
>         Df  Deviance Resid. Df Resid. Dev P(>|Chi|)
> NULL                       911     5700.6         
> rx       2       0.0       909     5848.2       1.0
> sex      1 2.054e-05       908     5848.2       1.0
> age      1       0.3       907     5847.9       0.6
> perfor   1       0.3       906     5847.6       0.6
> nodes    1     235.3       905     5612.3 4.253e-53
>
> The strange thing is that rx is not significant anymore.
>
> In the documentation for anova.coxph, there is a warning that
>
>   
>> The comparison between two or more models by |anova| or will only be 
>> valid if they are fitted to the same dataset. This may be a problem if 
>> there are missing values.
>>
>>     
> However, I inserted a single object to be analyzed sequentially. Is
> this a bug in R, or is it covered by the warning?
>   
Notice that you also lose the 18 observations in the comparison of  .~rx
with the empty model.

This is standard, losing observations on the way through an anova table
leads to madness.

What happens if you do something like

coxph(Surv(time, status) ~ rx, 
data=colondeath, subset=complete.cases(nodes))

or the corresponding survdiff() call?

-- 
   O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)                  FAX: (+45) 35327907




More information about the R-help mailing list