[R] Apparently Conflicting Results with coxph

Tue Oct 2 18:36:17 CEST 2007

  From my experience, what you are seeing is almost certainly a patient 
selection effect.  (The number 1 reason for puzzling results is incorrect coding 
of a time-dependent covariate, but you appear to have been quite careful).

   Assigning the implant as a non-time dependent covariate almost guarrantees 
that the estimated effect will be beneficial.  The only people who get an 
implant are those who live longer than average (long enough to get an implant).  
The size of such a bias is surprisingly large.  The problem is rediscovered in 
the cancer field every few years, in comparisons of responders to 
non-responders. 

   As a time-dependent covariate, you have the problem of indication for 
treatment.  Say for instance that the devices were very expensive, and were only 
used for patients in immenent danger of death.  For a device that was a placebo 
you would find, not surprisingly, that being selected for implantation carried a 
major risk.  The device may need to be extremely effective to overcome this type 
of bias.  As a simple example, if you compare the death rate of those who have 
seen a oncologist (cancer doc) in the last month to those who have not done so, 
you find that the former group has a much higher death rate.

   Terry Therneau

> Kevin E. Thorpe wrote:
>> Dear List:
>>
>> I have a data frame prepared in the couting process style for including
>> a binary time-dependent covariate.  The first few rows look like this.
>>
>>     PtNo Start    End Status Imp
>> 1      1     0  608.0      0   0
>> 2      2     0  513.0      0   0
>> 3      2   513  887.0      0   1
>> 4      3     0   57.0      0   0
>> 5      3    57  604.0      0   1
>> 6      4     0  150.0      1   0
>>
>>
>> The outcome is mortality and the covariate is for an implantable
>> defibrillator, so it is expected that the implant would reduce the
>> risk of death.  The results of fitting coxph (survival package) are:
>>
>> Call:
>> coxph(formula = Surv(Start, End, Status) ~ Imp, data = nina.excl)
>>
>>
>>      coef exp(coef) se(coef)     z    p
>> Imp 0.163      1.18    0.485 0.337 0.74
>>
>> Likelihood ratio test=0.11  on 1 df, p=0.738  n= 335
>>
>> Since this was unexpected, I created a non-counting process data
>> frame with an indicator variable representing received an implant
>> or not.  Here are the results:
>>
>> Call:
>> coxph(formula = Surv(Days, Dead) ~ Implant, data = nina.excl0)
>>
>>
>>          coef exp(coef) se(coef)     z       p
>> Implant -1.77     0.171    0.426 -4.15 3.3e-05
>>
>> Likelihood ratio test=19.1  on 1 df, p=1.21e-05  n= 197
>>
>> I found this degree of discrepancy surprising, especially the point
>> estimate of the coefficient.  I have verified the data frames are
>> set up correctly.
>>
>> Here is what I have tried to understand what is going on.
>>
>> I tried fitting models adjusted for other covariates that I have in
>> the data frame.  This did not appreciably affect the coefficients
>> for the implant variable.
>>
>> I ran cox.zph on the two models shown above and plotted the results.
>> In both cases, the point estimate of Beta(t) is sort of parabolic
>> in that the curves are monotonically increasing to a local maximum
>> after which they are monotonically decreasing (the CIs are a bit
>> more wiggly).
>>
>> I would interpret this to mean that the effect of implant is probably
>> time-dependent.  If so, how do I actually get a "proper" estimate of
>> beta(t) for a variable like this?
>>
>> Are there some other things I should look at to understand what's
>> going on?