[R] Apparently Conflicting Results with coxph
Terry Therneau
therneau at mayo.edu
Tue Oct 2 18:36:17 CEST 2007
From my experience, what you are seeing is almost certainly a patient
selection effect. (The number 1 reason for puzzling results is incorrect coding
of a time-dependent covariate, but you appear to have been quite careful).
Assigning the implant as a non-time dependent covariate almost guarrantees
that the estimated effect will be beneficial. The only people who get an
implant are those who live longer than average (long enough to get an implant).
The size of such a bias is surprisingly large. The problem is rediscovered in
the cancer field every few years, in comparisons of responders to
non-responders.
As a time-dependent covariate, you have the problem of indication for
treatment. Say for instance that the devices were very expensive, and were only
used for patients in immenent danger of death. For a device that was a placebo
you would find, not surprisingly, that being selected for implantation carried a
major risk. The device may need to be extremely effective to overcome this type
of bias. As a simple example, if you compare the death rate of those who have
seen a oncologist (cancer doc) in the last month to those who have not done so,
you find that the former group has a much higher death rate.
Terry Therneau
> Kevin E. Thorpe wrote:
>> Dear List:
>>
>> I have a data frame prepared in the couting process style for including
>> a binary time-dependent covariate. The first few rows look like this.
>>
>> PtNo Start End Status Imp
>> 1 1 0 608.0 0 0
>> 2 2 0 513.0 0 0
>> 3 2 513 887.0 0 1
>> 4 3 0 57.0 0 0
>> 5 3 57 604.0 0 1
>> 6 4 0 150.0 1 0
>>
>>
>> The outcome is mortality and the covariate is for an implantable
>> defibrillator, so it is expected that the implant would reduce the
>> risk of death. The results of fitting coxph (survival package) are:
>>
>> Call:
>> coxph(formula = Surv(Start, End, Status) ~ Imp, data = nina.excl)
>>
>>
>> coef exp(coef) se(coef) z p
>> Imp 0.163 1.18 0.485 0.337 0.74
>>
>> Likelihood ratio test=0.11 on 1 df, p=0.738 n= 335
>>
>> Since this was unexpected, I created a non-counting process data
>> frame with an indicator variable representing received an implant
>> or not. Here are the results:
>>
>> Call:
>> coxph(formula = Surv(Days, Dead) ~ Implant, data = nina.excl0)
>>
>>
>> coef exp(coef) se(coef) z p
>> Implant -1.77 0.171 0.426 -4.15 3.3e-05
>>
>> Likelihood ratio test=19.1 on 1 df, p=1.21e-05 n= 197
>>
>> I found this degree of discrepancy surprising, especially the point
>> estimate of the coefficient. I have verified the data frames are
>> set up correctly.
>>
>> Here is what I have tried to understand what is going on.
>>
>> I tried fitting models adjusted for other covariates that I have in
>> the data frame. This did not appreciably affect the coefficients
>> for the implant variable.
>>
>> I ran cox.zph on the two models shown above and plotted the results.
>> In both cases, the point estimate of Beta(t) is sort of parabolic
>> in that the curves are monotonically increasing to a local maximum
>> after which they are monotonically decreasing (the CIs are a bit
>> more wiggly).
>>
>> I would interpret this to mean that the effect of implant is probably
>> time-dependent. If so, how do I actually get a "proper" estimate of
>> beta(t) for a variable like this?
>>
>> Are there some other things I should look at to understand what's
>> going on?
More information about the R-help
mailing list