[R] Building a web risk calculator based on Cox, PH--definitive method for calculating probability?
Terry Therneau
therneau at mayo.edu
Wed Jul 18 15:06:47 CEST 2012
Here is an example of how to do it.
> library(survival)
> vfit <- coxph(Surv(time, status) ~ celltype + trt, data=veteran)
> userinput <- data.frame(celltype="smallcell", trt = 1)
> usercurve <- survfit(vfit, newdata=userinput) #the entire predicted
survival curve
> user2 <- summary(usercurve, time= 2*365.25) # 2 year time point
> user2$surv
[1] 0.0007438084
Some comments:
1. The summary function for survival curves was written so that people
could print out shorter summaries, but it works nicely for picking off a
particular time point. Since the curve is a step function and there
isn't likely to be a step at exactly "x" years, this is a bit more work
to do yourself. You might want to include the confidence limits in your
web report as well.
2. Put the whole formula into your coxph call. I have never ever
understood why people use the construct
tempvar <- with(data, Surv(time, status))
coxph(tempvar ~ age + sex + ....
It leaves you with harder to read code, poorer documentation (the
printout from coxph no longer shows the actual response variable), leads
to hard-to-diagnose failures for certain uses of predict, ... the list
goes on. I have not yet thought of a single good reason for doing it
other than "because you can".
3. Make the user data the same as the original. In the veteran cancer
data set "trt" is a numeric 0/1 variable, you had it as a factor in the
new data set.
4. Your should get your keyboard fixed -- it appears that the spacebar
is disabled when writing code :-)
5. If you plot the survival curve for the veterans cancer data set it
only reaches to about 2 1/2 years, so the summary for 5 years will
return NULL.
Terry Therneau
On 07/18/2012 05:00 AM, r-help-request at r-project.org wrote:
> I am a medical student and as a capstone for my summer research project I am
> going to create a simple online web "calculator" for users to input their
> relevant data, and a probability of relapse within 5 years will be computed
> and returned based on the Cox PH model I have developed.
>
> The issue I'm having is finding a definitive method/function to feed the
> user's "newdata" and return the probability of relapse within 5 years. I
> have googled this and the answers seems to be inconsistent; I have variously
> seen people recommend survest(), survfit(), and predict.coxph(). Terry had
> a useful answer
> http://r.789695.n4.nabble.com/how-to-calculate-predicted-probability-of-Cox-model-td4515265.html
> here but I didn't quite understand what he meant in his last sentence.
>
> Here is some code for you to quickly illustrate what you suggest.
>
> library(rms)
> library(survival)
> library(Hmisc)
> data(veteran)
> dd=datadist(veteran)
> options(datadist='dd')
> options(digits=4)
> obj=with(veteran,Surv(time,status))
> vetcoxph=coxph(obj~celltype+trt,data=veteran) #I will fit models from
> both the survival and rms packages so you can
> #use what you like
> vetcph=cph(obj~celltype+trt,data=veteran,surv=TRUE,time.inc=5*365,x=T,y=T)
> #let's say the user inputted that their cell type was smallcell and their
> treatment was "1".
> userinput=data.frame(celltype='smallcell',trt=factor(1))
>
> I really appreciate your recommendations
>
> Best,
> Jahan
More information about the R-help
mailing list