[R] Cox model
darteta001 at ikasle.ehu.es
darteta001 at ikasle.ehu.es
Tue Feb 12 16:52:41 CET 2008
Dear Eleni,
from a previous post regarding maximum number of variables in a
multiple linear regression analysis, posted last tuesday, and I think
it can be relevant also to Cox PH models:
"I can think of
no circumstance where multiple regression on "hundreds of thousands of
variables" is anything more than a fancy random number generator"
The thread is continued by someone having your same problem:
"When I try a regression problem with
3,000 coefficients in R running under Windows XP 64 bit with 8Gb of
memory
on the machine and the /3Gb option active (i.e., R can get up to 3Gb),
R
2.6.1 runs out of memory (apparently trying to duplicate the model
matrix)"
but the author continues...
"...one must be careful doing ordinary linear
regression with large numbers of coefficients. It does seem a little
unlikely that there is sufficient data to get useful estimates of
three
thousand coefficients using linear regression"
I also work with genomic data and it seems a well-accepted rule to
filter data. I am sure not all of your 18000 genes are relevant to
your study or have an effect on survival. Have a look at BioConductor
mailing list for info on this topic.
Best
David
> Hi David,
>
> The problem is that I need all these regressors. I need a
coefficient for
> every one of them and then rank them according to that coefficient.
>
> Thanks,
> Eleni
>
> On Feb 12, 2008 4:54 PM, <darteta001 at ikasle.ehu.es> wrote:
>
> > Hi Eleni,
> >
> > I am not an expert in R or statistics but in my opinion you have
too
> > many regressors compared to the number of observations and that
might
> > be the reason why you get the error. Others might say better but as
> > far as I know, having only 80 observations, it is a good idea to
first
> > filter your list of variables down to a few tenths.
> >
> >
> > HTH
> >
> > David
> >
> > > Hello R-community,
> > >
> > > It's been a week now that I am struggling with the
implementation of
> > a cox
> > > model in R. I have 80 cancer patients, so 80 time measurements
and 80
> > > relapse or no measurements (respective to censor, 1 if relapsed
over
> > the
> > > examined period, 0 if not). My microarray data contain around
18000
> > genes.
> > > So I have the expressions of 18000 genes in each of the 80 tumors
> > (matrix
> > > 80*18000). I would like to build a cox model in order to retrieve
> > the most
> > > significant genes (according to the p-value). The command that I
am
> > using
> > > is:
> > >
> > > test1 <- list(time,relapse,genes)
> > > coxph( Surv(time, relapse) ~ genes, test1)
> > >
> > > where time is a vector of size 80 containing the times, relapse
is a
> > vector
> > > of size 80 containing the relapse values and genes is a matrix
> > 80*18000.
> > > When I give the coxph command I retrieve an error saying that
cannot
> > > allocate vector of size 2.7Mb (in Windows). I also tried linux
and
> > then I
> > > receive error that maximum memory is reached. I increase the
memory
> > by
> > > initializing R with the command:
> > > R --min-vsize=10M --max-vsize=250M --min-nsize=1M --max-
nsize=200M
> > >
> > > I think it cannot get better than that because if I try for
example
> > > max-vsize=300 the memomry capacity is stored as NA.
> > >
> > > Does anyone have any idea why this happens and how I can
overcome it?
> > >
> > > I would be really grateful if you could help!
> > > It has been bothering me a lot!
> > >
> > > Thank you all,
> > > Eleni
> > >
> > > [[alternative HTML version deleted]]
> > >
> > > ______________________________________________
> > > R-help at r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide http://www.R-
project.org/posting-
> > guide.html
> > > and provide commented, minimal, self-contained, reproducible
code.
> > >
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
More information about the R-help
mailing list