[R] Cox model

Duncan Murdoch murdoch at stats.uwo.ca
Wed Feb 13 15:30:42 CET 2008


On 2/13/2008 9:08 AM, Gustaf Rydevik wrote:
> On Feb 13, 2008 3:06 PM, Gustaf Rydevik <gustaf.rydevik at gmail.com> wrote:
>> On Feb 13, 2008 2:37 PM, Matthias Gondan <matthias-gondan at gmx.de> wrote:
>> > Hi Eleni,
>> >
>> > The problem of this approach is easily explained: Under the Null
>> > hypothesis, the P values
>> > of a significance test are random variables, uniformly distributed in
>> > the interval [0, 1]. It
>> > is easily seen that the lowest of these P values is not any 'better'
>> > than the highest of the
>> > P values.
>> >
>> > Best wishes,
>> >
>> > Matthias
>> >
>>
>> Correct me if I'm wrong, but isn't that the point? I assume that the
>> hypothesis is that one or more of these genes are true predictors,
>> i.e. for these genes the p-value should be significant. For all the
>> other genes, the p-value is uniformly distributed. Using a
>> significance level of 0.01, and an a priori knowledge that there are
>> significant genes, you will end up with on the order of 20 genes, some
>> of which are the "true" predictors, and the rest being false
>> positives. this set of 20 genes can then be further analysed. A much
>> smaller and easier problem to solve, no?
>>
>>
>> /Gustaf
> 
> Sorry, it should say 200 genes instead of 20.
> 

I agree with your general point, but want to make one small quibble: 
the choice of 0.01 as a cutoff depends pretty strongly on the 
distribution of the p-value under the alternative.  With a small sample 
size and/or a small effect size, that may miss the majority of the true 
predictors.  You may need it to be 0.1 or higher to catch most of them, 
and then you'll have 10 times as many false positives to wade through 
(but still 10 times fewer than you started with, so your main point 
still holds).

Duncan Murdoch



More information about the R-help mailing list