[R] any other fast method for median calculation
    Dimitris Rizopoulos 
    d.rizopoulos at erasmusmc.nl
       
    Tue Apr 14 11:34:04 CEST 2009
    
    
  
S Ellison wrote:
> Sorting with an appropriate algorithm is nlog(n), so it's very hard to
> get the 'exact' median any faster. However, if you can cope with a less
> precise median, you could use a binary search between max(x) and min(x)
> with low tolerance or comparatively few iterations. In native R, though,
> that isn;t going to be fast; interpreter overhead will likely more than
> wipe out any reduction in number of comparisons.
> 
> In any case, it looks like you are not constrained by the median
> algorithm, but by the number of calls. You might do a lot better with
> apply, though 
>> apply(df,2,median)
well, for data frames, I think sapply(...) or even unlist(lapply(...)) 
will be faster, e.g.,
mat <- matrix(rnorm(50*2e05), 50, 2e05)
DF <- as.data.frame(mat)
invisible({gc(); gc()})
system.time(apply(DF, 2, median))
invisible({gc(); gc()})
system.time(sapply(DF, median))
invisible({gc(); gc()})
system.time(unlist(lapply(DF, median), use.names = FALSE))
Best,
Dimitris
> On my system 200k columns were processed in negligible time by apply
> and I'm still waiting for mapply.
> 
> S
> 
> 
> 
>>>> "Zheng, Xin (NIH) [C]" <zhengxin at mail.nih.gov> 14/04/2009 05:29:40
>>>>
> Hi there,
> 
> I got a data frame with more than 200k columns. How could I get median
> of each column fast? mapply is the fastest function I know for that,
> it's not yet satisfied though. 
> 
> It seems function "median" in R calculates median by "sort" and "mean".
> I am wondering if there is another function with better algorithm.
> 
> Any hint?
> 
> Thanks,
> 
> Xin Zheng
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help 
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html 
> and provide commented, minimal, self-contained, reproducible code.
> 
> *******************************************************************
> This email and any attachments are confidential. Any use...{{dropped:8}}
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
-- 
Dimitris Rizopoulos
Assistant Professor
Department of Biostatistics
Erasmus University Medical Center
Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
Tel: +31/(0)10/7043478
Fax: +31/(0)10/7043014
    
    
More information about the R-help
mailing list