[R] complexity of operations in R
Paul Johnson
pauljohn32 at gmail.com
Thu Jul 19 19:06:58 CEST 2012
On Thu, Jul 19, 2012 at 11:11 AM, Bert Gunter <gunter.berton at gene.com> wrote:
> Hadley et. al:
>
> Indeed. And using a loop is a poor way to do it anyway.
>
> v <- as.list(rep(FALSE,dotot))
>
> is way faster.
>
> -- Bert
>
Its not entirely clear to me what we are supposed to conclude about this.
I can confirm Bert's claim that his way is much faster. I ran
Johan's code, then Bert's suggestion. The speedup is almost
unbelievable.
> system.time(f(dotot))
user system elapsed
25.249 0.332 25.606
> #system.time(g(dotot))
> system.time(h(dotot))
user system elapsed
15.753 0.404 16.173
> p <- function(dotot){v <- as.list(rep(FALSE,dotot))}
> system.time(p(dotot))
user system elapsed
0.004 0.000 0.004
Why is this so much faster?
> f
function(dotot){
v<-matrix(ncol=1,nrow=0)
for(i in 1:dotot){
v<-rbind(v,FALSE)
}
return(v)
}
Obvious problems.
1. The return() at the end is unnecessary. Cutting that off saves 4%
or so on time.
>
> f2 <- function(dotot){
+ v<-matrix(ncol=1,nrow=0)
+ for(i in 1:dotot){
+ v<-rbind(v,FALSE)
+ }
+ v
+ }
> system.time(f2(dotot))
user system elapsed
24.150 0.220 24.391
2. Use of rbind is SLOW. Calling rbind over and over again is VERY
slow (http://pj.freefaculty.org/R/WorkingExamples/stackListItems.Rout).
3. Calling matrix over and over to create nothing is SLOW.
If we just want a giant empty list, do this:
mylst <- vector(mode="list", length=dotot)
> system.time(mylst <- vector(mode="list", length=dotot))
user system elapsed
0.000 0.000 0.001
In a practical application, where something has to be created to go
into the list, I think some speed considerations should be:
1. When a 'thing' is created and put in the list, do we need to
allocate memory twice like this
x <- do.whatever()
mylst[[999]] <- x
This will create x, and then make a duplicate of x to go into mylst. BORING!
if x is a big thing, I think better to avoid copying by making it anonymous:
mylst[[999]] <- do.whatever()
If memory is a consideration at all, this is better, and it avoids the
problem of allocating memory twice.
Anyway, it seems to me the point of this thread should be that
allocating a big giant list of nothings is arbitrarily fast, but it is
still not known
1. what is the efficient way to create things to go into the list,
2. what is the best way to make use of the results once they are collected.
I'm sure for looping and calling rbind lots of times is a really bad
idea. Everything else is on the table for discussion. Its not the for
loop that makes the original f function slow, its repeated use of
matrix and rbind.
pj
--
Paul E. Johnson
Professor, Political Science Assoc. Director
1541 Lilac Lane, Room 504 Center for Research Methods
University of Kansas University of Kansas
http://pj.freefaculty.org http://quant.ku.edu
More information about the R-help
mailing list