[R] complexity of operations in R

Fri Jul 20 01:00:27 CEST 2012

On 12-07-19 4:00 PM, Jan van der Laan wrote:
>
> When the length of the end result is not known, doubling the length of
> the list is also much faster than increasing the size of the list with
> single items.
>
> f <- function(n, preallocate) {
>       v <- if(preallocate) vector("list",n) else list() ;
>       for(i in seq_len(n)) {
>          v[[i]] <- i
>       }
>       v
> }
>
> g <- function(n) {
>       N <- 1000
>       v <- vector("list", N)
>       for(i in seq_len(n)) {
>          if (i > N) {
>              N <- 2 * N
>              length(v) <- N
>          }
>          v[[i]] <- i
>       }
>       v[1:i]
> }
>
>
>   > system.time(f(5E4, TRUE))
>      user  system elapsed
>     0.968   0.000   0.975
>   > system.time(f(5E4, FALSE))
>      user  system elapsed
>    52.611   0.136  54.197
>   > system.time(g(5E4))
>      user  system elapsed
>     1.388   0.008   1.424
>
>
> What causes these differences? I can imagine that the time needed for
> memory allocations play a role: multiple small allocations will be
> smaller than one large allocation. But that doesn't explain the
> quadratic growth in time. I would expect that to be linear. When doing
> v[[i]] <- i the list isn't copied, right?

If i is bigger than length(v), it has to be copied.  To make the list 
bigger, R allocates a bigger list, and copies all the pointers over, 
then does the assignment.  It could do this less frequently by 
over-allocating, but it was written in a time when memory was expensive, 
and programmers who cared about timing did pre-allocations.

Duncan Murdoch

>
> Jan
>
>
>
> On 07/19/2012 06:21 PM, William Dunlap wrote:
>> Preallocation of lists does speed things up.  The following shows
>> time quadratic in size when there is no preallocation and linear
>> growth when there is, for size in the c. 10^4 to 10^6 region:
>>> f <- function(n, preallocate) { v <- if(preallocate)vector("list",n) else list() ; for(i in seq_len(n)) v[[i]] <- i ; v }
>>> identical(f(17,pre=TRUE), f(17,pre=FALSE))
>> [1] TRUE
>>> system.time(f(n=1e4, preallocate=FALSE))
>>      user  system elapsed
>>     0.324   0.000   0.326
>>> system.time(f(n=2e4, preallocate=FALSE)) # 2x n, 4x time
>>      user  system elapsed
>>     1.316   0.012   1.329
>>> system.time(f(n=4e4, preallocate=FALSE)) # ditto
>>      user  system elapsed
>>     5.720   0.028   5.754
>>>
>>> system.time(f(n=1e4, preallocate=TRUE))
>>      user  system elapsed
>>     0.016   0.000   0.017
>>> system.time(f(n=2e4, preallocate=TRUE)) # 2x n, 2x time
>>      user  system elapsed
>>     0.032   0.004   0.036
>>> system.time(f(n=4e4, preallocate=TRUE)) # ditto
>>      user  system elapsed
>>     0.068   0.000   0.069
>>>
>>> system.time(f(n=4e5, preallocate=TRUE)) # 10x n, 10x time
>>      user  system elapsed
>>     0.688   0.000   0.688
>>
>> Above 10^6 there is some superlinearity
>>> system.time(f(n=4e6, preallocate=TRUE)) # 10x n, 20x time
>>      user  system elapsed
>>    11.125   0.052  11.181
>>
>> Bill Dunlap
>> Spotfire, TIBCO Software
>> wdunlap tibco.com
>>
>>
>>> -----Original Message-----
>>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
>>> Behalf Of Bert Gunter
>>> Sent: Thursday, July 19, 2012 9:11 AM
>>> To: Hadley Wickham
>>> Cc: r-help at r-project.org
>>> Subject: Re: [R] complexity of operations in R
>>>
>>> Hadley et. al:
>>>
>>> Indeed. And using a loop is a poor way to do it anyway.
>>>
>>> v <- as.list(rep(FALSE,dotot))
>>>
>>> is way faster.
>>>
>>> -- Bert
>>>
>>> On Thu, Jul 19, 2012 at 8:50 AM, Hadley Wickham <hadley at rice.edu> wrote:
>>>> On Thu, Jul 19, 2012 at 8:02 AM, Jan van der Laan <rhelp at eoos.dds.nl> wrote:
>>>>> Johan,
>>>>>
>>>>> Your 'list' and 'array doubling' code can be written much more efficient.
>>>>>
>>>>> The following function is faster than your g and easier to read:
>>>>>
>>>>> g2 <- function(dotot) {
>>>>>     v <- list()
>>>>>     for (i in seq_len(dotot)) {
>>>>>       v[[i]] <- FALSE
>>>>>     }
>>>>> }
>>>>
>>>> Except that you don't need to pre-allocate lists...
>>>>
>>>> Hadley
>>>>
>>>> --
>>>> Assistant Professor / Dobelman Family Junior Chair
>>>> Department of Statistics / Rice University
>>>> http://had.co.nz/
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>>
>>> --
>>>
>>> Bert Gunter
>>> Genentech Nonclinical Biostatistics
>>>
>>> Internal Contact Info:
>>> Phone: 467-7374
>>> Website:
>>> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-
>>> biostatistics/pdb-ncb-home.htm
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>