[R] complexity of operations in R
Duncan Murdoch
murdoch.duncan at gmail.com
Fri Jul 20 01:00:27 CEST 2012
On 12-07-19 4:00 PM, Jan van der Laan wrote:
>
> When the length of the end result is not known, doubling the length of
> the list is also much faster than increasing the size of the list with
> single items.
>
> f <- function(n, preallocate) {
> v <- if(preallocate) vector("list",n) else list() ;
> for(i in seq_len(n)) {
> v[[i]] <- i
> }
> v
> }
>
> g <- function(n) {
> N <- 1000
> v <- vector("list", N)
> for(i in seq_len(n)) {
> if (i > N) {
> N <- 2 * N
> length(v) <- N
> }
> v[[i]] <- i
> }
> v[1:i]
> }
>
>
> > system.time(f(5E4, TRUE))
> user system elapsed
> 0.968 0.000 0.975
> > system.time(f(5E4, FALSE))
> user system elapsed
> 52.611 0.136 54.197
> > system.time(g(5E4))
> user system elapsed
> 1.388 0.008 1.424
>
>
> What causes these differences? I can imagine that the time needed for
> memory allocations play a role: multiple small allocations will be
> smaller than one large allocation. But that doesn't explain the
> quadratic growth in time. I would expect that to be linear. When doing
> v[[i]] <- i the list isn't copied, right?
If i is bigger than length(v), it has to be copied. To make the list
bigger, R allocates a bigger list, and copies all the pointers over,
then does the assignment. It could do this less frequently by
over-allocating, but it was written in a time when memory was expensive,
and programmers who cared about timing did pre-allocations.
Duncan Murdoch
>
> Jan
>
>
>
> On 07/19/2012 06:21 PM, William Dunlap wrote:
>> Preallocation of lists does speed things up. The following shows
>> time quadratic in size when there is no preallocation and linear
>> growth when there is, for size in the c. 10^4 to 10^6 region:
>>> f <- function(n, preallocate) { v <- if(preallocate)vector("list",n) else list() ; for(i in seq_len(n)) v[[i]] <- i ; v }
>>> identical(f(17,pre=TRUE), f(17,pre=FALSE))
>> [1] TRUE
>>> system.time(f(n=1e4, preallocate=FALSE))
>> user system elapsed
>> 0.324 0.000 0.326
>>> system.time(f(n=2e4, preallocate=FALSE)) # 2x n, 4x time
>> user system elapsed
>> 1.316 0.012 1.329
>>> system.time(f(n=4e4, preallocate=FALSE)) # ditto
>> user system elapsed
>> 5.720 0.028 5.754
>>>
>>> system.time(f(n=1e4, preallocate=TRUE))
>> user system elapsed
>> 0.016 0.000 0.017
>>> system.time(f(n=2e4, preallocate=TRUE)) # 2x n, 2x time
>> user system elapsed
>> 0.032 0.004 0.036
>>> system.time(f(n=4e4, preallocate=TRUE)) # ditto
>> user system elapsed
>> 0.068 0.000 0.069
>>>
>>> system.time(f(n=4e5, preallocate=TRUE)) # 10x n, 10x time
>> user system elapsed
>> 0.688 0.000 0.688
>>
>> Above 10^6 there is some superlinearity
>>> system.time(f(n=4e6, preallocate=TRUE)) # 10x n, 20x time
>> user system elapsed
>> 11.125 0.052 11.181
>>
>> Bill Dunlap
>> Spotfire, TIBCO Software
>> wdunlap tibco.com
>>
>>
>>> -----Original Message-----
>>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
>>> Behalf Of Bert Gunter
>>> Sent: Thursday, July 19, 2012 9:11 AM
>>> To: Hadley Wickham
>>> Cc: r-help at r-project.org
>>> Subject: Re: [R] complexity of operations in R
>>>
>>> Hadley et. al:
>>>
>>> Indeed. And using a loop is a poor way to do it anyway.
>>>
>>> v <- as.list(rep(FALSE,dotot))
>>>
>>> is way faster.
>>>
>>> -- Bert
>>>
>>> On Thu, Jul 19, 2012 at 8:50 AM, Hadley Wickham <hadley at rice.edu> wrote:
>>>> On Thu, Jul 19, 2012 at 8:02 AM, Jan van der Laan <rhelp at eoos.dds.nl> wrote:
>>>>> Johan,
>>>>>
>>>>> Your 'list' and 'array doubling' code can be written much more efficient.
>>>>>
>>>>> The following function is faster than your g and easier to read:
>>>>>
>>>>> g2 <- function(dotot) {
>>>>> v <- list()
>>>>> for (i in seq_len(dotot)) {
>>>>> v[[i]] <- FALSE
>>>>> }
>>>>> }
>>>>
>>>> Except that you don't need to pre-allocate lists...
>>>>
>>>> Hadley
>>>>
>>>> --
>>>> Assistant Professor / Dobelman Family Junior Chair
>>>> Department of Statistics / Rice University
>>>> http://had.co.nz/
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>>
>>> --
>>>
>>> Bert Gunter
>>> Genentech Nonclinical Biostatistics
>>>
>>> Internal Contact Info:
>>> Phone: 467-7374
>>> Website:
>>> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-
>>> biostatistics/pdb-ncb-home.htm
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list