[Rd] [R] Successive subsets from a vector?
Thomas Lumley
tlumley at u.washington.edu
Tue Aug 22 16:54:59 CEST 2006
On Tue, 22 Aug 2006, hadley wickham wrote:
>> The loop method took 195 secs. Just assigning to an answer of the correct
>> length reduced this to 5 secs. e.g. use
>>
>> ADDRESSES <- character(length(VECTOR)-4)
>>
>> Moral: don't grow vectors repeatedly.
>
> Other languages (eg. Java) grow the size of the vector independently
> of the number of observations in it (I think Java doubles the size
> whenever the vector is filled), thus changing O(n) behaviour to O(log
> n). I've always wondered why R doesn't do this.
>
(redirected to r-devel, a better location for wonder of this type)
This was apparently the intention at the beginnng of time, thus the LENGTH
and TRUELENGTH macros in the source.
In many cases, though, there is duplication as well as length change, eg
x<-c(x, something)
will set NAMED(x) to 2 by the second iteration, forcing duplication at
each subsequent iteration. The doubling strategy would still leave us with
O(n) behaviour, just with a smaller constant.
The only case I can think of where the doubling strategy actually helps a
lot is the one in Atte's example, assigning off the end of an existing
vector. That wasn't legal in early versions of R (and I think most people
would agree that it shouldn't be encouraged).
A reAllocVector() function would clearly have some benefits, but not as
many as one would expect. That's probably why it hasn't been done (which
doesn't mean that it shouldn't be done).
Providing the ability to write assignment functions that don't duplicate
is a more urgent problem.
-thomas
Thomas Lumley Assoc. Professor, Biostatistics
tlumley at u.washington.edu University of Washington, Seattle
More information about the R-devel
mailing list