[Rd] internal copying in R (soon to be released R-3.1.0
Simon Urbanek
simon.urbanek at r-project.org
Mon Mar 3 19:37:34 CET 2014
On Mar 2, 2014, at 12:37 PM, Jens Oehlschlägel <jens.oehlschlaegel at truecluster.com> wrote:
> Dear core group,
>
> Which operation in R guarantees to get a true copy of an atomic vector, not just a second symbol pointing to the same shared memory?
>
None, there is no concept of "shared" memory at R level. You seem to be mixing C level API specifics and the R language. In the former duplicate() creates a new copy.
> y <- x[]
> #?
>
> y <- x
> y[1] <- y[1]
> #?
>
> Is there any function that returns its argument as a non-shared atomic but only copies if the argument was shared?
>
> Given an atomic vector x, what is the best official way to find out whether other symbols share the vector RAM? Querying NAMED() < 2 doesn't work because .Call sets sxpinfo_struct.named to 2. It even sets it to 2 if the argument to .Call was a never-named expression!?
>
> > named(1:3)
> [1] 2
>
Assuming that you are talking about the C API, please consider reading about the concepts involved. .Call() doesn't set named to 2 at all - it passes whatever object is passed so it is the C code's responsibility to handle incoming objects according to the desired semantics (see the previous post here).
> And it seems to set it permanently, pure read-access can trigger copy-on-modify:
>
> > x <- integer(1e8)
> > system.time(x[1]<-1L)
> User System verstrichen
> 0 0 0
> > system.time(x[1]<-2L)
> User System verstrichen
> 0 0 0
>
> having called .Call now leads to an unnecessary copy on the next assignment
>
> > named(x)
> [1] 2
> > system.time(x[1]<-3L)
> User System verstrichen
> 0.14 0.07 0.20
> > system.time(x[1]<-4L)
> User System verstrichen
> 0 0 0
>
> this not only happens with user written functions doing read-access
>
> > is.unsorted(x)
> [1] TRUE
> > system.time(x[1]<-5L)
> User System verstrichen
> 0.11 0.09 0.21
>
> Why don't you simply give package authors read-access to sxpinfo_struct.named in .Call (without setting it to 2)? That would give us more control and also save some unnecessary copying.
Again, you're barking up the wrong tree - .Call() doesn't bump NAMED at all - it simply passes the object:
#include <Rinternals.h>
SEXP nam(SEXP x) { return ScalarInteger(NAMED(x)); }
> .Call("nam", 1+1)
[1] 0
> x=1+1
> .Call("nam", x)
[1] 1
> y=x
> .Call("nam", x)
[1] 2
Cheers,
Simon
> I guess once R switches to reference-counting preventive increasing in .Call could not be continued anyhow.
>
> Kind regards
>
>
> Jens Oehlschlägel
>
> P.S. please cc me in answers as I am not member of r-devel
>
>
> P.P.S. function named() was tentatively defined as follows:
>
> named <- function(x)
> .Call("R_bit_named", x, PACKAGE="bit")
>
> SEXP R_bit_named(SEXP x){
> SEXP ret_;
> PROTECT( ret_ = allocVector(INTSXP,1) );
> INTEGER(ret_)[0] = NAMED(x);
> UNPROTECT(1);
> return ret_;
> }
>
>
> > version
> _
> platform x86_64-w64-mingw32
> arch x86_64
> os mingw32
> system x86_64, mingw32
> status Under development (unstable)
> major 3
> minor 1.0
> year 2014
> month 02
> day 28
> svn rev 65091
> language R
> version.string R Under development (unstable) (2014-02-28 r65091)
> nickname Unsuffered Consequences
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
More information about the R-devel
mailing list