[Rd] Deep copy of factor levels?

Prof Brian Ripley ripley at stats.ox.ac.uk
Mon Mar 17 10:30:10 CET 2014


Do use current R (3.1.0 alpha at present: 3.0.2 is obsolete) before 
reporting such things.  I think you will see that this has changed ....

On 17/03/2014 09:13, Kirill Müller wrote:
> Hi
>
>
> It seems that selecting an element of a factor will copy its levels
> (Ubuntu 13.04, R 3.0.2). Below is the output of a script that creates a
> factor with 10000 elements and then calls as.list() on it. The new
> object seems to use more than 700 MB, and inspection of the levels of
> the individual elements of the list suggest that they are distinct objects.
>
> Perhaps some performance gain could be achieved by copying the levels
> "by reference", but I don't know R internals well enough to see if it's
> possible. Is there a particular reason for creating a full copy of the
> factor levels?
>
> This has come up when looking at the performance of rbind.fill (in the
> plyr package) with factors: https://github.com/hadley/plyr/issues/206 .
>
>
> Best regards
>
> Kirill
>
>
>
>  > gc()
>            used (Mb) gc trigger  (Mb)  max used   (Mb)
> Ncells  325977 17.5    1074393  57.4  10049951  536.8
> Vcells 4617168 35.3   87439742 667.2 204862160 1563.0
>  > system.time(x <- factor(seq_len(1e4)))
>     user  system elapsed
>    0.008   0.000   0.007
>  > system.time(xx <- as.list(x))
>     user  system elapsed
>    4.263   0.000   4.322
>  > gc()
>              used  (Mb) gc trigger  (Mb)  max used   (Mb)
> Ncells    385991  20.7    1074393  57.4  10049951  536.8
> Vcells 104672187 798.6  112367694 857.3 204862160 1563.0
>  > .Internal(inspect(levels(xx[[1]])))
> @387f620 16 STRSXP g1c7 [MARK,NAM(2)] (len=10000, tl=0)
>    @144da4e8 09 CHARSXP g1c1 [MARK,gp=0x60] [ASCII] [cached] "1"
>    @144da518 09 CHARSXP g1c1 [MARK,gp=0x60] [ASCII] [cached] "2"
>    @27d1298 09 CHARSXP g1c1 [MARK,gp=0x60] [ASCII] [cached] "3"
>    @144da548 09 CHARSXP g1c1 [MARK,gp=0x60] [ASCII] [cached] "4"
>    @144da578 09 CHARSXP g1c1 [MARK,gp=0x60] [ASCII] [cached] "5"
>    ...
>  > .Internal(inspect(levels(xx[[2]])))
> @1b38cb90 16 STRSXP g1c7 [MARK,NAM(2)] (len=10000, tl=0)
>    @144da4e8 09 CHARSXP g1c1 [MARK,gp=0x60] [ASCII] [cached] "1"
>    @144da518 09 CHARSXP g1c1 [MARK,gp=0x60] [ASCII] [cached] "2"
>    @27d1298 09 CHARSXP g1c1 [MARK,gp=0x60] [ASCII] [cached] "3"
>    @144da548 09 CHARSXP g1c1 [MARK,gp=0x60] [ASCII] [cached] "4"
>    @144da578 09 CHARSXP g1c1 [MARK,gp=0x60] [ASCII] [cached] "5"
>    ...
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel


-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-devel mailing list