[Rd] Deep copy of factor levels?
Prof Brian Ripley
ripley at stats.ox.ac.uk
Mon Mar 17 10:30:10 CET 2014
Do use current R (3.1.0 alpha at present: 3.0.2 is obsolete) before
reporting such things. I think you will see that this has changed ....
On 17/03/2014 09:13, Kirill Müller wrote:
> Hi
>
>
> It seems that selecting an element of a factor will copy its levels
> (Ubuntu 13.04, R 3.0.2). Below is the output of a script that creates a
> factor with 10000 elements and then calls as.list() on it. The new
> object seems to use more than 700 MB, and inspection of the levels of
> the individual elements of the list suggest that they are distinct objects.
>
> Perhaps some performance gain could be achieved by copying the levels
> "by reference", but I don't know R internals well enough to see if it's
> possible. Is there a particular reason for creating a full copy of the
> factor levels?
>
> This has come up when looking at the performance of rbind.fill (in the
> plyr package) with factors: https://github.com/hadley/plyr/issues/206 .
>
>
> Best regards
>
> Kirill
>
>
>
> > gc()
> used (Mb) gc trigger (Mb) max used (Mb)
> Ncells 325977 17.5 1074393 57.4 10049951 536.8
> Vcells 4617168 35.3 87439742 667.2 204862160 1563.0
> > system.time(x <- factor(seq_len(1e4)))
> user system elapsed
> 0.008 0.000 0.007
> > system.time(xx <- as.list(x))
> user system elapsed
> 4.263 0.000 4.322
> > gc()
> used (Mb) gc trigger (Mb) max used (Mb)
> Ncells 385991 20.7 1074393 57.4 10049951 536.8
> Vcells 104672187 798.6 112367694 857.3 204862160 1563.0
> > .Internal(inspect(levels(xx[[1]])))
> @387f620 16 STRSXP g1c7 [MARK,NAM(2)] (len=10000, tl=0)
> @144da4e8 09 CHARSXP g1c1 [MARK,gp=0x60] [ASCII] [cached] "1"
> @144da518 09 CHARSXP g1c1 [MARK,gp=0x60] [ASCII] [cached] "2"
> @27d1298 09 CHARSXP g1c1 [MARK,gp=0x60] [ASCII] [cached] "3"
> @144da548 09 CHARSXP g1c1 [MARK,gp=0x60] [ASCII] [cached] "4"
> @144da578 09 CHARSXP g1c1 [MARK,gp=0x60] [ASCII] [cached] "5"
> ...
> > .Internal(inspect(levels(xx[[2]])))
> @1b38cb90 16 STRSXP g1c7 [MARK,NAM(2)] (len=10000, tl=0)
> @144da4e8 09 CHARSXP g1c1 [MARK,gp=0x60] [ASCII] [cached] "1"
> @144da518 09 CHARSXP g1c1 [MARK,gp=0x60] [ASCII] [cached] "2"
> @27d1298 09 CHARSXP g1c1 [MARK,gp=0x60] [ASCII] [cached] "3"
> @144da548 09 CHARSXP g1c1 [MARK,gp=0x60] [ASCII] [cached] "4"
> @144da578 09 CHARSXP g1c1 [MARK,gp=0x60] [ASCII] [cached] "5"
> ...
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-devel
mailing list