[Rd] Data frames and row names
Prof Brian Ripley
ripley at stats.ox.ac.uk
Tue Aug 15 08:11:32 CEST 2006
On Mon, 14 Aug 2006, Henrik Bengtsson wrote:
> In R-devel v2.4.0 NEWS:
>
> o The 'row.names' of a data frame may be stored internally as an
> integer or character vector. This can result in considerably
> more compact storage (and more logical row names from rbind)
> when the row.names are 1:nrow(x). However, such data frames
> are not compatible with earlier versions of R: this can be
> ensured by supplying a character vector as 'row.names'.
>
> This is great.
>
> With row.names == NULL for 1:nrow(x) the storage would be even more
> compact.
A few bytes more compact. Some day you may get up to the next few lines
of NEWS which say
The internal storage of row.names = 1:n just records 'n' for
efficiency with very long vectors.
(BTW, this is four months' old news, hence my 'some day' comment.)
> I noticed that the number of rows is inferred from row
> names:
>
> > dim.data.frame
> function (x)
> c(length(attr(x, "row.names")), length(x))
> <environment: namespace:base>
>
> but couldn't the number of rows be inferred from the first column, if
> there are no row names? I realize that this would break the case with
> zero-column data frames, e.g.
>
> > df <- data.frame(a=1:10)
> > df[,-1]
> NULL data frame with 10 rows.
>
> ...but maybe there is a way around that too.
Yes, see above.
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-devel
mailing list