[R] Indexes on dataframe columns?
Duncan Murdoch
murdoch at stats.uwo.ca
Thu Oct 25 16:10:37 CEST 2007
On 10/25/2007 9:27 AM, Ranjan Bagchi wrote:
> Hi --
>
> I'm working with some data frames with fairly high nrows (call it 8
> columns, by 20,000 rows). Are there any indexes on these columns?
>
> When I do a df[df$foo == 42,] [which I think is idiomatic], am I doing a linear
> search or something better? If the column contents is ordered, I'd like
> to at least be doing a naive binary search.
You're not doing a search at all: you are calculating a vector of TRUE
and FALSE values, then selecting the rows corresponding to TRUE values.
No optimization is done, so it doesn't matter if the values are unique
or sorted.
20,000 rows is not a particularly large number nowadays, so this may be
reasonable. I believe you'll get a fast search if the foo column is
used as row names, but you'll need to time it to be sure. Then the
indexing would be df["42", ].
If it's still too slow, I'd advise against using data frames. Matrix
indexing is much faster.
Duncan Murdoch
More information about the R-help
mailing list