[Rd] stopping finalizers
Hadley Wickham
h.wickham at gmail.com
Sat Feb 16 02:32:41 CET 2013
> The subset table isn't a copy of the subset, it contains the unique key and
> an indicator column showing whether the element is in the subset. I need
> this even if the subset is never modified, so that I can join it to the main
> table and use it in SQL 'where' conditions to get computations for the right
> subset of the data.
Cool - Is that faster than storing a column that just contains the
include indices?
> The whole point of this new sqlsurvey package is that most of the
> aggregation operations happen in the database rather than in R, which is
> faster for very large data tables. The use case is things like the American
> Community Survey and the Nationwide Emergency Department Subsample, with
> millions or tens of millions of records and quite a lot of variables. At
> this scale, loading stuff into memory isn't feasible on commodity desktops
> and laptops, and even on computers with enough memory, the database
> (MonetDB) is faster.
Have you done any comparisons of monetdb vs sqlite - I'm interested to
know how much faster it is. I'm working on a package
(https://github.com/hadley/dplyr) that compiles R data manipulation
expressions into (e.g. SQL), and have been wondering if it's worth
considering a column-store like monetdb.
Hadley
--
Chief Scientist, RStudio
http://had.co.nz/
More information about the R-devel
mailing list