[R] lean and mean lm/glm?
    Thomas Lumley 
    tlumley at u.washington.edu
       
    Wed Aug 23 19:15:29 CEST 2006
    
    
  
On Wed, 23 Aug 2006, Damien Moore wrote:
>
> Thomas Lumley < tlumley at u.washington.edu > wrote:
>
>> I have written most of a bigglm() function where the data= argument is a
>> function with a single argument 'reset'. When called with reset=FALSE the
>> function should return another chunk of data, or NULL if no data are
>> available, and when called with reset=TRUE it should go back to the
>> beginning of the data. I don't think this is too inelegant.
>
> yes, that does sound like a pretty elegent solution. It would be even 
> more so if you could offer a default implementation of the data_function 
> that simply passes chunks of large X and y matrices held in memory.
I have done that for data frames.
> (ideally you would just intialize the data_function to reference the X 
> and y data to avoid duplicating it, don't know if that's possible in R.)
The part that is extracted is a copy. The whole thing isn't copied, 
though.
The chunk would have to be a copy if it were an R matrix because matrices 
are stored in continguous column-major format and a chunk won't be 
contiguous. I think an implementation that uses precomputed design 
matrices would want to be written in C and call the incremental QR 
decomposition routines row by row.  The reason for working in chunks in R 
is to allow model.frame and model.matrix to work reasonably efficiently, 
and they aren't needed if you already have the design matrix.
> how long before its ready? :)
Depends on how many more urgent things intervene.
 	-thomas
Thomas Lumley			Assoc. Professor, Biostatistics
tlumley at u.washington.edu	University of Washington, Seattle
    
    
More information about the R-help
mailing list