[Rd] "Default" accessor in S4 classes
Simon Urbanek
simon.urbanek at r-project.org
Tue Jan 8 01:50:59 CET 2013
Chris,
On Jan 7, 2013, at 6:23 PM, Chris Jewell wrote:
> Hi All,
>
> I'm currently trying to write an S4 class that mimics a data.frame, but stores data on disc in HDF5 format. The idea is that the dataset is likely to be too large to fit into a standard desktop machine, and by using subscripts, the user may load bits of the dataset at a time. eg:
>
>> myLargeData <- LargeData("/path/to/file")
>> mySubSet <- myLargeData[1:10, seq(1,15,by=3)]
>
> I've therefore defined by LargeData class thus
>
>> LargeData <- setClass("LargeData", representation(filename="character"))
>> setMethod("initialize","LargeData", function(.Object,filename) .Object at filename <- filename)
>
> I've then defined the "[" method to call a C++ function (Rcpp), opening the HDF5 file, and returning the required rows/cols as a data.frame.
>
> However, what if the user wants to load the entire dataset into memory? Which method do I overload to achieve the following?
>
>> fullData <- myLargeData
>> class(fullData)
> [1] "data.frame"
>
That makes no sense since a <- b is not a transformation, "a" will have the same value as "b" by definition - and thus the same class. If you really meant
fullData <- as.data.frame(myLargerData)
then you just need to implement the as.data.frame() method for your class.
Note, however, that a more common way to convert between a big data reference and native format in its entirety is simply myLargeData[] -- you may want to have a look at the (many) existing big data packages (AFAIR bigmemory uses C++ back-end as well). Also note that indexing is tricky in R and easy to get wrong (remember: negative indices, index by name etc.)
> or apply transformations:
>
>> myEigen <- eigen(myLargeData)
>
> In C++ I would normally overload the "double" or "float" operator to achieve this -- can I do the same thing in R?
>
Again, there is no implicit coercion in R (you cannot declare variable type in advance) so it doesn't make sense in the context you have in mind from C++ -- in R the equivalent is simply implementing as.double() method, but I suspect that's not what you had in mind. For generics you can simply implement a method for your class (that does the coercion, for example, or uses a more efficient way). If you cannot define a generic or don't want to write your own methods then it's a problem, because the only theoretical way is to subclass numeric vector class, but that is not possible in R if you want to change the representation because it falls through to the more efficient internal code too quickly (without extra dispatch) for you.
Cheers.
Simon
> Thanks,
>
> Chris
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>
More information about the R-devel
mailing list