[R] compress data on read, decompress on write
Ramon Diaz-Uriarte
rdiaz02 at gmail.com
Thu Feb 28 23:38:38 CET 2008
Dear Christos,
Thanks for your reply. Actually, I should have been more careful with
language: its not really a sparse matrix, but rather a ragged array
that results from a more compact representation we though of for the
hidden states in a Hidden Markov Model in many runs of MCMC. However,
it might make sense for us to check sparseMatrix and see how its done
there.
Thanks,
R
On Thu, Feb 28, 2008 at 7:49 PM, Christos Hatzis
<christos.hatzis at nuverabio.com> wrote:
> Ramon,
>
> If you are looking for a solution to your specific application (as opposed
> to a general compression/ decompression mechanism), it might be worth
> checking out the Matrix package, which has facilities for storing and
> manipulating sparse matrices. The sparseMatrix class stores matrices in the
> triplet representation (i.e. only indices and values of the non-zero
> elements) and this affords great compression ratios, depending on the size
> and degree of sparseness of the matrix.
>
> -Christos
>
>
>
> > -----Original Message-----
> > From: r-help-bounces at r-project.org
> > [mailto:r-help-bounces at r-project.org] On Behalf Of Ramon Diaz-Uriarte
> > Sent: Thursday, February 28, 2008 1:18 PM
> > To: r-help at stat.math.ethz.ch
> > Subject: [R] compress data on read, decompress on write
> >
> > Dear All,
> >
> > I'd like to be able to have R store (in a list component) a
> > compressed data set, and then write it out uncompressed.
> > gzcon and gzfile work in exactly the opposite direction. What
> > would be a good way to handle this?
> >
> > Details:
> > ----------
> >
> > We have a package that uses C; part of the C output is a
> > large sparse matrix. This is never manipulated directly by R,
> > but always by the C code. However, we need to store that data
> > somewhere (inside an R
> > object) for further calls to the functions in our package.
> > We'd like to store that matrix as part of the R object (say,
> > as an element of a list). Ideally, it would be stored in as
> > compressed a way as possible.
> > Then, when we need to use that information, it would be
> > decompressed and passed to the C function.
> >
> > I guess one way to do it is to have C deal with the
> > compression and uncompression (e.g., using zlib or the bzip2
> > libraries) and then use readBin, etc, from R. But, if I can,
> > I'd like to avoid our C code having to call zlib, etc, so as
> > to make our package easily portable.
> >
> >
> > Thanks,
> >
> > R.
> >
> > --
> > Ramon Diaz-Uriarte
> > Statistical Computing Team
> > Structural Biology and Biocomputing Programme Spanish
> > National Cancer Centre (CNIO) http://ligarto.org/rdiaz
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> >
>
>
>
--
Ramon Diaz-Uriarte
Statistical Computing Team
Structural Biology and Biocomputing Programme
Spanish National Cancer Centre (CNIO)
http://ligarto.org/rdiaz
More information about the R-help
mailing list