[R] compress data on read, decompress on write

Ramon Diaz-Uriarte rdiaz02 at gmail.com
Thu Feb 28 23:38:38 CET 2008


Dear Christos,

Thanks for your reply. Actually, I should have been more careful with
language: its not really a sparse matrix, but rather a ragged array
that results from a more compact representation we though of for the
hidden states in a Hidden Markov Model in many runs of MCMC. However,
it might make sense for us to check sparseMatrix and see how its done
there.

Thanks,

R

On Thu, Feb 28, 2008 at 7:49 PM, Christos Hatzis
<christos.hatzis at nuverabio.com> wrote:
> Ramon,
>
>  If you are looking for a solution to your specific application (as opposed
>  to a general compression/ decompression mechanism), it might be worth
>  checking out the Matrix package, which has facilities for storing and
>  manipulating sparse matrices.  The sparseMatrix class stores matrices in the
>  triplet representation (i.e. only indices and values of the non-zero
>  elements) and this affords great compression ratios, depending on the size
>  and degree of sparseness of the matrix.
>
>  -Christos
>
>
>
>  > -----Original Message-----
>  > From: r-help-bounces at r-project.org
>  > [mailto:r-help-bounces at r-project.org] On Behalf Of Ramon Diaz-Uriarte
>  > Sent: Thursday, February 28, 2008 1:18 PM
>  > To: r-help at stat.math.ethz.ch
>  > Subject: [R] compress data on read, decompress on write
>  >
>  > Dear All,
>  >
>  > I'd like to be able to have R store (in a list component) a
>  > compressed data set, and then write it out uncompressed.
>  > gzcon and gzfile work in exactly the opposite direction. What
>  > would be a good way to handle this?
>  >
>  > Details:
>  > ----------
>  >
>  > We have a package that uses C; part of the C output is a
>  > large sparse matrix. This is never manipulated directly by R,
>  > but always by the C code. However, we need to store that data
>  > somewhere (inside an R
>  > object) for further calls to the functions in our package.
>  > We'd like to store that matrix as part of the R object (say,
>  > as an element of a list). Ideally, it would be stored in as
>  > compressed a way as possible.
>  > Then, when we need to use that information, it would be
>  > decompressed and passed to the C function.
>  >
>  > I guess one way to do it is to have C deal with the
>  > compression and uncompression (e.g., using zlib or the bzip2
>  > libraries) and then use readBin, etc, from R. But, if I can,
>  > I'd like to avoid our C code having to call zlib, etc, so as
>  > to make our package easily portable.
>  >
>  >
>  > Thanks,
>  >
>  > R.
>  >
>  > --
>  > Ramon Diaz-Uriarte
>  > Statistical Computing Team
>  > Structural Biology and Biocomputing Programme Spanish
>  > National Cancer Centre (CNIO) http://ligarto.org/rdiaz
>  >
>  > ______________________________________________
>  > R-help at r-project.org mailing list
>  > https://stat.ethz.ch/mailman/listinfo/r-help
>  > PLEASE do read the posting guide
>  > http://www.R-project.org/posting-guide.html
>  > and provide commented, minimal, self-contained, reproducible code.
>  >
>  >
>
>
>



-- 
Ramon Diaz-Uriarte
Statistical Computing Team
Structural Biology and Biocomputing Programme
Spanish National Cancer Centre (CNIO)
http://ligarto.org/rdiaz



More information about the R-help mailing list