[Bioc-sig-seq] Cached GenomicRanges or RangedData Objects?

Mon Oct 11 19:51:52 CEST 2010

When you are saying you're bumping up against the limits of RAM, how
much do you have?  It might be nice to have a GRanges that has
pass-by-reference semantics, but in general I find my GRanges to be
smaller (at most a couple of GB).  My "problems" are with the
associated data.

Kasper

On Mon, Oct 11, 2010 at 11:59 AM, Charles C. Berry <cberry at tajo.ucsd.edu> wrote:
> On Mon, 11 Oct 2010, Steve Lianoglou wrote:
>
>> Hi Chuck,
>>
>> On Mon, Oct 11, 2010 at 11:24 AM, Charles C. Berry <cberry at tajo.ucsd.edu>
>> wrote:
>>>
>>> We are liking the idioms that go with GenomicRanges and RangedData
>>> Objects
>>> (follow, precede, findOverlaps, etc), but we are bumping up against
>>> memory
>>> demands of loading very large objects.
>>>
>>> Is there now or will there soon be a cached version of these that will
>>> lessen our memory requirements?
>>>
>>> If not, is there a cookbook as to how to create and save cached versions
>>> of
>>> these objects.
>>>
>>> Or maybe a place to look in the bioConductor codebase to get some ideas
>>> of
>>> how to go about constructing cached versions of these classes?
>>
>> I'm not sure what you mean by caching -- do you want them serialized
>> to disk and you read off parts when you need them, or?
>
> That's basically the idea. I looked at how BSGenome handles FASTA, and it
> allows you to read in one chromosome, make apparent copies that do not
> physically copy the object unless it is modified, and then clean up
> afterwards without much of the work under the hood.
>
>
>>
>> Also -- I typically split my data and processing to work on a
>> chromosome by chromosome basis -- even though the GenomicRanges
>> infrastructure allows you to keep ranges spanning multiple chromosomes
>> in one object. Although it's a bit more book keeping code on my part,
>> I find that doing so helps to keep my RAM requirements down a bit.
>> Perhaps that obvious/marginal suggestion might help for the time
>> being?
>
> Thanks. We have bits and pieces of a pipeline that do that. But we are about
> to refactor that pipeline, so the hope is to make something that is fairly
> clean, will endure, and handle the large objects that new sequencing
> technologies are likely to throw at us.
>
> Chuck
>>
>> -steve
>>
>> --
>> Steve Lianoglou
>> Graduate Student: Computational Systems Biology
>>  | Memorial Sloan-Kettering Cancer Center
>>  | Weill Medical College of Cornell University
>> Contact Info: http://cbio.mskcc.org/~lianos/contact
>>
>
> Charles C. Berry                            (858) 534-2098
>                                            Dept of Family/Preventive
> Medicine
> E mailto:cberry at tajo.ucsd.edu               UC San Diego
> http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901
>
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>
>