[BioC] Cleaning up after getSeq(BSgenome, GRanges)
Steve Lianoglou
mailinglist.honeypot at gmail.com
Sat Jun 30 20:35:26 CEST 2012
¡Merci beaucoup!
On Sat, Jun 30, 2012 at 3:42 AM, Hervé Pagès <hpages at fhcrc.org> wrote:
> Hi Steve,
>
> The intention was really that the DNAStringSet object returned by
> getSeq() would not hold any reference to the chromosomes that
> getSeq() would load in the cache during the extraction so everything
> would get automatically uncached at the first gc() opportunity after
> getSeq() returns.
> Unfortunately this was broken because of an issue with a low-level
> helper in IRanges (the "xvcopy" method for XRawList objects to be
> precise). The problem is fixed in IRanges 1.15.16 (I'll apply the
> fix to release too):
>
>> library(BSgenome.Hsapiens.UCSC.hg19)
>
>> gc()
> used (Mb) gc trigger (Mb) max used (Mb)
> Ncells 1265019 67.6 1710298 91.4 1476915 78.9
> Vcells 585626 4.5 1162592 8.9 901241 6.9
>
>> options(verbose=TRUE) # so uncaching events will be reported
>
> ## Extracting the first 10 nucleotides from each chromosome:
>> first10 <- getSeq(Hsapiens, end=10)
> uncaching chr1
> uncaching chr10
> uncaching chr11_gl000202_random
> uncaching chr11
> uncaching chr12
> uncaching chr13
> uncaching chr15
> uncaching chr14
> uncaching chr16
> uncaching chr17_gl000203_random
> uncaching chr17_gl000206_random
> uncaching chr19
> uncaching chr19_gl000208_random
> uncaching chr18_gl000207_random
> uncaching chr18
> uncaching chr17_gl000205_random
> uncaching chr17_gl000204_random
> uncaching chr17_ctg5_hap1
> uncaching chr1_gl000192_random
> uncaching chr1_gl000191_random
> uncaching chr19_gl000209_random
> uncaching chr17
> uncaching chr2
> uncaching chr21_gl000210_random
> uncaching chr21
> uncaching chr20
> uncaching chr22
> uncaching chr3
> uncaching chr4_gl000193_random
> uncaching chr4_ctg9_hap1
> uncaching chr4_gl000194_random
> uncaching chr4
> uncaching chr5
> uncaching chr6_cox_hap2
> uncaching chr6_dbb_hap3
> uncaching chr6_apd_hap1
> uncaching chr6_mcf_hap5
> uncaching chr6_mann_hap4
> uncaching chr6
> uncaching chr7
> uncaching chr7_gl000195_random
> uncaching chr6_ssto_hap7
> uncaching chr6_qbl_hap6
> uncaching chr8_gl000197_random
> uncaching chr8_gl000196_random
> uncaching chr8
> uncaching chr9_gl000199_random
> uncaching chrM
> uncaching chrUn_gl000213
> uncaching chrUn_gl000214
> uncaching chrUn_gl000212
> uncaching chrUn_gl000211
> uncaching chr9_gl000201_random
> uncaching chr9_gl000200_random
> uncaching chr9_gl000198_random
> uncaching chrUn_gl000217
> uncaching chrUn_gl000220
> uncaching chrUn_gl000223
> uncaching chrUn_gl000227
> uncaching chrUn_gl000230
> uncaching chrUn_gl000234
> uncaching chrUn_gl000238
> uncaching chrUn_gl000242
> uncaching chrUn_gl000243
> uncaching chrUn_gl000241
> uncaching chrUn_gl000240
> uncaching chrUn_gl000239
> uncaching chrUn_gl000237
> uncaching chrUn_gl000236
> uncaching chrUn_gl000235
> uncaching chrUn_gl000233
> uncaching chrUn_gl000232
> uncaching chrUn_gl000231
> uncaching chrUn_gl000229
> uncaching chrUn_gl000228
> uncaching chrUn_gl000226
> uncaching chrUn_gl000225
> uncaching chrUn_gl000224
> uncaching chrUn_gl000222
> uncaching chrUn_gl000221
> uncaching chrUn_gl000219
> uncaching chrUn_gl000218
> uncaching chrUn_gl000216
> uncaching chrUn_gl000215
> uncaching chrUn_gl000246
> uncaching chrUn_gl000249
> uncaching chrUn_gl000248
> uncaching chrUn_gl000247
> uncaching chrUn_gl000245
> uncaching chrUn_gl000244
> uncaching chrX
> uncaching chr9
>
>> first10
> A DNAStringSet instance of length 93
> width seq
> [1] 10 NNNNNNNNNN
> [2] 10 NNNNNNNNNN
> [3] 10 NNNNNNNNNN
> [4] 10 NNNNNNNNNN
> [5] 10 NNNNNNNNNN
> [6] 10 NNNNNNNNNN
> [7] 10 NNNNNNNNNN
> [8] 10 NNNNNNNNNN
> [9] 10 NNNNNNNNNN
> ... ... ...
> [85] 10 GATCTGAAGA
> [86] 10 GATCATGCCT
> [87] 10 GATCTTCAGG
> [88] 10 GATCTGCGCA
> [89] 10 GATCAGATAG
> [90] 10 GATCTTAAGC
> [91] 10 GATCTAAGTT
> [92] 10 GATCTGTCAT
> [93] 10 GATCACCAAG
>
>> ls(Hsapiens at .seqs_cache)
> [1] "chrY"
>
>> gc()
> Garbage collection 177 = 120+21+36 (level 2) ...
> 69.6 Mbytes of cons cells used (66%)
> 61.8 Mbytes of vectors used (17%)
> uncaching chrY
> used (Mb) gc trigger (Mb) max used (Mb)
> Ncells 1301932 69.6 1967602 105.1 1967602 105.1
> Vcells 8094983 61.8 48876866 373.0 58058596 443.0
>
>> ls(Hsapiens at .seqs_cache)
> character(0)
>
>> gc()
> Garbage collection 178 = 120+21+37 (level 2) ...
> 69.5 Mbytes of cons cells used (66%)
> 4.6 Mbytes of vectors used (2%)
> used (Mb) gc trigger (Mb) max used (Mb)
> Ncells 1300073 69.5 1967602 105.1 1967602 105.1
> Vcells 600775 4.6 39101492 298.4 58058596 443.0
>
> Memory used is almost the same as before getSeq() was called.
>
> Thanks for reporting the issue!
>
> H.
>
>
>
> On 06/27/2012 10:20 AM, Steve Lianoglou wrote:
>>
>> Howdy,
>>
>> Say I'd like to fetch muchos sequences from hg19 that are defined in a
>> GRanges object that spans all hg19 chromosomes.
>>
>> I can make my life easy and do:
>>
>> R> library(BSgenome.Hsapiens.UCSC.hg19)
>> R> seqs <- getSeq(Hsapiens, my.GRanges)
>>
>> But while my life has been made easy, life for my CPU has been made
>> harder as I (think that I) have now all of the Hsapiens chromosomes
>> loaded up into (I think) the Hsapiens at .seqs_cache.
>>
>> I reckon I can do something like:
>>
>> R> rm(list=ls(Hsapiens at .seqs_cache), envir=Hsapiens at .seqs_cache)
>> R> gc()
>>
>> to try to remedy the situation myself, but I wonder if I'm missing
>> something else?
>>
>> Perhaps having a clearCache,BSgenome method to do some cleanup might be
>> handy?
>>
>> Thanks,
>> -steve
>>
>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages at fhcrc.org
> Phone: (206) 667-5791
> Fax: (206) 667-1319
>
>
--
Steve Lianoglou
Graduate Student: Computational Systems Biology
| Memorial Sloan-Kettering Cancer Center
| Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact
More information about the Bioconductor
mailing list