[BioC] Cleaning up after getSeq(BSgenome, GRanges)

Steve Lianoglou mailinglist.honeypot at gmail.com
Sat Jun 30 20:35:26 CEST 2012


¡Merci beaucoup!

On Sat, Jun 30, 2012 at 3:42 AM, Hervé Pagès <hpages at fhcrc.org> wrote:
> Hi Steve,
>
> The intention was really that the DNAStringSet object returned by
> getSeq() would not hold any reference to the chromosomes that
> getSeq() would load in the cache during the extraction so everything
> would get automatically uncached at the first gc() opportunity after
> getSeq() returns.
> Unfortunately this was broken because of an issue with a low-level
> helper in IRanges (the "xvcopy" method for XRawList objects to be
> precise). The problem is fixed in IRanges 1.15.16 (I'll apply the
> fix to release too):
>
>> library(BSgenome.Hsapiens.UCSC.hg19)
>
>> gc()
>          used (Mb) gc trigger (Mb) max used (Mb)
> Ncells 1265019 67.6    1710298 91.4  1476915 78.9
> Vcells  585626  4.5    1162592  8.9   901241  6.9
>
>> options(verbose=TRUE)  # so uncaching events will be reported
>
> ## Extracting the first 10 nucleotides from each chromosome:
>> first10 <- getSeq(Hsapiens, end=10)
> uncaching chr1
> uncaching chr10
> uncaching chr11_gl000202_random
> uncaching chr11
> uncaching chr12
> uncaching chr13
> uncaching chr15
> uncaching chr14
> uncaching chr16
> uncaching chr17_gl000203_random
> uncaching chr17_gl000206_random
> uncaching chr19
> uncaching chr19_gl000208_random
> uncaching chr18_gl000207_random
> uncaching chr18
> uncaching chr17_gl000205_random
> uncaching chr17_gl000204_random
> uncaching chr17_ctg5_hap1
> uncaching chr1_gl000192_random
> uncaching chr1_gl000191_random
> uncaching chr19_gl000209_random
> uncaching chr17
> uncaching chr2
> uncaching chr21_gl000210_random
> uncaching chr21
> uncaching chr20
> uncaching chr22
> uncaching chr3
> uncaching chr4_gl000193_random
> uncaching chr4_ctg9_hap1
> uncaching chr4_gl000194_random
> uncaching chr4
> uncaching chr5
> uncaching chr6_cox_hap2
> uncaching chr6_dbb_hap3
> uncaching chr6_apd_hap1
> uncaching chr6_mcf_hap5
> uncaching chr6_mann_hap4
> uncaching chr6
> uncaching chr7
> uncaching chr7_gl000195_random
> uncaching chr6_ssto_hap7
> uncaching chr6_qbl_hap6
> uncaching chr8_gl000197_random
> uncaching chr8_gl000196_random
> uncaching chr8
> uncaching chr9_gl000199_random
> uncaching chrM
> uncaching chrUn_gl000213
> uncaching chrUn_gl000214
> uncaching chrUn_gl000212
> uncaching chrUn_gl000211
> uncaching chr9_gl000201_random
> uncaching chr9_gl000200_random
> uncaching chr9_gl000198_random
> uncaching chrUn_gl000217
> uncaching chrUn_gl000220
> uncaching chrUn_gl000223
> uncaching chrUn_gl000227
> uncaching chrUn_gl000230
> uncaching chrUn_gl000234
> uncaching chrUn_gl000238
> uncaching chrUn_gl000242
> uncaching chrUn_gl000243
> uncaching chrUn_gl000241
> uncaching chrUn_gl000240
> uncaching chrUn_gl000239
> uncaching chrUn_gl000237
> uncaching chrUn_gl000236
> uncaching chrUn_gl000235
> uncaching chrUn_gl000233
> uncaching chrUn_gl000232
> uncaching chrUn_gl000231
> uncaching chrUn_gl000229
> uncaching chrUn_gl000228
> uncaching chrUn_gl000226
> uncaching chrUn_gl000225
> uncaching chrUn_gl000224
> uncaching chrUn_gl000222
> uncaching chrUn_gl000221
> uncaching chrUn_gl000219
> uncaching chrUn_gl000218
> uncaching chrUn_gl000216
> uncaching chrUn_gl000215
> uncaching chrUn_gl000246
> uncaching chrUn_gl000249
> uncaching chrUn_gl000248
> uncaching chrUn_gl000247
> uncaching chrUn_gl000245
> uncaching chrUn_gl000244
> uncaching chrX
> uncaching chr9
>
>> first10
>  A DNAStringSet instance of length 93
>     width seq
>  [1]    10 NNNNNNNNNN
>  [2]    10 NNNNNNNNNN
>  [3]    10 NNNNNNNNNN
>  [4]    10 NNNNNNNNNN
>  [5]    10 NNNNNNNNNN
>  [6]    10 NNNNNNNNNN
>  [7]    10 NNNNNNNNNN
>  [8]    10 NNNNNNNNNN
>  [9]    10 NNNNNNNNNN
>  ...   ... ...
> [85]    10 GATCTGAAGA
> [86]    10 GATCATGCCT
> [87]    10 GATCTTCAGG
> [88]    10 GATCTGCGCA
> [89]    10 GATCAGATAG
> [90]    10 GATCTTAAGC
> [91]    10 GATCTAAGTT
> [92]    10 GATCTGTCAT
> [93]    10 GATCACCAAG
>
>> ls(Hsapiens at .seqs_cache)
> [1] "chrY"
>
>> gc()
> Garbage collection 177 = 120+21+36 (level 2) ...
> 69.6 Mbytes of cons cells used (66%)
> 61.8 Mbytes of vectors used (17%)
> uncaching chrY
>          used (Mb) gc trigger  (Mb) max used  (Mb)
> Ncells 1301932 69.6    1967602 105.1  1967602 105.1
> Vcells 8094983 61.8   48876866 373.0 58058596 443.0
>
>> ls(Hsapiens at .seqs_cache)
> character(0)
>
>> gc()
> Garbage collection 178 = 120+21+37 (level 2) ...
> 69.5 Mbytes of cons cells used (66%)
> 4.6 Mbytes of vectors used (2%)
>          used (Mb) gc trigger  (Mb) max used  (Mb)
> Ncells 1300073 69.5    1967602 105.1  1967602 105.1
> Vcells  600775  4.6   39101492 298.4 58058596 443.0
>
> Memory used is almost the same as before getSeq() was called.
>
> Thanks for reporting the issue!
>
> H.
>
>
>
> On 06/27/2012 10:20 AM, Steve Lianoglou wrote:
>>
>> Howdy,
>>
>> Say I'd like to fetch muchos sequences from hg19 that are defined in a
>> GRanges object that spans all hg19 chromosomes.
>>
>> I can make my life easy and do:
>>
>> R> library(BSgenome.Hsapiens.UCSC.hg19)
>> R> seqs <- getSeq(Hsapiens, my.GRanges)
>>
>> But while my life has been made easy, life for my CPU has been made
>> harder as I (think that I) have now all of the Hsapiens chromosomes
>> loaded up into (I think) the Hsapiens at .seqs_cache.
>>
>> I reckon I can do something like:
>>
>> R> rm(list=ls(Hsapiens at .seqs_cache), envir=Hsapiens at .seqs_cache)
>> R> gc()
>>
>> to try to remedy the situation myself, but I wonder if I'm missing
>> something else?
>>
>> Perhaps having a clearCache,BSgenome method to do some cleanup might be
>> handy?
>>
>> Thanks,
>> -steve
>>
>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages at fhcrc.org
> Phone:  (206) 667-5791
> Fax:    (206) 667-1319
>
>



-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact



More information about the Bioconductor mailing list