[Bioc-sig-seq] Rsamtools: Select reads on a chromosome

Steve Lianoglou mailinglist.honeypot at gmail.com
Wed Jan 20 23:04:43 CET 2010


Hi,

About selecting all reads on a chromosome:

On Thu, Jan 7, 2010 at 1:11 AM, Martin Morgan <mtmorgan at fhcrc.org> wrote:
<snip>

>> which <- RangesList(chr1=IRanges(start=1, end=247249719))
>> params <- ScanBamParams(which=which)
>> reads <- scanBam(my.bam.file, param=params)[[1]]
>>
>> Is there are "better" way to do it, eg. w/o making the IRanges object
>> that's stretches over the chromosome?
>
> I don't think so, though 'end' doesn't have to be a literal end, e.g,.
> .Machine$integer.max and 'stretches' doesn't really involve any cost --
> just two numbers.

I just tried to do this in another context, but this actually send R
into a tailspin, eg:

R> which <- RangesList(chr1=IRanges(start=1, end=.Machine$integer.max-1))
R> r <- scanBam('scratch-sorted.bam', param=ScanBamParam(what='pos',
which=which))

 *** caught segfault ***
address 0x0, cause 'unknown'

Traceback:
 1: .Call(func, file, index, "rb", list(space(which),
.uunlist(start(which)),     .uunlist(end(which))), flag, simpleCigar,
...)
 2: .io_bam(.scan_bam, file, index, tmpl, param = param)
 3: .local(file, index, ...)
 4: scanBam("scratch-sorted.bam", param = ScanBamParam(what = "pos",
  which = which))
 5: scanBam("scratch-sorted.bam", param = ScanBamParam(what = "pos",
  which = which))

I have access to chromosome length information, so it's not really a
problem for me, but it seems as if something is happening which you
didn't expect, so I thought you'd like to know.

Thanks,
-steve

ps: I'm using IRanges(.., end=.Machine$integer.max-1) because using
.Machine$integer.max causes an integer overflow

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact



More information about the Bioc-sig-sequencing mailing list