[Bioc-sig-seq] Rsamtools: Select reads on a chromosome
Martin Morgan
mtmorgan at fhcrc.org
Thu Jan 21 05:21:29 CET 2010
Hi Steve --
Steve Lianoglou wrote:
> Hi,
>
> About selecting all reads on a chromosome:
>
> On Thu, Jan 7, 2010 at 1:11 AM, Martin Morgan <mtmorgan at fhcrc.org> wrote:
> <snip>
>
>>> which <- RangesList(chr1=IRanges(start=1, end=247249719))
>>> params <- ScanBamParams(which=which)
>>> reads <- scanBam(my.bam.file, param=params)[[1]]
>>>
>>> Is there are "better" way to do it, eg. w/o making the IRanges object
>>> that's stretches over the chromosome?
>> I don't think so, though 'end' doesn't have to be a literal end, e.g,.
>> .Machine$integer.max and 'stretches' doesn't really involve any cost --
>> just two numbers.
>
> I just tried to do this in another context, but this actually send R
> into a tailspin, eg:
>
> R> which <- RangesList(chr1=IRanges(start=1, end=.Machine$integer.max-1))
> R> r <- scanBam('scratch-sorted.bam', param=ScanBamParam(what='pos',
> which=which))
>
> *** caught segfault ***
> address 0x0, cause 'unknown'
Yes my suggestion seems right in principle but wrong in practice -- the
limit is 536870912L, and Rsamtools now checks for this. An easy way to
specify 'an entire chromosome' sounds like a reasonable feature request.
Thanks for the report, Steve.
Martin
>
> Traceback:
> 1: .Call(func, file, index, "rb", list(space(which),
> .uunlist(start(which)), .uunlist(end(which))), flag, simpleCigar,
> ...)
> 2: .io_bam(.scan_bam, file, index, tmpl, param = param)
> 3: .local(file, index, ...)
> 4: scanBam("scratch-sorted.bam", param = ScanBamParam(what = "pos",
> which = which))
> 5: scanBam("scratch-sorted.bam", param = ScanBamParam(what = "pos",
> which = which))
>
> I have access to chromosome length information, so it's not really a
> problem for me, but it seems as if something is happening which you
> didn't expect, so I thought you'd like to know.
>
> Thanks,
> -steve
>
> ps: I'm using IRanges(.., end=.Machine$integer.max-1) because using
> .Machine$integer.max causes an integer overflow
>
--
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
More information about the Bioc-sig-sequencing
mailing list