[Bioc-sig-seq] strandedness bug in Rsamtools?
Martin Morgan
mtmorgan at fhcrc.org
Sat Feb 6 17:36:13 CET 2010
Hi Steve --
On 02/06/2010 07:29 AM, Steve Lianoglou wrote:
> Hi,
>
> On Sat, Feb 6, 2010 at 4:17 AM, Steve Lianoglou
> <mailinglist.honeypot at gmail.com> wrote:
>
>> I think there might be a bug in Rsamtools. I have reason to believe
>> that the strand information that's being returned from
>> scanBam(...)$strand is flipped, ie. a read that should be on the '+'
>> strand is being reported as aligning to the '-' strand.
>
> I just wanted to include an example of what I'm seeing (sorry I didn't
> before, the need for shut-eye was competing with the desire to write a
> better message :-)
>
> Anyway, here's what I get from my bamfile:
>
> $ samtools view my-sorted.bam chr2:2355-2360
> 1805160 16 chr2 2355 255 20M * 0 0
> GTCTTACTTTGATTATGATC IIIIIIIIIIIIIIIIIIII XA:i:0 MD:Z:20
> NM:i:0
> 4681745 16 chr2 2355 255 20M * 0 0
> GTCTTACTTTGATTATGATC IIIIIIIIIIIIIIIIIIII XA:i:0 MD:Z:20
> NM:i:0
>
> I believe a value of 16 in the FLAG field indicates the negative strand ...
It's not a simple sign flip
fl <- system.file("extdata", "ex1.bam", package="Rsamtools")
bam <- scanBam(fl)[[1]]
## samtools view ex1.bam > ex1.sam
tbl <- read.table("~/tmp/ex1.sam", fill=TRUE, sep="\t", quote="")
all.equal(as.character(tbl[[1]]), bam$qname)
[1] TRUE
> bitAnd(head(tbl[[2]], 20), 0x0010)
[1] 0 0 0 0 0 0 0 0 16 0 0 0 0 0 0 0 0 0 0 0
> head(bam$strand, 20)
[1] + + + + + + + + - + + + + + + + + + + +
Levels: - + *
> table(bam$strand, useNA="always")
- + * <NA>
1624 1647 0 36
> table(bitAnd(tbl[[2]][!is.na(bam$stran)], 16)==0, useNA="always")
FALSE TRUE <NA>
1624 1647 0
though these tests don't cover the flag=16 case. Will investigate further.
Martin
>
> Now in R:
>
> R> bf <- 'my-sorted.bam
> R> r <- scanBam(bf,
> param=ScanBamParam(which=RangesList(chr2=IRanges(start=2355,
> end=2360))))[[1]]
> R> r$pos
> [1] 2355 2355
> R> r$strand
> [1] + +
> Levels: + - *
>
> If I'm still thinking about this straight, and it's a bug, maybe for
> now I'll just swap the levels of the r$strand to be c('-', '+', '*')
> when I get my reads back from scanBam.
>
> Thanks,
> -steve
>
--
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
More information about the Bioc-sig-sequencing
mailing list