[Bioc-sig-seq] strandedness bug in Rsamtools?

Martin Morgan mtmorgan at fhcrc.org
Sat Feb 6 23:13:30 CET 2010


On 02/06/2010 01:45 PM, Steve Lianoglou wrote:
> Ok, last one from me for now ... it seems like you may know where the
> problem is already, but perhaps this can help.
> 
> I knew that all of the "puzzle" pieces were definitely working "as I
> expected", when I ran some code on Jan 14th. By "working" I mean:
> 
> (i) querying a bam file by setting a strand and ensuring the strand
> was correct when returned; and
> (ii) not setting a strand, but filtering all returned reads by strand
> 
> So, I checked out versions of IRanges, Biostrings, BSgenome, and
> Rsamtools as of the latest revision on Jan13th ... this didn't work, I
> was getting some errors of "cyclic name space dependencies are not
> supported" when loading the Biostrings library (although it installed
> fine) ... but no matter about that.
> 
> So I went back and checked out the versions of these packages as of
> the latest commit on Jan4 (no particular reason for the date, just
> back tracking) ... these are working "as they should be" (from (i) and
> (ii)) above.
> 
> Perhaps that will help you find where the bugs are (if you don't know already).
> 
> Also, I should say that the FLAG params in my SAM/BAM files are all
> either 0, 4, 16 (for + strand, no alignment, and - strand (I think)),
> so perhaps my use case is too simple as a valid test case ... but
> bit-un/packing is bit-un/packing, no? :-)

Thanks Steve -- there's a single line change to BSgenome (in the early
lines of strand.R) that restores strand levels and behavior to
Rsamtools, but we are still not fully aware of what consequences this
might have for other packages so probably best to stick with your 'last
known good' installation of early January. Certainly your flags should
be parsed correctly.

With respect to Vince's example, I think the strand mix-up also
clarifies there -- his flag 16 is the minus strand, reads are presented
from 5' to 3', so the sample read on the reverse strand needs to be
reverse complemented to match the reference (plus strand), via Polonius
or otherwise.

Martin

> Thanks,
> -steve
> 
> 
> On Sat, Feb 6, 2010 at 3:57 PM, Steve Lianoglou
> <mailinglist.honeypot at gmail.com> wrote:
>> Hi,
>>
>> On Sat, Feb 6, 2010 at 11:40 AM, Martin Morgan <mtmorgan at fhcrc.org> wrote:
>>> On 02/06/2010 07:29 AM, Steve Lianoglou wrote:
>>>> Hi,
>>>>
>>>> On Sat, Feb 6, 2010 at 4:17 AM, Steve Lianoglou
>>>> <mailinglist.honeypot at gmail.com> wrote:
>> <snip>
>>
>>>> R> r$strand
>>>> [1] + +
>>>> Levels: + - *
>>>
>>> I notice that your levels are different from mine; what does
>>> levels(strand()) say? what's your locale? (sesssionInfo() ;) --
>>> Rsamtools and strand() in general was trying to be locale-independent,
>>> but this seems not to be the case...
>>>
>>> The workaround is likely to work in a "C" locale, ?Sys.setlocale
>>
>> Sorry, I pasted the sessionInfo() in the first post and neglected to
>> do so in the second, but the locale is:
>>
>> locale:
>> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>>
>> Also:
>>
>>> ok, I think it's this
>>>
>>> ------------------------------------------------------------------------
>>> r44085 | p.aboyoun | 2010-01-19 15:16:56 -0800 (Tue, 19 Jan 2010) | 2 lines
>>> Changed paths:
>>>   M /trunk/madman/Rpacks/BSgenome/R/strand.R
>>>
>>> Added support for Rle strand column in strand,DataTable method.
>>> Also made "+" the first level of strand factor.
>>> ------------------------------------------------------------------------
>>>
>>> which changed the order of strand levels. This will take a bit of
>>> coordination on our part. Thanks for the reports.
>>
>> So, being that you're more familiar with these codebases than I could
>> hope to be, do you think the problem lies strictly in the BSgenome
>> package? Or should I look at Rsamtools code, too?
>>
>> Do you reckon running with a checkout of the version of the BSgenome
>> repo at the revision before r44085 will be OK to sidestep the issue,
>> or should I backtrack my Rsamtools, too?
>>
>> I will, of course, try a few different permutations to see what works,
>> in the meanwhile, so I'm not expecting you to take time to try that
>> yourself. If you had some intuition, though, I'd appreciate your point
>> of view :-)
>>
>> Thanks,
>> -steve
>>
>> --
>> Steve Lianoglou
>> Graduate Student: Computational Systems Biology
>>  | Memorial Sloan-Kettering Cancer Center
>>  | Weill Medical College of Cornell University
>> Contact Info: http://cbio.mskcc.org/~lianos/contact
>>
> 
> 
> 


-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioc-sig-sequencing mailing list