[Bioc-sig-seq] strandedness bug in Rsamtools?

Steve Lianoglou mailinglist.honeypot at gmail.com
Sat Feb 6 22:45:37 CET 2010


Ok, last one from me for now ... it seems like you may know where the
problem is already, but perhaps this can help.

I knew that all of the "puzzle" pieces were definitely working "as I
expected", when I ran some code on Jan 14th. By "working" I mean:

(i) querying a bam file by setting a strand and ensuring the strand
was correct when returned; and
(ii) not setting a strand, but filtering all returned reads by strand

So, I checked out versions of IRanges, Biostrings, BSgenome, and
Rsamtools as of the latest revision on Jan13th ... this didn't work, I
was getting some errors of "cyclic name space dependencies are not
supported" when loading the Biostrings library (although it installed
fine) ... but no matter about that.

So I went back and checked out the versions of these packages as of
the latest commit on Jan4 (no particular reason for the date, just
back tracking) ... these are working "as they should be" (from (i) and
(ii)) above.

Perhaps that will help you find where the bugs are (if you don't know already).

Also, I should say that the FLAG params in my SAM/BAM files are all
either 0, 4, 16 (for + strand, no alignment, and - strand (I think)),
so perhaps my use case is too simple as a valid test case ... but
bit-un/packing is bit-un/packing, no? :-)

Thanks,
-steve


On Sat, Feb 6, 2010 at 3:57 PM, Steve Lianoglou
<mailinglist.honeypot at gmail.com> wrote:
> Hi,
>
> On Sat, Feb 6, 2010 at 11:40 AM, Martin Morgan <mtmorgan at fhcrc.org> wrote:
>> On 02/06/2010 07:29 AM, Steve Lianoglou wrote:
>>> Hi,
>>>
>>> On Sat, Feb 6, 2010 at 4:17 AM, Steve Lianoglou
>>> <mailinglist.honeypot at gmail.com> wrote:
> <snip>
>
>>> R> r$strand
>>> [1] + +
>>> Levels: + - *
>>
>> I notice that your levels are different from mine; what does
>> levels(strand()) say? what's your locale? (sesssionInfo() ;) --
>> Rsamtools and strand() in general was trying to be locale-independent,
>> but this seems not to be the case...
>>
>> The workaround is likely to work in a "C" locale, ?Sys.setlocale
>
> Sorry, I pasted the sessionInfo() in the first post and neglected to
> do so in the second, but the locale is:
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>
> Also:
>
>> ok, I think it's this
>>
>> ------------------------------------------------------------------------
>> r44085 | p.aboyoun | 2010-01-19 15:16:56 -0800 (Tue, 19 Jan 2010) | 2 lines
>> Changed paths:
>>   M /trunk/madman/Rpacks/BSgenome/R/strand.R
>>
>> Added support for Rle strand column in strand,DataTable method.
>> Also made "+" the first level of strand factor.
>> ------------------------------------------------------------------------
>>
>> which changed the order of strand levels. This will take a bit of
>> coordination on our part. Thanks for the reports.
>
> So, being that you're more familiar with these codebases than I could
> hope to be, do you think the problem lies strictly in the BSgenome
> package? Or should I look at Rsamtools code, too?
>
> Do you reckon running with a checkout of the version of the BSgenome
> repo at the revision before r44085 will be OK to sidestep the issue,
> or should I backtrack my Rsamtools, too?
>
> I will, of course, try a few different permutations to see what works,
> in the meanwhile, so I'm not expecting you to take time to try that
> yourself. If you had some intuition, though, I'd appreciate your point
> of view :-)
>
> Thanks,
> -steve
>
> --
> Steve Lianoglou
> Graduate Student: Computational Systems Biology
>  | Memorial Sloan-Kettering Cancer Center
>  | Weill Medical College of Cornell University
> Contact Info: http://cbio.mskcc.org/~lianos/contact
>



-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact



More information about the Bioc-sig-sequencing mailing list