[BioC] problems with strand in predictCoding
Steve Lianoglou
mailinglist.honeypot at gmail.com
Fri Apr 20 21:45:26 CEST 2012
Hi,
On Fri, Apr 20, 2012 at 3:32 PM, Jeremiah Degenhardt
<degenhardt.jeremiah at gene.com> wrote:
> Hi Steve,
>
> Out of curiosity, could you provide an example of an instance when you
> prefer the function to only return hits on the same strand? I have
> tried hard to come up with an example but can't think of one. It's
> probably due more to my background though...
Sure ... the stars have aligned in such a way that I spend most of my
time using different types of strand-specific rna sequencing data.
In this case, when finding overlaps between reads and (annotated)
genes, it's handy that it defaults to keep the strand info.
-steve
>
> best,
>
> Jeremiah
>
> On Fri, Apr 20, 2012 at 10:07 AM, Steve Lianoglou
> <mailinglist.honeypot at gmail.com> wrote:
>> Seems like people are piling up on the "ignore.strand=FALSE is a bad
>> idea" bandwagon .. for what it's worth, I think the default to "honor
>> the strand" in overlap queries is a sensible one, to me.
>>
>> -steve
>>
>> On Fri, Apr 20, 2012 at 11:58 AM, Jeremiah Degenhardt
>> <degenhardt.jeremiah at gene.com> wrote:
>>>>
>>>>
>>>> There is an "ignore.strand" argument to findOverlaps, so we have a switch. I
>>>> have always thought that strand should be ignored by default in operations
>>>> like overlap detection, and only considered as a "direction" rather than as
>>>> separate in space. It's very useful for resize() and flank() to consider
>>>> strand, but not so useful for findOverlaps. The ignore.strand=FALSE in those
>>>> cases default would qualify for the eight circle if there were a Bioc
>>>> Inferno book. It's only the default that I argue with though, having the
>>>> capability to consider strand is useful.
>>>
>>> I had forgotten about the ignore.strand option, thanks for the
>>> reminder Michael. So, given that it's there I agree with you fully. It
>>> seems he default should be changed to TRUE for the Overlap functions
>>> and the precedes and follows as well.
>>>
>>> Note however, that this would not fully correct the issue in the
>>> predictCoding function as the function still needs to correctly
>>> reverse complement the varAllele to get the annotation correct.
>>>
>>> As a further note on how big of an issue this is, if you go to the
>>> BioC home page and look at the tutorial on "Using Bioconductor to
>>> annotate genetic variants" you will find that the example makes this
>>> exact mistake. The variants in the VCF are unstranded and two of the
>>> genes in the example are negative strand and one is positive.
>>> Following the code you will get incorrect annotations for all variants
>>> on the negative strand genes.
>>>
>>> Jeremiah
>>>
>>>
>>>
>>> --
>>> Jeremiah Degenhardt, Ph.D.
>>> Computational Biologist
>>> Bioinformatics and Computational Biology
>>> Genentech, Inc.
>>> degenhardt.jeremiah at gene.com
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>>
>> --
>> Steve Lianoglou
>> Graduate Student: Computational Systems Biology
>> | Memorial Sloan-Kettering Cancer Center
>> | Weill Medical College of Cornell University
>> Contact Info: http://cbio.mskcc.org/~lianos/contact
>
>
>
> --
> Jeremiah Degenhardt, Ph.D.
> Computational Biologist
> Bioinformatics and Computational Biology
> Genentech, Inc.
> degenhardt.jeremiah at gene.com
--
Steve Lianoglou
Graduate Student: Computational Systems Biology
| Memorial Sloan-Kettering Cancer Center
| Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact
More information about the Bioconductor
mailing list