[BioC] DEXSeq - too many exons in gene

Antonio Domingues amjdomingues at gmail.com
Thu Feb 6 21:17:02 CET 2014


Hi Devon,

thank you for the clarification. I thought DEXSeq used a union model, 
but under the "disjoint gene model" it all makes sense now.

Best,
António

On 06/02/14 19:42, Devon Ryan wrote:
> Hi Antonio,
>
> I counted 13 exonic bins by eye. What do you find to be amiss there? Remember that you're not using a flattened/union gene model with DEXseq, but rather pretty much the exact opposite (maybe it should be called a "disjoint gene model"?).
>
> BTW, that first bin is actually 2bp wide.
>
> Regards,
> Devon
>
> ____________________________________________
> Devon Ryan, Ph.D.
> Email: dpryan at dpryan.com
> Tel: +49 (0)178 298-6067
> Molecular and Cellular Cognition Lab
> German Centre for Neurodegenerative Diseases (DZNE)
> Ludwig-Erhard-Allee 2
> 53175 Bonn, Germany
>
> On Feb 6, 2014, at 7:12 PM, António domingues wrote:
>
>> Hi Steve,
>>
>> thank for the comments. First of all, my apologies, I have sent the wrong screenshot. It should have been the one (attached) for Sike1. Long day. Anyway,  see my replies bellow to the points that are still valid.
>>
>> On 02/06/2014 06:54 PM, Steve Lianoglou wrote:
>>> Hi,
>>>
>>> A few comments in line:
>>>
>>> On Thu, Feb 6, 2014 at 9:01 AM, António domingues
>>> <amjdomingues at gmail.com> wrote:
>>>> Hi Bioconductors,
>>>>
>>>> I happened upon a funny thing in DEXseq: a gene which appears to have more
>>>> exons in the final DEXseq output than the annotation suggests. The gene
>>>> ENSMUSG00000027854 (screen-shot from UCSC in attachment) suggests the 3
>>>> exons in a flattened gene model. However, the DEXSeq results lists 13 exons
>>>> (here showing the output of htseq-count):
>>> Not sure why you say the *gene* only has 3 exons ... you have
>>> highlighted one isoform of the gene which has very few exons, but you
>>> can from both your picture and the exons definitions you pasted below
>>> for ENSMUSG00000027854 (presumably that's Csde1 :-) that if you
>>> consider all of the isoforms of the gene together, it has many more
>>> than just three exons.
>>>
>>> Know what I mean?
>> It is not Csde1 :s
>>
>>>> Between exon1 is only 1 base long (?) and exons1 to 4 are contiguous. As far
>>>> as I am aware, DEXSeq model should have flattened all of these into one
>>>> single "exon". Is this correct? is the error coming from the gtf? (at the
>>>> end of the message there is also the gene annotation in the gtf).
>>> I'm trying to parse the various exon annotations from your email, but
>>> I don't see where the 1-width exon is.
>> This one:
>> chr3    mm10_ensGene.gtf    exonic_part    102995728    102995729 .    +    .    transcripts "ENSMUST00000029447"; exonic_part_number "001"; gene_id "ENSMUSG00000027854"
>>
>> Unless I calculated it incorrectly.
>>
>>> Figure 1 from their paper shows pretty clearly how the "break down" of
>>> exons are calcualted across isoforms to create *counting bins* -- just
>>> keep in mind that these things are not necessarily "exons" anymore.
>> Yes I am aware of that but I should have been clearer in the distinction from "exon" and counting bin. I thin that with the new screenshot it will become more apparent what I mean.
>>>> This is specially concerning for me because I am interested in selecting the
>>>> first and last exon of genes, using the exon ranking from DEXSeq, to analyze
>>>> further.
>>> I'm not sure if what I posted was at all helpful, but if someone else
>>> doesn't do a better job of providing you with the answer you were
>>> looking for, you might try to draw a figure of a gene model (with a
>>> few splicing isoforms) and point out what it is, exactly, that you
>>> hope to extract from it.
>>>
>>> While it's clear what "First and last" exon of a *single transcript
>>> isoform* of a gene might be, it might get hairy when you start
>>> summarizing the "counting bins" across multiple isoforms of the same
>>> gene.
>> True. I am only using the DEXseq results as a quick and dirty approach before I get data from other tools which handle better. For example, miso has annotations for alternative polyadenilation and Cufflinks provides some information on alternative promoter usage. Regardless, if the gene model is incorrect, which I hope it is and this is only me being thick, then DEXseq results from some counting bins not be trustworthy.
>>
>>
>>> Oh, and by the way:
>>>
>>>
>>>> Hi Bioconductors,
>>>>
>>>> I happened upon a funny thing in DEXseq: a gene which appears to have more
>>>> exons in the final DEXseq output than the annotation suggests. The gene
>>>> ENSMUSG00000027854 (screen-shot from UCSC in attachment) suggests the 3
>>>> exons in a flattened gene model.
>>> I'd argue that the isoform of the gene that you highlighted in your
>>> original screen shot only has*two*  exons
>>>
>>> -steve
>> ehehe, correct.
>>
>>> HTH,
>>> -steve
>>>
>> Cheers,
>> António
>>
>>
>> <ENSMUSG00000027854.png>_______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
António Miguel de Jesus Domingues, PhD
Postdoctoral researcher
Deep Sequencing Group - SFB655
Biotechnology Center (Biotec)
Technische Universität Dresden
Fetscherstraße 105
01307 Dresden

Phone: +49 (351) 458 82362
Email: antonio.domingues(at)biotec.tu-dresden.de
--
The Unbearable Lightness of Molecular Biology



More information about the Bioconductor mailing list