[BioC] Recommended gene model for DESeq

Simon Anders anders at embl.de
Sat Apr 7 00:59:30 CEST 2012


Hi Gordon

On 2012-04-06 23:24, Assaf Gordon wrote:
> Once per gene - got it. What about a case where a read matches
> multiple genes? (described as "ambiguous" in
> HTSeq-Count/GenomicRanges "modes") Is it OK to count this read
> several times (once for each gene, multiple different genes), or
> would that invalidate the results?

HTSeq-count discards reads that map to several genes and counts vthem as 
"ambiguous". I've explained the reason a while ago in a
SeqAnswers thread, in post #4 here: 
http://seqanswers.com/forums/showthread.php?t=9129

"Imagine we have two paralogous genes that have identical sequence at
one half or their length and divergent sequence at the other half, and
one of these genes is differentially expressed and the other is not. All
reads that stem from the identical-sequence parts of the transcripts
will map to both genes, and if we include them in our counts, both genes
will appear to be differentially expressed, even though only one is
really. If we count only the uniquely mapping reads (i.e., those
stemming from the divergent parts of the transcripts), we are safe."


   Simon



More information about the Bioconductor mailing list