[Bioc-sig-seq] Dealing with pileups/duplicates in RNAseq
Steve Lianoglou
mailinglist.honeypot at gmail.com
Fri Apr 23 18:50:12 CEST 2010
Hi all,
Sorry for abusing the list (and *-seq terminology) as this isn't
really a Bioconductor-related question, but I was curious how you all
deal with "pileups" in RNAseq data. By pileup I mean separate
observations of the same read (ie. two++ different reads that map to
the same exact genomic locus), aka duplicate reads.
I'm pretty sure it's common practice to remove them in ChIP-seq
experiments since, I believe, they are usually assumed to be PCR
artifacts, but with genes being able to vary in their expression
level, removing all of them probably isn't a given.
That having been said, I have been removing them anyway. I think I've
seen some references to only keep N-many reads that map to the same
place, where N seems to be arbitrarily chosen at a global scale.
I guess it makes the most sense to probably determine N on a
gene-by-gene basis, perhaps by quantifying the expression of the gene
based on its uniquely-appearing reads, though.
So, I'm just curious if/how you folks are tackling this issue.
Thanks,
-steve
--
Steve Lianoglou
Graduate Student: Computational Systems Biology
| Memorial Sloan-Kettering Cancer Center
| Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact
More information about the Bioc-sig-sequencing
mailing list