[Bioc-sig-seq] assess how many duplicated reads
Martin Morgan
mtmorgan at fhcrc.org
Fri Aug 12 05:34:03 CEST 2011
On 08/11/2011 09:50 AM, Kunbin Qu wrote:
> Hi, I have some human single end RNA-seq runs on HiSeq. Can I have
> some suggestions on how to assess how many duplicated reads out of
> these libraries? I looked around srFilter() in ShortRead, but have
> not had a clear thought on how to implement it? Should I use IRanges
> as an alternative to assess the unique starting site after the
> mapping? If so, what function do you suggest? I'd like to count reads
> which map to the same location (even with some mismatches) as
> duplicates. Thanks.
ShortRead::tables() could be used for exactly identical unaligned reads.
ShortRead::occurrenceFilter is an implementation for non-gapped, aligned
reads. For aligned reads with gaps I think you're on your own, but maybe
GRanges::readGappedAlignments or Rsamtools::scanBam + the logic of
ShortRead::occurrenceFilter would be a starting point. Perhaps your
aligner has already flagged duplicate reads, in which case the 'flag'
field available in scanBamParam and scanBam would be helpful.
Hope that is of some help.
Martin
>
> -Kunbin
>
>
>
> ______________________________________________________________________
>
>
The contents of this electronic message, including any attachments, are
intended only for the use of the individual or entity to which they are
addressed and may contain confidential information. If you are not the
intended recipient, you are hereby notified that any use, dissemination,
distribution, or copying of this message or any attachment is strictly
prohibited. If you have received this transmission in error, please send
an e-mail to postmaster at genomichealth.com and delete this message, along
with any attachments, from your computer.
> [[alternative HTML version deleted]]
>
> _______________________________________________ Bioc-sig-sequencing
> mailing list Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
--
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
Location: M1-B861
Telephone: 206 667-2793
More information about the Bioc-sig-sequencing
mailing list