[Bioc-sig-seq] removing non-unique sequences and uniqueFilter
Martin Morgan
mtmorgan at fhcrc.org
Thu May 14 17:54:14 CEST 2009
Hi Tobias --
Tobias Straub <tstraub at med.uni-muenchen.de> writes:
> hi
>
> i would like to remove all non-unique sequences from a AlignedRead
> object. i thought that the uniqueFilter would help me to do so. in
> fact, the filter removes a considerable amount of reads, but when i
> call tables on the result object i still have lots of sequences
> occuring more than once.
> did i miss something?
The challenge is in defining what 'unique' is. From the help page
?uniqueFilter
uniqueFilter(withSread=TRUE, .name="UniqueFilter")
and
withSread: A 'logical(1)' indicating whether uniqueness includes the
read sequence ('withSread=TRUE') or is based only on
chromosome, position, and strand ('withSread=FALSE').
so uniqueFilter by default looks for reads that are identical in terms
of the actual sequence, and are also identical in terms of chromosome,
position, and strand of alignment. 'tables' is based on just the
reads. If you wanted to make the reads unique, based only on sequence
identity, you could do something like
aln[!srduplicated(aln)]
Martin
> thanks in advance
> Tobias
>
> ----------------------------------------------------------------------
> Tobias Straub ++4989218075439 Adolf-Butenandt-Institute, München D
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
--
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
More information about the Bioc-sig-sequencing
mailing list