[Bioc-sig-seq] unique reads count
    Martin Morgan 
    mtmorgan at fhcrc.org
       
    Wed Jan 27 22:06:55 CET 2010
    
    
  
Hi Joseph --
On 01/27/2010 12:33 PM, joseph wrote:
> Hello
> I have a ShortReadQ object: 
>> rfq
> class: ShortReadQ
> length: 16115723 reads; width: 34 cycles
> 
> I used the negation of the result from srduplicated to count the unique reads:
>> sum(!srduplicated(sread(rfq)))
> [1] 4545719
> 
> But also I looked at the frequency with which each read occurs using the tables function:
>> head(tables(rfq_s_3_mel)$distribution)
>   nOccurrences  nReads
> 1            1 4022038
> 2            2  255649
srduplicated is behaving like 'duplicated', which is to return TRUE when
an element has already been seen
> duplicated(c("A", "B", "B", "C"))
[1] FALSE FALSE  TRUE FALSE
There's one duplicate, the second 'B'.
After example(srduplicated) I have
> tables(sread(rfq))$distribution
  nOccurrences nReads
1            1    239
2            2      7
3            3      1
> sum(srduplicated(sread(rfq)))
[1] 9
there are 7 reads that are the second of two reads, and 2 reads that are
the second and third of three reads.
Martin
> 
> I expected that for nOccurrences=1, the nReads should be the same as what I got with !srduplicated.
> 
> Can anybody explain why I got different counts?
> Thank you
> Joseph Dhahbi
> 
> 
>       
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
    
    
More information about the Bioc-sig-sequencing
mailing list