[Bioc-sig-seq] What to do when single reads make up a large percentage of counts?

Jenny Drnevich drnevich at illinois.edu
Tue Aug 11 21:42:11 CEST 2009


Hi everyone,

I thought I would try this list before the general Bioconductor one 
because my question pertains to NGS counts, although in reality it's 
a general statistical theory question. I hope someone can help me or 
point me in the right direction! Typically, you cannot compare counts 
from different samples directly, but instead you have adjust by the 
total number of counts obtained for each sample, correct?  This 
assumes that any changes in the counts of particular sequences will 
not substantially affect the total count number... but what if it 
might? I'm helping a colleague with some data where they sequenced 
the 18-30 nt fraction of RNA to look for miRNAs; they got 1.1 to 2.1 
million reads per sample, but these aligned to only 187 miRNAs! Some 
of the miRNAs have up to 30% of all reads, which is a really large 
percentage. Say a miRNA "X" that is 30% of the reads doubles its 
count number in another sample, but the counts for all other miRNAs 
are the same. The new percentage of "X" in the second sample is not 
60%, but instead 46.15%, and the observed ratios of all the other 
miRNAs are decreased by a factor or 0.77 (= 1/1.3). Is there any way 
to correct for this? What do you do when the top 5 miRNAs make up 70% 
of the counts??

Thanks,
Jenny

Jenny Drnevich, Ph.D.

Functional Genomics Bioinformatics Specialist
W.M. Keck Center for Comparative and Functional Genomics
Roy J. Carver Biotechnology Center
University of Illinois, Urbana-Champaign

330 ERML
1201 W. Gregory Dr.
Urbana, IL 61801
USA

ph: 217-244-7355
fax: 217-265-5066
e-mail: drnevich at illinois.edu



More information about the Bioc-sig-sequencing mailing list