[BioC] How can I get the normalized read counts from TMM?

Thu Oct 11 10:28:27 CEST 2012

Hi Reanne,

I replied offline, but I reply again on-list.

> I have a question about TMM normalization used in EdgeR.
> 
> Q. How can I get the normalized read counts from TMM?

See the cpm() function ... for example:

counts.per.m <- cpm(d, normalized.lib.sizes=TRUE)

if 'd' is a DGEList object.  Also, see ?cpm

> I understand that calcNormFactors() produces two columns of information. The first is lib.size and the second is norm.factors. From what I have read, multiplying these two columns together gives an effective library size. Should I then be dividing the raw read counts by the effective library size to get normalized read counts? I am trying to do differential expression analysis on paired data using Fisher's exact test. By paired I mean I have two sets of data, sequenced on different platforms, but from the same patient. So I am looking for DEG caused by platform difference. Originally I was using RPKM values, but I am wondering if TMM would be better.

Your pairing is not the usual "biological" pairing, but you might consider reading the "3.4.1 Paired Samples" Section of the edgeR user's guide:

http://www.bioconductor.org/packages/2.11/bioc/vignettes/edgeR/inst/doc/edgeRUsersGuide.pdf

Do you have multiple pairs and want to look for consistent platform differences?  If so, this above option may be what you want.

Alternatively, if you just have a single pair, you might consider using the binomTest() -- vaguely similar to Fisher's exact -- and manually alter the n1 and n2 arguments to be the effective library sizes; this is how "normalization" is achieved, by modifying offsets, not the data.  See ?binomTest

Note that, fisher.test() and binomTest() and all the edgeR testing methods should take counts as input, not TMM-normalized values or RPKMs.

Best, Mark

> 
> 
> The article Normalization methods for Illumina high-throughput RNA sequencing data analysis describes normalized read counts in the following way.
> "To obtain normalized read counts, these normalization factors are re-scaled by the mean of the normalized library sizes. Normalized read counts are obtained by dividing raw read counts by these re-scaled normalization factors."
> 
> I would love some clarification of TMM as well as any opinions on my use of Fisher's exact test. Thanks for the help in advance.
> 
> Reanne Bowlby
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

----------
Prof. Dr. Mark Robinson
Bioinformatics
Institute of Molecular Life Sciences
University of Zurich
Winterthurerstrasse 190
8057 Zurich
Switzerland

v: +41 44 635 4848
f: +41 44 635 6898
e: mark.robinson at imls.uzh.ch
o: Y11-J-16
w: http://tiny.cc/mrobin

----------
http://www.fgcz.ch/Bioconductor2012