[Bioc-sig-seq] Questions on the edgeR package

Wed Feb 9 23:43:10 CET 2011

Hi Jason.

Some answers below.

On 2011-02-10, at 5:41 AM, Jason Lu wrote:

> Hi,
> 
> I have two questions on the edgeR package, and hope someone here could help.
> 
> Question-1. The calculation of the effective library size for the
> two-library case has been a mystery to me. In the Genome Biology
> (2010) paper, it was mentioned that "the effective library sizes are
> calculated by multiplying/dividing the square root of the estimated
> normalization factor with the original library size.". When I use the
> edgeR to compute the norm.factors, it actually comes up two
> norm.factors. From what I read from other posts, for the
> multiple-library case the effective library sizes are the
> original.lib.size * norm.factors. My question is how to compute the
> effective lib.sizes for the two-lib case ( used by the sage.test in
> the next step).
>> d = calcNormFactors(d)
>> d
> An object of class "DGEList"
> $samples
>   group lib.size norm.factors
> c1     1  5248721        0.979
> c2     0  6352217        1.021

A few things have changed over time.  Awhile back, the normalization factors were explicitly relative a reference column.  Now, they are still calculated against a reference column, but themselves are normalized to multiply to 1.  If you plan to use sage.test() -- which I should point out will come out roughly equivalent to exactTest() with no replicates -- then yes, you should use the product of lib.size*norm.factors.  This is actually equivalent (on 2 libraries) to the earlier practice of only having 1 normalization factor and partitioning it with square roots to each library.  For your example above, something like:

D <- d$counts
f <- d$samples$lib.size*d$samples$norm.factors
st <- sage.test( D[,1], D[,2], n1=f[1], n2=f[2] )

> Question-2. I encountered a strange error when I try to run
> "estimateTagwiseDisp" (below). It works fine with "estimateCommonDisp"
> though.
> 
>> load("~/ctable.rda")
>> library(edgeR)
>> 
>> grp = rep(c('cancer','normal'),each=3)
>> d = DGEList(counts = ctab[,1:6], group = grp)
> Calculating library sizes from column totals.
>> d = calcNormFactors(d)
>> 
>> setwd("~/samplerun/seqf/comp")
>> 
>> options(digits = 4)
>> d = d[rowSums(d$counts)>5,]
>> d.2 <- estimateTagwiseDisp(d, prior.n = 15)
> Error in t.default(object$counts) : argument is not a matrix

estimateTagwiseDisp() requires estimateCommonDisp() to be called first. You can fix this by calling:

[...]
d = d[rowSums(d$counts)>5,]
d <- estimateCommonDisp(d)
d.2 <- estimateTagwiseDisp(d, prior.n = 15)

Of course, we should have a more informative error.  Thanks for pointing this out, we'll make a change.

Cheers,
Mark

>> class(d$counts)
> [1] "matrix"
>> d$counts[1:2,]
>   c1 c2 c3  n1 n2  n3
> 1  25 32 40 100 96 102
> 10  1  0  0   7  2   0
>> d$samples
>    group lib.size norm.factors
> c1 cancer  5248721       0.9049
> c2 cancer  6352217       0.9388
> c3 cancer 12012184       0.9409
> n1 normal  7477196       1.2217
> n2 normal  8012616       1.1123
> n3 normal  6691637       0.9206
>> sessionInfo()
> R version 2.13.0 Under development (unstable) (2011-01-20 r54055)
> Platform: x86_64-unknown-linux-gnu (64-bit)
> 
> locale:
> [1] C
> 
> attached base packages:
> [1] stats     graphics  grDevices datasets  utils     methods   base
> 
> other attached packages:
> [1] edgeR_2.1.10
> 
> loaded via a namespace (and not attached):
> [1] limma_3.7.22
> 
> Thanks for help.
> 
> Jason
> 
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

------------------------------
Mark Robinson, PhD (Melb)
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: mrobinson at wehi.edu.au
e: m.robinson at garvan.org.au
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
------------------------------

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:6}}