[Bioc-sig-seq] Questions on the edgeR package
Mark Robinson
mrobinson at wehi.EDU.AU
Wed Feb 9 23:43:10 CET 2011
Hi Jason.
Some answers below.
On 2011-02-10, at 5:41 AM, Jason Lu wrote:
> Hi,
>
> I have two questions on the edgeR package, and hope someone here could help.
>
> Question-1. The calculation of the effective library size for the
> two-library case has been a mystery to me. In the Genome Biology
> (2010) paper, it was mentioned that "the effective library sizes are
> calculated by multiplying/dividing the square root of the estimated
> normalization factor with the original library size.". When I use the
> edgeR to compute the norm.factors, it actually comes up two
> norm.factors. From what I read from other posts, for the
> multiple-library case the effective library sizes are the
> original.lib.size * norm.factors. My question is how to compute the
> effective lib.sizes for the two-lib case ( used by the sage.test in
> the next step).
>> d = calcNormFactors(d)
>> d
> An object of class "DGEList"
> $samples
> group lib.size norm.factors
> c1 1 5248721 0.979
> c2 0 6352217 1.021
A few things have changed over time. Awhile back, the normalization factors were explicitly relative a reference column. Now, they are still calculated against a reference column, but themselves are normalized to multiply to 1. If you plan to use sage.test() -- which I should point out will come out roughly equivalent to exactTest() with no replicates -- then yes, you should use the product of lib.size*norm.factors. This is actually equivalent (on 2 libraries) to the earlier practice of only having 1 normalization factor and partitioning it with square roots to each library. For your example above, something like:
D <- d$counts
f <- d$samples$lib.size*d$samples$norm.factors
st <- sage.test( D[,1], D[,2], n1=f[1], n2=f[2] )
> Question-2. I encountered a strange error when I try to run
> "estimateTagwiseDisp" (below). It works fine with "estimateCommonDisp"
> though.
>
>> load("~/ctable.rda")
>> library(edgeR)
>>
>> grp = rep(c('cancer','normal'),each=3)
>> d = DGEList(counts = ctab[,1:6], group = grp)
> Calculating library sizes from column totals.
>> d = calcNormFactors(d)
>>
>> setwd("~/samplerun/seqf/comp")
>>
>> options(digits = 4)
>> d = d[rowSums(d$counts)>5,]
>> d.2 <- estimateTagwiseDisp(d, prior.n = 15)
> Error in t.default(object$counts) : argument is not a matrix
estimateTagwiseDisp() requires estimateCommonDisp() to be called first. You can fix this by calling:
[...]
d = d[rowSums(d$counts)>5,]
d <- estimateCommonDisp(d)
d.2 <- estimateTagwiseDisp(d, prior.n = 15)
Of course, we should have a more informative error. Thanks for pointing this out, we'll make a change.
Cheers,
Mark
>> class(d$counts)
> [1] "matrix"
>> d$counts[1:2,]
> c1 c2 c3 n1 n2 n3
> 1 25 32 40 100 96 102
> 10 1 0 0 7 2 0
>> d$samples
> group lib.size norm.factors
> c1 cancer 5248721 0.9049
> c2 cancer 6352217 0.9388
> c3 cancer 12012184 0.9409
> n1 normal 7477196 1.2217
> n2 normal 8012616 1.1123
> n3 normal 6691637 0.9206
>> sessionInfo()
> R version 2.13.0 Under development (unstable) (2011-01-20 r54055)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] C
>
> attached base packages:
> [1] stats graphics grDevices datasets utils methods base
>
> other attached packages:
> [1] edgeR_2.1.10
>
> loaded via a namespace (and not attached):
> [1] limma_3.7.22
>
> Thanks for help.
>
> Jason
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
------------------------------
Mark Robinson, PhD (Melb)
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: mrobinson at wehi.edu.au
e: m.robinson at garvan.org.au
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
------------------------------
______________________________________________________________________
The information in this email is confidential and intend...{{dropped:6}}
More information about the Bioc-sig-sequencing
mailing list