[Bioc-sig-seq] edgeR tagwise estimates not converging to common estimate with large prior.n value
Gordon K Smyth
smyth at wehi.EDU.AU
Sat Sep 17 02:47:21 CEST 2011
Dear Sean,
The dispersion estimation functions in edgeR have a lower limit for the
dispersions that they will estimate. For estimateCommonDisp(), the lower
limit is just above 0.0001. For estimateTagwiseDisp() the lower limit is
just above 0.001. For your data, the ideal dispersion estimate appears to
be zero, so the functions are simply returning to you the pre-set lower
limits.
I agree that was a bit sloppy of us (the edgeR authors) for the lower
limits to be inconsistent between the functions. The reason for
estimateTagwiseDisp() having a higher limit is that it does a grid search,
so we wanted to limit the number of grid points for computational
efficiency.
The new glm functions in edgeR, estimateGLMCommonDisp() etc have somewhat
less restrictive lower limits than the classic functions that you are
using.
The bottom line is that with technical data such as the yeast data, we do
not view the differences between dispersion estimates of 1e-3 or 1e-4 as
scientifically meaningful. We would simply observe that the dispersion
appears to be at the lower boundary, showing that the data has essentially
no biological variability. We would set the dispersions to be zero.
Best wishes
Gordon
> Date: Thu, 15 Sep 2011 18:03:28 -0700
> From: Sean Ruddy <sruddy17 at gmail.com>
> To: bioc-sig-sequencing at r-project.org
> Subject: [Bioc-sig-seq] edgeR tagwise estimates not converging to
> common estimate with large prior.n value
>
> Hi,
>
> Thanks in advance for any help. I have the latest R software (2.13.1) and
> edgeR software (2.8.4). I'm running into a problem where I estimate a common
> dispersion parameter of 0.0001 and when I subsequently estimate tagwise
> dispersions using the default prior.n = 10, the summary statistics are
>
> Min. 1st Qu. Median Mean 3rd Qu. Max.
> 0.001 0.001 0.001 0.001 0.001 0.022
>
> ie, all estimates are 10 times larger than the common dispersion estimate.
> Since the method is supposed to shrink toward the common value this seems a
> little surprising. When I increase prior.n to a large number I expect the
> tagwise estimates to all converge to the common dispersion, but as you might
> guess from the table above it converges to 0.001 = 10*common.
>
> The data comes from the bioconductor package "yeastRNASeq" and it appears
> from the description of the data that the two samples in each group are
> actually from sequencing the same extraction of mRNA, ie not biological and
> not even really technical replicates. So the common dispersion should be
> zero as the counts should follow the poisson.
>
> I cannot explain the behavior of the estimates but I'm afraid it might be
> something in the code so I'll include that below.
>
> library(yeastRNASeq)
> data( geneLevelData )
> d <- DGEList( geneLevelData , group = c( rep( "Mutant" , 2 ) , rep( "Wild" ,
> 2 ) ) )
> d <- calcNormFactors( d )
> d <- d[rowSums(d$counts) >= 5, ]
> d <- estimateCommonDisp( d )
>
> d$common.dispersion
> [1] 0.000101
>
> d <- estimateTagwiseDisp( d , prior.n = 10 )
>
> summary( d$tagwise.dispersion )
> Min. 1st Qu. Median Mean 3rd Qu. Max.
> 0.001 0.001 0.001 0.001 0.001 0.022
>
> d <- estimateTagwiseDisp( d , prior.n = 1000 )
>
> summary( d$tagwise.dispersion )
> Min. 1st Qu. Median Mean 3rd Qu. Max.
> 0.001 0.001 0.001 0.001 0.001 0.001
>
>
> It could just be an oddity of the data set itself but I don't have enough
> experience using edgeR across different RNA-Seq experiments to know how
> these methods should behave.
>
>
> Thanks,
> Sean
______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}
More information about the Bioc-sig-sequencing
mailing list