[BioC] EdgeR condition-specific dispersion

Gordon K Smyth smyth at wehi.EDU.AU
Fri Oct 5 09:41:16 CEST 2012


Dear Thomas,

It does make sense to estimate condition-specific dispersions, but most of 
the time it isn't worthwhile to do so, and the only penalty for not doing 
so when you could have is some loss of statistical power (fewer DE genes). 
It makes sense when a perturbed condition is more variable than a 'normal' 
condition, for example cancer tumour vs normal tissue, or knockout vs 
wildtype.  For it to be worthwhile, there must be a substantial difference 
between in variability and a relatively large number of replicate samples 
in each group.  It is almost certainly not worthwhile if you only have 2-3 
replicates in each condition.

I wonder how you have established that the dispersion varies with the 
combination of cues?  By running edgeR separately on different conditions? 
Otherwise you might be examining standard deviations rather than 
dispersions, and they are not the same thing.

Is the sequencing depth similar between the different conditions?  If the 
library sizes are different, then edgeR will assign different variances to 
different observations, even though the dispersions might be the same.

Anyway, edgeR is limited to estimating the dispersion at the gene level. 
It cannot be easily modified to estimate the dispersion on a 
condition-specific basis.

On the other hand, voom (a function in the limma package) estimates 
observation-specific dispersions, and can be easily modified to do so in a 
condition-specific manner.  This is part of the work of Charity Law, who 
is currently writing up her PhD thesis.  If you really need to go in this 
direction, I can show you how to do so using voom.

Best wishes
Gordon

> Date: Tue, 2 Oct 2012 17:15:47 +0000
> From: Thomas Frederick Willems <twillems at mit.edu>
> To: "bioconductor at stat.math.ethz.ch" <bioconductor at stat.math.ethz.ch>
> Subject: [BioC] EdgeR condition-specific dispersion
>
> I'm dealing with a factorial RNA-seq data set in which cells have been 
> stimulated with various combinations of extra-cellular cues. As such, I 
> was interested in applying the GLM framework in edgeR to assess the 
> contribution of each extra-cellular cue to the differential expression 
> of certain genes. My concern, however, is that both the expression level 
> and the dispersion of each gene varies greatly with the combination of 
> cues. EdgeR doesn't seem to estimate condition-specific dispersion but 
> rather one dispersion per gene (if the tagwise options is used). My 
> question is therefore two-fold:

> 1) Does it make sense to want to estimate condition-specific 
> dispersions?

> 2) Is there a way to modify the edgeR framework so that it does this?
>
> Thanks
> Thomas


______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioconductor mailing list