[BioC] edgeR-DeSeq - inconsistency between Variance and Coefficient of Variation
Gordon K Smyth
smyth at wehi.EDU.AU
Sat Mar 31 01:45:32 CEST 2012
Dear Miguel,
There is no assumption in edgeR that variances be equal between groups.
The variance depends on the mean, and the mean depends on the library
size, and there is no assumption that library sizes are equal between
groups. So computing the variance is not of interest.
The concept of biological coefficient of variation, or coefficient of
biological variation, is due to Robinson, McCarthy and Smyth
(Bioinformatics 2010) and is explained more fully in McCarthy, Chen and
Smyth (Nucleic Acids Research, 2012), so you might find it helpful to read
the description of it in the latter paper.
I can't answer your other questions because you're doing your own
personalized analysis pipeline. I can only help posters using the
standard edgeR pipelines. I may be understanding poorly, but your
questions seem to be specific to your own data or to your own analysis
approach.
Best wishes
Gordon
> Date: Fri, 30 Mar 2012 10:30:09 +0200
> From: Miguel Gallach <miguel.gallach at univie.ac.at>
> To: Bioconductor mailing list <bioconductor at r-project.org>
> Subject: [BioC] edgeR-DeSeq - inconsistency between Variance and
> Coefficient of Variation?
>
> Dear list,
>
> I posted a question few days ago with any success, so I decided to try
> again, explaining better my question and changing the header.
>
> I am analyzing RNA-Seq data with edgeR - DeSeq. I have two biological
> groups, two replicates each, and I want to test DE between the two
> biological groups.
>
> For instance, with edgeR, I calculated tagwise dispersion for each gene.
> With this dispersion data, I calculated the variance according to the
> formula V = mu *( 1 + dispersion * mu). I used the definition from
> http://seqanswers.com/forums/showthread.php?t=5591&highlight=edgeR+variance.
>
> When I plot the correlation of the variance between the two biological
> groups, I found they have very good correlation. According to this
> result we can conclude that for most genes the variance is equal between
> groups. From this it comes my first question: Is the assumption of equal
> variances a requisite to perform the DE test?
>
> After this, I calculated the sqrt(dispersion) for every gene, i.e.,
> according to edgeR and DeSeq manuals, the coefficient of biological
> variation (i.e, C.V = s.d./mean = sqrt(dispersion)). Well, when I plot
> the correlation of the C.V. between the two biological groups, what I
> found now is that the C.V. for one biological group is systematically
> higher than the C.V. in the other group. In other words, for most genes
> in group 1, the C.V. is higher than that in group 2. This result can be
> nicely seen as a regression line that is parallel an above the expected
> y = x. Indeed I found something like y = 0.11 + x.
>
> This result scares me a lot. If I understood well, since C.V.1 > C.V.2;
> sqrt(var1)/mean1 > sqrt(var2)/mean2; since var1 ~ var2, then mean1<mean2
> for most of the genes, which is obviously false. What am I missing? How
> is it possible that two groups have similar variance but one group have
> higher C.V. than the other (for most genes!)?
>
> I did not check this with DeSeq yet, but I assume the results will be
> similar (given that the amount of DE genes are similar and congruent).
>
> Any help would be very appreciated.
>
> Many thanks,
> Miguel Gallach
>
______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}
More information about the Bioconductor
mailing list