[BioC] edgeR: tagwise dispersion in 2-factorial vs. 1-factorial design

Wed Apr 25 04:12:35 CEST 2012

Dear Henning,

Making decisions about how whether to analyse a data set as a whole or in 
pieces depends on the specifics of your problem and your data, and there 
is no univeral answer.  I can tell you however that I almost always 
analyse all the data from one study together, i.e., I would most often use 
the 2-factorial approach.  Generally it pays to pool information about the 
dispersion from multiple groups.  Of course you should do some exploratory 
analysis using a MDS plot or similar to see if there are any problem 
libraries, for any of the three genotypes.

Best wishes
Gordon

> Date: Mon, 23 Apr 2012 14:07:55 +0200
> From: "Henning Wildhagen" <HWildhagen at gmx.de>
> To: bioconductor at r-project.org
> Subject: [BioC] edgeR: tagwise dispersion in 2-factorial vs.
> 	1-factorial	design
>
> Hi,
>
> i am analysing a two-factorial RNA-seq experiment with edgeR. The design 
> of my study has two factors, genotype and treatment. Genotype has three 
> levels (A,B,C), "treatment" has two levels ("control", "stress"). The 
> first and most important question that i want to answer is which 
> transcripts are affected by treatment in each of the three genotypes. I 
> did this analysis by specifying a two-factorial model and subsequently 
> selecting coefficients/contrasts to test for the treatment effect 
> genotype-wise. Of course, this type of analysis can also be done in a 
> 1-factorial way, i.e. by defining three separate DGEList-objects for 
> each genotype and then performing an exactTest for the treatment effect 
> for each of the three DGEList-objects/genotypes. For one of the 
> genotypes, say "A", the latter analysis gives approximately 60% more DE 
> genes compared to the DE-analysis based on the 2-factorial model. For 
> the other two genotypes, the number of DE genes is almost the same in 
> the two analyses. My first guess was, that this finding this related to 
> the differences in the estimation of the tagwise dispersion. In the 
> two-factorial analysis, one and the same dispersion estimate per 
> transcript is used to test for DE. In the 1-factorial analysis, three 
> dispersion estimates are calculated per transcript, one for each 
> genotype. When comparing the distributions of genotype-wise dispersion 
> estimates of the 1-factorial analysis with the "common" tagwise 
> dispersion of the 2-factorial model, i see that the median is higher and 
> the range of the 95%tiles is wider for genotypes B, C and the "common" 
> dispersion of the 2-factorial model, compared to genotype "A".

> Now my question is which analysis is more reliable, the 2-factorial or the 1-factorial?
>
> Thanks for any help or comments on this problem,
>
> Henning
>
> ------------------------------------------------------
> Dr. Henning Wildhagen
> Forest Research Institute Baden-W?rttemberg
> Freiburg, Germany
> -- 
>

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}