[BioC] edgeR: Specifying the "coef"-argument in glmLRT in a two factor study

Wed Mar 28 15:11:59 CEST 2012

Hi all,

i am analysing a two-factorial RNA-seq experiment using edgeR. The design of my study has two factors, genotype and treatment. 
Genotype has three levels (A,B,C), "treatment" has two levels ("control", "stress").
I have some questions regarding the usage of the "coef" and "contrast"-argument using the glmLRT()-function as outlined below:

#####################################################################
BEGIN CODE
library(edgeR)
# Generate a dataframe of neg.binom counts 
set.seed(50)
s <- as.data.frame(matrix(rnbinom(3000,mu=20,size=20),ncol=3))
u <- as.data.frame(matrix(rnbinom(3000,mu=30,size=25),ncol=3))
v <- as.data.frame(matrix(rnbinom(3000,mu=40,size=20),ncol=3))
w <- as.data.frame(matrix(rnbinom(3000,mu=50,size=25),ncol=3))
x <- as.data.frame(matrix(rnbinom(3000,mu=60,size=20),ncol=3))
y <- as.data.frame(matrix(rnbinom(3000,mu=70,size=25),ncol=3))
z <- as.data.frame(cbind(s,u,v,w,x,y))
names(z) <- c("A_Con1", "A_Con2","A_Con3","A_Str1", "A_Str2","A_Str3", 
"B_Con1","B_Con2","B_Con3","B_Str1","B_Str2","B_Str3",
"C_Con1","C_Con2","C_Con3","C_Str1","C_Str2","C_Str3")

# Specify factors
genotype <- as.factor(c(rep("A",6),rep("B",6),rep("C",6)))
treatment <-as.factor(rep(c(rep("Control",3),rep("Stress",3)),3))

# DGEList-object
groups <- c("A_Con", "A_Con","A_Con","A_Str", "A_Str","A_Str", 
"B_Con","B_Con","B_Con","B_Str","B_Str","B_Str",
"C_Con","C_Con","C_Con","C_Str","C_Str","C_Str")

tmp1 <- DGEList(counts=z,group=groups)
tmp1 <- calcNormFactors(tmp1)

# Design matrix
design <- model.matrix(~genotype*treatment)

# Estimating Cox-Reid tagwise dispersion
tmp2 <- getPriorN(tmp1, design=design,prior.df=20)
tmp1 <- estimateTagwiseDisp(tmp1,prior.n=tmp2,trend="movingave",prop.used=0.3,grid.length=500)
summary(tmp1$tagwise.dispersion)
tmp1$common.disp

# Calling DE genes with the full model
glmfit.tmp1 <- glmFit(tmp1,design=design,dispersion=tmp1$tagwise.dispersion,method="auto")

# Testing for the effect of "treatment"
lrt.tmp1.treat <- glmLRT(tmp1,glmfit.tmp1,coef=4) #
treat <- topTags(lrt.tmp1.treat,n=nrow(lrt.tmp1.treat))
treat1 <- treat[[1]]$adj.P.Val < 0.5
table(treat1)
# 2 DE genes

END CODE
##################################################################

According to the vignette and previous posts on this issue, specifying coef=4 should test for the difference between
control and stress samples, irrespective of the level of the second factor.
However, since in this design matrix, the reference cell is "genotypeA-treatmentControl", i have the impression that specifying
coef=4 in the glmLRT()-function tests for the effect of treatment specifically on genotypeA rather than for the treatment
effect irrespective of "genotype".
>From working through the package "contrast", i conclude that for testing for the effect of treatment averaged over genotype,
i need to specify a contrast as follows:

### BEGIN CODE
lrt.tmp1.trt.av <- glmLRT(tmp1,glmfit.tmp1,contrast=c(0,0,0,1,1/3,1/3))
treat.av <- topTags(lrt.tmp1.trt.av,n=nrow(lrt.tmp1.trt.av))
treat.av1 <- treat.av[[1]]$adj.P.Val < 0.5
table(treat.av1) 
# 9 DE genes
sessionInfo()
#R version 2.14.0 (2011-10-31)
#Platform: x86_64-pc-linux-gnu (64-bit)
#
#locale:
# [1] LC_CTYPE=de_DE.UTF-8          LC_NUMERIC=C                 
# [3] LC_TIME=de_DE.UTF-8           LC_COLLATE=de_DE.UTF-8       
# [5] LC_MONETARY=de_DE.UTF-8       LC_MESSAGES=de_DE.UTF-8      
# [7] LC_PAPER=de_DE.UTF-8          LC_NAME=de_DE.UTF-8          
# [9] LC_ADDRESS=de_DE.UTF-8        LC_TELEPHONE=de_DE.UTF-8     
#[11] LC_MEASUREMENT=de_DE.UTF-8    LC_IDENTIFICATION=de_DE.UTF-8
#
#attached base packages:
#[1] stats     graphics  grDevices utils     datasets  methods   base     
#
#other attached packages:
#[1] edgeR_2.4.3  limma_3.10.3 JGR_1.7-9    iplots_1.1-4 JavaGD_0.5-5
#[6] rJava_0.9-3 
#
#loaded via a namespace (and not attached):
#[1] tools_2.14.0
### END CODE

So, my question is, do i really test for the treatment effect irrespective of genotype by specifiying "coef=4" (=contrast=c(0,0,0,1,0,0)) or rather
for the effect of treatment for the reference genotype ("A"). In the latter case, what would then be the correct approach to test
for the effect of treatment irrespective of genotype?
In either case, did i get it right that the obtained p-Values indicate the significance of the effect rather than for the test that the coefficients
differ from zero?

Thanks for your help, Henning

----------------------------------------------
Dr. Henning Wildhagen
Chair of Tree Physiology, University of Freiburg
Germany
--