[BioC] edgeR design matrix, one group vs average of other groups

Tue Mar 11 18:54:22 CET 2014

Dear Bioconductors,

I am working on RNA-seq data with multiple experimental factors and I am
trying to reproduce the edgeR manual, chapter 3.2.3, GLM approach.

> design <- model.matrix(~0+group, data=y$samples)
> colnames(design) <- levels(y$samples$group)
> design
  		A	B	C
sample.1	1	0	0
sample.2	1 	0 	0
sample.3 	0 	1 	0
sample.4 	0	1	0
sample.5 	0 	0 	1

> fit <- glmFit(y, design)

I want to know which genes are differentially expressed in C compared to
the other groups, so I chose to compare C to the average of A and B

> lrt <- glmLRT(fit, contrast=c(-0.5,-0.5,1))

Alternatively I could put A and B in a single group

> design
  		A.B	C
sample.1	1	0
sample.2	1 	0
sample.3 	1 	0
sample.4 	1	0
sample.5 	0 	1

> fit <- glmFit(y, design)

an compare C to A.B

> lrt <- glmLRT(fit, contrast=c(-1,1))

When I try this with my own data, the first approach gives me many more
differentially expressed genes than the second one, but the second gene
set is a subset of the first one. I would be very grateful if somebody
could explain to me what is the difference between the approaches, and
which one is the more appropriate for my purpose (find genes specific
for condition C)

Best wishes,

Georg

> sessionInfo()

R version 3.0.1 (2013-05-16)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=C                 LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] limma_3.18.13

loaded via a namespace (and not attached):
[1] compiler_3.0.1 tools_3.0.1