[BioC] nested factors in edgeR [was: something wrong with my design matrix in edgeR]
Gordon K Smyth
smyth at wehi.EDU.AU
Fri Apr 27 04:05:58 CEST 2012
Dear Li,
Thanks for showing the targets file -- that makes your experiment very
clear.
Your design matrix can be seen to be incorrect because it has five columns
for an experiment that has only four distinct groups. The problem is that
the populations are subgroups, nested within the latitudes north and
south. So your two factors are nested rather than crossed.
I assume you want to test for differences between the two north
populations (FBK and NWL), and then for differences between the two south
populations (SOU and CAR), and then later for N vs S. There are many ways
to do this, but the following is perhaps the easiest:
pop <- factor(targets$pop, levels=c("FBK","NWL","SOU","CAR"))
design <- model.matrix(~pop)
Then later
fit <- glmFit(d, design)
so the coefficients 2-3 compare NWL, SOU and CAR respectively back to FBK.
To test NWL vs FBK (within the north group):
lrt <- glmFit(d, fit, coef=2)
topTags(lrt)
To test CAR vs SOU (within the south group):
lrt <- glmFit(d, fit, contrast=c(0,0,-1,1))
topTags(lrt)
To test S vs N overall:
lrt <- glmFit(d, fit, contrast=c(0,-1,1,1)/2)
topTags(lrt)
Best wishes
Gordon
--------------- original message --------------
[BioC] something wrong with my design matrix in edgeR
Wang, Li li.wang at ttu.edu
Thu Apr 26 18:56:46 CEST 2012
Dear List members
My targets include 31 samples representing two groups (south and north)
and also four populations (FBK, NWL, SOU, CAR).
> targets <- read.delim(file = "Targets.txt", stringsAsFactors = FALSE)
> targets
files group pop
1 FBK01.txt N FBK
2 FBK04.txt N FBK
3 FBK05.txt N FBK
4 FBK08.txt N FBK
5 FBK09.txt N FBK
6 FBK10.txt N FBK
7 FBK13.txt N FBK
8 NWL02.txt N NWL
9 NWL04.txt N NWL
10 NWL06.txt N NWL
11 NWL08.txt N NWL
12 NWL09.txt N NWL
13 NWL10.txt N NWL
14 NWL11.txt N NWL
15 NWL14.txt N NWL
16 SOU01.txt S SOU
17 SOU02.txt S SOU
18 SOU03.txt S SOU
19 SOU04.txt S SOU
20 SOU07.txt S SOU
21 SOU09.txt S SOU
22 SOU11.txt S SOU
23 SOU15.txt S SOU
24 CAR01.txt S CAR
25 CAR02.txt S CAR
26 CAR03.txt S CAR
27 CAR04.txt S CAR
28 CAR05.txt S CAR
29 CAR11.txt S CAR
30 CAR12.txt S CAR
31 CAR14.txt S CAR
We are interested with the differential expression between north and south
group by adjusting for differences between populations.
> latitude <- factor(targets$group)
> latitude <- relevel(latitude, ref="N")
> pop <- factor(targets$pop)
> design <- model.matrix(~pop+latitude)
> rownames(design) <- rownames(d$samples)
> design
(Intercept) popFBK popNWL popSOU latitudeS
1 1 1 0 0 0
2 1 1 0 0 0
3 1 1 0 0 0
4 1 1 0 0 0
5 1 1 0 0 0
6 1 1 0 0 0
7 1 1 0 0 0
8 1 0 1 0 0
9 1 0 1 0 0
10 1 0 1 0 0
11 1 0 1 0 0
12 1 0 1 0 0
13 1 0 1 0 0
14 1 0 1 0 0
15 1 0 1 0 0
16 1 0 0 1 1
17 1 0 0 1 1
18 1 0 0 1 1
19 1 0 0 1 1
20 1 0 0 1 1
21 1 0 0 1 1
22 1 0 0 1 1
23 1 0 0 1 1
24 1 0 0 0 1
25 1 0 0 0 1
26 1 0 0 0 1
27 1 0 0 0 1
28 1 0 0 0 1
29 1 0 0 0 1
30 1 0 0 0 1
31 1 0 0 0 1
attr(,"assign")
[1] 0 1 1 1 2
attr(,"contrasts")
attr(,"contrasts")$pop
[1] "contr.treatment"
attr(,"contrasts")$latitude
[1] "contr.treatment"
I am wondering maybe there is something wrong with my design matrix. So,
when I tried to estimate the dispersions, I got the following errors.
> d <- estimateGLMCommonDisp(d, design)
Error in solve.default(R, t(beta)) :
system is computationally singular: reciprocal condition number =
5.16368e-17
The following is session information. Any comments are very appreciated.
> sessionInfo()
R version 2.15.0 (2012-03-30)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] qvalue_1.30.0 edgeR_2.6.0 limma_3.12.0
loaded via a namespace (and not attached):
[1] tcltk_2.15.0 tools_2.15.0
Thanks in advance!
Best wishes
Li
______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}
More information about the Bioconductor
mailing list