[BioC] design matrix Limma design for paired t-test

Moshe Olshansky olshansky at wehi.EDU.AU
Mon Jun 18 09:26:44 CEST 2012


Hi Ingrid,

If I understand correctly, you would like to find genes which are
differentially expressed (DE) between Treatment and Control at 4 hours and
compare them with those which are DE at 18 hours.
One way to do it is to split your data into two separate sets ( 4 hours
and 18 hours) and find DE genes for each part separately (and then you
omit your Time column). But by doing so you reduce your ability to
estimate the variances.
So a preferable way would be to omit the time and have 4 conditions:
C4,T4,C18 and T18 (Control and Treatment at 4 and 18 hours). The you may
use MakeContrasts function of limma to find DE genes between T4 and C4 and
between T18 and C18.
If x is your targets file, i.e.

> x
         X                                           FileName Treatment
Donor Time
1   DC_4_4 US10463851_252665214446_S01_GE1_1010_Sep10_1_2.txt         T   
 4    4
2   SC_4_4 US10463851_252665214448_S01_GE1_1010_Sep10_1_2.txt         C   
 4    4
3  DC_18_4 US10463851_252665214447_S01_GE1_1010_Sep10_1_2.txt         T   
 4   18
4  SC_18_4 US10463851_252665214444_S01_GE1_1010_Sep10_1_3.txt         C   
 4   18
5   DC_4_5 US10463851_252665214448_S01_GE1_1010_Sep10_1_4.txt         T   
 5    4
6   SC_4_5 US10463851_252665214444_S01_GE1_1010_Sep10_1_1.txt         C   
 5    4
7  DC_18_5 US10463851_252665214446_S01_GE1_1010_Sep10_1_3.txt         T   
 5   18
8  SC_18_5 US10463851_252665214447_S01_GE1_1010_Sep10_1_4.txt         C   
 5   18
9   DC_4_6 US10463851_252665214445_S01_GE1_1010_Sep10_1_4.txt         T   
 6    4
10  SC_4_6 US10463851_252665214447_S01_GE1_1010_Sep10_1_3.txt         C   
 6    4
11 DC_18_6 US10463851_252665214448_S01_GE1_1010_Sep10_1_3.txt         T   
 6   18
12 SC_18_6 US10463851_252665214445_S01_GE1_1010_Sep10_1_3.txt         C   
 6   18
13  DC_4_7 US10463851_252665214444_S01_GE1_1010_Sep10_1_4.txt         T   
 7    4
14  SC_4_7 US10463851_252665214445_S01_GE1_1010_Sep10_1_2.txt         C   
 7    4
15 DC_18_7 US10463851_252665214447_S01_GE1_1010_Sep10_1_1.txt         T   
 7   18
16 SC_18_7 US10463851_252665214446_S01_GE1_1010_Sep10_1_1.txt         C   
 7   18
17  DC_4_8 US10463851_252665214444_S01_GE1_1010_Sep10_1_2.txt         T   
 8    4
18  SC_4_8 US10463851_252665214446_S01_GE1_1010_Sep10_1_4.txt         C   
 8    4
19 DC_18_8 US10463851_252665214445_S01_GE1_1010_Sep10_1_1.txt         T   
 8   18
20 SC_18_8 US10463851_252665214448_S01_GE1_1010_Sep10_1_1.txt         C   
 8   18
>

you can do
> y <- cbind(x$Donor,paste(x$Treatment,x$Time,sep="_"))
> colnames(y) <- c("Donor","Cond_Tim")
> y <- data.frame(y)
> y
   Donor Cond_Tim
1      4      T_4
2      4      C_4
3      4     T_18
4      4     C_18
5      5      T_4
6      5      C_4
7      5     T_18
8      5     C_18
9      6      T_4
10     6      C_4
11     6     T_18
12     6     C_18
13     7      T_4
14     7      C_4
15     7     T_18
16     7     C_18
17     8      T_4
18     8      C_4
19     8     T_18
20     8     C_18
>

Then

> design <- model.matrix(~Donor+Cond_Tim,y)
> colnames(design) <- gsub("Cond_Tim","",colnames(design))
> colnames(design)[1] <- "Intercept"
> design
    Intercept  Donor5 Donor6 Donor7 Donor8 C_4 T_18 T_4
1            1      0      0      0      0   0    0   1
2            1      0      0      0      0   1    0   0
3            1      0      0      0      0   0    1   0
4            1      0      0      0      0   0    0   0
5            1      1      0      0      0   0    0   1
6            1      1      0      0      0   1    0   0
7            1      1      0      0      0   0    1   0
8            1      1      0      0      0   0    0   0
9            1      0      1      0      0   0    0   1
10           1      0      1      0      0   1    0   0
11           1      0      1      0      0   0    1   0
12           1      0      1      0      0   0    0   0
13           1      0      0      1      0   0    0   1
14           1      0      0      1      0   1    0   0
15           1      0      0      1      0   0    1   0
16           1      0      0      1      0   0    0   0
17           1      0      0      0      1   0    0   1
18           1      0      0      0      1   1    0   0
19           1      0      0      0      1   0    1   0
20           1      0      0      0      1   0    0   0
attr(,"assign")
[1] 0 1 1 1 1 2 2 2
attr(,"contrasts")
attr(,"contrasts")$Donor
[1] "contr.treatment"

attr(,"contrasts")$Cond_Tim
[1] "contr.treatment"

So now your base level is Dono4, Control at 18 hours. I do not know
whether you version of R will produce identical results. Assuming it is
you can proceed as following (if X is your normalized and log-transformed
expression matrix):

> contr <- makeContrasts(h_18=T_18,h_4=T_4-C_4,levels=design)
> fit <- lmFit(X,design=design)
> fit <- contrasts.fit(fit,contr)
> fit <- eBayes(fit)
> tp4 <- topTable(fit,coef="h_4",number=nrow(X))
> tp18 <- topTable(fit,coef="h_18",number=nrow(X))

Now from tp4 and tp18 you can find genes which are DE at 4 hours and 18
hours respectively and then compare the two lists.
Just one warning: if you criterion for DE is logFC of 1 (fold change of 2)
and adjusted p-value of 0.05, it may happen that certain gene has logFC of
1.05 at 4 hours and 0.97 at 18 hours (and good p-value in both cases) and
then it is DE at 4 hours and not DE at 18 hours, but actually it's
behavior is not really different. Or it may happen that it has good logFC
under both conditions but adjusted p-value of 0.04 at 4 hours and of 0.06
at 18 hours, so once again it is DE at 4 hours and not DE at 18 hours, but
once again it's behavior is not very different. So watch out for such
genes - they are not what you are looking for.

Best regards,
Moshe.


> Thanks Moshe for your reply !
> It's very clear ! As you wrote, I want to test  is " if the effect of
> the treatment at 4 hours is different from the one at 18 hours, between
> Control and Treated cells ", but I don't see how change my design.
> Somebody can help me ?
>
> Cheers,
>
> Ingrid
>
> Ingrid MERCIER
> Mycobacterial Interactions with Host Cells Team
> Institute of Pharmacology&  Structural Biology
> CNRS - University of Toulouse
> BP 64182
> F-31077 Toulouse Cedex France
> Tel +33 (0)5 61 17 54 63
>
>
>
>
> Le 14/06/2012 03:44, Moshe Olshansky a écrit :
>> Hi Ingrid,
>>
>> With your design your "base" level is patient 4, Control, 4 hours (let's
>> call it B).
>> The mean for, say, patient 6, Treatment, 18 hours is:
>> B + Donor6 + TreatT + Time18
>> where Donor6 is the difference between Donor4 and Donor6 (same for any
>> treatment and time), TreatT is the difference between Treatment and
>> Control (independent of patient and time) and Time18 is the difference
>> between 18 hours and 4 hours (independent of patient and treatment).
>>
>> If you think that the effect of Treatment versus Control is the same at
>> 4
>> hours and 18 hours, then what you did is all right. If you think that
>> the
>> effect of the treatment at 4 hours may be different from the one at 18
>> hours, you need to change your design.
>>
>> Best regards,
>> Moshe.
>>
>>> Thanks a lot Belinda !!
>>>
>>> I mistaked so I replaced Time=Treat by Time only, and it's good.
>>> So, I have a last question : I 'm confused with the differents coef in
>>> topTable.
>>> I get genes but I tested several coef without understanding their
>>> significance.
>>> Somebody can explain me what mean coef="TreatT", or coef=
>>> "Time18",coef=
>>> " Donor5",coef= " Donor6", coef= "Donor7",coef= " Donor8".
>>> My main objective is to identidy the differential expressed genes
>>> between the Control donors and Treated Donors at 4 hours or 18 hours.
>>> I have no idea, which coef I have to use it.
>>>
>>> Cheers,
>>>
>>> Ingrid
>>>
>>> Ingrid MERCIER
>>> Mycobacterial Interactions with Host Cells Team
>>> Institute of Pharmacology&   Structural Biology
>>> CNRS - University of Toulouse
>>> BP 64182
>>> F-31077 Toulouse Cedex France
>>> Tel +33 (0)5 61 17 54 63
>>>
>>>
>>>
>>>
>>> Le 13/06/2012 08:45, Belinda Phipson a écrit :
>>>> Hi Ingrid
>>>>
>>>> The problem with your code is the following line:
>>>>>    Time=Treat=factor(Targets$Time)
>>>> Where you essentially set the time factor equal to the treat factor.
>>>>
>>>> Cheers,
>>>> Belinda
>>>>
>>>>
>>>> -----Original Message-----
>>>> From:bioconductor-bounces at r-project.org
>>>> [mailto:bioconductor-bounces at r-project.org] On Behalf Of Ingrid
>>>> Mercier
>>>> Sent: Wednesday, 13 June 2012 1:02 AM
>>>> To:bioconductor at r-project.org;smyth at wehi.edu.au
>>>> Subject: [BioC] design matrix Limma design for paired t-test
>>>>
>>>> Dear list and Gordon,
>>>>
>>>> I have some troubles to computed a moderated paired t-test in the
>>>> linear
>>>> model.
>>>> Here is my experimental plan :
>>>>
>>>> I used a single channel Agilent microarray.
>>>> 2 types of cells : Control (S) and Treated (T)
>>>> Fives human donors : 4-5-6-7-8
>>>> Two times of treatment : 4 hours and 18 hours
>>>>
>>>> I want to compare teh differential expresed genes between my C versus
>>>> T
>>>> at 4
>>>> hours and then at 18 hours.
>>>>
>>>> Here is my design :
>>>>
>>>>
>>>> My targets frame is :
>>>>>    Targets
>>>>             X                                           FileName
>>>> Treatment
>>>> Donor Time
>>>> 1   DC_4_4 US10463851_252665214446_S01_GE1_1010_Sep10_1_2.txt
>>>> T
>>>> 4    4
>>>> 2   SC_4_4 US10463851_252665214448_S01_GE1_1010_Sep10_1_2.txt
>>>> C
>>>> 4    4
>>>> 3  DC_18_4 US10463851_252665214447_S01_GE1_1010_Sep10_1_2.txt
>>>> T
>>>> 4   18
>>>> 4  SC_18_4 US10463851_252665214444_S01_GE1_1010_Sep10_1_3.txt
>>>> C
>>>> 4   18
>>>> 5   DC_4_5 US10463851_252665214448_S01_GE1_1010_Sep10_1_4.txt
>>>> T
>>>> 5    4
>>>> 6   SC_4_5 US10463851_252665214444_S01_GE1_1010_Sep10_1_1.txt
>>>> C
>>>> 5    4
>>>> 7  DC_18_5 US10463851_252665214446_S01_GE1_1010_Sep10_1_3.txt
>>>> T
>>>> 5   18
>>>> 8  SC_18_5 US10463851_252665214447_S01_GE1_1010_Sep10_1_4.txt
>>>> C
>>>> 5   18
>>>> 9   DC_4_6 US10463851_252665214445_S01_GE1_1010_Sep10_1_4.txt
>>>> T
>>>> 6    4
>>>> 10  SC_4_6 US10463851_252665214447_S01_GE1_1010_Sep10_1_3.txt
>>>> C
>>>> 6    4
>>>> 11 DC_18_6 US10463851_252665214448_S01_GE1_1010_Sep10_1_3.txt
>>>> T
>>>> 6   18
>>>> 12 SC_18_6 US10463851_252665214445_S01_GE1_1010_Sep10_1_3.txt
>>>> C
>>>> 6   18
>>>> 13  DC_4_7 US10463851_252665214444_S01_GE1_1010_Sep10_1_4.txt
>>>> T
>>>> 7    4
>>>> 14  SC_4_7 US10463851_252665214445_S01_GE1_1010_Sep10_1_2.txt
>>>> C
>>>> 7    4
>>>> 15 DC_18_7 US10463851_252665214447_S01_GE1_1010_Sep10_1_1.txt
>>>> T
>>>> 7   18
>>>> 16 SC_18_7 US10463851_252665214446_S01_GE1_1010_Sep10_1_1.txt
>>>> C
>>>> 7   18
>>>> 17  DC_4_8 US10463851_252665214444_S01_GE1_1010_Sep10_1_2.txt
>>>> T
>>>> 8    4
>>>> 18  SC_4_8 US10463851_252665214446_S01_GE1_1010_Sep10_1_4.txt
>>>> C
>>>> 8    4
>>>> 19 DC_18_8 US10463851_252665214445_S01_GE1_1010_Sep10_1_1.txt
>>>> T
>>>> 8   18
>>>> 20 SC_18_8 US10463851_252665214448_S01_GE1_1010_Sep10_1_1.txt
>>>> C
>>>> 8   18
>>>>
>>>>
>>>> then I create my design matrix :
>>>>
>>>>>    Donor
>>>>     [1] 4 4 4 4 5 5 5 5 6 6 6 6 7 7 7 7 8 8 8 8
>>>> Levels: 4 5 6 7 8
>>>>>    Treat=factor(Targets$Treatment,levels=c("C","T"))
>>>>>    Treat
>>>>     [1] T C T C T C T C T C T C T C T C T C T C
>>>> Levels: C T
>>>>>    Time=Treat=factor(Targets$Time)
>>>>>    Time
>>>>     [1] 4  4  18 18 4  4  18 18 4  4  18 18 4  4  18 18 4  4  18 18
>>>> Levels: 4 18
>>>>
>>>>>    design=model.matrix(~Donor+Treat+Time)
>>>>>    design
>>>>       (Intercept) Donor5 Donor6 Donor7 Donor8 Treat18 Time18
>>>> 1            1      0      0      0      0       0      0
>>>> 2            1      0      0      0      0       0      0
>>>> 3            1      0      0      0      0       1      1
>>>> 4            1      0      0      0      0       1      1
>>>> 5            1      1      0      0      0       0      0
>>>> 6            1      1      0      0      0       0      0
>>>> 7            1      1      0      0      0       1      1
>>>> 8            1      1      0      0      0       1      1
>>>> 9            1      0      1      0      0       0      0
>>>> 10           1      0      1      0      0       0      0
>>>> 11           1      0      1      0      0       1      1
>>>> 12           1      0      1      0      0       1      1
>>>> 13           1      0      0      1      0       0      0
>>>> 14           1      0      0      1      0       0      0
>>>> 15           1      0      0      1      0       1      1
>>>> 16           1      0      0      1      0       1      1
>>>> 17           1      0      0      0      1       0      0
>>>> 18           1      0      0      0      1       0      0
>>>> 19           1      0      0      0      1       1      1
>>>> 20           1      0      0      0      1       1      1
>>>> attr(,"assign")
>>>> [1] 0 1 1 1 1 2 3
>>>> attr(,"contrasts")
>>>> attr(,"contrasts")$Donor
>>>> [1] "contr.treatment"
>>>>
>>>> attr(,"contrasts")$Treat
>>>> [1] "contr.treatment"
>>>>
>>>> attr(,"contrasts")$Time
>>>> [1] "contr.treatment"
>>>>
>>>>
>>>> In this design matrix I think something is wrong, because of the
>>>> column
>>>> Treat18 is the same as Time18.
>>>> I don't understand why.
>>>> So, the following code failed, and the differential expressed genes
>>>> are
>>>> odds.
>>>>
>>>> Somebody can help me !!! Thanks all.
>>>>
>>>>
>>>>>    fit=lmFit(test_norm,design)
>>>> Coefficients not estimable: Time18
>>>> Message d'avis :
>>>> Partial NA coefficients for 34183 probe(s)
>>>>>    fit2=eBayes(fit)
>>>> Message d'avis :
>>>> In ebayes(fit = fit, proportion = proportion, stdev.coef.lim =
>>>> stdev.coef.lim,  :
>>>>      Estimation of var.prior failed - set to default value
>>>>
>>>>
>>>>>    table = topTable(fit2,1, number=5000,
>>>> p.value=0.05,adjust.method="BH",sort.by="logFC",lfc=2)
>>>>>    head(table)
>>>>                     ID    logFC  AveExpr         t      P.Value
>>>> adj.P.Val
>>>> B
>>>> 6509  A_33_P3396434 18.44159 18.41239 245.14490 1.308161e-31
>>>> 2.353520e-28
>>>> 53.41519
>>>> 22398 A_33_P3223592 18.25824 18.24591 242.75647 1.545005e-31
>>>> 2.514901e-28
>>>> 53.36821
>>>> 10771 A_33_P3244165 18.21029 18.02229  90.76191 2.796577e-24
>>>> 2.467615e-23
>>>> 44.59915
>>>> 6149  A_33_P3346552 18.14780 18.12098 207.18556 2.282464e-30
>>>> 1.147374e-27
>>>> 52.50960
>>>> 23554 A_33_P3210160 18.08158 18.21026 239.64192 1.924175e-31
>>>> 2.560908e-28
>>>> 53.30521
>>>> 20924 A_33_P3286278 18.04425 18.07312 179.72121 2.558128e-29
>>>> 5.025546e-27
>>>> 51.56876
>>>>
>>>>
>>>> Best,
>>>>
>>>> Ingrid
>>>>
>>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>
>> ______________________________________________________________________
>> The information in this email is confidential and intended solely for
>> the addressee.
>> You must not disclose, forward, print or use it without the permission
>> of the sender.
>> ______________________________________________________________________
>



______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioconductor mailing list