[BioC] Limma design : What is the "best" method to infer biological replicates during differentially expression analysis

Gordon K Smyth smyth at wehi.EDU.AU
Thu Feb 20 03:01:59 CET 2014


Dear Santy,

The first method is twoway anova, a generalization of a paired analysis.

The second method is a random effects approach in which the intra-donor 
correlation is incorporated into the covariance matrix instead of the 
linear predictor.

Both are good methods.  The twoway anova approach makes fewer assumptions 
but the random effects approach is statistically more powerful, 
particularly for unbalanced designs.

For a balanced design in which all donors receive all stimuli, the twoway 
anovao approach is virtually as powerful as the random effects approach 
and hence is preferable.

For an unbalanced design in which each donor receives only a subset of the 
stimula, the random effects approach is more powerful.

Your experiment is almost completely balanced -- there is just one missing 
stimuli for one donor.  Hence I would use the twoway anova approach.

Best wishes
Gordon


> Date: Tue, 18 Feb 2014 12:05:41 +0100
> From: Santy Marques-Ladeira <santy.marquesladeira at gmail.com>
> To: bioconductor at r-project.org
> Subject: [BioC] Limma design : What is the "best" method to infer
> 	biological replicates during differentially expression analysis
>
> Dear Limma users,
>
>
>
>
> I'm currently working on Affymetrix microarray (Hu-gene 2.1) with only one
> channel. I got 8 experimental stimuli conducted on 7 donors. During the
> experimental process each stimulus was conducted at the same time for all
> the donors so I assume there are no technical replicates, but I need to
> take in account the "donor effect" (or biological replicates).
>
> In the "Limma" documentation there are 2 ways of including this "effect"
> but it is still not clear to me what is the best choice, and the 2 methods
> gave me 86% of correlation for a pairwise comparison test.
>
>
>
> *1/The first method I used was including the donors in my design model :*
>
> # I build a matrix with 2 columns (Stimulus + Donor)
> # Each row is an array
> sample = matrix(0, nrow=55, ncol=2)
> rownames(sample) <- colnames("normalised data I use to perform the
> analysis")
> colnames(sample) <- c("Stimulus", "Donor")
>
> # In the column Stimulus there is directly the name of the stimulus tested
> sample[1:7,1] = "Stim1"
> sample[8:14,1] = "Stim2"
> sample[15:20,1] = "Stim3"
> sample[21:27,1] = "Stim4"
> sample[28:34,1] = "Stim5"
> sample[35:41,1] = "Stim6"
> sample[42:48,1] = "Stim7"
> sample[49:55,1] = "Stim8"
> # In the column Donor there is the number of the donor (from 2 to 8)
> sample[,2] = c(rep(c(2,3,4,5,6,7,8),2),3,4,5,6,7,8,rep(c(2,3,4,5,6,7,8),5))
> targets = as.data.frame(sample)
>
> Stimulus = factor(targets$Stimulus, levels = c("Stim1", "Stim2", "Stim3",
> "Stim4", "Stim5", "Stim6", "Stim7", "Stim8"))
> Donor = factor(targets$Donor, levels = c("2", "3", "4", "5", "6", "7", "8"))
>
> # I use the model.matrix() function to infer the design based on my matrix
> design <- model.matrix(~-1+Stimulus+Donor)
> rownames(design) <- colnames("normalised data I use to perform the
> analysis")
> colnames(design)[1:8] <- c("Stim1", "Stim2", "Stim3", "Stim4", "Stim5",
> "Stim6", "Stim7", "Stim8")
>
> #This gives me :
> #                      Stim1   Stim2   Stim3   ...   ...   ...   Donor3
> Donor4   ...   ...   Donor8
> #experience1        1          0         0
> 0            0                       0
> #experience2        1          0         0
> 1            0                       0
> #experience3        1          0         0
> 0            1                       0
> #...
> #...
> #...
>
> First thing is that because I remove for a stimulus the donor2, this donor
> is removed from the design model (maybe because of dimension for regression
> ?)
>
> Then I do the usual commands :
> fit <- lmFit("normalised data I use to perform the analysis", design)
> cont.matrix<-makeContrasts(Stim2vsStim4=Stim2-Stim4, levels=design)
> fit2<-contrasts.fit(fit, cont.matrix)
> fit2 <- eBayes(fit2)
> anadiff.limma.Stim2<-topTable(fit2, number=1000, adjust="BH")
>
>
> *2/The second method I used is by treating donors as replicates separately:*
>
> I used the exact same "targets" data.frame, except there is no more the
> Donor column but only the Stimulus column
> [...]
> design <- model.matrix(~-1+Stimulus)
> [...]
>
> # List of donors saved as a variable for blocks
> biolrep = c(rep(c(2,3,4,5,6,7,8),2),3,4,5,6,7,8,rep(c(2,3,4,5,6,7,8),5))
>
> # I used the duplicateCorrelation() function to take in account the link
> between donors
> corfit<-duplicateCorrelation(c031, design, ndups=1, block=biolrep)
>
> Then I do the usual commands (incorporating the donor information) :
> fit<-lmFit(c031, design, block=biolrep, cor=corfit$consensus)
> cont.matrix<-makeContrasts(Stim2vsStim4=Stim2-Stim4, levels=design)
> fit2<-contrasts.fit(fit, cont.matrix)
> fit2<-eBayes(fit2)
> anadiff.limma.Stim2<-topTable(fit2, number=1000, adjust="BH")
>
>
>
> When I check the results and compare the list of the 1000 genes given by
> each method, there are 857 genes in common. The fact that I found the use
> of these two methods on this site doesn't helps me much about choosing one.
> Does someone got an idea about what method I should use for biological
> replicates ? (And by the way if someone got an answer more precise about
> why Donor2 is excluded, that should be nice)
>
>
>
>
> Best regards,
> Santy

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioconductor mailing list