[BioC] testing triangular setup for finding instances of "all are different"

Mon Oct 12 17:28:11 CEST 2009

Hi Wolfgang,

Maybe I am missing your point entirely, but doesn't decideTests(fit1) do 
exactly what you want? You would have to modify your contrasts matrix 
(as the third column isn't a contrast anyway) to something like this:

 > cont <- cbind(contr.matr1[,-3], c(1,0,-1))
 > colnames(cont) <- c("a-b","b-c","a-c")
 > cont
         a-b b-c a-c
groupsa   1   0   1
groupsb  -1   1   0
groupsc   0  -1  -1

 > fit1 <- eBayes(contrasts.fit(lmFit(dat,design1),cont))
 > rslt <- decideTests(fit1)
 > vennDiagram(rslt)

Best,

Jim

Wolfgang Raffelsberger wrote:
> Dear List,
> 
> I'm trying to find out the best procedure to treat the following 
> "triangular" experimental plan.
> Three groups of samples (prepared as triplicates and run on 3+3+3=9 
> microarrays) are to be compared in the way where those elements/genes 
> that are different in each of the three groups should be identified (ie, 
> (av1 != av2) & (av1 != av3) & (av2 != av3) )
> 
> If I run an ANOVA the F value won't tell me if one or two groups are 
> different.  From the literature I see that people then like to resolves 
> this again as multiple pair-wise comparisons to find out which 
> group-means are actually different ....
> Of course the simples way might be to run three (independent) pair-wise 
> comparisons (HoA: av1 = av2; HoB: av1 = av3; HoC: av2 = av3). However, 
> when looking at the resulting p-values I'm not taking in account the 
> extra multiplicity of testing. (Of course, before testing I'd check the 
> data by QC and run some filtering of the data). A work-around might be 
> to append all p-values to a 3 * length(genes) vector that could be used 
> to estimate FDR or local fdr and finally search elements/genes where all 
> FDRs are very low (I suppose I'd have to use the max(FDR) for a given 
> element/gene)
> On this list I've seen a post for a somehow similar (?) case ("Re: 
> [BioC] result of linear model", 12 june 2009) suggesting to either look 
> at a "p.value/ratio threshold in limma", but I don't think this would 
> address my questions satisfyingly. Looking at the ratio of only 2 
> pair-wise comparisons it might happen that (av1 != av2) and (av1 != 
> av3), but if (av2 = av3) the conclusion that all three are different may 
> be wrong ...
> 
> So, I'm asking myself if there are more direct and elegant approaches to 
> this problem. I remember that in related fields a constellation called 
> "trio" is somehow popular, and I'm sure others may have already thought 
> about solutions for such a setup.
> Finally, the questions remains if it is possible to develop one (single) 
> FDR value for estimating that a given gene might be different in each of 
> the three groups.
> 
> Any hints/suggestions ?
> 
> Thank's in advance,
> Wolfgang Raffelsberger
> 
> 
> # Here an example to illustrate ...
> set.seed(2009)
> dat <- matrix(runif(180),nc=9)
> rownames(dat) <- paste("g",1:nrow(dat),sep="_")
> dat[1,] <- dat[1,] + rep(1:3,each=3)       # true positive
> dat[2,] <- dat[2,] + 2*rep(1:3,each=3)     # another true positive
> 
> groups <- gl(3,3,labels=letters[1:3])      # organize the data in 3 groups
> 
> # basic testing in limma (emprical Bayes)
> require(limma)
> (design1 <- model.matrix( ~ -1 + groups ))
> (contr.matr1 <- makeContrasts(paste(colnames(design1)[1:2],collapse="-"),
>   paste(colnames(design1)[2:3],collapse="-"),
>   paste(colnames(design1)[c(1:3)],collapse="-"),
>   levels=design1))
> fit1 <- eBayes(contrasts.fit(lmFit(dat,design1),contr.matr1))
>   topTable(fit1, coef=1,number=4)        # the best of 1st pairwise 
> comparison
> 
> # append p-values for correcting them all together and then rearrange in 
> origial setup
> combFDR <- matrix(p.adjust(as.numeric(fit1$p.value),method="BY"), 
> nc=3)   # or your favorite FDR estimation method
>   rownames(combFDR) <- rownames(dat)
> combFDR[ order(apply(combFDR,1,min))[1:10],]        # the best FDR results
> # and then the 'old' question of finding/defining a suitable threshold 
> for the FDR values remains to be addressed ...
> 
> # plot the best one ...
> require(lattice)
> stripplot(dat[ order(apply(combFDR,1,min))[1],],groups=groups)
> 
> 
> # for completeness :
>  >  sessionInfo()
> R version 2.9.1 (2009-06-26)
> i386-pc-mingw32
> 
> locale:
> LC_COLLATE=French_France.1252;LC_CTYPE=French_France.1252;LC_MONETARY=French_France.1252;LC_NUMERIC=C;LC_TIME=French_France.1252 
> 
> 
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base   
> other attached packages:
> [1] lattice_0.17-25 limma_2.18.2 
> loaded via a namespace (and not attached):
> [1] grid_2.9.1  tools_2.9.1
> 
> 
> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
> Wolfgang Raffelsberger, PhD
> Laboratoire de BioInformatique et Génomique Intégratives
> CNRS UMR7104, IGBMC,  1 rue Laurent Fries,  67404 Illkirch  Strasbourg,  
> France
> Tel (+33) 388 65 3300         Fax (+33) 388 65 3276
> wolfgang.raffelsberger (at) igbmc.fr
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: 
> http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826