[BioC] testing triangular setup for finding instances of "all are different"
James W. MacDonald
jmacdon at med.umich.edu
Mon Oct 12 17:28:11 CEST 2009
Hi Wolfgang,
Maybe I am missing your point entirely, but doesn't decideTests(fit1) do
exactly what you want? You would have to modify your contrasts matrix
(as the third column isn't a contrast anyway) to something like this:
> cont <- cbind(contr.matr1[,-3], c(1,0,-1))
> colnames(cont) <- c("a-b","b-c","a-c")
> cont
a-b b-c a-c
groupsa 1 0 1
groupsb -1 1 0
groupsc 0 -1 -1
> fit1 <- eBayes(contrasts.fit(lmFit(dat,design1),cont))
> rslt <- decideTests(fit1)
> vennDiagram(rslt)
Best,
Jim
Wolfgang Raffelsberger wrote:
> Dear List,
>
> I'm trying to find out the best procedure to treat the following
> "triangular" experimental plan.
> Three groups of samples (prepared as triplicates and run on 3+3+3=9
> microarrays) are to be compared in the way where those elements/genes
> that are different in each of the three groups should be identified (ie,
> (av1 != av2) & (av1 != av3) & (av2 != av3) )
>
> If I run an ANOVA the F value won't tell me if one or two groups are
> different. From the literature I see that people then like to resolves
> this again as multiple pair-wise comparisons to find out which
> group-means are actually different ....
> Of course the simples way might be to run three (independent) pair-wise
> comparisons (HoA: av1 = av2; HoB: av1 = av3; HoC: av2 = av3). However,
> when looking at the resulting p-values I'm not taking in account the
> extra multiplicity of testing. (Of course, before testing I'd check the
> data by QC and run some filtering of the data). A work-around might be
> to append all p-values to a 3 * length(genes) vector that could be used
> to estimate FDR or local fdr and finally search elements/genes where all
> FDRs are very low (I suppose I'd have to use the max(FDR) for a given
> element/gene)
> On this list I've seen a post for a somehow similar (?) case ("Re:
> [BioC] result of linear model", 12 june 2009) suggesting to either look
> at a "p.value/ratio threshold in limma", but I don't think this would
> address my questions satisfyingly. Looking at the ratio of only 2
> pair-wise comparisons it might happen that (av1 != av2) and (av1 !=
> av3), but if (av2 = av3) the conclusion that all three are different may
> be wrong ...
>
> So, I'm asking myself if there are more direct and elegant approaches to
> this problem. I remember that in related fields a constellation called
> "trio" is somehow popular, and I'm sure others may have already thought
> about solutions for such a setup.
> Finally, the questions remains if it is possible to develop one (single)
> FDR value for estimating that a given gene might be different in each of
> the three groups.
>
> Any hints/suggestions ?
>
> Thank's in advance,
> Wolfgang Raffelsberger
>
>
> # Here an example to illustrate ...
> set.seed(2009)
> dat <- matrix(runif(180),nc=9)
> rownames(dat) <- paste("g",1:nrow(dat),sep="_")
> dat[1,] <- dat[1,] + rep(1:3,each=3) # true positive
> dat[2,] <- dat[2,] + 2*rep(1:3,each=3) # another true positive
>
> groups <- gl(3,3,labels=letters[1:3]) # organize the data in 3 groups
>
> # basic testing in limma (emprical Bayes)
> require(limma)
> (design1 <- model.matrix( ~ -1 + groups ))
> (contr.matr1 <- makeContrasts(paste(colnames(design1)[1:2],collapse="-"),
> paste(colnames(design1)[2:3],collapse="-"),
> paste(colnames(design1)[c(1:3)],collapse="-"),
> levels=design1))
> fit1 <- eBayes(contrasts.fit(lmFit(dat,design1),contr.matr1))
> topTable(fit1, coef=1,number=4) # the best of 1st pairwise
> comparison
>
> # append p-values for correcting them all together and then rearrange in
> origial setup
> combFDR <- matrix(p.adjust(as.numeric(fit1$p.value),method="BY"),
> nc=3) # or your favorite FDR estimation method
> rownames(combFDR) <- rownames(dat)
> combFDR[ order(apply(combFDR,1,min))[1:10],] # the best FDR results
> # and then the 'old' question of finding/defining a suitable threshold
> for the FDR values remains to be addressed ...
>
> # plot the best one ...
> require(lattice)
> stripplot(dat[ order(apply(combFDR,1,min))[1],],groups=groups)
>
>
> # for completeness :
> > sessionInfo()
> R version 2.9.1 (2009-06-26)
> i386-pc-mingw32
>
> locale:
> LC_COLLATE=French_France.1252;LC_CTYPE=French_France.1252;LC_MONETARY=French_France.1252;LC_NUMERIC=C;LC_TIME=French_France.1252
>
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
> other attached packages:
> [1] lattice_0.17-25 limma_2.18.2
> loaded via a namespace (and not attached):
> [1] grid_2.9.1 tools_2.9.1
>
>
> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
> Wolfgang Raffelsberger, PhD
> Laboratoire de BioInformatique et Génomique Intégratives
> CNRS UMR7104, IGBMC, 1 rue Laurent Fries, 67404 Illkirch Strasbourg,
> France
> Tel (+33) 388 65 3300 Fax (+33) 388 65 3276
> wolfgang.raffelsberger (at) igbmc.fr
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826
More information about the Bioconductor
mailing list