[BioC] Changing results depending on context
January Weiner
january.weiner at mpiib-berlin.mpg.de
Tue Jun 12 13:15:44 CEST 2012
Dear all,
I am evaluating a collection of two-color arrays. There are several
sets corresponding to different tissues, there is a treatment (common
for each tissue), and the treatment effect is analysed independently
in each of the tissues. I am analysing the data using limma.
The results in terms of the number of significant genes vary widely
depending whether the tissues are analysed separately or jointly.
Since the matter is even a bit more complicated, I include only
pseudocode below. I guess that this depends on the background
variation estimation in eBayes, but the trend is not consistent, e.g.
for one tissue, more (many more!) significant results are found when
analysed separately, and for another tissue the results are much
"better" (ie. more singificant results) if analysed along other data.
targets <- readTargets( "targets.txt" )
# Agilent two-color microarrays
rg.1 <- read.maimages( targets, columns= list( G= "gMedianSignal",
Gb= "gBGMedianSignal", R="rMedianSignal", Rb = "rBGMedianSignal"),
annotation = c("Row", "Col","FeatureNum", "ControlType","ProbeName",
"GeneName", "SystematicName", "Description" ))
rg.2 <- backgroundCorrect( rg.1, method= "normexp", offset=50 )
rg.3 <- normalizeWithinArrays( rg.2, method= "loess" )
rg <- normalizeBetweenArrays( rg.3, method= "quantile" )
rg <- rg[ rg$genes$ControlType == 0, ]
# analysing all data together:
t2 <- targetsA2C( targets )
design <- model.matrix( ~ 0 + t2$Target )
colnames( design ) <- levels( t2$Target )
corfit <- intraspotCorrelation( rg, design )
fit <- lmscFit( rg, design, correlation= corfit$consensus.correlation )
cmtx <- makeContrasts( "A.T1-A.T2", "B.T1-B.T2", levels= design )
fit <- contrasts.fit( fit, cmtx )
fit <- eBayes( fit )
Above, A and B are tissues, T1 and T2 are treatments.
Here, some exemplary results showing the number of significant genes:
adj.P.Val < 0.05:
A.T1-A.T2: 5601
B.T1-B.T2: 3914
adj.P.Val < 1e-5
A.T1-A.T2: 672
B.T1-B.T2: 758
Now, I analyse the data separately:
A.targets <- targets[ 1:16, ]
A.rg <- rg[ 1:16, ]
A.t2 <- targetsA2C( A.targets )
A.design <- model.matrix( ~ 0 + A.t2$Target )
colnames( A.design ) <- levels( A.t2$Target )
A.corfit <- intraspotCorrelation( A.rg, A.design )
A.fit <- lmscFit( A.rg, A.design, correlation= A.corfit$consensus.correlation )
A.cmtx <- makeContrasts( "A.T1-A.T2", levels= A.design )
A.fit <- contrasts.fit( A.fit, cmtx )
A.fit <- eBayes( A.fit )
B.targets <- targets[ 17:32, ]
B.rg <- rg[ 17:32, ]
B.t2 <- targetsA2C( B.targets )
B.design <- model.matrix( ~ 0 + B.t2$Target )
colnames( B.design ) <- levels( B.t2$Target )
B.corfit <- intraspotCorrelation( B.rg, B.design )
B.fit <- lmscFit( B.rg, B.design, correlation= B.corfit$consensus.correlation )
B.cmtx <- makeContrasts( "B.T1-B.T2", levels= B.design )
B.fit <- contrasts.fit( B.fit, cmtx )
B.fit <- eBayes( B.fit )
Numbers of significant data are now much, much different:
adj.P.Val < 0.05:
A.T1-A.T2: 3347
B.T1-B.T2: 5443
adj.P.Val < 1e-5
A.T1-A.T2: 108
B.T1-B.T2: 1102
six times less for tissue A, but almost 50% more for tissue B! How can this be?
The log fold changes are stable (i.e., they do not change between the
various sets), which is to be expected -- changing the context
influences the moderated t statistics, but not the estimation of log
fold change. However, the p-values are not simply smaller or larger,
there is little correlation of the p-values from one context to
another. I'm a bit lost in here.
Kind regards,
January
--
-------- Dr. January Weiner 3 --------------------------------------
Max Planck Institute for Infection Biology
Charitéplatz 1
D-10117 Berlin, Germany
Web : www.mpiib-berlin.mpg.de
Tel : +49-30-28460514
Fax : +49-30-28450505
More information about the Bioconductor
mailing list