[BioC] Changing results depending on context

Tue Jun 12 13:15:44 CEST 2012

Dear all,

I am evaluating a collection of two-color arrays. There are several
sets corresponding to different tissues, there is a treatment (common
for each tissue), and the treatment effect is analysed independently
in each of the tissues. I am analysing the data using limma.

The results in terms of the number of significant genes vary widely
depending whether the tissues are analysed separately or jointly.
Since the matter is even a bit more complicated, I include only
pseudocode below. I guess that this depends on the background
variation estimation in eBayes, but the trend is not consistent, e.g.
for one tissue, more (many more!) significant results are found when
analysed separately, and for another tissue the results are much
"better" (ie. more singificant results) if analysed along other data.

  targets <- readTargets( "targets.txt" )
# Agilent two-color microarrays
  rg.1 <- read.maimages( targets, columns= list( G= "gMedianSignal",
Gb=  "gBGMedianSignal", R="rMedianSignal",   Rb = "rBGMedianSignal"),
annotation = c("Row", "Col","FeatureNum", "ControlType","ProbeName",
"GeneName", "SystematicName", "Description" ))
  rg.2 <- backgroundCorrect( rg.1, method= "normexp", offset=50 )
  rg.3 <- normalizeWithinArrays( rg.2, method= "loess" )
  rg <- normalizeBetweenArrays( rg.3, method= "quantile" )
  rg <- rg[ rg$genes$ControlType == 0, ]

# analysing all data together:
  t2 <- targetsA2C( targets )
  design <- model.matrix( ~ 0 + t2$Target )
  colnames( design ) <- levels( t2$Target )
  corfit <- intraspotCorrelation( rg, design )
  fit <- lmscFit( rg, design, correlation= corfit$consensus.correlation )
  cmtx <- makeContrasts( "A.T1-A.T2", "B.T1-B.T2", levels= design )
  fit <- contrasts.fit( fit, cmtx )
  fit <- eBayes( fit )

Above, A and B are tissues, T1 and T2 are treatments.

Here, some exemplary results showing the number of significant genes:

adj.P.Val < 0.05:
A.T1-A.T2: 5601
B.T1-B.T2: 3914

adj.P.Val < 1e-5
A.T1-A.T2: 672
B.T1-B.T2: 758

Now, I analyse the data separately:

A.targets <- targets[ 1:16, ]
A.rg <- rg[ 1:16, ]
A.t2 <- targetsA2C( A.targets )
A.design <- model.matrix( ~ 0 + A.t2$Target )
colnames( A.design ) <- levels( A.t2$Target )
A.corfit <- intraspotCorrelation( A.rg, A.design )
A.fit <- lmscFit( A.rg, A.design, correlation= A.corfit$consensus.correlation )
A.cmtx <- makeContrasts( "A.T1-A.T2", levels= A.design )
A.fit <- contrasts.fit( A.fit, cmtx )
A.fit <- eBayes( A.fit )

B.targets <- targets[ 17:32, ]
B.rg <- rg[ 17:32, ]
B.t2 <- targetsA2C( B.targets )
B.design <- model.matrix( ~ 0 + B.t2$Target )
colnames( B.design ) <- levels( B.t2$Target )
B.corfit <- intraspotCorrelation( B.rg, B.design )
B.fit <- lmscFit( B.rg, B.design, correlation= B.corfit$consensus.correlation )
B.cmtx <- makeContrasts( "B.T1-B.T2", levels= B.design )
B.fit <- contrasts.fit( B.fit, cmtx )
B.fit <- eBayes( B.fit )

Numbers of significant data are now much, much different:

adj.P.Val < 0.05:
A.T1-A.T2: 3347
B.T1-B.T2: 5443

adj.P.Val < 1e-5
A.T1-A.T2: 108
B.T1-B.T2: 1102

six times less for tissue A, but almost 50% more for tissue B! How can this be?

The log fold changes are stable (i.e., they do not change between the
various sets), which is to be expected -- changing the context
influences the moderated t statistics, but not the estimation of log
fold change. However, the p-values are not simply smaller or larger,
there is little correlation of the p-values from one context to
another. I'm a bit lost in here.

Kind regards,

January

-- 
-------- Dr. January Weiner 3 --------------------------------------
Max Planck Institute for Infection Biology
Charitéplatz 1
D-10117 Berlin, Germany
Web   : www.mpiib-berlin.mpg.de
Tel     : +49-30-28460514
Fax    : +49-30-28450505