[BioC] Two colour arrays - advice on unconnected design, batch effect

Tue Oct 27 06:00:18 CET 2009

Hi Katrina,

Hello from down the road.

> Date: Mon, 26 Oct 2009 13:42:30 +1100
> From: "Katrina Bell" <katrina.bell at mcri.edu.au>
> Subject: [BioC] Two colour arrays - advice on unconnected design,
> 	batch effect
> To: <bioconductor at stat.math.ethz.ch>
>
>
> Dear All,
>
> I have limited experience with analysis of two colour arrays and would 
> appreciate your thoughts on the follow design matrix I have constructed 
> and ways to deal with batch effects. It is an unconnected design, of 5 
> conditions each with their own reference. I should say, these are 
> agilent 44k mouse arrays.
>
> There are 39 arrays in total. There are technical dye swaps (which I 
> know aren't the best- but its what I have got) and these arrays have 
> been performed in 3 lots (so 3 batches). I have attempted to take care 
> of the technical replicates (same mouse/RNA, just labelled in reverse 
> for a dye swap) using the block function in lmFit
>
> Targets
> SlideNumber  Cy3 Cy5 Batch Bioreps
> 1  WTCL WTCR 1 1
> 2  CoffeeAL CoffeeAR 1 2
> 3  CoffeeCL CoffeeCR 1 3
> 4  WTBL WTBR 1 4
> 5  WTBR WTBL 1 4
> 6  CoffeeAR CoffeeAL 1 2
> 7  WTCR WTCL 1 1
> 8  CoffeeCL CoffeeCR 1 5
> 9  WTBL WTBR 1 6
> 10  WTAL WTAR 1 7
> 11  CoffeeAL CoffeeAR 1 8
> 12  CoffeeAR CoffeeAL 1 8
> 13  WTAR WTAL 1 7
> 14  WTBR WTBL 1 6
> 15  CoffeeCR CoffeeCL 1 5
> 16  WTAL WTAR 1 9
> 17  WTCR WTCL 1 10
> 18  CoffeeCR CoffeeCL 1 3
> 19  WTAR WTAL 1 9
> 20  WTBR WTBL 2 11
> 21  WTBL WTBR 2 11
> 22  WTCR WTCL 2 12
> 23  WTCL WTCR 2 12
> 24  WTAR WTAL 2 13
> 25  WTAL WTAR 2 13
> 26  CoffeeCR CoffeeCL 2 14
> 27  CoffeeCL CoffeeCR 2 14
> 28  CoffeeAR CoffeeAL 2 15
> 29  CoffeeAL CoffeeAR 2 15
> 30  WTBR WTBL 3 16
> 31  WTBL WTBR 3 16
> 32  WTCR WTCL 3 17
> 33  WTCL WTCR 3 17
> 34  WTAR WTAL 3 18
> 35  WTAL WTAR 3 18
> 36  CoffeeCR CoffeeCL 3 19
> 37  CoffeeCL CoffeeCR 3 19
> 38  CoffeeAR CoffeeAL 3 20
> 39  CoffeeAL CoffeeAR 3 20
>
>
>
> RG <- read.maimages(target, source= "agilent", path="ArrayFiles")
> RG <- backgroundCorrect(RG, method="subtract")
>
> I also tried using normexp, offset 50, but a couple of my arrays M 
> values really constricted after this...
>
> RG$genes$Status <-controlStatus(spottypes, RG)
> Matching patterns for: ControlType GeneName
> Found 43379 probe
> Found 604 DarkCorner
> Found 14 GE_BrightCorner
> Found 1486 controls
> Setting attributes: values Color
>> w <-modifyWeights(array(1,dim(RG)), RG$genes$Status, c("BrightCorner", "DarkCorner"), c(0,0))
>
> bioreps<-c(1,2,3,4,4,2,1,5,6,7,8,8,7,6,5,9,10,3,9,11,11,12,12,13,13,14,14,15,15,16,16,17,17,18,18,19,19,20,20 )
> MA <-normalizeWithinArrays(RG, weights=w, method='loess')
> MA<-normalizeBetweenArrays(MA, method="Aquantile")
>
> MA.avg <-avereps(MA, ID=MA$genes$ProbeName)
>
> corfit<-duplicateCorrelation(MA.avg, block=biorep)

Need to add the design matrix to the duplicateCorrelation() call.

>> corfit$consensus
> [1] -0.812968
>
> As this is an unconnected design, I followed Gordon's advice in another 
> posting and made my own design matrix.
>
>
>> design
>      Dye WTAR WTBR WTCR CoffeeAR CoffeeCr
> [1,]   1    0    0   -1        0        0
> [2,]   1    0    0    0       -1        0
> [3,]   1    0    0    0        0       -1
> [4,]   1    0   -1    0        0        0
> [5,]   1    0    1    0        0        0
> [6,]   1    0    0    0        1        0
> [7,]   1    0    0    1        0        0
> [8,]   1    0    0    0        0       -1
> [9,]   1    0   -1    0        0        0
> [10,]   1   -1    0    0        0        0
> [11,]   1    0    0    0       -1        0
> [12,]   1    0    0    0        1        0
> [13,]   1    1    0    0        0        0
> [14,]   1    0    1    0        0        0
> [15,]   1    0    0    0        0        1
> [16,]   1   -1    0    0        0        0
> [17,]   1    0    0    1        0        0
> [18,]   1    0    0    0        0       -1
> [19,]   1    1    0    0        0        0
> [20,]   1    0    1    0        0        0
> [21,]   1    0   -1    0        0        0
> [22,]   1    0    0    1        0        0
> [23,]   1    0    0   -1        0        0
> [24,]   1    1    0    0        0        0
> [25,]   1   -1    0    0        0        0
> [26,]   1    0    0    0        0        1
> [27,]   1    0    0    0        0       -1
> [28,]   1    0    0    0        1        0
> [29,]   1    0    0    0       -1        0
> [30,]   1    0    1    0        0        0
> [31,]   1    0   -1    0        0        0
> [32,]   1    0    0    1        0        0
> [33,]   1    0    0   -1        0        0
> [34,]   1    1    0    0        0        0
> [35,]   1   -1    0    0        0        0
> [36,]   1    0    0    0        0        1
> [37,]   1    0    0    0        0       -1
> [38,]   1    0    0    0        1        0
> [39,]   1    0    0    0       -1        0
>
>
> fit<- lmFit(MA.avg,design, block=bioreps, cor=corfit$consensus)
> fit2 <-eBayes(fit)
> WTAR<- topTable(fit2, coef=2, adjust="BH")
>
>
> Is it sensible to make a coefficent for each of the batches in my design 
> with my set of arrays? So three extra columns? I am unsure if I have 
> enough information in my arrays for this, and I would appreciated your 
> advice/ suggestions.  I am especially concerned about how to treat the 
> batch effect as the second batch has some background hybridisation 
> issues from looking at the FE array images.  Although they look OK on 
> the QC in limma- just more constricted M values than the other arrays, I 
> am concerned about them. I did remove the whole batch and ran the 
> analysis with the remaining 29 arrays to gauge what effect they were 
> having on the analysis and found that I got even less statistically 
> significant genes.
>
> So, my questions are;
> 1. is the design matrix I constructed OK ?

Looks ok, although haven't checked every line.  Each coefficient is 
measuring the R vs L effect for a specific combination of treatments. 
Assume that's what you want.

> 2. How can I deal with the batch effect in my set off arrays.

Add it to your design matrix, i.e., an indicator column for batch 2 and 
one for batch 3.

Best wishes
Gordon

> 3. Any other comments welcome!
>
> Thanks for any help you are able to give.
> Cheers
> Katrina