[BioC] Two colour arrays - advice on unconnected design, batch effect
Gordon K Smyth
smyth at wehi.EDU.AU
Tue Oct 27 06:00:18 CET 2009
Hi Katrina,
Hello from down the road.
> Date: Mon, 26 Oct 2009 13:42:30 +1100
> From: "Katrina Bell" <katrina.bell at mcri.edu.au>
> Subject: [BioC] Two colour arrays - advice on unconnected design,
> batch effect
> To: <bioconductor at stat.math.ethz.ch>
>
>
> Dear All,
>
> I have limited experience with analysis of two colour arrays and would
> appreciate your thoughts on the follow design matrix I have constructed
> and ways to deal with batch effects. It is an unconnected design, of 5
> conditions each with their own reference. I should say, these are
> agilent 44k mouse arrays.
>
> There are 39 arrays in total. There are technical dye swaps (which I
> know aren't the best- but its what I have got) and these arrays have
> been performed in 3 lots (so 3 batches). I have attempted to take care
> of the technical replicates (same mouse/RNA, just labelled in reverse
> for a dye swap) using the block function in lmFit
>
> Targets
> SlideNumber Cy3 Cy5 Batch Bioreps
> 1 WTCL WTCR 1 1
> 2 CoffeeAL CoffeeAR 1 2
> 3 CoffeeCL CoffeeCR 1 3
> 4 WTBL WTBR 1 4
> 5 WTBR WTBL 1 4
> 6 CoffeeAR CoffeeAL 1 2
> 7 WTCR WTCL 1 1
> 8 CoffeeCL CoffeeCR 1 5
> 9 WTBL WTBR 1 6
> 10 WTAL WTAR 1 7
> 11 CoffeeAL CoffeeAR 1 8
> 12 CoffeeAR CoffeeAL 1 8
> 13 WTAR WTAL 1 7
> 14 WTBR WTBL 1 6
> 15 CoffeeCR CoffeeCL 1 5
> 16 WTAL WTAR 1 9
> 17 WTCR WTCL 1 10
> 18 CoffeeCR CoffeeCL 1 3
> 19 WTAR WTAL 1 9
> 20 WTBR WTBL 2 11
> 21 WTBL WTBR 2 11
> 22 WTCR WTCL 2 12
> 23 WTCL WTCR 2 12
> 24 WTAR WTAL 2 13
> 25 WTAL WTAR 2 13
> 26 CoffeeCR CoffeeCL 2 14
> 27 CoffeeCL CoffeeCR 2 14
> 28 CoffeeAR CoffeeAL 2 15
> 29 CoffeeAL CoffeeAR 2 15
> 30 WTBR WTBL 3 16
> 31 WTBL WTBR 3 16
> 32 WTCR WTCL 3 17
> 33 WTCL WTCR 3 17
> 34 WTAR WTAL 3 18
> 35 WTAL WTAR 3 18
> 36 CoffeeCR CoffeeCL 3 19
> 37 CoffeeCL CoffeeCR 3 19
> 38 CoffeeAR CoffeeAL 3 20
> 39 CoffeeAL CoffeeAR 3 20
>
>
>
> RG <- read.maimages(target, source= "agilent", path="ArrayFiles")
> RG <- backgroundCorrect(RG, method="subtract")
>
> I also tried using normexp, offset 50, but a couple of my arrays M
> values really constricted after this...
>
> RG$genes$Status <-controlStatus(spottypes, RG)
> Matching patterns for: ControlType GeneName
> Found 43379 probe
> Found 604 DarkCorner
> Found 14 GE_BrightCorner
> Found 1486 controls
> Setting attributes: values Color
>> w <-modifyWeights(array(1,dim(RG)), RG$genes$Status, c("BrightCorner", "DarkCorner"), c(0,0))
>
> bioreps<-c(1,2,3,4,4,2,1,5,6,7,8,8,7,6,5,9,10,3,9,11,11,12,12,13,13,14,14,15,15,16,16,17,17,18,18,19,19,20,20 )
> MA <-normalizeWithinArrays(RG, weights=w, method='loess')
> MA<-normalizeBetweenArrays(MA, method="Aquantile")
>
> MA.avg <-avereps(MA, ID=MA$genes$ProbeName)
>
> corfit<-duplicateCorrelation(MA.avg, block=biorep)
Need to add the design matrix to the duplicateCorrelation() call.
>> corfit$consensus
> [1] -0.812968
>
> As this is an unconnected design, I followed Gordon's advice in another
> posting and made my own design matrix.
>
>
>> design
> Dye WTAR WTBR WTCR CoffeeAR CoffeeCr
> [1,] 1 0 0 -1 0 0
> [2,] 1 0 0 0 -1 0
> [3,] 1 0 0 0 0 -1
> [4,] 1 0 -1 0 0 0
> [5,] 1 0 1 0 0 0
> [6,] 1 0 0 0 1 0
> [7,] 1 0 0 1 0 0
> [8,] 1 0 0 0 0 -1
> [9,] 1 0 -1 0 0 0
> [10,] 1 -1 0 0 0 0
> [11,] 1 0 0 0 -1 0
> [12,] 1 0 0 0 1 0
> [13,] 1 1 0 0 0 0
> [14,] 1 0 1 0 0 0
> [15,] 1 0 0 0 0 1
> [16,] 1 -1 0 0 0 0
> [17,] 1 0 0 1 0 0
> [18,] 1 0 0 0 0 -1
> [19,] 1 1 0 0 0 0
> [20,] 1 0 1 0 0 0
> [21,] 1 0 -1 0 0 0
> [22,] 1 0 0 1 0 0
> [23,] 1 0 0 -1 0 0
> [24,] 1 1 0 0 0 0
> [25,] 1 -1 0 0 0 0
> [26,] 1 0 0 0 0 1
> [27,] 1 0 0 0 0 -1
> [28,] 1 0 0 0 1 0
> [29,] 1 0 0 0 -1 0
> [30,] 1 0 1 0 0 0
> [31,] 1 0 -1 0 0 0
> [32,] 1 0 0 1 0 0
> [33,] 1 0 0 -1 0 0
> [34,] 1 1 0 0 0 0
> [35,] 1 -1 0 0 0 0
> [36,] 1 0 0 0 0 1
> [37,] 1 0 0 0 0 -1
> [38,] 1 0 0 0 1 0
> [39,] 1 0 0 0 -1 0
>
>
> fit<- lmFit(MA.avg,design, block=bioreps, cor=corfit$consensus)
> fit2 <-eBayes(fit)
> WTAR<- topTable(fit2, coef=2, adjust="BH")
>
>
> Is it sensible to make a coefficent for each of the batches in my design
> with my set of arrays? So three extra columns? I am unsure if I have
> enough information in my arrays for this, and I would appreciated your
> advice/ suggestions. I am especially concerned about how to treat the
> batch effect as the second batch has some background hybridisation
> issues from looking at the FE array images. Although they look OK on
> the QC in limma- just more constricted M values than the other arrays, I
> am concerned about them. I did remove the whole batch and ran the
> analysis with the remaining 29 arrays to gauge what effect they were
> having on the analysis and found that I got even less statistically
> significant genes.
>
> So, my questions are;
> 1. is the design matrix I constructed OK ?
Looks ok, although haven't checked every line. Each coefficient is
measuring the R vs L effect for a specific combination of treatments.
Assume that's what you want.
> 2. How can I deal with the batch effect in my set off arrays.
Add it to your design matrix, i.e., an indicator column for batch 2 and
one for batch 3.
Best wishes
Gordon
> 3. Any other comments welcome!
>
> Thanks for any help you are able to give.
> Cheers
> Katrina
More information about the Bioconductor
mailing list