[BioC] design matrix with technical and biologial replicates
James W. MacDonald
jmacdon at uw.edu
Wed Apr 18 15:42:57 CEST 2012
Hi Manuela,
On 4/18/2012 7:52 AM, Manuela Di Russo wrote:
> Dear list,
>
> I'm working with microarray expression data and I am using limma to detect
> differentially expressed genes. I have some questions about the design
> matrix and the handling of biological and technical replicates.
>
> The target file is:
>
> Sample_name sample_type sample_replicate
> disease_status
>
> MPM_07 1
> 1 1
>
> MPM_08 1
> 2 1
>
> MPM_09 1
> 3 1
>
> MPM_10_a 1
> 4 1
>
> MPM_10_b 1
> 4 1
>
> MPM_11 1
> 5 1
>
> MPM_12 1
> 6 1
>
> PP_01_a 2
> 7 0
>
> PP_01_b 2
> 7 0
>
> PP_02 2
> 8 0
>
> PP_03 2
> 9 0
>
> PP_04 2
> 10 0
>
> PP_05 2
> 11 0
>
> PP_06 2
> 12 0
>
> PV_02 3
> 13 0
>
> PV_03 3
> 14 0
>
> PV_04 3
> 15 0
>
> PV_05 3
> 16 0
>
> Each sample is hybridized on an Affymetrix HG-U133-Plus2 array.
>
> So I have 7 mesothelioma samples (sample_type=1) where 2 were from the same
> patient (MPM_10 a e b)), 7 parietal pleural samples (sample_type= 2) where 2
> were from the same patient (PP_01 a e b) and 4 visceral pleural samples
> (sample_type= 3). In reality 4 parietal pleural samples (PP_02,PP_03,PP_04
> and PP_05) and 4 visceral pleural samples (PV_02,PV_03,PV_04 and PV_05) come
> from the same patients.
>
> pd<- data.frame(sample_type= c(rep(1,7),rep(2,7),rep(3,4)),
> sample_replicate = c(1:4,4,5,6,7,7,8:12,13:16),
> disease_status=c(rep(1,7),rep(0,11)))
>
> biolrep<-pd$sample_replicate
>
> f<- factor(pd$sample_type)
>
> design<- model.matrix(~0+f)
>
> colnames(design)<- c("MPM", "PP", "PV")
>
> I tried to handle technical replicates using the block argument of function
> duplicatecorrelation() as follows:
I don't think you can use duplicateCorrelation() here, as you don't have
duplicates for all samples. I believe lmFit() with a cor argument will
fit a block diagonal correlation matrix, which is clearly not applicable
here. I may be in error however, in which case Gordon Smyth will surely
post a correction around 5-6 pm EDT or so.
With a mixture of duplicated and not duplicated samples, you will
likely have to do one of two less than ideal things. First, you could
simply ignore the duplication, and analyze as if the duplicates were
independent samples. This is less than ideal because there will be a
correlation between these samples, which will tend to lower your
estimate of intra-sample variation.
Second, you could compute means of the duplicates and then use those in
lieu of the original data. Again, this is not ideal, as the means will
have an intrinsically lower variance than individual samples. All things
equal, this is probably the better way to go.
Best,
Jim
>
> corfit<- duplicateCorrelation(eset_norm_genes_ff_filtered, design, ndups=1,
> block= biolrep) # eset_norm_genes_ff_filtered is an ExpressionSet object
> containing pre-processed and filtered data
>
> I am interested in identifying differentially expressed genes between MPM
> and PP and between PV and PP.
>
> contrast.matrix_all.contrasts<-
> makeContrasts(MPMvsPP=MPM-PP,PVvsPP=PV-PP,levels=design)
>
> fit_ff<-lmFit(eset_norm_genes_ff_filtered, design,block=biolrep,
> ndups=1,cor=corfit$consensus)
>
> fit2_ff<- contrasts.fit(fit_ff, contrast.matrix_all.contrasts)
>
> fit2e_ff<-eBayes(fit2_ff)
>
> I think that my approach is correct for the first contrast (MPM vs PP) but
> not for the second one because biolrep doesn't consider the fact that some
> samples between PP and PV are paired.
>
> Am I correct?
>
> What about defining biolrep<-c(1:4,4,5,6,7,7,8:12,8:11)?
>
> Is there a method to handle such an experimental design?
>
> Sorry for my long post!
>
> Any suggestion/comment is welcome.
>
> Cheers,
>
> Manuela
>
>
>
> ----------------------------------------------------------------------------
> ----------
>
> Manuela Di Russo, Ph.D. Student
> Department of Experimental Pathology, MBIE
> University of Pisa
> Pisa, Italy
> e-mail:<mailto:manuela.dirusso at for.unipi.it> manuela.dirusso at for.unipi.it
> mobile: +393208778864
>
> phone: +39050993538
>
>
>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
More information about the Bioconductor
mailing list