[BioC] some help requested for constructing an appropriate design matrix in LIMMA

Belinda Phipson phipson at wehi.EDU.AU
Sat Jun 23 02:43:00 CEST 2012


Hi Steven

A common problem with small sample sizes! There are some things you can try:
1) You can try using a function called combat() in the sva package to
remove the cell line effect.
2) You can have a look at what fit$df.prior is giving you. A larger value
will result in more significant differential expression, and a small value
will result in less differential expression. If you plot log(fit$sigma) on
the y-axis and fit$Amean on the x-axis you may find some strange highly
variable genes that may need to be filtered out. This can affect the
estimate of the prior degrees of freedom. You may also decide to filter
your genes based on a variance (fit$sigma) cut-off and re-run eBayes().
3) You could try running eBayes with a trend:
> fit <- eBayes(fit,trend=T)
It may not make any difference, but you can try!

Other than that have a closer look at the individual expression values of
the top 10 genes, even if they're not significant, to determine whether
there is a difference between R and S. They may still be worth following
up.
 I find a barplot() helpful to visualise the individual sample values.

Good luck!
Cheers.
Belinda


> Hi Belinda,
>
> Thank you for the suggestion.
> I filtered out the probes with very low expression values and then used
> your design matrix.
> Next I used:
>>myresults<-topTable(fit, coef=4, number=50000)
>
> This list gives me not a single significant differentially expressed probe
> after correction for mult hypothesis testing.
> When I used multi dimensional plotting on my dataset, I noticed that there
> was a very big biological difference between the cell lines (first
> dimension), but that the difference between resistent and sensitive (2nd
> dimension) are quite small.
> Could this be the cause of the relatively high p-values? Is there a way to
> correct for this?
>
> 2012/6/18 Belinda Phipson <phipson at wehi.edu.au>
>
>> Hi Steven
>>
>> You could just include cell line in your linear model rather than using
>> duplicateCorrelation().
>>
>> > design <-
>> model.matrix(~factor(targets$cellline)+factor(targets$fenotype))
>> > fit <- lmFit(eset,design)
>> > fit <- eBayes(fit)
>>
>> This will test R vs S taking into account cell line. You could also
>> filter
>> out lowly expressed genes across all samples to improve your power to
>> detect
>> differentially expressed genes as your sample size is quite small.
>>
>> Cheers,
>> Belinda
>>
>> -----Original Message-----
>> From: bioconductor-bounces at r-project.org
>> [mailto:bioconductor-bounces at r-project.org] On Behalf Of steven segbroek
>> Sent: Friday, 15 June 2012 2:13 AM
>> To: bioconductor at r-project.org
>> Subject: [BioC] some help requested for constructing an appropriate
>> design
>> matrix in LIMMA
>>
>> Dear R-users,
>>
>> I want to analyse a single channel micro array experiment which looks
>> like
>> the following:
>>
>> > targets
>>  File cellline fenotype
>> 1    A       1        R
>> 2    B       2        R
>> 3    C       3        R
>> 4    D       1        S
>> 5    E       2        S
>> 6    F       3        S
>>
>> There are three different cell lines, each of which comes in two
>> versions.
>>
>> Every cell line has  a variant that is resistant to a specific drug and
>> another variant that is sensitive to this drug.
>>
>> We treated both variant of the three cell lines with this drug and then
>> extracted RNA which was then hybridised to a micro array.
>>
>> The question we want to resolve is: which genes are differentially
>> regulated between resistant (R) and sensitive (S) versions of these cell
>> lines.
>>
>> There is quite some biological variation between the cell lines, so
>> grouping them by fenotype and then searching for differentially
>> regulated
>> genes would be a bad idea.
>>
>> So, the idea is to to construct a model that accounts for this
>> biological
>> variation between the cell lines and looks which genes are consistently
>> up
>> or down regulated between resistant and sensitive versions of these
>> three
>> cell lines.
>>
>> I am a bit puzzled on how to setup an appropriate design matrix for this
>> particular setup.
>>
>> I have come up with the following code:
>> > design
>>  celline fenotR fenotS
>> 1       1      1      0
>> 2       2      1      0
>> 3       3      1      0
>> 4       1      0      1
>> 5       2      0      1
>> 6       3      0      1
>> attr(,"assign")
>> [1] 1 2 2
>> attr(,"contrasts")
>> attr(,"contrasts")$fenot
>> [1] "contr.treatment"
>>
>> >block<-c(1,2,3,1,2,3)
>> >eset<-exprs(BSData.log2.quantile)
>> >cor<-duplicateCorrelation(eset, ndups=1, block=block, design=design)
>> >fit <- lmFit(eset, design, block=block, cor=cor$consensus)
>> >fit<- eBayes(fit)
>> >cont.matrix<-makeContrasts(resvssens= fenotR - fenotS, levels=design)
>> >fit2<-contrasts.fit(fit,cont.matrix)
>> >fit2<-eBayes(fit2)
>> >topTable(fit2)
>>
>> However, this code results in an adj.p-value that is "0.9999541" for
>> every
>> gene.
>>
>> Is there a better way to analyse this?
>>
>> Kind regards,
>> Steven
>>
>>         [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>> ______________________________________________________________________
>> The information in this email is confidential and intended solely for
>> the
>> addressee.
>> You must not disclose, forward, print or use it without the permission
>> of
>> the sender.
>> ______________________________________________________________________
>>
>



______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioconductor mailing list