[BioC] EdgeR for proteomics

Wed Oct 10 22:40:48 CEST 2012

Hi Fabricio,

I suggest you check (at least) 2 things:

1.
> disp <- estimateCommonDisp(b)
> disp$common.dispersion = 0.0001004979

> disp$common.dispersion =  3.999943

Your example only makes 1 call to estimateCommonDisp(), but you have 2 drastically different values.  Are you reporting these as the estimated values, or are you actually running this command and *setting* the common dispersion?  It's not clear from your message.

You may also want to study some of the GLM-based case studies in:
http://www.bioconductor.org/packages/2.11/bioc/vignettes/edgeR/inst/doc/edgeRUsersGuide.pdf

For example, the standard GLM work flow would be similar to that on Page 7.

2. 
> lrt <- glmLRT(b,fit,coef=fit$design)

The docs for the 'coef' argument (?glmLRT) say:
----
   coef: integer or character vector indicating which coefficients of
         the linear model are to be tested equal to zero.  Values must
         be columns or column names of ‘design’. Defaults to the last
         coefficient.  Ignored if ‘contrast’ is specified.
----
As you can see, the function is expecting something very different to what give as your 'coef' argument. Maybe you want 'coef=2:6', if you are looking for any difference between your 6 groups.  Of course, maybe you actually want to split your factors into 2 … one of ("La","Lm","MO") and one of ("6h","24h") and construct a design matrix accordingly.  But, this is also not clear from your message.

Hope that helps,
Mark

----------
Prof. Dr. Mark Robinson
Bioinformatics
Institute of Molecular Life Sciences
University of Zurich
Winterthurerstrasse 190
8057 Zurich
Switzerland

v: +41 44 635 4848
f: +41 44 635 6898
e: mark.robinson at imls.uzh.ch
o: Y11-J-16
w: http://tiny.cc/mrobin

----------
http://www.fgcz.ch/Bioconductor2012

On 10.10.2012, at 06:41, Fabricio Marchini wrote:

> Hi,
> 
> I'm using EdgeR to analyse a proteomic data with peptide counting. I have
> limited experience on R/EdgeR/Statistics so I appreciate some help.
> Using the follow code:
> 
> a=file[,2:64]
> 
> b=DGEList(counts=a,group=rep(c("La6h","La24h","Lm6h","Lm24h","MO6h","MO24h"
> ),c(10,11,10,11,10,11)), lib.size=colSums(a))
> 
> b <- calcNormFactors(b)
> 
> times <- rep(c("La6h","La24h","Lm6h","Lm24h","MO6h","MO24h"),c(10,11,10,11,
> 10,11))
> 
> times <- factor(times,levels=c("La6h","La24h","Lm6h","Lm24h","MO6h","MO24h"
> ))
> 
> design <- model.matrix(~factor(times))
> 
> disp <- estimateCommonDisp(b)
> 
> fit <- glmFit(b,design,dispersion=disp$common.dispersion)
> 
> lrt <- glmLRT(b,fit,coef=fit$design)
> disp$common.dispersion = 0.0001004979
> 
> All proteins (3430) had a p.value of 0.
> 
> I tried also with
> 
> fit <- glmFit(b,design,dispersion=disp$common.dispersion)
> 
> lrt <- glmLRT(b,fit,coef=fit$design)
> disp$common.dispersion =  3.999943
> 
> and that gave me all the proteins with p.value lower than 6.29E-05.
> 
> That gave a signal that I'm doing something wrong or because of both common
> dispersions my data is not a appropriate for the analysis.
> 
> Any suggestions or corrections?
> 
> --
> Fabricio K. Marchini
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor