[Bioc-sig-seq] EdgeR bug (stable (1.4.3) and devel (1.5.5)) : the genes "columns" does not get ordered as the "counts" in the topTags results.

Gordon K Smyth smyth at wehi.EDU.AU
Fri Jan 22 03:09:22 CET 2010


Dear Nicolas,

Thanks very much for the bug report.  I have committed a fix to edgeR 
versions 1.4.6 (release) and 1.5.6 (devel).

Best wishes
Gordon

On Thu, 21 Jan 2010, Nicolas Delhomme wrote:

> Good afternoon,
>
> I've had the following bug in edgeR version 1.4.3 (bioC 2.5 stable).
>
> I created my object like this:
>
> dge<-DGEList(
>              counts=counts,
>              group=sub("\\.\\d","",colnames(counts)),
>              lib.size=colSums(counts),
>              genes=transcripts)
>
> Then I estimated the common dispersion:
>
> dge <- estimateCommonDisp(dge)
>
> And did a contrast between two conditions and got the information back in a 
> data.frame:
>
> tags <- topTags(exactTest(dge,pair=c("gal4","twi")),n=.Machine$integer.max)
>
> There, I realized that the genes were not ordered accordingly to the 
> colnames. The colnames are correct. See the results below.
>
>> tags$table[1:10,]
>                 genes   logConc     logFC       PValue          FDR
> FBtr0084431 FBtr0087887 -33.16413 33.703845 9.050816e-50 1.834510e-45
> FBtr0084439 FBtr0087893 -17.89889  5.696000 2.391551e-45 2.423717e-41
> FBtr0071928 FBtr0074873 -19.97871  7.066663 1.814922e-43 1.226222e-39
> FBtr0077809 FBtr0070400 -33.99876 32.034581 1.677380e-29 8.499706e-26
> FBtr0300181 FBtr0074211 -21.99303  7.636927 1.945823e-26 7.887978e-23
> FBtr0301322 FBtr0084735 -34.32748 31.377138 8.141553e-24 2.750352e-20
> FBtr0299839 FBtr0084424 -19.89508  4.299143 9.575473e-23 2.426066e-19
> FBtr0299841 FBtr0084426 -19.89508  4.299143 9.575473e-23 2.426066e-19
> FBtr0299840 FBtr0084425 -19.91022  4.268863 2.075009e-22 4.673152e-19
> FBtr0079322 FBtr0071074 -19.40663  3.933231 1.308765e-21 2.652735e-18
>
> So I downloaded the latest version (1.5.5, bioC 2.6 devel) and I could 
> reproduce the same bug. Actually before I could do that I got yet another 
> error message saying:
>
> "Number of rows in the annotation object 'genes' is not equal to the number 
> of rows in the matrix of counts.
> Must have an annotation for each gene/tag, if annotation is to be used."
>
> whereas I didn't change anything to the above command lines. I tracked it 
> down and what happens is that rows with empty counts are removed from the 
> counts object but not from the genes one. And this obviously has the sanity 
> check failing.
>
> Changing the line 76 in classes.R (today's svn checkout) from
>
> 	else if(nrow(as.data.frame(genes))==nrow(x$counts)) x$genes <- 
> as.data.frame(genes, stringsAsFactors=FALSE)
>
> to
>
> 	else 
> if(nrow(as.data.frame(genes)[!allZeros,,drop=FALSE])==nrow(x$counts)) x$genes 
> <- as.data.frame(genes, stringsAsFactors=FALSE)[!allZeros,,drop=FALSE]
>
> corrects that problem. I haven't investigated the afore mentioned one, as I 
> expect it to be in some deeper functions and that I could work around it by 
> selecting on the colnames.
>
> Regards,
>
> Nicolas Delhomme
>
> PS: sessionInfo()
>
> R version 2.10.0 (2009-10-26)
> x86_64-unknown-linux-gnu
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
> [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] doMC_1.2.0      multicore_0.1-3 foreach_1.3.0   codetools_0.2-2
> [5] iterators_1.0.3 edgeR_1.5.5     DESeq_0.7.4     locfit_1.5-5
> [9] lattice_0.18-3  akima_0.5-4     Biobase_2.6.1
>
> loaded via a namespace (and not attached):
> [1] annotate_1.24.1     AnnotationDbi_1.8.1 DBI_0.2-5
> [4] dgenb_0.4.0         genefilter_1.28.2   geneplotter_1.24.0
> [7] grid_2.10.0         limma_3.2.1         MASS_7.3-4
> [10] RColorBrewer_1.0-2  RSQLite_0.8-1       splines_2.10.0
> [13] survival_2.35-8     xtable_1.5-6
>
>
>
>
>
> ---------------------------------------------------------------
> Nicolas Delhomme
>
> High Throughput Functional Genomics Center
>
> European Molecular Biology Laboratory
>
> Tel: +49 6221 387 8426
> Email: nicolas.delhomme at embl.de
> Meyerhofstrasse 1 - Postfach 10.2209
> 69102 Heidelberg, Germany
> ---------------------------------------------------------------

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioc-sig-sequencing mailing list