[Bioc-sig-seq] EdgeR bug (stable (1.4.3) and devel (1.5.5)) : the genes "columns" does not get ordered as the "counts" in the topTags results.
Gordon K Smyth
smyth at wehi.EDU.AU
Fri Jan 22 03:09:22 CET 2010
Dear Nicolas,
Thanks very much for the bug report. I have committed a fix to edgeR
versions 1.4.6 (release) and 1.5.6 (devel).
Best wishes
Gordon
On Thu, 21 Jan 2010, Nicolas Delhomme wrote:
> Good afternoon,
>
> I've had the following bug in edgeR version 1.4.3 (bioC 2.5 stable).
>
> I created my object like this:
>
> dge<-DGEList(
> counts=counts,
> group=sub("\\.\\d","",colnames(counts)),
> lib.size=colSums(counts),
> genes=transcripts)
>
> Then I estimated the common dispersion:
>
> dge <- estimateCommonDisp(dge)
>
> And did a contrast between two conditions and got the information back in a
> data.frame:
>
> tags <- topTags(exactTest(dge,pair=c("gal4","twi")),n=.Machine$integer.max)
>
> There, I realized that the genes were not ordered accordingly to the
> colnames. The colnames are correct. See the results below.
>
>> tags$table[1:10,]
> genes logConc logFC PValue FDR
> FBtr0084431 FBtr0087887 -33.16413 33.703845 9.050816e-50 1.834510e-45
> FBtr0084439 FBtr0087893 -17.89889 5.696000 2.391551e-45 2.423717e-41
> FBtr0071928 FBtr0074873 -19.97871 7.066663 1.814922e-43 1.226222e-39
> FBtr0077809 FBtr0070400 -33.99876 32.034581 1.677380e-29 8.499706e-26
> FBtr0300181 FBtr0074211 -21.99303 7.636927 1.945823e-26 7.887978e-23
> FBtr0301322 FBtr0084735 -34.32748 31.377138 8.141553e-24 2.750352e-20
> FBtr0299839 FBtr0084424 -19.89508 4.299143 9.575473e-23 2.426066e-19
> FBtr0299841 FBtr0084426 -19.89508 4.299143 9.575473e-23 2.426066e-19
> FBtr0299840 FBtr0084425 -19.91022 4.268863 2.075009e-22 4.673152e-19
> FBtr0079322 FBtr0071074 -19.40663 3.933231 1.308765e-21 2.652735e-18
>
> So I downloaded the latest version (1.5.5, bioC 2.6 devel) and I could
> reproduce the same bug. Actually before I could do that I got yet another
> error message saying:
>
> "Number of rows in the annotation object 'genes' is not equal to the number
> of rows in the matrix of counts.
> Must have an annotation for each gene/tag, if annotation is to be used."
>
> whereas I didn't change anything to the above command lines. I tracked it
> down and what happens is that rows with empty counts are removed from the
> counts object but not from the genes one. And this obviously has the sanity
> check failing.
>
> Changing the line 76 in classes.R (today's svn checkout) from
>
> else if(nrow(as.data.frame(genes))==nrow(x$counts)) x$genes <-
> as.data.frame(genes, stringsAsFactors=FALSE)
>
> to
>
> else
> if(nrow(as.data.frame(genes)[!allZeros,,drop=FALSE])==nrow(x$counts)) x$genes
> <- as.data.frame(genes, stringsAsFactors=FALSE)[!allZeros,,drop=FALSE]
>
> corrects that problem. I haven't investigated the afore mentioned one, as I
> expect it to be in some deeper functions and that I could work around it by
> selecting on the colnames.
>
> Regards,
>
> Nicolas Delhomme
>
> PS: sessionInfo()
>
> R version 2.10.0 (2009-10-26)
> x86_64-unknown-linux-gnu
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] doMC_1.2.0 multicore_0.1-3 foreach_1.3.0 codetools_0.2-2
> [5] iterators_1.0.3 edgeR_1.5.5 DESeq_0.7.4 locfit_1.5-5
> [9] lattice_0.18-3 akima_0.5-4 Biobase_2.6.1
>
> loaded via a namespace (and not attached):
> [1] annotate_1.24.1 AnnotationDbi_1.8.1 DBI_0.2-5
> [4] dgenb_0.4.0 genefilter_1.28.2 geneplotter_1.24.0
> [7] grid_2.10.0 limma_3.2.1 MASS_7.3-4
> [10] RColorBrewer_1.0-2 RSQLite_0.8-1 splines_2.10.0
> [13] survival_2.35-8 xtable_1.5-6
>
>
>
>
>
> ---------------------------------------------------------------
> Nicolas Delhomme
>
> High Throughput Functional Genomics Center
>
> European Molecular Biology Laboratory
>
> Tel: +49 6221 387 8426
> Email: nicolas.delhomme at embl.de
> Meyerhofstrasse 1 - Postfach 10.2209
> 69102 Heidelberg, Germany
> ---------------------------------------------------------------
______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}
More information about the Bioc-sig-sequencing
mailing list