[Bioc-sig-seq] EdgeR bug (stable (1.4.3) and devel (1.5.5)) : the genes "columns" does not get ordered as the "counts" in the topTags results.
Nicolas Delhomme
delhomme at embl.de
Thu Jan 21 16:03:00 CET 2010
Good afternoon,
I've had the following bug in edgeR version 1.4.3 (bioC 2.5 stable).
I created my object like this:
dge<-DGEList(
counts=counts,
group=sub("\\.\\d","",colnames(counts)),
lib.size=colSums(counts),
genes=transcripts)
Then I estimated the common dispersion:
dge <- estimateCommonDisp(dge)
And did a contrast between two conditions and got the information back
in a data.frame:
tags <- topTags(exactTest(dge,pair=c("gal4","twi")),n=.Machine
$integer.max)
There, I realized that the genes were not ordered accordingly to the
colnames. The colnames are correct. See the results below.
> tags$table[1:10,]
genes logConc logFC PValue FDR
FBtr0084431 FBtr0087887 -33.16413 33.703845 9.050816e-50 1.834510e-45
FBtr0084439 FBtr0087893 -17.89889 5.696000 2.391551e-45 2.423717e-41
FBtr0071928 FBtr0074873 -19.97871 7.066663 1.814922e-43 1.226222e-39
FBtr0077809 FBtr0070400 -33.99876 32.034581 1.677380e-29 8.499706e-26
FBtr0300181 FBtr0074211 -21.99303 7.636927 1.945823e-26 7.887978e-23
FBtr0301322 FBtr0084735 -34.32748 31.377138 8.141553e-24 2.750352e-20
FBtr0299839 FBtr0084424 -19.89508 4.299143 9.575473e-23 2.426066e-19
FBtr0299841 FBtr0084426 -19.89508 4.299143 9.575473e-23 2.426066e-19
FBtr0299840 FBtr0084425 -19.91022 4.268863 2.075009e-22 4.673152e-19
FBtr0079322 FBtr0071074 -19.40663 3.933231 1.308765e-21 2.652735e-18
So I downloaded the latest version (1.5.5, bioC 2.6 devel) and I could
reproduce the same bug. Actually before I could do that I got yet
another error message saying:
"Number of rows in the annotation object 'genes' is not equal to the
number of rows in the matrix of counts.
Must have an annotation for each gene/tag, if annotation is to be
used."
whereas I didn't change anything to the above command lines. I tracked
it down and what happens is that rows with empty counts are removed
from the counts object but not from the genes one. And this obviously
has the sanity check failing.
Changing the line 76 in classes.R (today's svn checkout) from
else if(nrow(as.data.frame(genes))==nrow(x$counts)) x$genes <-
as.data.frame(genes, stringsAsFactors=FALSE)
to
else if(nrow(as.data.frame(genes)[!allZeros,,drop=FALSE])==nrow(x
$counts)) x$genes <- as.data.frame(genes, stringsAsFactors=FALSE)[!
allZeros,,drop=FALSE]
corrects that problem. I haven't investigated the afore mentioned one,
as I expect it to be in some deeper functions and that I could work
around it by selecting on the colnames.
Regards,
Nicolas Delhomme
PS: sessionInfo()
R version 2.10.0 (2009-10-26)
x86_64-unknown-linux-gnu
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] doMC_1.2.0 multicore_0.1-3 foreach_1.3.0 codetools_0.2-2
[5] iterators_1.0.3 edgeR_1.5.5 DESeq_0.7.4 locfit_1.5-5
[9] lattice_0.18-3 akima_0.5-4 Biobase_2.6.1
loaded via a namespace (and not attached):
[1] annotate_1.24.1 AnnotationDbi_1.8.1 DBI_0.2-5
[4] dgenb_0.4.0 genefilter_1.28.2 geneplotter_1.24.0
[7] grid_2.10.0 limma_3.2.1 MASS_7.3-4
[10] RColorBrewer_1.0-2 RSQLite_0.8-1 splines_2.10.0
[13] survival_2.35-8 xtable_1.5-6
---------------------------------------------------------------
Nicolas Delhomme
High Throughput Functional Genomics Center
European Molecular Biology Laboratory
Tel: +49 6221 387 8426
Email: nicolas.delhomme at embl.de
Meyerhofstrasse 1 - Postfach 10.2209
69102 Heidelberg, Germany
More information about the Bioc-sig-sequencing
mailing list