[BioC] bug? universeMappedCount for KEGGHyperG tests in GOstats
Cei Abreu-Goodger
cei at ebi.ac.uk
Tue Oct 6 11:50:59 CEST 2009
Hi all,
There seems to be a problem with the universeMappedCount function (and
maybe the underlying statistics following from this?) for an hyperGTest
on a KEGGHyperGParams. It appears to be reporting the total number of
mapped genes in the _tested_ categories instead of the total number of
mapped genes in the initial universe. This may sound intentional, but
its inconsistent with what happens when using a GOHyperGParams.
Example code follows with its output and sessionInfo at the end:
library(GOstats)
library("org.Mm.eg.db")
# Define a fixed universe for KEGG and GO tests
universeKEGG <- sample(mappedkeys(org.Mm.egPATH),1000)
universeGO <- sample(mappedkeys(org.Mm.egGO),1000)
# Perform GO/KEGG hyperG tests with different sample sizes
for (size in c(5,10,20)) {
genesKEGG <- sample(universeKEGG,size)
genesGO <- sample(universeGO,size)
paramsKEGG <- new("KEGGHyperGParams", geneIds=genesKEGG,
universeGeneIds=universeKEGG,
annotation="org.Mm.eg", pvalueCutoff=0.05,
testDirection="over")
paramsGO <- new("GOHyperGParams", geneIds=genesGO,
universeGeneIds=universeGO,
annotation="org.Mm.eg", pvalueCutoff=0.05, ontology="MF",
testDirection="over")
resultsKEGG <- hyperGTest(paramsKEGG)
resultsGO <- hyperGTest(paramsGO)
uniSizeKEGG <- universeMappedCount(resultsKEGG)
uniSizeGO <- universeMappedCount(resultsGO)
print(paste("Sample size:",size,", GO mapped universe:",uniSizeGO,",
KEGG mapped universe:",uniSizeKEGG))
}
## Code output:
[1] "Sample size: 5 , GO mapped universe: 884 , KEGG mapped universe: 286"
[1] "Sample size: 10 , GO mapped universe: 884 , KEGG mapped universe: 402"
[1] "Sample size: 20 , GO mapped universe: 884 , KEGG mapped universe: 569"
## The GO mapped universe stays constant but KEGG counts increase with
sample sizes.
sessionInfo()
R version 2.9.2 (2009-08-24)
i386-apple-darwin8.11.1
locale:
en_GB.UTF-8/en_GB.UTF-8/C/C/en_GB.UTF-8/en_GB.UTF-8
attached base packages:
[1] stats graphics grDevices datasets utils methods base
other attached packages:
[1] GO.db_2.2.11 org.Mm.eg.db_2.2.11 GOstats_2.10.0
[4] RSQLite_0.7-1 DBI_0.2-4 graph_1.22.2
[7] Category_2.10.1 AnnotationDbi_1.6.1 Biobase_2.4.1
loaded via a namespace (and not attached):
[1] annotate_1.22.0 genefilter_1.24.2 GSEABase_1.6.1 RBGL_1.20.0
[5] splines_2.9.2 survival_2.35-4 tools_2.9.2 XML_2.5-3
[9] xtable_1.5-5
More information about the Bioconductor
mailing list