[BioC] How to get Gene ontology (GO) terms per probe
James W. MacDonald
jmacdon at uw.edu
Thu Oct 25 18:42:48 CEST 2012
Hi Rafi,
On 10/23/2012 6:59 PM, Rafi [guest] wrote:
> I am new to R/BioC. I am trying to do GO-based clustering of genes. The input (for the package csbl.go) needs to be gene name and GO terms in each row. Example:
Hmm. Weird that this package doesn't have facilities to do this. Anyway,
not that difficult, starting after your line that creates the testid object:
d.f <- select(rat2302.db, testid, c("SYMBOL", "GO"))
out <- data.frame(tapply(d.f$GO, d.f$SYMBOL, paste, collapse = " ")) ##
note there is a space between the " ".
write.table(out, "input_for_csbl.txt", col.names = FALSE, quote = FALSE)
Best,
Jim
>
> AP4B1 GO:0005215 GO:0005488 GO:0005515 GO:0005625
> BCAS2 GO:0005515 GO:0005634 GO:0005681 GO:0008380
>
> I tried using annotate in bioconductor:
>
> library("rat2302.db")
> library(annotate)
> testid<-c("1367462_at","1380262_at", "1392516_a_at", "1396521_at")
> goid1<- rat2302GO[testid]
>
> But I get only each GO term in seperate row:
>
> toTable(goid1)
>
> probe_id go_id Evidence Ontology
> 1 1367462_at GO:0008152 IEA BP
> 2 1367462_at GO:0008152 ISO BP
> 3 1367462_at GO:0006508 IMP BP
> 4 1367462_at GO:0005886 IEA CC
> 5 1367462_at GO:0005737 IEA CC
> 6 1380262_at GO:0005575 ND CC
> 7 1380262_at GO:0005634 IEA CC
> 8 1380262_at GO:0005737 IEA CC
> 9 1367462_at GO:0005509 IEA MF
> 10 1367462_at GO:0005509 TAS MF
>
> Is there any easier way to get all GO terms per gene/probe?
>
> Any help is greatly appreciated.
>
> Thanks
> Rafi
>
> -- output of sessionInfo():
>
> R version 2.15.0 (2012-03-30)
> Platform: x86_64-pc-mingw32/x64 (64-bit)
>
> locale:
> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
> [4] LC_NUMERIC=C LC_TIME=English_United States.1252
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] csbl.go_1.4.0 RUnit_0.4.26 cluster_1.14.2 GO.db_2.7.1 BiocInstaller_1.4.9
> [6] annotate_1.34.1 rat2302.db_2.7.1 org.Rn.eg.db_2.7.1 RSQLite_0.11.1 DBI_0.2-5
> [11] AnnotationDbi_1.18.1 Biobase_2.16.0 BiocGenerics_0.2.0
>
> loaded via a namespace (and not attached):
> [1] IRanges_1.14.4 stats4_2.15.0 tools_2.15.0 XML_3.9-4.1 xtable_1.7-0
>
> --
> Sent via the guest posting facility at bioconductor.org.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
More information about the Bioconductor
mailing list