[BioC] Map GO terms to Uniprot from org.Hs.eg
James W. MacDonald
jmacdon at med.umich.edu
Wed Sep 14 15:48:23 CEST 2011
Hi Sandeep,
Here's a start.
> library(org.Hs.eg.db)
> uniprots <- head(Rkeys(org.Hs.egUNIPROT))
> uniprots
[1] "A0A183" "A0A5E8" "A0A962" "A0AUX0" "A0AUZ9" "A0AV02"
> egs <- mget(uniprots, revmap(org.Hs.egUNIPROT))
> egs
$A0A183
[1] "448835"
$A0A5E8
[1] "10634"
$A0A962
[1] "55072"
$A0AUX0
[1] "272"
$A0AUZ9
[1] "151050"
$A0AV02
[1] "84561"
> gos <- lapply(egs, get, org.Hs.egGO)
This will result in a list of lists, where the list names are the
UniProt IDs
> names(gos)
[1] "A0A183" "A0A5E8" "A0A962" "A0AUX0" "A0AUZ9" "A0AV02"
And for each UniProt ID you have a list of all GO IDs that map to that
UniProt ID, along with their evidence code.
> gos$A0A183
$`GO:0031424`
$`GO:0031424`$GOID
[1] "GO:0031424"
$`GO:0031424`$Evidence
[1] "IEA"
$`GO:0031424`$Ontology
[1] "BP"
So for this first one, there is only one GO term, GO:0031424, that is a
BP term. It can get much more complicated, with multiple terms (of
multiple types) for each UniProt ID (e.g., you could have 5 MF terms and
3 BP terms for one UniProt ID). Which may make putting things into a
nice neat table a bit challenging.
The list can be parsed using some combination of lapply() and sapply(),
but I don't have the time to play around with it. That will have to be
your homework for the day.
Also note that you can query these .db packages with SQL queries, if you
are a database person. This might make things easier. See
http://www.bioconductor.org/packages/2.8/bioc/vignettes/AnnotationDbi/inst/doc/AnnotationDbi.pdf,
in particular sections 2.0.9 and 2.0.10.
Best,
Jim
On 9/14/2011 6:53 AM, Sandeep Amberkar wrote:
> Dear All,
>
>
> I have loaded the dataset "org.Hs.eg" into my R-session. Being using it for
> the first time, I am not familiar with its data structure. Can anyone please
> help me in building a table that contains ontology wise mapping to Uniprot
> identifiers? I want the final output table to look something like this --
>
> Uniprot GO_BP GO_CC GO_MF
> ABC123 GO:121 GO:122 GO:123
>
> Thanks in advance for your help.
>
> Warm Regards,
> Sandeep Amberkar
> BioQuant,BQ26,
> Im Neuenheimer Feld 267,
> D-69120,Heidelberg
> Tel: +49-6221-5451354
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826
**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
More information about the Bioconductor
mailing list