[BioC] GOSeq with unsupported organism (Arabidopsis) and retrieving gene IDs from enriched GO categories

Wed Mar 12 11:46:59 CET 2014

Hi All,

I'm currently working on a differential gene expression analysis and 
I've used GOSeq to find enriched GO categories, just like what is 
mentioned here 
(https://stat.ethz.ch/pipermail/bioconductor/attachments/20110308/92b27df4/attachment.pl 

), except I am using a non-supported organism (Arabidopsis). I've come 
to the exact point in the analysis as Fernando has in the above link, 
where I would like to extract all gene IDs associated with the enriched 
GO terms in my DE analysis.

My question is, how can I do this with a non-supported organism?

For a supported organism, the process looks to be straight forward.. but 
for an unsupported genome and for a newbie in R, the process isn't so 
easy..

This is some of the code that got me to where I am now.

#calculate pwf function
pwf = nullp(genes,bias.data=overlapLengths)

tairgo <- read.table("ATH_GO_GOSLIM.txt", header=F, sep="\t", fill=T) 
#read in GO Categories File

GO.wall <- goseq(pwf, gene2cat=tairgo[,c(1,6)]) # get ID and GO columns 
only from tairgo
GO.samp <- goseq(pwf, gene2cat=tairgo[,c(1,6)], 
method="Sampling",repcnt=1000)

enriched.GO = GO.wall$category[p.adjust(GO.wall$over_represented_pvalue, 
method = "BH") < 0.05]
enriched.sampgo = 
GO.samp$category[p.adjust(GO.wall$over_represented_pvalue, method = 
"BH") < 0.05]

What I've been thinking of doing is  looping through my enriched GO 
terms vector and finding all gene IDs that have matching GO terms in 
"tairgo". However, is there a better way to do this using one of the 
functions built into GOSeq?

Thanks so much for your valuable input!!

-- 
Dale Richardson, Ph.D.
Laboratory of Plant Molecular Biology
Instituto Gulbenkian de Ciência
Rua da Quinta Grande, 6
2780-156 Oeiras
Portugal
http://www.igc.gulbenkian.pt
Tel: +351 214 464 647