[BioC] GOSeq with unsupported organism (Arabidopsis) and retrieving gene IDs from enriched GO categories
Dale Richardson
drichardson at igc.gulbenkian.pt
Wed Mar 12 11:46:59 CET 2014
Hi All,
I'm currently working on a differential gene expression analysis and
I've used GOSeq to find enriched GO categories, just like what is
mentioned here
(https://stat.ethz.ch/pipermail/bioconductor/attachments/20110308/92b27df4/attachment.pl
), except I am using a non-supported organism (Arabidopsis). I've come
to the exact point in the analysis as Fernando has in the above link,
where I would like to extract all gene IDs associated with the enriched
GO terms in my DE analysis.
My question is, how can I do this with a non-supported organism?
For a supported organism, the process looks to be straight forward.. but
for an unsupported genome and for a newbie in R, the process isn't so
easy..
This is some of the code that got me to where I am now.
#calculate pwf function
pwf = nullp(genes,bias.data=overlapLengths)
tairgo <- read.table("ATH_GO_GOSLIM.txt", header=F, sep="\t", fill=T)
#read in GO Categories File
GO.wall <- goseq(pwf, gene2cat=tairgo[,c(1,6)]) # get ID and GO columns
only from tairgo
GO.samp <- goseq(pwf, gene2cat=tairgo[,c(1,6)],
method="Sampling",repcnt=1000)
enriched.GO = GO.wall$category[p.adjust(GO.wall$over_represented_pvalue,
method = "BH") < 0.05]
enriched.sampgo =
GO.samp$category[p.adjust(GO.wall$over_represented_pvalue, method =
"BH") < 0.05]
What I've been thinking of doing is looping through my enriched GO
terms vector and finding all gene IDs that have matching GO terms in
"tairgo". However, is there a better way to do this using one of the
functions built into GOSeq?
Thanks so much for your valuable input!!
--
Dale Richardson, Ph.D.
Laboratory of Plant Molecular Biology
Instituto Gulbenkian de Ciência
Rua da Quinta Grande, 6
2780-156 Oeiras
Portugal
http://www.igc.gulbenkian.pt
Tel: +351 214 464 647
More information about the Bioconductor
mailing list