[BioC] BSgenome or org.Hs.eg.db to find gene length
Marc Carlson
mcarlson at fhcrc.org
Thu Oct 11 19:10:35 CEST 2012
Hi Fatemehsadat,
You could consider doing it this way:
library(Homo.sapiens)
cols(Homo.sapiens) ## shows cols you could use
keytypes(Homo.sapiens) ## shows keytypes
k <- keys(Homo.sapiens,keytype="ENTREZID") ## discovers all available
keys of this kind
result <- select(Homo.sapiens, k, cols=c("TXNAME","TXSTART","TXEND",
"TXSTRAND"), keytype="ENTREZID")
Then you could process that result according to your definition of what
you think constitutes the "gene range". Do you think it is the max
range? The average? Maybe the max range plus some buffering sequence
to account for likely transcriptional regulators? It's your call how
you want to do that step, but the data frame in result should give you
the range positions for all the transcripts and their associated gene IDs.
OR, you might also consider doing it this way:
result2 <- transcriptsBy(TxDb.Hsapiens.UCSC.hg19.knownGene, by= "gene")
Which will give you a list like object that is also suitable for use in
range operations.
Hope this helps,
Marc
On 10/11/2012 09:42 AM, Fatemehsadat Seyednasrollah wrote:
> Dear list,
>
> As I have read I can find chromosome number (using org.Hs.egCHR) , chromosome location (org.Hs.egCHRLOC) and end position(using org.Hs.egCHRLOCEND) of a list of gene symbols. But I did not find which one mapped the gene length to its symbol. Should I subtract what I get in org.Hs.egCHRLOCEND from org.Hs.egCHRLOC for each gene symbol to find the gene length or is there an easier way to find it for a long list of gene symbols.
>
> Thank you
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list