[Bioc-sig-seq] unexpected genes names list using getBM{biomaRt}
James W. MacDonald
jmacdon at med.umich.edu
Mon Dec 7 16:20:48 CET 2009
Hi Ramzi,
Ramzi TEMANNI wrote:
> Hi,
> I want to extract the gene names knowing the chromosome and the position for
> each genes:
>> t.cpd[1:10,1:2]
> CHR.M1 POS.M1
> [1,] "12" "140059033"
> [2,] "19" "164634640"
> [3,] "10" "32347784"
> [4,] "11" "30576841"
> [5,] "2" "86479831"
> [6,] "12" "237019866"
> [7,] "4" "76487174"
> [8,] "20" "136121868"
> [9,] "2" "6255547"
> [10,] "1" "67658137"
>
> i use the following commands:
> library(biomaRt)
> mart = useMart("ensembl")
> ensembl = useDataset("hsapiens_gene_ensembl", mart = mart)
> gn.m1<-getBM(attributes= c("hgnc_symbol"),
> filters=c("chromosome_name","start"),
> values=list(t.cpd[1:10,1],t.cpd[1:10,2]), mart=ensembl)
>
> I'm expecting having a list of 10 genes names, but instead i get 8652 genes:
> hgnc_symbol
> 1 OR2M1P
> 2 OR2L1P
> 3 HSD17B7P1
> 4 OR14L1P
> 5 OR2W5
> 6 VN1R5
> ......
> 8649 WFS1
> 8650 SNORD73A
> 8651 SNORA24
> 8652 SNORA26
>
> Did I miss something ?
Yes. You are giving the start position, but not the end. Without
explicitly telling the Biomart server where to stop looking for genes,
where do you think it will stop by default?
Also, several of your coordinates are nonsensical. For instance, chr12
is only 133851859 bases long, chr20 is 63025520 bases long, etc.
Best,
Jim
>
> Thanks in advance for your help
>
> Best Regards,
> Ramzi
>
> ----------------------------------------------------------------
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
--
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826
**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
More information about the Bioc-sig-sequencing
mailing list