[BioC] makeTranscriptDbFromBiomart error
Marc Carlson
mcarlson at fhcrc.org
Thu Jun 7 19:40:32 CEST 2012
Hi Stefanie,
This is related to a bug with the 5' and 3' starts/ends that was in the
latest version of biomaRt. We reported it to them a couple weeks ago
because it immediately started to break some of our quality control
tests for GenomicFeatures. At that time, they told us that it has been
fixed, but it will still take a couple of weeks for their correction to
propagate out. In the meantime, using either makeTranscriptDbFromUCSC()
or the stock annotation packages for human, might be a good work-around
for you.
The warning that you saw for makeTranscriptDbFromUCSC() was another
quality control check. We expect that when an annotation resource tells
us the range for a CDS that this range should be divisible by three.
When this doesn't happen, we issue the warning you were seeing for
makeTranscriptDbFromUCSC().
Hope that this clarifies things,
Marc
On 06/07/2012 08:50 AM, Stefanie Tauber wrote:
> Hi,
>
> here is my sessionInfo:
>
>> sessionInfo()
> R version 2.15.0 (2012-03-30)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=C LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] GenomicFeatures_1.8.0 AnnotationDbi_1.18.0 Biobase_2.16.0
> [4] GenomicRanges_1.8.1 IRanges_1.14.2 BiocGenerics_0.2.0
>
> loaded via a namespace (and not attached):
> [1] biomaRt_2.12.0 Biostrings_2.24.0 bitops_1.0-4.1 BSgenome_1.24.0
> [5] DBI_0.2-5 RCurl_1.91-1 Rsamtools_1.8.0 RSQLite_0.11.1
> [9] rtracklayer_1.16.0 stats4_2.15.0 tools_2.15.0 XML_3.9-4
> [13] zlibbioc_1.2.0
>
> I updated GenomicFeatures to 1.8.1, but unfortunately did not help.
>
>
> BUT: makeTranscriptDbFromUCSC did work :)
>
>> txdb<- makeTranscriptDbFromUCSC(genome="hg19", tablename="ensGene")
> Download the ensGene table ... OK
> Extract the 'transcripts' data frame ... OK
> Extract the 'splicings' data frame ... OK
> Download and preprocess the 'chrominfo' data frame ... OK
> Prepare the 'metadata' data frame ... metadata: OK
> Make the TranscriptDb object ... OK
> There were 50 or more warnings (use warnings() to see the first 50)
>
>> txdb
> TranscriptDb object:
> | Db type: TranscriptDb
> | Supporting package: GenomicFeatures
> | Data source: UCSC
> | Genome: hg19
> | Genus and Species: Homo sapiens
> | UCSC Table: ensGene
> | Resource URL: http://genome.ucsc.edu/
> | Type of Gene ID: Ensembl gene ID
> | Full dataset: yes
> | miRBase build ID: NA
> | transcript_nrow: 181648
> | exon_nrow: 541825
> | cds_nrow: 278798
> | Db created by: GenomicFeatures package from Bioconductor
> | Creation time: 2012-06-07 17:48:45 +0200 (Thu, 07 Jun 2012)
> | GenomicFeatures version at creation time: 1.8.1
> | RSQLite version at creation time: 0.11.1
> | DBSCHEMAVERSION: 1.0
>
>> warnings()
> Warning messages:
> 1: In .extractUCSCCdsStartEnd(cdsStart[i], cdsEnd[i], exon_locs$start[[i]], ... :
> UCSC data anomaly in transcript ENST00000513161: the cds cumulative length is not a multiple of 3
> 2: In .extractUCSCCdsStartEnd(cdsStart[i], cdsEnd[i], exon_locs$start[[i]], ... :
> UCSC data anomaly in transcript ENST00000417833: the cds cumulative length is not a multiple of 3
> 3: In .extractUCSCCdsStartEnd(cdsStart[i], cdsEnd[i], exon_locs$start[[i]], ... :
> UCSC data anomaly in transcript ENST00000450884: the cds cumulative length is not a multiple of 3
>
>
> Best,
> Stefanie
>
> Am 07.06.2012 um 16:25 schrieb Steve Lianoglou:
>
>> Hi Stefanie,
>>
>> On Thu, Jun 7, 2012 at 5:16 AM, Stefanie Tauber
>> <stefanie.tauber at univie.ac.at> wrote:
>>> Hi
>>>
>>> I just tried it with R 2.15, I get the same error.
>>>
>>> If I follow your suggestion:
>>>
>>> txdb<- makeTranscriptDbFromUCSC(genome="hg19", tablename="ensGene")
>>>
>>>
>>> I get:
>>>
>>> Download the ensGene table ... OK
>>> Extract the 'transcripts' data frame ... OK
>>> Extract the 'splicings' data frame ... OK
>>> Download and preprocess the 'chrominfo' data frame ... Error in
>>> download.file(url, destfile, quiet = TRUE) :
>>> cannot open URL
>>> 'http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/chromInfo.txt.gz'
>>> In addition: There were 50 or more warnings (use warnings() to see the first
>>> 50)
>> [snip]
>>
>> Strange ... I also get the same warnings you get (the "cds cumulative
>> length is not a multiple of 3") for some transcripts, but I think this
>> is something beyond our control. I don't get any error(s) when
>> downloading and building the TxDB, so it completes fine for me.
>>
>> I'm actually running the *-devel versions of the bioc packages w/
>> R-2.15.x so it's not very easy for me to check the current released
>> GenomicFeatures package, but I'd be a bit surprised if the error is
>> there.
>>
>> Could you paste the output of `sessionInfo()` after you call
>> `library(GenomicFeatures)` when running your new R-2.15.x install?
>>
>> -steve
>>
>>
>> --
>> Steve Lianoglou
>> Graduate Student: Computational Systems Biology
>> | Memorial Sloan-Kettering Cancer Center
>> | Weill Medical College of Cornell University
>> Contact Info: http://cbio.mskcc.org/~lianos/contact
> DI Stefanie Tauber
>
> Center for Integrative Bioinformatics Vienna (CIBIV)
> (CIBIV is a joint institute of Vienna University, Medical University, and University of Veterinary Medicine, Vienna, Austria)
> Max F. Perutz Laboratories (MFPL)
> Campus Vienna Biocenter 5 (VBC5), Ebene 1, Room 1812.2
> Dr. Bohr Gasse 9
> A-1030 Wien, Austria
> Phone: ++43 +1 / 42772-4030
> Fax: ++43 +1 / 42772-4098
> email: stefanie.tauber at univie.ac.at
> www.cibiv.at
>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list