[BioC] Create transcriptDb using gff3 files? - library GenomicFeatures and rtracklayer
Nicolas Delhomme
delhomme at embl.de
Thu Apr 5 17:51:02 CEST 2012
Hi Malcom,
Thanks for the clarification,
Nico
On 5 Apr 2012, at 17:41, Cook, Malcolm wrote:
>> Hi all,
>>
>> Sorry I haven't read the whole thread, still I have a few comments that might
>> be off the main topic then.
>>
>> On 5 Apr 2012, at 17:01, Cook, Malcolm wrote:
>>
>>> Supporting both Ensemble's GTF and GFF3 would be ideal.
>>>
>>> Ensembl GTF would open up many genomes, including those in:
>>> ftp://ftp.ensembl.org/pub/release-66/gtf/
>>> ftp://ftp.ensemblgenomes.org/pub/metazoa/release-13/gtf/
>>> ftp://ftp.ensemblgenomes.org/pub/fungi/release-13/gtf/
>>> ftp://ftp.ensemblgenomes.org/pub/protists/release-13/gtf/
>>> ftp://ftp.ensemblgenomes.org/pub/plants/release-13/gtf/
>>>
>>>
>>> Supporting Ensembl GTF would make it easy to distribute/archive the
>> elements of a transcriptome analysis alongside a project/analysis in a
>> generally useful format (i.e. IGV and other tools can work with it more or less
>> directly)
>>
>> In my package easyRNASeq, I already load Ensembl GTF files and convert
>> them into GRanges / RangedData object. It's pretty straightforward. I guess
>> that adapting the code to create a transcriptDb should be do-able.
>>
>>>
>>> Related note, I have learned that the BioMarts produced for
>> EnsemblGenome's are NOT ARCHIVED, whereas it seems that historic GTF IS
>> available. Upshot: you'd best not depend upon being able to reproduce
>> today's TranscriptDbFromBiomart tomorrow.
>>
>> I don't know where you learned that and how you meant it exactly, but using
>> biomaRt, you can still access Ensembl version as old as of march 2009: see
>> http://mar2009.archive.ensembl.org/index.html.
>
> I learned it via an email exchange with Ensembl Genomes support
>
> Hello Malcolm,
> No, I am afraid that for Ensembl Genomes we don't make older versions available through an Archive! site, like we do for Ensembl.
> --
> With kind regards,
> Bert Overduin, Ph.D.
> (Ensembl Helpdesk)
>
> I realize this refers to the Ensembl Genomes web site, not the BioMart per se, however I'm pretty sure it extends.
>
> Note, EnsemblGenomes sites do NOT have the same archive policy as the main Ensembl site.
>
> I would like to be able to more clearly refer to this distinction via an on-line policy document, or some such, and would welcome a reference if there is one to be had.....
>
>> It's not straightforward to
>> figure it out, but on the main Ensembl webpage, you can get the full list by
>> clicking the "view in archive site" link at the bottom left of the papge. It
>> redirects to this URL: http://www.ensembl.org/Help/ArchiveList.
>> Then, to use biomaRt on a given archive, you need to change the host
>> argument of useMart to the URL of the corresponding Ensembl archive as in:
>> useMart("ENSEMBL_MART_ENSEMBL",host="mar2009.archive.ensembl.org"
>> ). I recon that the biomaRT archive arguments does not work for that. I need
>> to post something about this on the mailing list.
>
>
>
>>
>>>
>>> re: "typical gff3 files"...
>>> Flybase makes gff3 extracts and if my understanding is correct, have been
>> diligent in "getting it right"
>>
>> I believe so too. Again, in easyRNASeq, I do parse Flybase gff3 files and
>> convert them to GRanges/RangedData object, but all the merit goes to the
>> readGff3 function from the genomeIntervals package. Reading a gff3 file
>> with this function is extremely quick as is accessing the gffAttributes
>> (performed at the C layer) .
>>
>> Cheers,
>>
>> Nico
>>
>>>
>>> Also, NCBI historically has tried to provide GFFx extracts, with oodles of
>> caveats.
>>> But, but, Last month they announced progress on improving their GFF3
>> offerings: http://bio.perl.org/pipermail/bioperl-l/2012-March/036387.html
>>> Example: ftp://ftp.ncbi.nlm.nih.gov/genomes/H_sapiens/GFF/
>>> YMMV.
>>>
>>> I too once hoped to find makeTranscriptDbFromGFF3 capability so as to
>> allow easy tracking the head of Flybase's offerings, i.e.
>> ftp://ftp.flybase.net/genomes/Drosophila_melanogaster/dmel_r5.44_FB201
>> 2_02/gff/ - alas I too have not followed up.
>>>
>>> ~Malcolm
>>>
>>>
>>>> -----Original Message-----
>>>> From: bioconductor-bounces at r-project.org [mailto:bioconductor-
>>>> bounces at r-project.org] On Behalf Of Marc Carlson
>>>> Sent: Wednesday, April 04, 2012 7:44 PM
>>>> To: bioconductor at r-project.org
>>>> Subject: Re: [BioC] Create transcriptDb using gff3 files? - library
>>>> GenomicFeatures and rtracklayer
>>>>
>>>> I was looking at this during the course, and this is on my TODO list for
>>>> the next release cycle. I think it is long overdue and I don't think
>>>> that the community is going to get it done in spite of all the
>>>> enthusiasm. There has not been time to do it before now but I am hoping
>>>> that will now change. It should be simple enough in principle, but it
>>>> might not be exactly trivial as I have discovered (on closer inspection)
>>>> that the gff specification is not as concrete as one would like it to
>>>> be. Also there have been several different versions.
>>>>
>>>> Some things that can help speed me along:
>>>>
>>>> 1) which version is most important? gff3? Or one of the other
>>>> versions? It is likely that with the older versions we may not be able
>>>> to extract as much meaningful information.
>>>>
>>>> 2) where is the best place to find some typical gff3 files for
>>>> examples? This should not be difficult, but when I was looking before I
>>>> was finding that people were surprisingly stingy about sharing these.
>>>>
>>>>
>>>> Marc
>>>>
>>>>
>>>>
>>>> On 04/03/2012 03:57 PM, Michael Lawrence wrote:
>>>>> Marc was working on this during the course in Feb. Not sure what
>>>> happened
>>>>> to it. He said it was simple. Maybe just waiting for the release to pass.
>>>>>
>>>>> Michael
>>>>>
>>>>> On Tue, Apr 3, 2012 at 3:40 PM, Steve Lianoglou<
>>>>> mailinglist.honeypot at gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> On Tue, Apr 3, 2012 at 4:41 PM, Sang Chul Choi<schoi at cornell.edu>
>>>> wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> I am wondering if I could create a TranscriptDb object (library
>>>>>> GenomicFeatures) using a gff3 file. I could read a gff3 file using
>>>>>> import.gff3, but I could not find a way to create TranscriptDb object
>> from
>>>>>> the object from import.gff3.
>>>>>>> Two arguments for makeTranscriptDb are required: transcripts,
>> splicings.
>>>>>> It does not seem to be easy to parse this information from the object
>>>> form
>>>>>> import.gff3. I will appreciate any help.
>>>>>>
>>>>>> As far as I know, this functionality isn't there yet ...
>>>>>>
>>>>>> I once (early feb, 2012) suggested I might take a crack at making this
>>>>>> happen but haven't actually found the time to do it ... I'm not sure
>>>>>> anyone in bioc-core land (hi, Marc) has found the time to do it
>>>>>> either, so I think you're out of luck.
>>>>>>
>>>>>> Sorry for that. But the good news is that I bet a patch that does this
>>>>>> would be welcome ;-)
>>>>>>
>>>>>> -steve
>>>>>>
>>>>>> --
>>>>>> Steve Lianoglou
>>>>>> Graduate Student: Computational Systems Biology
>>>>>> | Memorial Sloan-Kettering Cancer Center
>>>>>> | Weill Medical College of Cornell University
>>>>>> Contact Info: http://cbio.mskcc.org/~lianos/contact
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioconductor mailing list
>>>>>> Bioconductor at r-project.org
>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>> Search the archives:
>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>>
>>>>> [[alternative HTML version deleted]]
>>>>>
>>>>> _______________________________________________
>>>>> Bioconductor mailing list
>>>>> Bioconductor at r-project.org
>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>> Search the archives:
>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives:
>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
More information about the Bioconductor
mailing list