[Bioc-sig-seq] RangedData versus GenomicRanges/GRanges
Ivan Gregoretti
ivangreg at gmail.com
Fri Oct 29 05:54:40 CEST 2010
Hello Janet,
It is a rare pleasure to have the opportunity to enlighten somebody
from the Fred Hutchinson Cancer Research Center about R functionality.
The bottom line is this: GenomicRanges is much more biology-awared
than the generic RangedData class.
GenomicRanges natively stores a strand value per feature. RangedData
does not, unless you create it. GenomicRanges' strand values are very
intuitive: +, -, and *.
GenomicRanges "rows" can be ordered by any "column" even if it ends up
dis-ordering the chromosomes. RangedData can only order features
within each space.
GenomicRanges can store the complete list of chromosomes and their
corresponding sizes four your particular organism. You can create a
GenomicRanges instance out of a RangedData without providing
explicitly the list of chromosomes and their sizes. Just do
library(GenomicRanges)
my_gr <- as(my_rd,"GRanges")
The list of chromosomes is gathered on the fly from the features. The
list chromosome lengths still has to be assigned manually, which is
fine.
Nowadays you can rtracklayer::import() BED directly as GenomicRanges.
Importing large BED into either GenomicRanges or RangedData is, in my
experience, equally slow. There is no difference there.
Why not forgetting RangedData then? The advantage over GenomicRanges
is, also in my experience, that it accepts features mapped beyond the
limits of chromosomes. The most unforgiving example is mitochondrial
DNA. Because it is circular, it naturally gets sequencing reads with
"starts" that are numerically larger than it "ends".
In high throughput sequencing I still use RangedData when
1) I do not care about relatively few misbehaving reads
2) I need my script to run without errors from GenomicRanges sanity check.
For everyday high throughput sequencing I use GenomicRanges keeping
the chromosome lengths unassigned. It could be called a hybrid.
I hope this helps.
Ivan
Ivan Gregoretti, PhD
National Institute of Diabetes and Digestive and Kidney Diseases
National Institutes of Health
5 Memorial Dr, Building 5, Room 205.
Bethesda, MD 20892. USA.
Phone: 1-301-496-1016 and 1-301-496-1592
Fax: 1-301-496-9878
On Thu, Oct 28, 2010 at 9:25 PM, Janet Young <jayoung at fhcrc.org> wrote:
> Hi,
>
> I've been on a long long vacation, so I'm a bit more out of the loop than I
> usually am.
>
> I've been using RangedData a lot in my code until now to represent sets of
> genomic regions spread over multiple chromosomes, and I've just realized
> that GenomicRanges has a lot of the same characteristics.
>
> I wanted to ask you all
> - whether RangedData and GenomicRanges are pretty much equivalent, or if
> there are functions that exist for one but not the other?
> - whether I can use pretty much the same code and functions if I switch
> everything over to use GenomicRanges?
> - are there subtle differences I should be careful of if I make the switch?
>
> thanks very much,
>
> Janet Young
>
>
> -------------------------------------------------------------------
>
> Dr. Janet Young (Trask lab)
>
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Avenue N., C3-168,
> P.O. Box 19024, Seattle, WA 98109-1024, USA.
>
> tel: (206) 667 1471 fax: (206) 667 6524
> email: jayoung ...at... fhcrc.org
>
> http://www.fhcrc.org/labs/trask/
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>
More information about the Bioc-sig-sequencing
mailing list