[Bioc-sig-seq] coordinates 0-indexed or 1-indexed in IRanges?
Hans-Rudolf Hotz
hrh at fmi.ch
Tue Apr 13 15:02:23 CEST 2010
On 04/13/2010 12:35 PM, margherita mutarelli wrote:
> Dear all,
>
> please apologize if I missed this information, but I have looked throughout
> the documentation and vignettes of the IRanges packages and I could not find
> this information:
>
> are the coordinates in IRanges objects considered as "0-indexed" or
> "1-indexed"?
>
> I.e. when importing the refGene.txt table (or any) from UCSC, we know that
> they are 0-indexed, meaning that the first base is not part of the
> gene/transcript/object.
> If IRanges are 1-index this means we have to subtract 1 from the start
> coordinate precedent in the table when creating an IRanges object from them.
>
> Is it correct?
Hi Margherita
this topic always causes problems. As far as I understand the situation,
you have to add 1 to the start of the coordinates you have downloaded (I
assume a BED files) from UCSC.
Let me try and explain with a simple example:
we have two features ranging from 1 to 5 and 5 to 10. We can create
simple IRanges objects:
> f1 <- IRanges(c(1), c(5))
> f2 <- IRanges(c(5), c(10))
>
> f1
IRanges of length 1
start end width
[1] 1 5 5
> f2
IRanges of length 1
start end width
[1] 5 10 6
>
and of course, they do overlap:
> findOverlaps(f1,f2)
An object of class “RangesMatching”
Slot "matchMatrix":
query subject
[1,] 1 1
Slot "DIM":
[1] 1 1
>
Now let's assume we got these numbers from UCSC as part of a BED file
for S. cerevisiae, chromosome 11:
chrXI 1 5
chrXI 5 10
BED files are '0-based' and 'end exclusive' (see:
http://genome.ucsc.edu/FAQ/FAQformat.html#format1
on the chromosome (with a '0-based' notation) this would look like
0 1 2 3 4 5 6 7 8 9 10
C A C C A C A C C C A
f1 * * * *
f2 * * * * *
=> they don't overlap!
play with the 'upload custom track' (using the small BED file from
above) tool on the UCSC genome browser in case this is stil confusing
Now back to IRanges (which are '1-based' and 'end inclusive')
1 2 3 4 5 6 7 8 9 10
C A C C A C A C C C A
f1 * * * *
f2 * * * * *
our new numbers are: 2 to 5 and 6 to 10 (which corresponds to adding 1
to the start before we create the IRanges object)
> ff1 <- IRanges(c(2), c(5))
> ff2 <- IRanges(c(6), c(10))
> findOverlaps(ff1,ff2)
An object of class “RangesMatching”
Slot "matchMatrix":
query subject
Slot "DIM":
[1] 1 1
>
=> they don't overlap.
I hope this helps
Hans
> This can be important to clarify, both when considering overlap of features
> and in junctions, since it can shift the correct exon boundaries.
>
> Cheers,
>
> Margherita
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
More information about the Bioc-sig-sequencing
mailing list