[BioC] nearest() for GRanges
Martin Morgan
mtmorgan at fhcrc.org
Mon Jun 18 21:39:19 CEST 2012
Hi Oleg --
On 06/17/2012 11:11 PM, Oleg Mayba wrote:
> Hi,
>
> I just noticed that a piece of logic I was relying on with GRanges before
> does not seem to work anymore. Namely, I expect the behavior of nearest()
> with a single GRanges object as an argument to be the same as that of
> IRanges (example below), but it's not anymore. I expect nearest(GR1) NOT
> to behave trivially but to return the closest range OTHER than the range
> itself. I could swear that was the case before, but isn't any longer:
I think you're right that there is an inconsistency here; Val will
likely help clarify in a day or so. My two cents...
I think, certainly, that GRanges on a single chromosome on the "+"
strand should behave like an IRanges
library(GenomicRanges)
library(RUnit)
r <- IRanges(c(1,5,10), c(2,7,12))
g <- GRanges("chr1", r, "+")
## first two ok, third should work but fails
checkEquals(precede(r), precede(g))
checkEquals(follow(r), follow(g))
try(checkEquals(nearest(r), nearest(g)))
Also, on the "-" strand I think we're expecting
g <- GRanges("chr1", r, "-")
## first two ok, third should work but fails
checkEquals(follow(r), precede(g))
checkEquals(precede(r), follow(g))
try(checkEquals(nearest(r), nearest(g)))
For "*" (which was your example) I think the situation is (a) different
in devel than in release; and (b) not so clear. In devel, "*" is (from
method?"nearest,GenomicRanges,missing") "x on '*' strand can match to
ranges on any of ''+'', ''-'' or ''*''" and in particular I think these
are always true:
checkEquals(precede(g), follow(g))
checkEquals(nearest(r), follow(g))
I would also expect
try(checkEquals(nearest(g), follow(g)))
though this seems not to be the case. In 'release', "*" is coereced and
behaves as if on the "+" strand (I think).
Martin
>
>> z=IRanges(start=c(1,5,10), end=c(2,7,12))
>> z
> IRanges of length 3
> start end width
> [1] 1 2 2
> [2] 5 7 3
> [3] 10 12 3
>> nearest(z)
> [1] 2 1 2
>>
>> z=GRanges(seqnames=rep('chr1',3),ranges=IRanges(start=c(1,5,10),
> end=c(2,7,12)))
>> z
> GRanges with 3 ranges and 0 elementMetadata cols:
> seqnames ranges strand
> <Rle> <IRanges> <Rle>
> [1] chr1 [ 1, 2] *
> [2] chr1 [ 5, 7] *
> [3] chr1 [10, 12] *
> ---
> seqlengths:
> chr1
> NA
>> nearest(z)
> [1] 1 2 3
>>
>> sessionInfo()
> R version 2.15.0 (2012-03-30)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=C LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] datasets utils grDevices graphics stats methods base
>
> other attached packages:
> [1] GenomicRanges_1.8.6 IRanges_1.14.3 BiocGenerics_0.2.0
>
> loaded via a namespace (and not attached):
> [1] stats4_2.15.0
>>
>
>
>
> I want the IRanges behavior and not what seems currently to be the GRanges
> behavior, since I have some code that depends on it. Is there a quick way
> to make nearest() do that for me again?
>
> Thanks!
>
> Oleg.
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
More information about the Bioconductor
mailing list