[BioC] nearest() for GRanges

Martin Morgan mtmorgan at fhcrc.org
Mon Jun 18 21:39:19 CEST 2012


Hi Oleg --

On 06/17/2012 11:11 PM, Oleg Mayba wrote:
> Hi,
>
> I just noticed that a piece of logic I was relying on with GRanges before
> does not seem to work anymore.  Namely, I expect the behavior of nearest()
> with a single GRanges object as an argument to be the same as that of
> IRanges (example below), but it's not anymore.  I expect nearest(GR1) NOT
> to behave trivially but to return the closest range OTHER than the range
> itself.  I could swear that was the case before, but isn't any longer:

I think you're right that there is an inconsistency here; Val will 
likely help clarify in a day or so. My two cents...

I think, certainly, that GRanges on a single chromosome on the "+" 
strand should behave like an IRanges

   library(GenomicRanges)
   library(RUnit)

   r <- IRanges(c(1,5,10), c(2,7,12))
   g <- GRanges("chr1", r, "+")

   ## first two ok, third should work but fails
   checkEquals(precede(r), precede(g))
   checkEquals(follow(r), follow(g))
   try(checkEquals(nearest(r), nearest(g)))

Also, on the "-" strand I think we're expecting

   g <- GRanges("chr1", r, "-")

   ## first two ok, third should work but fails
   checkEquals(follow(r), precede(g))
   checkEquals(precede(r), follow(g))
   try(checkEquals(nearest(r), nearest(g)))

For "*" (which was your example) I think the situation is (a) different 
in devel than in release; and (b) not so clear. In devel, "*" is (from 
method?"nearest,GenomicRanges,missing") "x on '*' strand can match to 
ranges on any of ''+'', ''-'' or ''*''" and in particular I think these 
are always true:

   checkEquals(precede(g), follow(g))
   checkEquals(nearest(r), follow(g))

I would also expect

   try(checkEquals(nearest(g), follow(g)))

though this seems not to be the case. In 'release', "*" is coereced and 
behaves as if on the "+" strand (I think).

Martin

>
>> z=IRanges(start=c(1,5,10), end=c(2,7,12))
>> z
> IRanges of length 3
>      start end width
> [1]     1   2     2
> [2]     5   7     3
> [3]    10  12     3
>> nearest(z)
> [1] 2 1 2
>>
>> z=GRanges(seqnames=rep('chr1',3),ranges=IRanges(start=c(1,5,10),
> end=c(2,7,12)))
>> z
> GRanges with 3 ranges and 0 elementMetadata cols:
>        seqnames    ranges strand
>           <Rle>  <IRanges>   <Rle>
>    [1]     chr1  [ 1,  2]      *
>    [2]     chr1  [ 5,  7]      *
>    [3]     chr1  [10, 12]      *
>    ---
>    seqlengths:
>     chr1
>       NA
>> nearest(z)
> [1] 1 2 3
>>
>> sessionInfo()
> R version 2.15.0 (2012-03-30)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>   [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>   [7] LC_PAPER=C                 LC_NAME=C
>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] datasets  utils     grDevices graphics  stats     methods   base
>
> other attached packages:
> [1] GenomicRanges_1.8.6 IRanges_1.14.3      BiocGenerics_0.2.0
>
> loaded via a namespace (and not attached):
> [1] stats4_2.15.0
>>
>
>
>
> I want the IRanges behavior and not what seems currently to be the GRanges
> behavior, since I have some code that depends on it. Is there a quick way
> to make nearest() do that for me again?
>
> Thanks!
>
> Oleg.
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor


-- 
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioconductor mailing list