[BioC] nearest() for GRanges

Valerie Obenchain vobencha at fhcrc.org
Wed Jun 20 22:13:27 CEST 2012


Hi Oleg, Malcom,

Thanks for the bug report. This is now fixed in devel 1.9.28.  Over the 
past months we've done an overhaul of the precede/follow code in devel. 
The new nearest method is based on the new precede and follow and is 
documented at

?'nearest,GenomicRanges,GenomicRanges-method'

Let me know if you run into problems.

Valerie



On 06/18/2012 02:25 PM, Cook, Malcolm wrote:
> Martin, Oleg, Val, all,
>
> I too have script logic that benefitted from and depends upon what the
> behavior of nearest,GenomicRanges,missing as reported by Oleg.
>
> Thanks for the unit tests Martin.
>
> If it helps in sleuthing, in my hands, the 3rd test used to pass (if my
> memory serves), but does not pass now, as the attached transcript shows.
>
> Hoping it helps find a speedy resolution to this issue,
>
> ~ Malcolm Cook
>
>
>
>>   r<- IRanges(c(1,5,10), c(2,7,12))
>>   g<- GRanges("chr1", r, "+")
>>   checkEquals(precede(r), precede(g))
> [1] TRUE
>>    checkEquals(follow(r), follow(g))
> [1] TRUE
>>   try(checkEquals(nearest(r), nearest(g)))
> Error in checkEquals(nearest(r), nearest(g)) :
>    Mean relative difference: 0.6
>
>
>> sessionInfo()
> R version 2.15.0 (2012-03-30)
> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>
> locale:
> [1] C
>
> attached base packages:
>   [1] tools     splines   parallel  stats     graphics  grDevices utils
> datasets  methods   base
>
> other attached packages:
>   [1] RUnit_0.4.26          log4r_0.1-4           vwr_0.1
> RecordLinkage_0.4-1   ffbase_0.5            ff_2.2-7
> bit_1.1-8             evd_2.2-6             ipred_0.8-13
> prodlim_1.3.1         KernSmooth_2.23-7     nnet_7.3-1
> survival_2.36-14      mlbench_2.1-0         MASS_7.3-18
> ada_2.0-2             rpart_3.1-53          e1071_1.6
> class_7.3-3           XLConnect_0.1-9       XLConnectJars_0.1-4
> rJava_0.9-3           latticeExtra_0.6-19   RColorBrewer_1.0-5
> lattice_0.20-6        doMC_1.2.5            multicore_0.1-7
> [28] BSgenome_1.24.0       rtracklayer_1.16.1    Rsamtools_1.8.5
> Biostrings_2.24.1     GenomicFeatures_1.8.1 AnnotationDbi_1.18.1
> GenomicRanges_1.8.6   IRanges_1.14.3        Biobase_2.16.0
> BiocGenerics_0.2.0    data.table_1.8.0      compare_0.2-3
> svUnit_0.7-10         doParallel_1.0.1      iterators_1.0.6
> foreach_1.4.0         ggplot2_0.9.1         sqldf_0.4-6.4
> RSQLite.extfuns_0.0.1 RSQLite_0.11.1        chron_2.3-42
> gsubfn_0.6-3          proto_0.3-9.2         DBI_0.2-5
> functional_0.1        reshape_0.8.4         plyr_1.7.1
> [55] stringr_0.6           gtools_2.6.2
>
> loaded via a namespace (and not attached):
>   [1] RCurl_1.91-1     XML_3.9-4        biomaRt_2.12.0   bitops_1.0-4.1
> codetools_0.2-8  colorspace_1.1-1 compiler_2.15.0  dichromat_1.2-4
> digest_0.5.2     grid_2.15.0      labeling_0.1     memoise_0.1
> munsell_0.3      reshape2_1.2.1   scales_0.2.1     stats4_2.15.0
> tcltk_2.15.0     zlibbioc_1.2.0
>
>
>
>
>
>
> On 6/18/12 2:39 PM, "Martin Morgan"<mtmorgan at fhcrc.org>  wrote:
>
>> Hi Oleg --
>>
>> On 06/17/2012 11:11 PM, Oleg Mayba wrote:
>>> Hi,
>>>
>>> I just noticed that a piece of logic I was relying on with GRanges
>>> before
>>> does not seem to work anymore.  Namely, I expect the behavior of
>>> nearest()
>>> with a single GRanges object as an argument to be the same as that of
>>> IRanges (example below), but it's not anymore.  I expect nearest(GR1)
>>> NOT
>>> to behave trivially but to return the closest range OTHER than the range
>>> itself.  I could swear that was the case before, but isn't any longer:
>> I think you're right that there is an inconsistency here; Val will
>> likely help clarify in a day or so. My two cents...
>>
>> I think, certainly, that GRanges on a single chromosome on the "+"
>> strand should behave like an IRanges
>>
>>    library(GenomicRanges)
>>    library(RUnit)
>>
>>    r<- IRanges(c(1,5,10), c(2,7,12))
>>    g<- GRanges("chr1", r, "+")
>>
>>    ## first two ok, third should work but fails
>>    checkEquals(precede(r), precede(g))
>>    checkEquals(follow(r), follow(g))
>>    try(checkEquals(nearest(r), nearest(g)))
>>
>> Also, on the "-" strand I think we're expecting
>>
>>    g<- GRanges("chr1", r, "-")
>>
>>    ## first two ok, third should work but fails
>>    checkEquals(follow(r), precede(g))
>>    checkEquals(precede(r), follow(g))
>>    try(checkEquals(nearest(r), nearest(g)))
>>
>> For "*" (which was your example) I think the situation is (a) different
>> in devel than in release; and (b) not so clear. In devel, "*" is (from
>> method?"nearest,GenomicRanges,missing") "x on '*' strand can match to
>> ranges on any of ''+'', ''-'' or ''*''" and in particular I think these
>> are always true:
>>
>>    checkEquals(precede(g), follow(g))
>>    checkEquals(nearest(r), follow(g))
>>
>> I would also expect
>>
>>    try(checkEquals(nearest(g), follow(g)))
>>
>> though this seems not to be the case. In 'release', "*" is coereced and
>> behaves as if on the "+" strand (I think).
>>
>> Martin
>>
>>>> z=IRanges(start=c(1,5,10), end=c(2,7,12))
>>>> z
>>> IRanges of length 3
>>>       start end width
>>> [1]     1   2     2
>>> [2]     5   7     3
>>> [3]    10  12     3
>>>> nearest(z)
>>> [1] 2 1 2
>>>> z=GRanges(seqnames=rep('chr1',3),ranges=IRanges(start=c(1,5,10),
>>> end=c(2,7,12)))
>>>> z
>>> GRanges with 3 ranges and 0 elementMetadata cols:
>>>         seqnames    ranges strand
>>>            <Rle>   <IRanges>    <Rle>
>>>     [1]     chr1  [ 1,  2]      *
>>>     [2]     chr1  [ 5,  7]      *
>>>     [3]     chr1  [10, 12]      *
>>>     ---
>>>     seqlengths:
>>>      chr1
>>>        NA
>>>> nearest(z)
>>> [1] 1 2 3
>>>> sessionInfo()
>>> R version 2.15.0 (2012-03-30)
>>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>>
>>> locale:
>>>    [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>>    [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>>    [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>>>    [7] LC_PAPER=C                 LC_NAME=C
>>>    [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>
>>> attached base packages:
>>> [1] datasets  utils     grDevices graphics  stats     methods   base
>>>
>>> other attached packages:
>>> [1] GenomicRanges_1.8.6 IRanges_1.14.3      BiocGenerics_0.2.0
>>>
>>> loaded via a namespace (and not attached):
>>> [1] stats4_2.15.0
>>>
>>>
>>> I want the IRanges behavior and not what seems currently to be the
>>> GRanges
>>> behavior, since I have some code that depends on it. Is there a quick
>>> way
>>> to make nearest() do that for me again?
>>>
>>> Thanks!
>>>
>>> Oleg.
>>>
>>> 	[[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>> -- 
>> Computational Biology / Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N.
>> PO Box 19024 Seattle, WA 98109
>>
>> Location: Arnold Building M1 B861
>> Phone: (206) 667-2793
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list