[BioC] nearest() for GRanges
Cook, Malcolm
MEC at stowers.org
Thu Jun 21 02:20:00 CEST 2012
Hi Valerie,
Very glad you found and fixed the root cause.
I don't know the overhead it would take for you, but, this being a
regression, might you consider fixing in Bioconductor 2.10 as, say
GenomicRanges_1.8.
Thanks for your consideration,
Malcolm
On 6/20/12 3:13 PM, "Valerie Obenchain" <vobencha at fhcrc.org> wrote:
>Hi Oleg, Malcom,
>
>Thanks for the bug report. This is now fixed in devel 1.9.28. Over the
>past months we've done an overhaul of the precede/follow code in devel.
>The new nearest method is based on the new precede and follow and is
>documented at
>
>?'nearest,GenomicRanges,GenomicRanges-method'
>
>Let me know if you run into problems.
>
>Valerie
>
>
>
>On 06/18/2012 02:25 PM, Cook, Malcolm wrote:
>> Martin, Oleg, Val, all,
>>
>> I too have script logic that benefitted from and depends upon what the
>> behavior of nearest,GenomicRanges,missing as reported by Oleg.
>>
>> Thanks for the unit tests Martin.
>>
>> If it helps in sleuthing, in my hands, the 3rd test used to pass (if my
>> memory serves), but does not pass now, as the attached transcript shows.
>>
>> Hoping it helps find a speedy resolution to this issue,
>>
>> ~ Malcolm Cook
>>
>>
>>
>>> r<- IRanges(c(1,5,10), c(2,7,12))
>>> g<- GRanges("chr1", r, "+")
>>> checkEquals(precede(r), precede(g))
>> [1] TRUE
>>> checkEquals(follow(r), follow(g))
>> [1] TRUE
>>> try(checkEquals(nearest(r), nearest(g)))
>> Error in checkEquals(nearest(r), nearest(g)) :
>> Mean relative difference: 0.6
>>
>>
>>> sessionInfo()
>> R version 2.15.0 (2012-03-30)
>> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>>
>> locale:
>> [1] C
>>
>> attached base packages:
>> [1] tools splines parallel stats graphics grDevices utils
>> datasets methods base
>>
>> other attached packages:
>> [1] RUnit_0.4.26 log4r_0.1-4 vwr_0.1
>> RecordLinkage_0.4-1 ffbase_0.5 ff_2.2-7
>> bit_1.1-8 evd_2.2-6 ipred_0.8-13
>> prodlim_1.3.1 KernSmooth_2.23-7 nnet_7.3-1
>> survival_2.36-14 mlbench_2.1-0 MASS_7.3-18
>> ada_2.0-2 rpart_3.1-53 e1071_1.6
>> class_7.3-3 XLConnect_0.1-9 XLConnectJars_0.1-4
>> rJava_0.9-3 latticeExtra_0.6-19 RColorBrewer_1.0-5
>> lattice_0.20-6 doMC_1.2.5 multicore_0.1-7
>> [28] BSgenome_1.24.0 rtracklayer_1.16.1 Rsamtools_1.8.5
>> Biostrings_2.24.1 GenomicFeatures_1.8.1 AnnotationDbi_1.18.1
>> GenomicRanges_1.8.6 IRanges_1.14.3 Biobase_2.16.0
>> BiocGenerics_0.2.0 data.table_1.8.0 compare_0.2-3
>> svUnit_0.7-10 doParallel_1.0.1 iterators_1.0.6
>> foreach_1.4.0 ggplot2_0.9.1 sqldf_0.4-6.4
>> RSQLite.extfuns_0.0.1 RSQLite_0.11.1 chron_2.3-42
>> gsubfn_0.6-3 proto_0.3-9.2 DBI_0.2-5
>> functional_0.1 reshape_0.8.4 plyr_1.7.1
>> [55] stringr_0.6 gtools_2.6.2
>>
>> loaded via a namespace (and not attached):
>> [1] RCurl_1.91-1 XML_3.9-4 biomaRt_2.12.0 bitops_1.0-4.1
>> codetools_0.2-8 colorspace_1.1-1 compiler_2.15.0 dichromat_1.2-4
>> digest_0.5.2 grid_2.15.0 labeling_0.1 memoise_0.1
>> munsell_0.3 reshape2_1.2.1 scales_0.2.1 stats4_2.15.0
>> tcltk_2.15.0 zlibbioc_1.2.0
>>
>>
>>
>>
>>
>>
>> On 6/18/12 2:39 PM, "Martin Morgan"<mtmorgan at fhcrc.org> wrote:
>>
>>> Hi Oleg --
>>>
>>> On 06/17/2012 11:11 PM, Oleg Mayba wrote:
>>>> Hi,
>>>>
>>>> I just noticed that a piece of logic I was relying on with GRanges
>>>> before
>>>> does not seem to work anymore. Namely, I expect the behavior of
>>>> nearest()
>>>> with a single GRanges object as an argument to be the same as that of
>>>> IRanges (example below), but it's not anymore. I expect nearest(GR1)
>>>> NOT
>>>> to behave trivially but to return the closest range OTHER than the
>>>>range
>>>> itself. I could swear that was the case before, but isn't any longer:
>>> I think you're right that there is an inconsistency here; Val will
>>> likely help clarify in a day or so. My two cents...
>>>
>>> I think, certainly, that GRanges on a single chromosome on the "+"
>>> strand should behave like an IRanges
>>>
>>> library(GenomicRanges)
>>> library(RUnit)
>>>
>>> r<- IRanges(c(1,5,10), c(2,7,12))
>>> g<- GRanges("chr1", r, "+")
>>>
>>> ## first two ok, third should work but fails
>>> checkEquals(precede(r), precede(g))
>>> checkEquals(follow(r), follow(g))
>>> try(checkEquals(nearest(r), nearest(g)))
>>>
>>> Also, on the "-" strand I think we're expecting
>>>
>>> g<- GRanges("chr1", r, "-")
>>>
>>> ## first two ok, third should work but fails
>>> checkEquals(follow(r), precede(g))
>>> checkEquals(precede(r), follow(g))
>>> try(checkEquals(nearest(r), nearest(g)))
>>>
>>> For "*" (which was your example) I think the situation is (a) different
>>> in devel than in release; and (b) not so clear. In devel, "*" is (from
>>> method?"nearest,GenomicRanges,missing") "x on '*' strand can match to
>>> ranges on any of ''+'', ''-'' or ''*''" and in particular I think these
>>> are always true:
>>>
>>> checkEquals(precede(g), follow(g))
>>> checkEquals(nearest(r), follow(g))
>>>
>>> I would also expect
>>>
>>> try(checkEquals(nearest(g), follow(g)))
>>>
>>> though this seems not to be the case. In 'release', "*" is coereced and
>>> behaves as if on the "+" strand (I think).
>>>
>>> Martin
>>>
>>>>> z=IRanges(start=c(1,5,10), end=c(2,7,12))
>>>>> z
>>>> IRanges of length 3
>>>> start end width
>>>> [1] 1 2 2
>>>> [2] 5 7 3
>>>> [3] 10 12 3
>>>>> nearest(z)
>>>> [1] 2 1 2
>>>>> z=GRanges(seqnames=rep('chr1',3),ranges=IRanges(start=c(1,5,10),
>>>> end=c(2,7,12)))
>>>>> z
>>>> GRanges with 3 ranges and 0 elementMetadata cols:
>>>> seqnames ranges strand
>>>> <Rle> <IRanges> <Rle>
>>>> [1] chr1 [ 1, 2] *
>>>> [2] chr1 [ 5, 7] *
>>>> [3] chr1 [10, 12] *
>>>> ---
>>>> seqlengths:
>>>> chr1
>>>> NA
>>>>> nearest(z)
>>>> [1] 1 2 3
>>>>> sessionInfo()
>>>> R version 2.15.0 (2012-03-30)
>>>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>>>
>>>> locale:
>>>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>>>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>>>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
>>>> [7] LC_PAPER=C LC_NAME=C
>>>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>>
>>>> attached base packages:
>>>> [1] datasets utils grDevices graphics stats methods base
>>>>
>>>> other attached packages:
>>>> [1] GenomicRanges_1.8.6 IRanges_1.14.3 BiocGenerics_0.2.0
>>>>
>>>> loaded via a namespace (and not attached):
>>>> [1] stats4_2.15.0
>>>>
>>>>
>>>> I want the IRanges behavior and not what seems currently to be the
>>>> GRanges
>>>> behavior, since I have some code that depends on it. Is there a quick
>>>> way
>>>> to make nearest() do that for me again?
>>>>
>>>> Thanks!
>>>>
>>>> Oleg.
>>>>
>>>> [[alternative HTML version deleted]]
>>>>
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives:
>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>> --
>>> Computational Biology / Fred Hutchinson Cancer Research Center
>>> 1100 Fairview Ave. N.
>>> PO Box 19024 Seattle, WA 98109
>>>
>>> Location: Arnold Building M1 B861
>>> Phone: (206) 667-2793
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>>http://news.gmane.org/gmane.science.biology.informatics.conductor
>
More information about the Bioconductor
mailing list