[BioC] nearest() for GRanges
Cook, Malcolm
MEC at stowers.org
Mon Jun 25 23:53:39 CEST 2012
Hi Valerie,
Indeed good news.
However, I am finding that this newest version is not yet available view
biocLite from repository at bioconductor.org. I am still picking up 1.8.6
with biocLite('GenomicRanges').
Should I expect to wait, or perhaps is there a 'push' at your end that
needs attending?
Please advise if I'm expecting it to appear before its time ;)
Thanks!
Malcolm
On 6/25/12 1:53 PM, "Valerie Obenchain" <vobencha at fhcrc.org> wrote:
>This is now fixed in release, BiocC 2.10, GenomicRanges 1.8.7.
>
>Note the behavior of '*' is different from previous behavior (i.e., <= v
>1.8.6). Treatment of '*' ranges was one of the aspects we clarified and
>enforced in the the recent update of precede, follows and nearest.
>
>Previously in release '*' was treated as a '+' range,
>
>g <- GRanges("chr1", IRanges(c(1,5,10), c(2,7,12)), "*")
> > g
>GRanges with 3 ranges and 0 elementMetadata cols:
> seqnames ranges strand
><Rle> <IRanges> <Rle>
> [1] chr1 [ 1, 2] *
> [2] chr1 [ 5, 7] *
> [3] chr1 [10, 12] *
> ---
> seqlengths:
> chr1
> NA
> > precede(g)
>[1] 2 3 NA
> > follow(g)
>[1] NA 1 2
> > nearest(g)
>[1] 2 1 2
>
>
>The new behavior of '*' (in both release and devel) considers both '+'
>and '-' possibilities. For details see the 'matching by strand' section
>described in precede() on the man page for ?GRanges.
>
> > precede(g)
>[1] 2 1 2
> > follow(g)
>[1] 2 1 2
> > nearest(g)
>[1] 2 1 2
>
>
>Valerie
>
>On 06/22/2012 03:25 PM, Cook, Malcolm wrote:
>> Great news, Valerie... thanks very much... I will take immediate
>>advantage
>> of this... after re-reading your report of 'an overhaul' I would well
>> understand if back-porting your fix in dev to release would be onerous
>>to
>> impossible.
>>
>> I hope it goes quickly and smoothly....
>>
>> Cheers,
>>
>> Malcolm
>>
>>
>> On 6/22/12 4:00 PM, "Valerie Obenchain"<vobencha at fhcrc.org> wrote:
>>
>>> On 06/20/2012 05:20 PM, Cook, Malcolm wrote:
>>>> Hi Valerie,
>>>>
>>>> Very glad you found and fixed the root cause.
>>>>
>>>> I don't know the overhead it would take for you, but, this being a
>>>> regression, might you consider fixing in Bioconductor 2.10 as, say
>>>> GenomicRanges_1.8.
>>>>
>>> Yes, I will fix this in release too. If not today then first thing next
>>> week.
>>>
>>> Valerie
>>>> Thanks for your consideration,
>>>>
>>>> Malcolm
>>>>
>>>> On 6/20/12 3:13 PM, "Valerie Obenchain"<vobencha at fhcrc.org> wrote:
>>>>
>>>>> Hi Oleg, Malcom,
>>>>>
>>>>> Thanks for the bug report. This is now fixed in devel 1.9.28. Over
>>>>>the
>>>>> past months we've done an overhaul of the precede/follow code in
>>>>>devel.
>>>>> The new nearest method is based on the new precede and follow and is
>>>>> documented at
>>>>>
>>>>> ?'nearest,GenomicRanges,GenomicRanges-method'
>>>>>
>>>>> Let me know if you run into problems.
>>>>>
>>>>> Valerie
>>>>>
>>>>>
>>>>>
>>>>> On 06/18/2012 02:25 PM, Cook, Malcolm wrote:
>>>>>> Martin, Oleg, Val, all,
>>>>>>
>>>>>> I too have script logic that benefitted from and depends upon what
>>>>>>the
>>>>>> behavior of nearest,GenomicRanges,missing as reported by Oleg.
>>>>>>
>>>>>> Thanks for the unit tests Martin.
>>>>>>
>>>>>> If it helps in sleuthing, in my hands, the 3rd test used to pass (if
>>>>>> my
>>>>>> memory serves), but does not pass now, as the attached transcript
>>>>>> shows.
>>>>>>
>>>>>> Hoping it helps find a speedy resolution to this issue,
>>>>>>
>>>>>> ~ Malcolm Cook
>>>>>>
>>>>>>
>>>>>>
>>>>>>> r<- IRanges(c(1,5,10), c(2,7,12))
>>>>>>> g<- GRanges("chr1", r, "+")
>>>>>>> checkEquals(precede(r), precede(g))
>>>>>> [1] TRUE
>>>>>>> checkEquals(follow(r), follow(g))
>>>>>> [1] TRUE
>>>>>>> try(checkEquals(nearest(r), nearest(g)))
>>>>>> Error in checkEquals(nearest(r), nearest(g)) :
>>>>>> Mean relative difference: 0.6
>>>>>>
>>>>>>
>>>>>>> sessionInfo()
>>>>>> R version 2.15.0 (2012-03-30)
>>>>>> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>>>>>>
>>>>>> locale:
>>>>>> [1] C
>>>>>>
>>>>>> attached base packages:
>>>>>> [1] tools splines parallel stats graphics grDevices
>>>>>> utils
>>>>>> datasets methods base
>>>>>>
>>>>>> other attached packages:
>>>>>> [1] RUnit_0.4.26 log4r_0.1-4 vwr_0.1
>>>>>> RecordLinkage_0.4-1 ffbase_0.5 ff_2.2-7
>>>>>> bit_1.1-8 evd_2.2-6 ipred_0.8-13
>>>>>> prodlim_1.3.1 KernSmooth_2.23-7 nnet_7.3-1
>>>>>> survival_2.36-14 mlbench_2.1-0 MASS_7.3-18
>>>>>> ada_2.0-2 rpart_3.1-53 e1071_1.6
>>>>>> class_7.3-3 XLConnect_0.1-9 XLConnectJars_0.1-4
>>>>>> rJava_0.9-3 latticeExtra_0.6-19 RColorBrewer_1.0-5
>>>>>> lattice_0.20-6 doMC_1.2.5 multicore_0.1-7
>>>>>> [28] BSgenome_1.24.0 rtracklayer_1.16.1 Rsamtools_1.8.5
>>>>>> Biostrings_2.24.1 GenomicFeatures_1.8.1 AnnotationDbi_1.18.1
>>>>>> GenomicRanges_1.8.6 IRanges_1.14.3 Biobase_2.16.0
>>>>>> BiocGenerics_0.2.0 data.table_1.8.0 compare_0.2-3
>>>>>> svUnit_0.7-10 doParallel_1.0.1 iterators_1.0.6
>>>>>> foreach_1.4.0 ggplot2_0.9.1 sqldf_0.4-6.4
>>>>>> RSQLite.extfuns_0.0.1 RSQLite_0.11.1 chron_2.3-42
>>>>>> gsubfn_0.6-3 proto_0.3-9.2 DBI_0.2-5
>>>>>> functional_0.1 reshape_0.8.4 plyr_1.7.1
>>>>>> [55] stringr_0.6 gtools_2.6.2
>>>>>>
>>>>>> loaded via a namespace (and not attached):
>>>>>> [1] RCurl_1.91-1 XML_3.9-4 biomaRt_2.12.0
>>>>>> bitops_1.0-4.1
>>>>>> codetools_0.2-8 colorspace_1.1-1 compiler_2.15.0 dichromat_1.2-4
>>>>>> digest_0.5.2 grid_2.15.0 labeling_0.1 memoise_0.1
>>>>>> munsell_0.3 reshape2_1.2.1 scales_0.2.1 stats4_2.15.0
>>>>>> tcltk_2.15.0 zlibbioc_1.2.0
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 6/18/12 2:39 PM, "Martin Morgan"<mtmorgan at fhcrc.org> wrote:
>>>>>>
>>>>>>> Hi Oleg --
>>>>>>>
>>>>>>> On 06/17/2012 11:11 PM, Oleg Mayba wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I just noticed that a piece of logic I was relying on with GRanges
>>>>>>>> before
>>>>>>>> does not seem to work anymore. Namely, I expect the behavior of
>>>>>>>> nearest()
>>>>>>>> with a single GRanges object as an argument to be the same as that
>>>>>>>> of
>>>>>>>> IRanges (example below), but it's not anymore. I expect
>>>>>>>> nearest(GR1)
>>>>>>>> NOT
>>>>>>>> to behave trivially but to return the closest range OTHER than the
>>>>>>>> range
>>>>>>>> itself. I could swear that was the case before, but isn't any
>>>>>>>> longer:
>>>>>>> I think you're right that there is an inconsistency here; Val will
>>>>>>> likely help clarify in a day or so. My two cents...
>>>>>>>
>>>>>>> I think, certainly, that GRanges on a single chromosome on the "+"
>>>>>>> strand should behave like an IRanges
>>>>>>>
>>>>>>> library(GenomicRanges)
>>>>>>> library(RUnit)
>>>>>>>
>>>>>>> r<- IRanges(c(1,5,10), c(2,7,12))
>>>>>>> g<- GRanges("chr1", r, "+")
>>>>>>>
>>>>>>> ## first two ok, third should work but fails
>>>>>>> checkEquals(precede(r), precede(g))
>>>>>>> checkEquals(follow(r), follow(g))
>>>>>>> try(checkEquals(nearest(r), nearest(g)))
>>>>>>>
>>>>>>> Also, on the "-" strand I think we're expecting
>>>>>>>
>>>>>>> g<- GRanges("chr1", r, "-")
>>>>>>>
>>>>>>> ## first two ok, third should work but fails
>>>>>>> checkEquals(follow(r), precede(g))
>>>>>>> checkEquals(precede(r), follow(g))
>>>>>>> try(checkEquals(nearest(r), nearest(g)))
>>>>>>>
>>>>>>> For "*" (which was your example) I think the situation is (a)
>>>>>>> different
>>>>>>> in devel than in release; and (b) not so clear. In devel, "*" is
>>>>>>> (from
>>>>>>> method?"nearest,GenomicRanges,missing") "x on '*' strand can match
>>>>>>>to
>>>>>>> ranges on any of ''+'', ''-'' or ''*''" and in particular I think
>>>>>>> these
>>>>>>> are always true:
>>>>>>>
>>>>>>> checkEquals(precede(g), follow(g))
>>>>>>> checkEquals(nearest(r), follow(g))
>>>>>>>
>>>>>>> I would also expect
>>>>>>>
>>>>>>> try(checkEquals(nearest(g), follow(g)))
>>>>>>>
>>>>>>> though this seems not to be the case. In 'release', "*" is coereced
>>>>>>> and
>>>>>>> behaves as if on the "+" strand (I think).
>>>>>>>
>>>>>>> Martin
>>>>>>>
>>>>>>>>> z=IRanges(start=c(1,5,10), end=c(2,7,12))
>>>>>>>>> z
>>>>>>>> IRanges of length 3
>>>>>>>> start end width
>>>>>>>> [1] 1 2 2
>>>>>>>> [2] 5 7 3
>>>>>>>> [3] 10 12 3
>>>>>>>>> nearest(z)
>>>>>>>> [1] 2 1 2
>>>>>>>>> z=GRanges(seqnames=rep('chr1',3),ranges=IRanges(start=c(1,5,10),
>>>>>>>> end=c(2,7,12)))
>>>>>>>>> z
>>>>>>>> GRanges with 3 ranges and 0 elementMetadata cols:
>>>>>>>> seqnames ranges strand
>>>>>>>> <Rle> <IRanges> <Rle>
>>>>>>>> [1] chr1 [ 1, 2] *
>>>>>>>> [2] chr1 [ 5, 7] *
>>>>>>>> [3] chr1 [10, 12] *
>>>>>>>> ---
>>>>>>>> seqlengths:
>>>>>>>> chr1
>>>>>>>> NA
>>>>>>>>> nearest(z)
>>>>>>>> [1] 1 2 3
>>>>>>>>> sessionInfo()
>>>>>>>> R version 2.15.0 (2012-03-30)
>>>>>>>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>>>>>>>
>>>>>>>> locale:
>>>>>>>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>>>>>>>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>>>>>>>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
>>>>>>>> [7] LC_PAPER=C LC_NAME=C
>>>>>>>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>>>>>>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>>>>>>
>>>>>>>> attached base packages:
>>>>>>>> [1] datasets utils grDevices graphics stats methods
>>>>>>>>base
>>>>>>>>
>>>>>>>> other attached packages:
>>>>>>>> [1] GenomicRanges_1.8.6 IRanges_1.14.3 BiocGenerics_0.2.0
>>>>>>>>
>>>>>>>> loaded via a namespace (and not attached):
>>>>>>>> [1] stats4_2.15.0
>>>>>>>>
>>>>>>>>
>>>>>>>> I want the IRanges behavior and not what seems currently to be the
>>>>>>>> GRanges
>>>>>>>> behavior, since I have some code that depends on it. Is there a
>>>>>>>> quick
>>>>>>>> way
>>>>>>>> to make nearest() do that for me again?
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>>
>>>>>>>> Oleg.
>>>>>>>>
>>>>>>>> [[alternative HTML version deleted]]
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Bioconductor mailing list
>>>>>>>> Bioconductor at r-project.org
>>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>>>> Search the archives:
>>>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>>> --
>>>>>>> Computational Biology / Fred Hutchinson Cancer Research Center
>>>>>>> 1100 Fairview Ave. N.
>>>>>>> PO Box 19024 Seattle, WA 98109
>>>>>>>
>>>>>>> Location: Arnold Building M1 B861
>>>>>>> Phone: (206) 667-2793
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Bioconductor mailing list
>>>>>>> Bioconductor at r-project.org
>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>>> Search the archives:
>>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>> _______________________________________________
>>>>>> Bioconductor mailing list
>>>>>> Bioconductor at r-project.org
>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>> Search the archives:
>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
More information about the Bioconductor
mailing list