[BioC] nearest() for GRanges
Cook, Malcolm
MEC at stowers.org
Mon Jun 18 23:25:18 CEST 2012
Martin, Oleg, Val, all,
I too have script logic that benefitted from and depends upon what the
behavior of nearest,GenomicRanges,missing as reported by Oleg.
Thanks for the unit tests Martin.
If it helps in sleuthing, in my hands, the 3rd test used to pass (if my
memory serves), but does not pass now, as the attached transcript shows.
Hoping it helps find a speedy resolution to this issue,
~ Malcolm Cook
> r <- IRanges(c(1,5,10), c(2,7,12))
> g <- GRanges("chr1", r, "+")
> checkEquals(precede(r), precede(g))
[1] TRUE
> checkEquals(follow(r), follow(g))
[1] TRUE
> try(checkEquals(nearest(r), nearest(g)))
Error in checkEquals(nearest(r), nearest(g)) :
Mean relative difference: 0.6
> sessionInfo()
R version 2.15.0 (2012-03-30)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
locale:
[1] C
attached base packages:
[1] tools splines parallel stats graphics grDevices utils
datasets methods base
other attached packages:
[1] RUnit_0.4.26 log4r_0.1-4 vwr_0.1
RecordLinkage_0.4-1 ffbase_0.5 ff_2.2-7
bit_1.1-8 evd_2.2-6 ipred_0.8-13
prodlim_1.3.1 KernSmooth_2.23-7 nnet_7.3-1
survival_2.36-14 mlbench_2.1-0 MASS_7.3-18
ada_2.0-2 rpart_3.1-53 e1071_1.6
class_7.3-3 XLConnect_0.1-9 XLConnectJars_0.1-4
rJava_0.9-3 latticeExtra_0.6-19 RColorBrewer_1.0-5
lattice_0.20-6 doMC_1.2.5 multicore_0.1-7
[28] BSgenome_1.24.0 rtracklayer_1.16.1 Rsamtools_1.8.5
Biostrings_2.24.1 GenomicFeatures_1.8.1 AnnotationDbi_1.18.1
GenomicRanges_1.8.6 IRanges_1.14.3 Biobase_2.16.0
BiocGenerics_0.2.0 data.table_1.8.0 compare_0.2-3
svUnit_0.7-10 doParallel_1.0.1 iterators_1.0.6
foreach_1.4.0 ggplot2_0.9.1 sqldf_0.4-6.4
RSQLite.extfuns_0.0.1 RSQLite_0.11.1 chron_2.3-42
gsubfn_0.6-3 proto_0.3-9.2 DBI_0.2-5
functional_0.1 reshape_0.8.4 plyr_1.7.1
[55] stringr_0.6 gtools_2.6.2
loaded via a namespace (and not attached):
[1] RCurl_1.91-1 XML_3.9-4 biomaRt_2.12.0 bitops_1.0-4.1
codetools_0.2-8 colorspace_1.1-1 compiler_2.15.0 dichromat_1.2-4
digest_0.5.2 grid_2.15.0 labeling_0.1 memoise_0.1
munsell_0.3 reshape2_1.2.1 scales_0.2.1 stats4_2.15.0
tcltk_2.15.0 zlibbioc_1.2.0
On 6/18/12 2:39 PM, "Martin Morgan" <mtmorgan at fhcrc.org> wrote:
>Hi Oleg --
>
>On 06/17/2012 11:11 PM, Oleg Mayba wrote:
>> Hi,
>>
>> I just noticed that a piece of logic I was relying on with GRanges
>>before
>> does not seem to work anymore. Namely, I expect the behavior of
>>nearest()
>> with a single GRanges object as an argument to be the same as that of
>> IRanges (example below), but it's not anymore. I expect nearest(GR1)
>>NOT
>> to behave trivially but to return the closest range OTHER than the range
>> itself. I could swear that was the case before, but isn't any longer:
>
>I think you're right that there is an inconsistency here; Val will
>likely help clarify in a day or so. My two cents...
>
>I think, certainly, that GRanges on a single chromosome on the "+"
>strand should behave like an IRanges
>
> library(GenomicRanges)
> library(RUnit)
>
> r <- IRanges(c(1,5,10), c(2,7,12))
> g <- GRanges("chr1", r, "+")
>
> ## first two ok, third should work but fails
> checkEquals(precede(r), precede(g))
> checkEquals(follow(r), follow(g))
> try(checkEquals(nearest(r), nearest(g)))
>
>Also, on the "-" strand I think we're expecting
>
> g <- GRanges("chr1", r, "-")
>
> ## first two ok, third should work but fails
> checkEquals(follow(r), precede(g))
> checkEquals(precede(r), follow(g))
> try(checkEquals(nearest(r), nearest(g)))
>
>For "*" (which was your example) I think the situation is (a) different
>in devel than in release; and (b) not so clear. In devel, "*" is (from
>method?"nearest,GenomicRanges,missing") "x on '*' strand can match to
>ranges on any of ''+'', ''-'' or ''*''" and in particular I think these
>are always true:
>
> checkEquals(precede(g), follow(g))
> checkEquals(nearest(r), follow(g))
>
>I would also expect
>
> try(checkEquals(nearest(g), follow(g)))
>
>though this seems not to be the case. In 'release', "*" is coereced and
>behaves as if on the "+" strand (I think).
>
>Martin
>
>>
>>> z=IRanges(start=c(1,5,10), end=c(2,7,12))
>>> z
>> IRanges of length 3
>> start end width
>> [1] 1 2 2
>> [2] 5 7 3
>> [3] 10 12 3
>>> nearest(z)
>> [1] 2 1 2
>>>
>>> z=GRanges(seqnames=rep('chr1',3),ranges=IRanges(start=c(1,5,10),
>> end=c(2,7,12)))
>>> z
>> GRanges with 3 ranges and 0 elementMetadata cols:
>> seqnames ranges strand
>> <Rle> <IRanges> <Rle>
>> [1] chr1 [ 1, 2] *
>> [2] chr1 [ 5, 7] *
>> [3] chr1 [10, 12] *
>> ---
>> seqlengths:
>> chr1
>> NA
>>> nearest(z)
>> [1] 1 2 3
>>>
>>> sessionInfo()
>> R version 2.15.0 (2012-03-30)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
>> [7] LC_PAPER=C LC_NAME=C
>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] datasets utils grDevices graphics stats methods base
>>
>> other attached packages:
>> [1] GenomicRanges_1.8.6 IRanges_1.14.3 BiocGenerics_0.2.0
>>
>> loaded via a namespace (and not attached):
>> [1] stats4_2.15.0
>>>
>>
>>
>>
>> I want the IRanges behavior and not what seems currently to be the
>>GRanges
>> behavior, since I have some code that depends on it. Is there a quick
>>way
>> to make nearest() do that for me again?
>>
>> Thanks!
>>
>> Oleg.
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>>http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
>--
>Computational Biology / Fred Hutchinson Cancer Research Center
>1100 Fairview Ave. N.
>PO Box 19024 Seattle, WA 98109
>
>Location: Arnold Building M1 B861
>Phone: (206) 667-2793
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at r-project.org
>https://stat.ethz.ch/mailman/listinfo/bioconductor
>Search the archives:
>http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list