[Bioc-sig-seq] seqselect on SimpleRleList and RangesList - bug? and request

Patrick Aboyoun paboyoun at fhcrc.org
Sat Jun 12 09:17:25 CEST 2010


Janet,
Most function in the IRanges package follows the R convention of 
considering the elements of names to be loosely linked attributes rather 
than rigid keys. For convenience, functions such as $, [, [[ treat a 
list as a hash if it has names, but in most circumstances the names are 
ignored or copied without use. Even when there are names on elements, 
there are some odd corner cases that can cause problems. For example, if 
I wanted to have multiple list elements with the same name, then some 
important operations give unexpected results:

 > list(a = 1, a = 2)["a"]
$a
[1] 1

If the issue is limited to enhance the seqselect function to make it 
name aware, it probably makes sense to go ahead with the enhancement. 
But the scope of this issue can grow quite large. For example, should 
names be used when adding to RleList objects? What should the following 
produce

RleList(a = Rle(1)) + RleList(a = Rle(2), a = Rle(3), b = Rle(4))

Due to these types of ambiguities, I would rather focus on educating the 
user to be mindful that these are position-oriented rather than 
key-oriented objects and have them ensure that elements are in alignment.

Thoughts?


Patrick



On 6/11/10 4:06 PM, Janet Young wrote:
> Hi,
>
> I've been playing around with seqselect on scores stored in a 
> SimpleRleList object to get subregions defined in a RangesList object.
>
> I found a couple of things:  first an enhancement request - would it 
> be possible to allow seqselect to deal with cases where not every 
> space (name) in the SimpleRleList has a corresponding space/name in 
> the RangesList object?
>
> The second is either bug or else I've misunderstood the way seqselect 
> is supposed to work, in a dangerous way - it looks like seqselect 
> doesn't use the names of the list items to select scores, it just 
> assumes that in the two lists the elements have the same names in the 
> same order.
>
> The code below should explain both issues problem much better than 
> those descriptions.
>
> thanks,
>
> Janet
>
>
>
> > library(IRanges)
>
> Attaching package: 'IRanges'
>
> The following object(s) are masked from 'package:base':
>
>     cbind, Map, mapply, order, paste, pmax, pmax.int, pmin, pmin.int, 
> rbind, rep.int, table
>
> >
> > ### generate some arbitrary scores
> > track <- RangedData(RangesList(chrA = IRanges(start = c(1, 4, 6), 
> width=c(3, 2, 4)),chrB = IRanges(start = c(1, 3, 6), width=c(3, 3, 4))) )
> > trackCoverage <- coverage(track, 
> weight=list(chrA=c(2,7,3),chrB=c(1,1,1)) )
> >
> > ### define subregions
> > exons <- RangesList(chrA = IRanges(start = c(2, 4), width = 
> c(2,2)),chrB = IRanges(start = 3, width = 5))
> >
> > ### seqselect works if all spaces in trackCoverage have an element 
> in exons
> > seqselect(trackCoverage,exons )
> SimpleRleList of length 2
> $chrA
> 'integer' Rle of length 4 with 2 runs
>   Lengths: 2 2
>   Values : 2 7
>
> $chrB
> 'integer' Rle of length 5 with 2 runs
>   Lengths: 1 4
>   Values : 2 1
>
> >
> > ### define subregions only on one chr
> > exons_chrAonly <- RangesList(chrA = IRanges(start = c(2, 4), width = 
> c(2, 2)))
> > ### now seqselect doesn't work if some spaces don't have any elements
> > seqselect(trackCoverage,exons_chrAonly )
> Error in seqselect(trackCoverage, exons_chrAonly) :
>   'length(start)' must equal 'length(x)' when 'end' and 'width' are NULL
> >
> >
> > ##### also, defining the regions with spaces in a different order 
> seems to cause trouble as seqselect doesn't seem to be using the 
> list's names - just going by order of elements
> > exons_reorderchrs <- RangesList(chrB = IRanges(start = 3, width = 
> 5),chrA = IRanges(start = c(2, 4), width = c(2,2)))
> > seqselect(trackCoverage,exons_reorderchrs )
> SimpleRleList of length 2
> $chrA
> 'integer' Rle of length 5 with 3 runs
>   Lengths: 1 2 2
>   Values : 2 7 3
>
> $chrB
> 'integer' Rle of length 4 with 3 runs
>   Lengths: 1 1 2
>   Values : 1 2 1
>
> >
> > identical ( seqselect(trackCoverage,exons ) , 
> seqselect(trackCoverage,exons_reorderchrs )  )
> [1] FALSE
> >
> > sessionInfo()
> R version 2.11.1 (2010-05-31)
> i386-apple-darwin9.8.0
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] IRanges_1.6.6
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing



More information about the Bioc-sig-sequencing mailing list