[Bioc-sig-seq] seqselect on SimpleRleList and RangesList - bug? and request
Patrick Aboyoun
paboyoun at fhcrc.org
Sat Jun 12 09:17:25 CEST 2010
Janet,
Most function in the IRanges package follows the R convention of
considering the elements of names to be loosely linked attributes rather
than rigid keys. For convenience, functions such as $, [, [[ treat a
list as a hash if it has names, but in most circumstances the names are
ignored or copied without use. Even when there are names on elements,
there are some odd corner cases that can cause problems. For example, if
I wanted to have multiple list elements with the same name, then some
important operations give unexpected results:
> list(a = 1, a = 2)["a"]
$a
[1] 1
If the issue is limited to enhance the seqselect function to make it
name aware, it probably makes sense to go ahead with the enhancement.
But the scope of this issue can grow quite large. For example, should
names be used when adding to RleList objects? What should the following
produce
RleList(a = Rle(1)) + RleList(a = Rle(2), a = Rle(3), b = Rle(4))
Due to these types of ambiguities, I would rather focus on educating the
user to be mindful that these are position-oriented rather than
key-oriented objects and have them ensure that elements are in alignment.
Thoughts?
Patrick
On 6/11/10 4:06 PM, Janet Young wrote:
> Hi,
>
> I've been playing around with seqselect on scores stored in a
> SimpleRleList object to get subregions defined in a RangesList object.
>
> I found a couple of things: first an enhancement request - would it
> be possible to allow seqselect to deal with cases where not every
> space (name) in the SimpleRleList has a corresponding space/name in
> the RangesList object?
>
> The second is either bug or else I've misunderstood the way seqselect
> is supposed to work, in a dangerous way - it looks like seqselect
> doesn't use the names of the list items to select scores, it just
> assumes that in the two lists the elements have the same names in the
> same order.
>
> The code below should explain both issues problem much better than
> those descriptions.
>
> thanks,
>
> Janet
>
>
>
> > library(IRanges)
>
> Attaching package: 'IRanges'
>
> The following object(s) are masked from 'package:base':
>
> cbind, Map, mapply, order, paste, pmax, pmax.int, pmin, pmin.int,
> rbind, rep.int, table
>
> >
> > ### generate some arbitrary scores
> > track <- RangedData(RangesList(chrA = IRanges(start = c(1, 4, 6),
> width=c(3, 2, 4)),chrB = IRanges(start = c(1, 3, 6), width=c(3, 3, 4))) )
> > trackCoverage <- coverage(track,
> weight=list(chrA=c(2,7,3),chrB=c(1,1,1)) )
> >
> > ### define subregions
> > exons <- RangesList(chrA = IRanges(start = c(2, 4), width =
> c(2,2)),chrB = IRanges(start = 3, width = 5))
> >
> > ### seqselect works if all spaces in trackCoverage have an element
> in exons
> > seqselect(trackCoverage,exons )
> SimpleRleList of length 2
> $chrA
> 'integer' Rle of length 4 with 2 runs
> Lengths: 2 2
> Values : 2 7
>
> $chrB
> 'integer' Rle of length 5 with 2 runs
> Lengths: 1 4
> Values : 2 1
>
> >
> > ### define subregions only on one chr
> > exons_chrAonly <- RangesList(chrA = IRanges(start = c(2, 4), width =
> c(2, 2)))
> > ### now seqselect doesn't work if some spaces don't have any elements
> > seqselect(trackCoverage,exons_chrAonly )
> Error in seqselect(trackCoverage, exons_chrAonly) :
> 'length(start)' must equal 'length(x)' when 'end' and 'width' are NULL
> >
> >
> > ##### also, defining the regions with spaces in a different order
> seems to cause trouble as seqselect doesn't seem to be using the
> list's names - just going by order of elements
> > exons_reorderchrs <- RangesList(chrB = IRanges(start = 3, width =
> 5),chrA = IRanges(start = c(2, 4), width = c(2,2)))
> > seqselect(trackCoverage,exons_reorderchrs )
> SimpleRleList of length 2
> $chrA
> 'integer' Rle of length 5 with 3 runs
> Lengths: 1 2 2
> Values : 2 7 3
>
> $chrB
> 'integer' Rle of length 4 with 3 runs
> Lengths: 1 1 2
> Values : 1 2 1
>
> >
> > identical ( seqselect(trackCoverage,exons ) ,
> seqselect(trackCoverage,exons_reorderchrs ) )
> [1] FALSE
> >
> > sessionInfo()
> R version 2.11.1 (2010-05-31)
> i386-apple-darwin9.8.0
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] IRanges_1.6.6
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
More information about the Bioc-sig-sequencing
mailing list