[Bioc-sig-seq] Add ability for `subset`ing IRanges-like objects based on their elementMetadata?
Charles C. Berry
cberry at tajo.ucsd.edu
Sat Jun 5 18:18:04 CEST 2010
On Fri, 4 Jun 2010, Patrick Aboyoun wrote:
> Great thread on the subset function. It currently has to IRanges-based
> methods:
>
>> showMethods("subset")
> Function: subset (package base)
> x="ANY"
> x="DataTable"
> x="Sequence"
>
> Based on what was being discussed, I see two enhancement requests:
>
> 1) Expanding the scope of subset to allow reference to components of
> non-DataTable objects such as IRanges and GRanges instances:
>
> ## Currently not supported, but could be
> ir <- IRanges(start = 1:10, end = 1:10)
> subset(ir, start < 5)
>
> 2) Add support for subsetting by 'logical' Rle in the subset function.
>
> The second request is straight-forward to implement since it can be done
> within the subset methods of the Sequence and DataTable virtual classes. If
> we limit the first to Ranges (virtual class) and GRanges (which doesn't
> inherit from Ranges) objects, then two more subset methods would suffice to
> achieve 1). Sound reasonable?
>
>
> Patrick
Perhaps this request pertaining to xtabs(..., subset = ...) is related.
Currently (rather, in IRanges_1.6.4)
> library(IRanges)
> ir <- RangedData(IRanges(start=1:10,width=1),space=rep(letters[1:2],5),z=rep(1:3,length=10))
> xtabs(~z,as.data.frame(ir),subset = z > 1)
z
2 3
3 3
> xtabs(~z,subset(ir,z>1))
z
2 3
3 3
>
> xtabs(~z,ir,subset = z > 1)
Error in xj[i] : invalid subscript type 'closure'
>
> xtabs(~z,subset(ir,space=='a'))
z
1 2 3
2 1 2
> xtabs(~z,ir,subset = space=='a')
Error in xj[i] : invalid subscript type 'closure'
>
Can this be changed to allow use of the subset argument when the data arg
is a RangedData (or GRanges) instance?
Thanks,
Chuck
>
>
> On 6/4/10 10:06 PM, Steve Lianoglou wrote:
>> Hi Vincent,
>>
>>
>> > the simplification that Steve
>> > seems to be asking for would
>> > allow implicit references to elementMetadata variables in the predicate.
>> > I
>> > am not in favor of such
>> > an extension of semantics of bracket.
>> >
>> Just to be clear, I'm not suggesting referencing elementMetadata
>> variables implicitly w/in brackets, but rather only when using
>> `subset` (as `subset` does now with columns of a data.frame (when it's
>> used *on* a data.frame))
>>
>> So, using your example gr object:
>>
>> GRanges with 10 ranges and 2 elementMetadata values
>> seqnames ranges strand | score GC
>> <Rle> <IRanges> <Rle> |<integer> <numeric>
>> a Chrom1 [ 1, 10] - | 1 1.0000000
>> b Chrom2 [ 2, 10] + | 2 0.8888889
>> c Chrom2 [ 3, 10] + | 3 0.7777778
>> d Chrom2 [ 4, 10] * | 4 0.6666667
>> e Chrom1 [ 5, 10] * | 5 0.5555556
>> f Chrom1 [ 6, 10] + | 6 0.4444444
>> g Chrom3 [ 7, 10] + | 7 0.3333333
>> h Chrom3 [ 8, 10] + | 8 0.2222222
>> i Chrom3 [ 9, 10] - | 9 0.1111111
>> j Chrom3 [10, 10] - | 10 0.0000000
>>
>> seqlengths
>> Chrom1 Chrom2 Chrom3
>> NA NA NA
>>
>> I was curious if this would be useful:
>>
>> R> subset(gr, strand == "+"& score> 6)
>>
>> but I wasn't trying to propose having something like this:
>>
>> R> gr[strand == "+"& score> 6]
>>
>>
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>
Charles C. Berry (858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901
More information about the Bioc-sig-sequencing
mailing list