[Bioc-sig-seq] Add ability for `subset`ing IRanges-like objects based on their elementMetadata?

Charles C. Berry cberry at tajo.ucsd.edu
Sat Jun 5 18:18:04 CEST 2010


On Fri, 4 Jun 2010, Patrick Aboyoun wrote:

> Great thread on the subset function. It currently has to IRanges-based 
> methods:
>
>>  showMethods("subset")
> Function: subset (package base)
> x="ANY"
> x="DataTable"
> x="Sequence"
>
> Based on what was being discussed, I see two enhancement requests:
>
> 1) Expanding the scope of subset to allow reference to components of 
> non-DataTable objects such as IRanges and GRanges instances:
>
> ## Currently not supported, but could be
> ir <- IRanges(start = 1:10, end = 1:10)
> subset(ir, start < 5)
>
> 2) Add support for subsetting by 'logical' Rle in the subset function.
>
> The second request is straight-forward to implement since it can be done 
> within the subset methods of the Sequence and DataTable virtual classes. If 
> we limit the first to Ranges (virtual class) and GRanges (which doesn't 
> inherit from Ranges) objects, then two more subset methods would suffice to 
> achieve 1). Sound reasonable?
>
>
> Patrick

Perhaps this request pertaining to xtabs(..., subset = ...) is related.

Currently (rather, in IRanges_1.6.4)

> library(IRanges)
> ir <- RangedData(IRanges(start=1:10,width=1),space=rep(letters[1:2],5),z=rep(1:3,length=10))
> xtabs(~z,as.data.frame(ir),subset = z > 1)
z
2 3
3 3
> xtabs(~z,subset(ir,z>1))
z
2 3
3 3
>
> xtabs(~z,ir,subset = z > 1)
Error in xj[i] : invalid subscript type 'closure'
>
> xtabs(~z,subset(ir,space=='a'))
z
1 2 3
2 1 2
> xtabs(~z,ir,subset = space=='a')
Error in xj[i] : invalid subscript type 'closure'
>

Can this be changed to allow use of the subset argument when the data arg 
is a RangedData (or GRanges) instance?

Thanks,

Chuck

>
>
> On 6/4/10 10:06 PM, Steve Lianoglou wrote:
>>  Hi Vincent,
>>
>> 
>> >  the simplification that Steve
>> >  seems to be asking for would
>> >  allow implicit references to elementMetadata variables in the predicate. 
>> >  I
>> >  am not in favor of such
>> >  an extension of semantics of bracket.
>> >
>>  Just to be clear, I'm not suggesting referencing elementMetadata
>>  variables implicitly w/in brackets, but rather only when using
>>  `subset` (as `subset` does now with columns of a data.frame (when it's
>>  used *on* a data.frame))
>>
>>  So, using your example gr object:
>>
>>  GRanges with 10 ranges and 2 elementMetadata values
>>    seqnames    ranges strand |     score        GC
>>       <Rle>  <IRanges>   <Rle>  |<integer>  <numeric>
>>  a   Chrom1  [ 1, 10]      - |         1 1.0000000
>>  b   Chrom2  [ 2, 10]      + |         2 0.8888889
>>  c   Chrom2  [ 3, 10]      + |         3 0.7777778
>>  d   Chrom2  [ 4, 10]      * |         4 0.6666667
>>  e   Chrom1  [ 5, 10]      * |         5 0.5555556
>>  f   Chrom1  [ 6, 10]      + |         6 0.4444444
>>  g   Chrom3  [ 7, 10]      + |         7 0.3333333
>>  h   Chrom3  [ 8, 10]      + |         8 0.2222222
>>  i   Chrom3  [ 9, 10]      - |         9 0.1111111
>>  j   Chrom3  [10, 10]      - |        10 0.0000000
>>
>>  seqlengths
>>    Chrom1 Chrom2 Chrom3
>>       NA     NA     NA
>>
>>  I was curious if this would be useful:
>> 
>> R>   subset(gr, strand == "+"&  score>  6)
>>
>>  but I wasn't trying to propose having something like this:
>> 
>> R>   gr[strand == "+"&  score>  6]
>> 
>> 
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>

Charles C. Berry                            (858) 534-2098
                                             Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu	            UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901



More information about the Bioc-sig-sequencing mailing list