[Bioc-sig-seq] Add ability for `subset`ing IRanges-like objects based on their elementMetadata?
Patrick Aboyoun
paboyoun at fhcrc.org
Sat Jun 5 19:30:06 CEST 2010
There is a lot of meat here that I can't properly address now because I
am heading out to serve as a BioC evangelist in Europe. I was looking
over the as.env methods that you created Michael and I agree it would be
useful if we expanded upon this to support Rle's. I probably wont be
able to do much work on this until late June, but Michael feel free to
rework this as you see fit.
Cheers,
Patrick
On 6/5/10 9:18 AM, Charles C. Berry wrote:
> On Fri, 4 Jun 2010, Patrick Aboyoun wrote:
>
>> Great thread on the subset function. It currently has to
>> IRanges-based methods:
>>
>>> showMethods("subset")
>> Function: subset (package base)
>> x="ANY"
>> x="DataTable"
>> x="Sequence"
>>
>> Based on what was being discussed, I see two enhancement requests:
>>
>> 1) Expanding the scope of subset to allow reference to components of
>> non-DataTable objects such as IRanges and GRanges instances:
>>
>> ## Currently not supported, but could be
>> ir <- IRanges(start = 1:10, end = 1:10)
>> subset(ir, start < 5)
>>
>> 2) Add support for subsetting by 'logical' Rle in the subset function.
>>
>> The second request is straight-forward to implement since it can be
>> done within the subset methods of the Sequence and DataTable virtual
>> classes. If we limit the first to Ranges (virtual class) and GRanges
>> (which doesn't inherit from Ranges) objects, then two more subset
>> methods would suffice to achieve 1). Sound reasonable?
>>
>>
>> Patrick
>
> Perhaps this request pertaining to xtabs(..., subset = ...) is related.
>
> Currently (rather, in IRanges_1.6.4)
>
>> library(IRanges)
>> ir <-
>> RangedData(IRanges(start=1:10,width=1),space=rep(letters[1:2],5),z=rep(1:3,length=10))
>>
>> xtabs(~z,as.data.frame(ir),subset = z > 1)
> z
> 2 3
> 3 3
>> xtabs(~z,subset(ir,z>1))
> z
> 2 3
> 3 3
>>
>> xtabs(~z,ir,subset = z > 1)
> Error in xj[i] : invalid subscript type 'closure'
>>
>> xtabs(~z,subset(ir,space=='a'))
> z
> 1 2 3
> 2 1 2
>> xtabs(~z,ir,subset = space=='a')
> Error in xj[i] : invalid subscript type 'closure'
>>
>
> Can this be changed to allow use of the subset argument when the data
> arg is a RangedData (or GRanges) instance?
>
> Thanks,
>
> Chuck
>
>>
>>
>> On 6/4/10 10:06 PM, Steve Lianoglou wrote:
>>> Hi Vincent,
>>>
>>>
>>> > the simplification that Steve
>>> > seems to be asking for would
>>> > allow implicit references to elementMetadata variables in the
>>> predicate. > I
>>> > am not in favor of such
>>> > an extension of semantics of bracket.
>>> >
>>> Just to be clear, I'm not suggesting referencing elementMetadata
>>> variables implicitly w/in brackets, but rather only when using
>>> `subset` (as `subset` does now with columns of a data.frame (when it's
>>> used *on* a data.frame))
>>>
>>> So, using your example gr object:
>>>
>>> GRanges with 10 ranges and 2 elementMetadata values
>>> seqnames ranges strand | score GC
>>> <Rle> <IRanges> <Rle> |<integer> <numeric>
>>> a Chrom1 [ 1, 10] - | 1 1.0000000
>>> b Chrom2 [ 2, 10] + | 2 0.8888889
>>> c Chrom2 [ 3, 10] + | 3 0.7777778
>>> d Chrom2 [ 4, 10] * | 4 0.6666667
>>> e Chrom1 [ 5, 10] * | 5 0.5555556
>>> f Chrom1 [ 6, 10] + | 6 0.4444444
>>> g Chrom3 [ 7, 10] + | 7 0.3333333
>>> h Chrom3 [ 8, 10] + | 8 0.2222222
>>> i Chrom3 [ 9, 10] - | 9 0.1111111
>>> j Chrom3 [10, 10] - | 10 0.0000000
>>>
>>> seqlengths
>>> Chrom1 Chrom2 Chrom3
>>> NA NA NA
>>>
>>> I was curious if this would be useful:
>>>
>>> R> subset(gr, strand == "+"& score> 6)
>>>
>>> but I wasn't trying to propose having something like this:
>>>
>>> R> gr[strand == "+"& score> 6]
>>>
>>>
>>
>> _______________________________________________
>> Bioc-sig-sequencing mailing list
>> Bioc-sig-sequencing at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>
>
> Charles C. Berry (858) 534-2098
> Dept of Family/Preventive
> Medicine
> E mailto:cberry at tajo.ucsd.edu UC San Diego
> http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego
> 92093-0901
>
>
More information about the Bioc-sig-sequencing
mailing list