[Bioc-sig-seq] Rle vs RangedData

Mon Jun 29 20:23:52 CEST 2009

Simon,
Could you provide timings for Rle element extraction because I have been 
trying to provide speedups for bottlenecks. If the need is to perform 
multiple element extraction, as Wolfgang suggests, then "[" for Rle 
should be performant since it only calculates the start values once:

## from the internals of "[" for Rle
output <- runValue(x)[findInterval(i, start(x))]
if (!drop) output <- Rle(output)

Patrick

Wolfgang Huber wrote:
>
> Hi Simon
>
> just to be sure - what is n? Number of segments, or length of the 
> (expanded) sequence?
>
> And rather than looking at the time needed to access a single value at 
> a certain position, shouldn't you be looking at the time needed to 
> access the values on a complete equi-spaced grid from begin to end of 
> the sequence?
>
>     bw Wolfgang
>
>
> Simon Anders ha scritto:
>> Hi Michael
>>
>> Michael Lawrence wrote:
>>> An Rle object, even if it only stores the widths, would be better 
>>> than RangedData. Just getting the starts out of a RangedData is an 
>>> O(n) operation, and there is in general a lot of overhead for 
>>> functionality that is not useful in your case.
>>
>> Thanks.
>>
>> But wait a second: Isn't there a slot "starts" in a RangedData object?
>> So why would it be O(n) if this information is already there?
>>
>> My concern was that getting the starts (or even just getting a value at
>> a given position) from an Rle object would be O(n) because the Rle
>> object does not contain the starts, only the lengths of the intervals.
>>
>> So, what information is now stored where?
>>
>> Cheers
>>   Simon
>
> Best wishes
>      Wolfgang
>
> ------------------------------------------------
> Wolfgang Huber, EMBL, http://www.ebi.ac.uk/huber