[Bioc-sig-seq] Rle vs RangedData
Patrick Aboyoun
paboyoun at fhcrc.org
Mon Jun 29 20:23:52 CEST 2009
Simon,
Could you provide timings for Rle element extraction because I have been
trying to provide speedups for bottlenecks. If the need is to perform
multiple element extraction, as Wolfgang suggests, then "[" for Rle
should be performant since it only calculates the start values once:
## from the internals of "[" for Rle
output <- runValue(x)[findInterval(i, start(x))]
if (!drop) output <- Rle(output)
Patrick
Wolfgang Huber wrote:
>
> Hi Simon
>
> just to be sure - what is n? Number of segments, or length of the
> (expanded) sequence?
>
> And rather than looking at the time needed to access a single value at
> a certain position, shouldn't you be looking at the time needed to
> access the values on a complete equi-spaced grid from begin to end of
> the sequence?
>
> bw Wolfgang
>
>
> Simon Anders ha scritto:
>> Hi Michael
>>
>> Michael Lawrence wrote:
>>> An Rle object, even if it only stores the widths, would be better
>>> than RangedData. Just getting the starts out of a RangedData is an
>>> O(n) operation, and there is in general a lot of overhead for
>>> functionality that is not useful in your case.
>>
>> Thanks.
>>
>> But wait a second: Isn't there a slot "starts" in a RangedData object?
>> So why would it be O(n) if this information is already there?
>>
>> My concern was that getting the starts (or even just getting a value at
>> a given position) from an Rle object would be O(n) because the Rle
>> object does not contain the starts, only the lengths of the intervals.
>>
>> So, what information is now stored where?
>>
>> Cheers
>> Simon
>
> Best wishes
> Wolfgang
>
> ------------------------------------------------
> Wolfgang Huber, EMBL, http://www.ebi.ac.uk/huber
More information about the Bioc-sig-sequencing
mailing list