[Bioc-sig-seq] Rle vs RangedData

Fri Jun 26 12:32:11 CEST 2009

Dear Michael and Patrick

As you may have noticed, my HilbertVis package requires the input data 
to be presented as ordinary vector. Obviously, it would be much better 
for performance to use a run-length-encoded vector (and the stand-alone 
version of HilbertVis already does that).

So, I wanted to add the functionality to use either Rle objects or 
RangedData objects as input to the hilbertDisplay function and got 
confused about the two classes.

Rle seems to be simple and lightweight, but I cannot see how I could 
perform fast random access. If I want to access an element of the vector 
with a given position somewhere in the middle, I suppose I cannot avoid 
having to add up all the lengths in order to find the right value. Is 
there any reason why you store lengths of the constant intervals in the 
Rle object rather than their start points? In the latter case one could 
achieve random access in time O(log n) as opposed to O(n). Or are the 
start points cached somewhere internally?

RangedData does seem to store the data in the start/value scheme that 
seems more advantageous to me. However, it has a rather heavyweight slot 
structure. Do I understand correctly that the canonical way to get the 
start and data vectors from a RangedData object 'rd' would be 
'start(rd)' and 'rd$score' (or maybe better 'rd[[1]]')?

As the most likely input for hilbertDisplay is the output of the 
'coverage' function, which is an Rle object, it seems to make sense to 
change hilbertDisplay to accept this. However, for performance reasons, 
I then better convert to RangedData.

Would you agree?

Can you shed some lights about what you intended on when to use "Rle" 
and when "RangedData"?

Thanks.

   Simon

+---
| Dr. Simon Anders, Dipl. Phys.
| European Bioinformatics Institute (EMBL-EBI)
| Hinxton, Cambridgeshire, UK
| office phone +44-1223-492680, mobile phone +44-7505-841692
| preferred (permanent) e-mail: sanders at fs.tum.de