[Bioc-sig-seq] Rle vs RangedData
Simon Anders
anders at ebi.ac.uk
Fri Jun 26 12:32:11 CEST 2009
Dear Michael and Patrick
As you may have noticed, my HilbertVis package requires the input data
to be presented as ordinary vector. Obviously, it would be much better
for performance to use a run-length-encoded vector (and the stand-alone
version of HilbertVis already does that).
So, I wanted to add the functionality to use either Rle objects or
RangedData objects as input to the hilbertDisplay function and got
confused about the two classes.
Rle seems to be simple and lightweight, but I cannot see how I could
perform fast random access. If I want to access an element of the vector
with a given position somewhere in the middle, I suppose I cannot avoid
having to add up all the lengths in order to find the right value. Is
there any reason why you store lengths of the constant intervals in the
Rle object rather than their start points? In the latter case one could
achieve random access in time O(log n) as opposed to O(n). Or are the
start points cached somewhere internally?
RangedData does seem to store the data in the start/value scheme that
seems more advantageous to me. However, it has a rather heavyweight slot
structure. Do I understand correctly that the canonical way to get the
start and data vectors from a RangedData object 'rd' would be
'start(rd)' and 'rd$score' (or maybe better 'rd[[1]]')?
As the most likely input for hilbertDisplay is the output of the
'coverage' function, which is an Rle object, it seems to make sense to
change hilbertDisplay to accept this. However, for performance reasons,
I then better convert to RangedData.
Would you agree?
Can you shed some lights about what you intended on when to use "Rle"
and when "RangedData"?
Thanks.
Simon
+---
| Dr. Simon Anders, Dipl. Phys.
| European Bioinformatics Institute (EMBL-EBI)
| Hinxton, Cambridgeshire, UK
| office phone +44-1223-492680, mobile phone +44-7505-841692
| preferred (permanent) e-mail: sanders at fs.tum.de
More information about the Bioc-sig-sequencing
mailing list