[BioC] matrix like object with Rle columns

Hervé Pagès hpages at fhcrc.org
Wed Jun 27 22:37:12 CEST 2012


Hi Kasper,

On 06/25/2012 08:56 PM, Kasper Daniel Hansen wrote:
[...]
> [ side question which could be relevant in this discussion: for a
> numeric Rle is there some notion of precision - say I have truly
> numeric values with tons of digits, and I want to consider two numbers
> part of the same run if |x1 -x2|<epsilon? ]

The comparison of 2 doubles is done at the C level with ==, which
AFAIK is the same as doing == in R (as long as we deal with non-NA
and non-NaN values). See the _fill_Rle_slots_with_double_vals() helper
function in IRanges/src/Rle_class.c for the details.

Therefore:

   > all.equal(sqrt(3)^2, 3)
   [1] TRUE
   > sqrt(3)^2 == 3
   [1] FALSE
   > Rle(c(sqrt(3)^2, 3))
   'numeric' Rle of length 2 with 2 runs
     Lengths: 1 1
     Values : 3 3

Note that base::rle() does the same:

   > rle(c(sqrt(3)^2, 3))
   Run Length Encoding
     lengths: int [1:2] 1 1
     values : num [1:2] 3 3

I can see that using a "|x1 -x2|<epsilon" criteria would in general
give better compression (less runs) but then the compression would not
be lossless as it is right now:

   > x <- c(sqrt(3)^2, 3)
   > identical(as.vector(Rle(x)), x)
   [1] TRUE
   > identical(inverse.rle(rle(x)), x)
   [1] TRUE

Also the "|x1 -x2|<epsilon" approach would introduce some subtle
complications due to the fact that the criteria is not transitive
anymore i.e. you can have |x1 -x2|<epsilon and |x2 -x3|<epsilon,
without having |x1 -x3|<epsilon. Because of that, finding the runs
becomes some kind of clustering problem with several possible
strategies, some of them very simple but not necessarily with
the "good properties".

H.

>
> Kasper
>
>>
>> Michael
>>
>> On Mon, Jun 25, 2012 at 8:27 PM, Kasper Daniel Hansen
>> <kasperdanielhansen at gmail.com> wrote:
>>>
>>> Do we have a matrix-like object, but where the columns are Rle's?
>>>
>>> Kasper
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>


-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list