[BioC] matrix like object with Rle columns
Hervé Pagès
hpages at fhcrc.org
Wed Jun 27 22:37:12 CEST 2012
Hi Kasper,
On 06/25/2012 08:56 PM, Kasper Daniel Hansen wrote:
[...]
> [ side question which could be relevant in this discussion: for a
> numeric Rle is there some notion of precision - say I have truly
> numeric values with tons of digits, and I want to consider two numbers
> part of the same run if |x1 -x2|<epsilon? ]
The comparison of 2 doubles is done at the C level with ==, which
AFAIK is the same as doing == in R (as long as we deal with non-NA
and non-NaN values). See the _fill_Rle_slots_with_double_vals() helper
function in IRanges/src/Rle_class.c for the details.
Therefore:
> all.equal(sqrt(3)^2, 3)
[1] TRUE
> sqrt(3)^2 == 3
[1] FALSE
> Rle(c(sqrt(3)^2, 3))
'numeric' Rle of length 2 with 2 runs
Lengths: 1 1
Values : 3 3
Note that base::rle() does the same:
> rle(c(sqrt(3)^2, 3))
Run Length Encoding
lengths: int [1:2] 1 1
values : num [1:2] 3 3
I can see that using a "|x1 -x2|<epsilon" criteria would in general
give better compression (less runs) but then the compression would not
be lossless as it is right now:
> x <- c(sqrt(3)^2, 3)
> identical(as.vector(Rle(x)), x)
[1] TRUE
> identical(inverse.rle(rle(x)), x)
[1] TRUE
Also the "|x1 -x2|<epsilon" approach would introduce some subtle
complications due to the fact that the criteria is not transitive
anymore i.e. you can have |x1 -x2|<epsilon and |x2 -x3|<epsilon,
without having |x1 -x3|<epsilon. Because of that, finding the runs
becomes some kind of clustering problem with several possible
strategies, some of them very simple but not necessarily with
the "good properties".
H.
>
> Kasper
>
>>
>> Michael
>>
>> On Mon, Jun 25, 2012 at 8:27 PM, Kasper Daniel Hansen
>> <kasperdanielhansen at gmail.com> wrote:
>>>
>>> Do we have a matrix-like object, but where the columns are Rle's?
>>>
>>> Kasper
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319
More information about the Bioconductor
mailing list