[Bioc-sig-seq] the as.matrix method of the RangesMatchingList

Nicolas Delhomme delhomme at embl.de
Thu May 14 14:07:21 CEST 2009


Hi Michael,

Thanks a lot. This is working very nicely. However, the user has to  
pay attention to the fact that it's query and subject are ordered in  
the same way to properly use the index generated.

For example, my subject has the names: "2L" "2R" "3L" "3R" "4"  "X"  
and the query: "4"  "X"  "3R" "2R" "2L" "3L". The overlap function  
takes care of this and compare the right spaces. It returns a  
RangesMatchingList with names:
"4"  "X"  "3R" "2R" "2L" "3L". This means that when I export the  
result as a matrix, the indices will be corrupted.

I can think of two solutions:
Either, there should be a warning emitted (when doing the overlap if  
the names are not ordered the same)
Or, and that would be my preferred solution, have an additional slot  
in the RangesMatchingList holding the mapping index from the query  
names to the subject names. This could then be used by the as.matrix  
method to return the "correct" indices. This should make it safe for  
the case where the user does not provide a query and a subject ordered  
in the same way. And it should be robust to the cases where the query  
and subject spaces are not entirely identical.

Well, this is just my two cents' worth as I'm not (yet) so familiar  
with the code.

Best,

---------------------------------------------------------------
Nicolas Delhomme

High Throughput Functional Genomics Center

European Molecular Biology Laboratory

Tel: +49 6221 387 8426
Email: nicolas.delhomme at embl.de
Meyerhofstrasse 1 - Postfach 10.2209
69102 Heidelberg, Germany
---------------------------------------------------------------



On 13 May 2009, at 22:20, Michael Lawrence wrote:

>
>
> On Wed, May 13, 2009 at 7:05 AM, Nicolas Delhomme <delhomme at embl.de>  
> wrote:
> Hi all,
>
> I've got the impression that the as.matrix method of the  
> RangesMatchingList does not work as it should.
>
> I have a RangesMatchingList which I obtained by using the overlap  
> (from the RangesList class) function that takes two RangesList as  
> input. When I apply as.matrix() on the RangesMatchingList, it gives  
> me the following error:
>
> Error in .Method(..., deparse.level = deparse.level) :
>  number of rows of matrices must match (see arg 2)
>
> The function is pretty easy:
>
> setMethod("as.matrix", "RangesMatchingList", function(x) {
>  cbind(space = space(x), do.call(cbind, lapply(x, as.matrix)))
> })
>
> When I replace the cbind in the do.call by an rbind, it's already  
> better
>
> Thanks, yes this was a bug. As the documentation states,  
> RangesMatchingList was considered experimental and not something  
> that was really tested. But I should have done a better job.
>
>
> Warning message:
> In .Method(..., deparse.level = deparse.level) :
>  number of rows of result is not a multiple of vector length (arg 1)
>
> This is due to the fact that space(x) returns many more spaces than  
> there are overlaps.
>
> This is a bug in space().
>
>
> I could solve that by changing the function into:
>
> setMethod("as.matrix", "RangesMatchingList", function(x) {
>  do.call(rbind,lapply(c(1:length(x)),function(i){mat <-  
> as.matrix(x[[i]]);cbind(space=rep(names(x)[[i]],nrow(mat)),mat)}))
> })
>
> Now, I do not know if I might have a particular use-case (having a  
> RangesMatchingList coming from the RangesList overlap function) that  
> you guys did not think of.
>
> It turns out that I had to rethink this method. As above, the user  
> will receive a character matrix, which probably isn't very useful.  
> Could translate the space names into integer IDs, but in order to  
> use that, one would have to split the matrix and loop over each  
> block. In that case, it would just be easier to loop over the  
> RangesMatchingList. Thus, I changed the function to return a doublet  
> matrix, just like RangesMatching, where the indices are adjusted so  
> that they are aligned with the result of calling 'unlist' on the  
> subject and query RangesLists (ie the index is global). I think this  
> will satisfy more use cases, but I'm not sure.
>
> These changes were applied in both trunk and release.
>
> Thanks for the feedback, and I'd appreciate more if you have any,
> Michael
>
>
> Just let me know,
>
> Best,
>
> ---------------------------------------------------------------
> Nicolas Delhomme
>
> High Throughput Functional Genomics Center
>
> European Molecular Biology Laboratory
>
> Tel: +49 6221 387 8426
> Email: nicolas.delhomme at embl.de
> Meyerhofstrasse 1 - Postfach 10.2209
> 69102 Heidelberg, Germany
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>



More information about the Bioc-sig-sequencing mailing list