[Bioc-sig-seq] aggregating a RangedData over an IRanges object
Patrick Aboyoun
paboyoun at fhcrc.org
Tue Jul 14 02:09:10 CEST 2009
Simon,
There is a coverage function for RangedData objects you can use. From
there you can calculate the exon AUCs using the viewSums function.
> suppressMessages(library(IRanges))
> v3 <- RangedData(IRanges(start = c(1, 4, 6), width=c(3, 2, 4)),
+ score = c(2, 7, 3))
> mycoverage <- coverage(v3, weight = lapply(values(v3), "[[", "score"))
> mycoverage
SimpleRleList: 1 element
> mycoverage[[1]]
'integer' Rle instance of length 9 with 3 runs
Lengths: 3 2 4
Values : 2 7 3
> myexons <- IRangesList(IRanges(start = c(2, 6), end = c(4, 8)))
> coverageAUC <- lapply(seq_len(length(mycoverage)),
+ function(i) viewSums(Views(mycoverage[[i]], myexons[[i]])))
> coverageAUC
[[1]]
[1] 11 9
> sessionInfo()
R version 2.10.0 Under development (unstable) (2009-06-28 r48863)
i386-apple-darwin9.7.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] IRanges_1.3.34
Patrick
Simon Anders wrote:
> Hi Michael
>
> Michael Lawrence wrote:
>
>>> 1. How can I get the vector referring to chr10 in cvg.bl1? Is there a way
>>> to coerce to Rle or numeric? Supposedly, this only make sense if I subset
>>> the RangedData object to only contain one chromosome.
>>>
>>>
>> score(cvg.bl1)
>> or for any arbitrary column:
>> cvg.bl1$score
>>
>
> Sorry, this is not at all what I meant, but this misunderstanding
> highlights the general issue I see with Rle and RangedData. You see, for
> me, there is on piece of data, a very long vector, and different ways to
> store it in memory. Let's say, my vector is
>
> v1 <- c( 2, 2, 2, 7, 7, 3, 3, 3, 3 )
>
> I could store this either as an ordinary vector, as above, or as an Rle
>
> v2 <- Rle( c( 2, 7, 3 ), c( 3, 2, 4 ) )
>
> or as a RangedData object
>
> v3 <- RangedData(
> IRanges( start=c( 1, 4, 6 ), width=c( 3, 2, 4 ) ),
> score = c( 2, 7, 3 ) )
>
> I can get from v2 to v1 with "as.vector"
>
> > as.vector( v2 )
> [1] 2 2 2 7 7 3 3 3 3
>
> > all( as.vector( v2 ) == v1 )
> [1] TRUE
>
> But it escapes me how I am supposed to get from v3 to v1. This is not
> what I want:
>
> > score(v3)
> [1] 2 7 3
>
> This here is overly complicated and works only because there are no gaps
> between the intervals:
>
> > rep( score(v3), each=width(v3) )
> [1] 2 2 2 7 7 7 3 3 3
>
>
>
>>> 2. How can I aggregate over the RangedData object? As it contains only one
>>> chromosome, it should be possible to coerce it to an Rle vector, but there
>>> is no coercion method, and, it should be possible anyway to avoid this.
>>>
>>>
>> Just use Views.
>> Like:
>> v <- Views(score(cvg.bl1), ranges(exons)[[1]])
>> viewSums(v)
>>
>
> Actually, it does not work:
>
>
>> v <- Views(score(cvg.bl1), ranges(exons)[[1]])
>>
> Error in function (classes, fdef, mtable) :
> unable to find an inherited method for function "Views",
> for signature "numeric"
>
> The reason is that 'score(cvg.bl1)' is just the numeric vector of
> values, and the information of the widths of the runs is lost here. So,
> even if there were a suitable method, it would not do what I want.
>
> In order to go on with my previous variables, let's define two index ranges:
>
> > r1 <- list( 2:3, 7:8 )
>
> With IRanges, I would rather write
>
> > r2 <- IRanges( start=c(2,7), width=c(2,2) )
>
> to denote the same.
>
> So, what I want is the sum of the indexed entries. With standard R
> vectors, I write
>
> > sapply( r1, function(i) sum( v1[i] ) )
> [1] 4 6
>
> With IRanges and Rle, I do
>
> > aggregate( v2, r2, sum )
> [1] 4 6
>
> to get the same. But what should I do if I have the RangedData v3
> instead of the Rle v2? The aggregate method does not support RangedData:
>
> > aggregate( v3, r2, sum )
> Error in FUN(window(x, start = start[i], end = end[i]), ...) :
> invalid 'type' (S4) of argument
>
> One might argue that aggregate should not support it, because a
> GenomeData object is a list of vectors, and an IRanges is not. Then, it
> would be nice if this here worked:
>
> > r3 <- IRangesList( score = r2 )
> > aggregate( v3, r3, sum )
>
>
> What I mean is that aggregate on the signature c( "RangedData",
> "IRangesList", "FUN" ) could be defined to go through the named (or
> indexed) elements of both its first and seond argument, aggregate these
> and return, in this case, list( score=c(4,6) ), and in general a list
> with one entry per field that both arguments share.
>
> Note that your suggestion with Views works for IRanges and Rle:
>
> > Views( v2, r2 )
> Views on a 9-length Rle subject
>
> views:
> start end width
> [1] 2 3 2 [2 2]
> [2] 7 8 2 [3 3]
> > viewSums( Views( v2, r2 ) )
> [1] 4 6
>
> but again, it cannot handle RangedData objects:
>
> > Views( v3, r2 )
> Error in function (classes, fdef, mtable) :
> unable to find an inherited method for function "Views", for
> signature "RangedData"
>
> > Views( v3, r3 )
> Error in function (classes, fdef, mtable) :
> unable to find an inherited method for function "Views", for
> signature "RangedData"
>
>
>
> Either, I am missing something, or RangedData is missing something.
>
>
> Cheers
> Simon
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>
More information about the Bioc-sig-sequencing
mailing list