[Bioc-sig-seq] aggregating a RangedData over an IRanges object
Patrick Aboyoun
paboyoun at fhcrc.org
Tue Jul 14 20:54:36 CEST 2009
Simon,
I've made further changes to the IRanges package to make it easier to
perform the coverage operations across chromosomes/contigs. I've added
RleViewsList (virtual) and SimpleRleViewsList (non-virtual) classes to
IRanges as well as a RleViewsList constructor and viewSums and other
view* summary methods for RleViewList objects. The code below looks much
cleaner now due to the RleViewsList constructor and the viewSums method.
Also, I point out the viewApply function, which serves in the capacity
of the aggregate function that you were experimenting with in your
e-mails. Do these steps fit into your workflow?
> suppressMessages(library(IRanges))
> v3 <- RangedData(IRanges(start = c(1, 4, 6), width=c(3, 2, 4)),
+ score = c(2, 7, 3))
> mycoverage <- coverage(v3, weight = "score")
> exons <- IRangesList(IRanges(start = c(2, 6), end = c(4, 8)))
> exonViews <- RleViewsList(rleList = mycoverage, rangesList = exons)
> coverageAUC1 <- viewSums(exonViews)
> coverageAUC1
SimpleIntegerList: 1 element
> coverageAUC1[[1]]
[1] 11 9
> coverageAUC2 <- viewApply(exonViews, sum)
> coverageAUC2
SimpleList: 1 element
> coverageAUC2[[1]]
[1] 11 9
> sessionInfo()
R version 2.10.0 Under development (unstable) (2009-06-28 r48863)
i386-apple-darwin9.7.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] IRanges_1.3.35
Patrick
Patrick Aboyoun wrote:
> I just checked in a change to BioC 2.5 (devel) to make the coverage
> method for RangedData accept character strings as arguments. So now
> the function call looks a bit cleaner.
>
> > suppressMessages(library(IRanges))
> > v3 <- RangedData(IRanges(start = c(1, 4, 6), width=c(3, 2, 4)),
> + score = c(2, 7, 3))
> > mycoverage <- coverage(v3, weight = "score")
> > mycoverage[[1]]
> 'integer' Rle instance of length 9 with 3 runs
> Lengths: 3 2 4
> Values : 2 7 3
>
> This change should be pushed to bioconductor.org on the July 14th
> build of BioC 2.5 (devel).
>
>
> Patrick
>
>
>
> Patrick Aboyoun wrote:
>> Simon,
>> There is a coverage function for RangedData objects you can use. From
>> there you can calculate the exon AUCs using the viewSums function.
>>
>>
>> > suppressMessages(library(IRanges))
>> > v3 <- RangedData(IRanges(start = c(1, 4, 6), width=c(3, 2, 4)),
>> + score = c(2, 7, 3))
>> > mycoverage <- coverage(v3, weight = lapply(values(v3), "[[", "score"))
>> > mycoverage
>> SimpleRleList: 1 element
>> > mycoverage[[1]]
>> 'integer' Rle instance of length 9 with 3 runs
>> Lengths: 3 2 4
>> Values : 2 7 3
>> > myexons <- IRangesList(IRanges(start = c(2, 6), end = c(4, 8)))
>> > coverageAUC <- lapply(seq_len(length(mycoverage)),
>> + function(i) viewSums(Views(mycoverage[[i]], myexons[[i]])))
>> > coverageAUC
>> [[1]]
>> [1] 11 9
>> > sessionInfo()
>> R version 2.10.0 Under development (unstable) (2009-06-28 r48863)
>> i386-apple-darwin9.7.0
>>
>> locale:
>> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>>
>> attached base packages:
>> [1] stats graphics grDevices utils datasets methods
>> base other attached packages:
>> [1] IRanges_1.3.34
>>
>>
>>
>> Patrick
>>
>>
>> Simon Anders wrote:
>>> Hi Michael
>>>
>>> Michael Lawrence wrote:
>>>
>>>>> 1. How can I get the vector referring to chr10 in cvg.bl1? Is
>>>>> there a way
>>>>> to coerce to Rle or numeric? Supposedly, this only make sense if I
>>>>> subset
>>>>> the RangedData object to only contain one chromosome.
>>>>>
>>>>>
>>>> score(cvg.bl1)
>>>> or for any arbitrary column:
>>>> cvg.bl1$score
>>>>
>>>
>>> Sorry, this is not at all what I meant, but this misunderstanding
>>> highlights the general issue I see with Rle and RangedData. You see,
>>> for
>>> me, there is on piece of data, a very long vector, and different
>>> ways to
>>> store it in memory. Let's say, my vector is
>>>
>>> v1 <- c( 2, 2, 2, 7, 7, 3, 3, 3, 3 )
>>>
>>> I could store this either as an ordinary vector, as above, or as an Rle
>>>
>>> v2 <- Rle( c( 2, 7, 3 ), c( 3, 2, 4 ) )
>>>
>>> or as a RangedData object
>>>
>>> v3 <- RangedData(
>>> IRanges( start=c( 1, 4, 6 ), width=c( 3, 2, 4 ) ),
>>> score = c( 2, 7, 3 ) )
>>>
>>> I can get from v2 to v1 with "as.vector"
>>>
>>> > as.vector( v2 )
>>> [1] 2 2 2 7 7 3 3 3 3
>>>
>>> > all( as.vector( v2 ) == v1 )
>>> [1] TRUE
>>>
>>> But it escapes me how I am supposed to get from v3 to v1. This is not
>>> what I want:
>>>
>>> > score(v3)
>>> [1] 2 7 3
>>>
>>> This here is overly complicated and works only because there are no
>>> gaps
>>> between the intervals:
>>>
>>> > rep( score(v3), each=width(v3) )
>>> [1] 2 2 2 7 7 7 3 3 3
>>>
>>>
>>>
>>>>> 2. How can I aggregate over the RangedData object? As it contains
>>>>> only one
>>>>> chromosome, it should be possible to coerce it to an Rle vector,
>>>>> but there
>>>>> is no coercion method, and, it should be possible anyway to avoid
>>>>> this.
>>>>>
>>>>>
>>>> Just use Views.
>>>> Like:
>>>> v <- Views(score(cvg.bl1), ranges(exons)[[1]])
>>>> viewSums(v)
>>>>
>>>
>>> Actually, it does not work:
>>>
>>>
>>>> v <- Views(score(cvg.bl1), ranges(exons)[[1]])
>>>>
>>> Error in function (classes, fdef, mtable) :
>>> unable to find an inherited method for function "Views",
>>> for signature "numeric"
>>>
>>> The reason is that 'score(cvg.bl1)' is just the numeric vector of
>>> values, and the information of the widths of the runs is lost here. So,
>>> even if there were a suitable method, it would not do what I want.
>>>
>>> In order to go on with my previous variables, let's define two index
>>> ranges:
>>>
>>> > r1 <- list( 2:3, 7:8 )
>>>
>>> With IRanges, I would rather write
>>>
>>> > r2 <- IRanges( start=c(2,7), width=c(2,2) )
>>>
>>> to denote the same.
>>>
>>> So, what I want is the sum of the indexed entries. With standard R
>>> vectors, I write
>>>
>>> > sapply( r1, function(i) sum( v1[i] ) )
>>> [1] 4 6
>>>
>>> With IRanges and Rle, I do
>>>
>>> > aggregate( v2, r2, sum )
>>> [1] 4 6
>>>
>>> to get the same. But what should I do if I have the RangedData v3
>>> instead of the Rle v2? The aggregate method does not support
>>> RangedData:
>>>
>>> > aggregate( v3, r2, sum )
>>> Error in FUN(window(x, start = start[i], end = end[i]), ...) :
>>> invalid 'type' (S4) of argument
>>>
>>> One might argue that aggregate should not support it, because a
>>> GenomeData object is a list of vectors, and an IRanges is not. Then, it
>>> would be nice if this here worked:
>>>
>>> > r3 <- IRangesList( score = r2 )
>>> > aggregate( v3, r3, sum )
>>>
>>>
>>> What I mean is that aggregate on the signature c( "RangedData",
>>> "IRangesList", "FUN" ) could be defined to go through the named (or
>>> indexed) elements of both its first and seond argument, aggregate these
>>> and return, in this case, list( score=c(4,6) ), and in general a list
>>> with one entry per field that both arguments share.
>>>
>>> Note that your suggestion with Views works for IRanges and Rle:
>>>
>>> > Views( v2, r2 )
>>> Views on a 9-length Rle subject
>>>
>>> views:
>>> start end width
>>> [1] 2 3 2 [2 2]
>>> [2] 7 8 2 [3 3]
>>> > viewSums( Views( v2, r2 ) )
>>> [1] 4 6
>>>
>>> but again, it cannot handle RangedData objects:
>>>
>>> > Views( v3, r2 )
>>> Error in function (classes, fdef, mtable) :
>>> unable to find an inherited method for function "Views", for
>>> signature "RangedData"
>>>
>>> > Views( v3, r3 )
>>> Error in function (classes, fdef, mtable) :
>>> unable to find an inherited method for function "Views", for
>>> signature "RangedData"
>>>
>>>
>>>
>>> Either, I am missing something, or RangedData is missing something.
>>>
>>>
>>> Cheers
>>> Simon
>>>
>>> _______________________________________________
>>> Bioc-sig-sequencing mailing list
>>> Bioc-sig-sequencing at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>>
>>
>> _______________________________________________
>> Bioc-sig-sequencing mailing list
>> Bioc-sig-sequencing at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
More information about the Bioc-sig-sequencing
mailing list