[Bioc-sig-seq] RangedData objects. Redefining widths with conditions.
Ivan Gregoretti
ivangreg at gmail.com
Fri Apr 23 16:42:49 CEST 2010
Hi Steve,
What you showed worked. No question, but I found resize() to be
unprepared to convenient use in RangedData objects.
For example, consider a more biological set of data
Z <- RangedData(
RangesList(
chrA = IRanges(start = c(1, 4, 6), width=c(3, 2, 4)),
chrB = IRanges(start = c(1, 3, 6), width=c(3, 3, 4))),
score = c( 2, 7, 3, 1, 1, 1 ),
strand= c('+','+','-','+','-','-') )
> Z
RangedData with 6 rows and 2 value columns across 2 spaces
space ranges | score strand
<character> <IRanges> | <numeric> <character>
1 chrA [1, 3] | 2 +
2 chrA [4, 5] | 7 +
3 chrA [6, 9] | 3 -
4 chrB [1, 3] | 1 +
5 chrB [3, 5] | 1 -
6 chrB [6, 9] | 1 -
here is resize() inconvenience
resize(Z, width=200, fix=ifelse(Z$strand=='+','start','end'))
Error in function (classes, fdef, mtable) :
unable to find an inherited method for function "resize", for
signature "RangedData"
What does work is ranges(Z) rather than Z itself:
> resize(ranges(Z), width=200, fix=ifelse(Z$strand=='+','start','end'))
SimpleRangesList of length 2
$chrA
IRanges of length 3
start end width
[1] 1 200 200
[2] 4 203 200
[3] -190 9 200
$chrB
IRanges of length 3
start end width
[1] 1 200 200
[2] 3 202 200
[3] -190 9 200
but as you see, the RangedData object is lost. You have to coerce it:
> as(resize(ranges(Z), width=200, fix=ifelse(Z$strand=='+','start','end')), 'RangedData')
RangedData with 6 rows and 0 value columns across 2 spaces
space ranges |
<character> <IRanges> |
1 chrA [ 1, 200] |
2 chrA [ 4, 203] |
3 chrA [-190, 9] |
4 chrB [ 1, 200] |
5 chrB [ 3, 202] |
6 chrB [-190, 9] |
Now I got a RangedData object but the value columns are still lost. I
have to reconstruct it.
[warning: the following command is obnoxious]
> as(cbind(as.data.frame(as(resize(ranges(Z), width=200, fix=ifelse(Z$strand=='+','start','end')), 'RangedData')), as.data.frame(Z)[,5:dim(Z)[1]]), 'RangedData')
RangedData with 6 rows and 2 value columns across 2 spaces
space ranges | score strand
<character> <IRanges> | <numeric> <factor>
1 chrA [ 1, 200] | 2 +
2 chrA [ 4, 203] | 7 +
3 chrA [-190, 9] | 3 -
4 chrB [ 1, 200] | 1 +
5 chrB [ 3, 202] | 1 -
6 chrB [-190, 9] | 1 -
Granted. It works, but wouldn't it be more convenient this?:
resize(Z, width=200, fix=ifelse(Z$strand=='+','start','end'))
Z is a tiny toy example, biological sets are regularly multi-million
rows. My set is over 100 million rows; as I write this, my 144GB RAM
machine is doing the resizing the 'long way round', as obnoxiously
shown . Still working.........
I wonder if there is a 'cheaper' way resize a large RangedData
instance. A better solution would be to upgrade resize() but I am not
that R-skilled. I hope the developers will consider it.
Thank you,
Ivan
Ivan Gregoretti, PhD
National Institute of Diabetes and Digestive and Kidney Diseases
National Institutes of Health
On Thu, Apr 22, 2010 at 5:11 PM, Steve Lianoglou
<mailinglist.honeypot at gmail.com> wrote:
> Hi,
>
> On Thu, Apr 22, 2010 at 4:17 PM, Ivan Gregoretti <ivangreg at gmail.com> wrote:
>> Hello everybody,
>>
>> How do you resize() the ranges of a RangedData object?
>>
>>
>> In the past (IRanges 1.4.11), I could
>>
>> 1) extend forward 200 bases from the start in '+' ranges OR
>> 2) extend backward 200 bases from the end in '-' ranges.
>>
>> The syntax was something like this:
>>
>> resize(ranges(A), width = 200, start = A$strand == "+")
>>
>> In IRanges 1.5.70, the "start" argument of resize() has been
>> depracated and replaced by "fix".
>>
>> Can somebody show how to get the task accomplished with the new resize()?
>
> I'm pretty sure you use `fix` just like you use start:
>
> R> strands <- c("+", '-', '+', '-', '-')
> R> ir <- IRanges(c(1,10,20,30, 40), width=5)
> R> ir
> IRanges of length 5
> start end width
> [1] 1 5 5
> [2] 10 14 5
> [3] 20 24 5
> [4] 30 34 5
> [5] 40 44 5
>
> R> resize(ir, width=8, fix=ifelse(strands == '+', 'start', 'end'))
> IRanges of length 5
> start end width
> [1] 1 8 8
> [2] 7 14 8
> [3] 20 27 8
> [4] 27 34 8
> [5] 37 44 8
>
> --
> Steve Lianoglou
> Graduate Student: Computational Systems Biology
> | Memorial Sloan-Kettering Cancer Center
> | Weill Medical College of Cornell University
> Contact Info: http://cbio.mskcc.org/~lianos/contact
>
More information about the Bioc-sig-sequencing
mailing list