[Bioc-sig-seq] Resizing a GRanges object with elements on the "*" strand
Ivan Gregoretti
ivangreg at gmail.com
Tue Sep 14 23:59:33 CEST 2010
Well, backward compatibility is _the_ priority.
Your proposal is fine with me. It would be very important to update
the manual providing a couple examples. Whatever minimises the
trial-and-error of the new user.
Thanks,
Ivan
On Tue, Sep 14, 2010 at 5:35 PM, Michael Lawrence
<lawrence.michael at gene.com> wrote:
>
>
> On Tue, Sep 14, 2010 at 2:13 PM, Ivan Gregoretti <ivangreg at gmail.com> wrote:
>>
>> Hi Michael, since you are asking for opinions...
>>
>>
>> When specifying the 'fix=' argument:
>>
>> In my view, fix="start" should always resize at the "start" regardless
>> of strandedness.
>>
>>
>
> Well this would be inconsistent of 'flank'. If you do not want strand to
> affect the results, you could set strand temporarily to "*".
>
>>
>> Default behaviour (when not specifying the 'fix=' argument):
>>
>>
>> When the strand is "*", I would expect resize() to default to
>> fix="center" rather than "start".
>
> Why would you expect this? resize() on a Ranges defaults to start. If strand
> is irrelevant, I would expect it to behave like a Ranges.
>
>>
>> When strands are "+" and "-" I would expect resize to default to
>> fix="start" and fix="end' respectively.
>>
>>
>
> But what if the user is interested in the ends, and specified fix = "end",
> but then ended up overriding the strand-dependent default? I think having
> the default dependent on strand and the non-default independent is
> confusing. Better to take one perspective: either fix is relative to the
> beginning of transcription or it isn't, and then we need to fix flank() too.
> We need to worry about back-compatibility, and my solution preserves that,
> except for "*" indicating "center", which would surprise many. I'm trying to
> resist the temptation of a 1-1 mapping between two tri-state variables.
>
> Michael
>
>>
>> Thank you,
>>
>> Ivan
>>
>>
>>
>> On Tue, Sep 14, 2010 at 4:46 PM, Michael Lawrence
>> <lawrence.michael at gene.com> wrote:
>> > I just checked in some changes to IRanges, that make this method work:
>> >
>> > setMethod("resize", "GenomicRanges",
>> > function(x, width, fix = "start", use.names = TRUE)
>> > {
>> > revFix <- c(start = "end", end = "start", center = "center")
>> > fix <- ifelse(strand(x) == "-", revFix[fix], fix)
>> > ranges <-
>> > resize(ranges(x), width = width, fix = fix, use.names =
>> > use.names)
>> > if (!IRanges:::anyMissing(seqlengths(x))) {
>> > start(x) <- start(ranges)
>> > end(x) <- end(ranges)
>> > } else {
>> > x <- clone(x, ranges = ranges)
>> > }
>> > x
>> > }
>> > )
>> >
>> > That will accept the fix argument, except start and end are reversed for
>> > negative strand features. '*' is treated just like '+'. If this is
>> > acceptable to the GenomicRanges guys, I will commit this.
>> >
>> > On Tue, Sep 14, 2010 at 9:36 AM, Ivan Gregoretti <ivangreg at gmail.com>
>> > wrote:
>> >>
>> >> Hello Leonardo,
>> >>
>> >> I believe that the issue here is that resize() does not support the
>> >> "fix" argument at all when handling GRanges.
>> >>
>> >> Actually that would be a nice upgrade of functionality for GRanges.
>> >>
>> >> I face the same limitation and I currently resize by hand. :(
>> >>
>> >> This is my work around:
>> >>
>> >> library(GenomicRanges)
>> >>
>> >> # a set of genomic features called C
>> >> C <- GRanges(seqnames=c("chr1","chr2","chr19","chrX"),
>> >> ranges=IRanges(start=c(0,0,5,1),
>> >> end=c(150,150,150,400)),
>> >> strand=c("*","-","*","+"),
>> >> score=c(10,20,30,90))
>> >>
>> >>
>> >> # peek at C
>> >> C
>> >> GRanges with 4 ranges and 1 elementMetadata value
>> >> seqnames ranges strand | score
>> >> <Rle> <IRanges> <Rle> | <numeric>
>> >> [1] chr1 [0, 150] * | 10
>> >> [2] chr2 [0, 150] - | 20
>> >> [3] chr19 [5, 150] * | 30
>> >> [4] chrX [1, 400] + | 90
>> >>
>> >> seqlengths
>> >> chr1 chr19 chr2 chrX
>> >> NA NA NA NA
>> >>
>> >> # this is the workaround
>> >> ranges(C) <- resize(ranges(C),1,fix="start")
>> >>
>> >> # peek at the resized set C
>> >> C
>> >> GRanges with 4 ranges and 1 elementMetadata value
>> >> seqnames ranges strand | score
>> >> <Rle> <IRanges> <Rle> | <numeric>
>> >> [1] chr1 [0, 0] * | 10
>> >> [2] chr2 [0, 0] - | 20
>> >> [3] chr19 [5, 5] * | 30
>> >> [4] chrX [1, 1] + | 90
>> >>
>> >> seqlengths
>> >> chr1 chr19 chr2 chrX
>> >> NA NA NA NA
>> >>
>> >>
>> >> Cheers,
>> >>
>> >> Ivan
>> >>
>> >>
>> >> Ivan Gregoretti, PhD
>> >> National Institute of Diabetes and Digestive and Kidney Diseases
>> >> National Institutes of Health
>> >> 5 Memorial Dr, Building 5, Room 205.
>> >> Bethesda, MD 20892. USA.
>> >> Phone: 1-301-496-1016 and 1-301-496-1592
>> >> Fax: 1-301-496-9878
>> >>
>> >>
>> >>
>> >> On Tue, Sep 14, 2010 at 11:36 AM, Leonardo Collado Torres
>> >> <lcollado at ibt.unam.mx> wrote:
>> >> > Hello,
>> >> >
>> >> > I have a rather simple question that involves GenomicRanges' design.
>> >> >
>> >> > Basically, I have a GRanges object where all the elements are from
>> >> > the
>> >> > undefined "*" strand. I just want to resize them to get the 1st (from
>> >> > left
>> >> > to right) base. However, I'm not able to do so with the "resize"
>> >> > function
>> >> > even when specifying fix = "start" as it uses the fix = "center"
>> >> > method.
>> >> > Is
>> >> > this the desired performance? I have 2 workarounds, but I'm puzzled
>> >> > as
>> >> > the
>> >> > "flank" function actually uses the start (left to right) when
>> >> > elements
>> >> > are
>> >> > from the "*" strand. Is there a quicker way to do this or should I
>> >> > stick
>> >> > to
>> >> > the flank + shift workaround?
>> >> >
>> >> > Thank you and greetings,
>> >> > Leonardo
>> >> >
>> >> >> testGR <- GRanges( seqnames = rep("test", 3), ranges = IRanges (
>> >> >> start
>> >> >> =
>> >> > c(10,100,1000), width = c(10, 100, 1000)), strand =
>> >> > Rle(strand(c("+","-")),
>> >> > c(1,2)) )
>> >> >> testGR
>> >> > GRanges with 3 ranges and 0 elementMetadata values
>> >> > seqnames ranges strand |
>> >> > <Rle> <IRanges> <Rle> |
>> >> > [1] test [ 10, 19] + |
>> >> > [2] test [ 100, 199] - |
>> >> > [3] test [1000, 1999] - |
>> >> >
>> >> > seqlengths
>> >> > test
>> >> > NA
>> >> >> resize(testGR, 1, fix="start")
>> >> > GRanges with 3 ranges and 0 elementMetadata values
>> >> > seqnames ranges strand |
>> >> > <Rle> <IRanges> <Rle> |
>> >> > [1] test [ 10, 10] + |
>> >> > [2] test [ 199, 199] - |
>> >> > [3] test [1999, 1999] - |
>> >> >
>> >> > seqlengths
>> >> > test
>> >> > NA
>> >> >> testGR2 <- GRanges( seqnames = rep("test", 3), ranges = IRanges (
>> >> >> start
>> >> >> =
>> >> > c(10,100,1000), width = c(10, 100, 1000)), strand =
>> >> > Rle(strand(c("*")),
>> >> > c(3)) )
>> >> >> testGR2
>> >> > GRanges with 3 ranges and 0 elementMetadata values
>> >> > seqnames ranges strand |
>> >> > <Rle> <IRanges> <Rle> |
>> >> > [1] test [ 10, 19] * |
>> >> > [2] test [ 100, 199] * |
>> >> > [3] test [1000, 1999] * |
>> >> >
>> >> > seqlengths
>> >> > test
>> >> > NA
>> >> >> resize(testGR2, 1, fix="start")
>> >> > GRanges with 3 ranges and 0 elementMetadata values
>> >> > seqnames ranges strand |
>> >> > <Rle> <IRanges> <Rle> |
>> >> > [1] test [ 14, 14] * |
>> >> > [2] test [ 149, 149] * |
>> >> > [3] test [1499, 1499] * |
>> >> >
>> >> > seqlengths
>> >> > test
>> >> > NA
>> >> >
>> >> >> testGR3 <- GRanges ( seqnames = seqnames(testGR2), ranges = IRanges(
>> >> >> start
>> >> > = start(testGR2), width = 1), strand = strand(testGR2) )
>> >> >>
>> >> >> testGR3
>> >> > GRanges with 3 ranges and 0 elementMetadata values
>> >> > seqnames ranges strand |
>> >> > <Rle> <IRanges> <Rle> |
>> >> > [1] test [ 10, 10] * |
>> >> > [2] test [ 100, 100] * |
>> >> > [3] test [1000, 1000] * |
>> >> >
>> >> > seqlengths
>> >> > test
>> >> > NA
>> >> >>
>> >> >
>> >> >> testGR4 <- shift(flank( testGR2, 1), 1)
>> >> >> testGR4
>> >> > GRanges with 3 ranges and 0 elementMetadata values
>> >> > seqnames ranges strand |
>> >> > <Rle> <IRanges> <Rle> |
>> >> > [1] test [ 10, 10] * |
>> >> > [2] test [ 100, 100] * |
>> >> > [3] test [1000, 1000] * |
>> >> >
>> >> > seqlengths
>> >> > test
>> >> > NA
>> >> >
>> >> >> sessionInfo()
>> >> > R version 2.12.0 Under development (unstable) (2010-09-08 r52880)
>> >> > Platform: x86_64-unknown-linux-gnu (64-bit)
>> >> >
>> >> > locale:
>> >> > [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C
>> >> > [3] LC_TIME=en_US.utf8 LC_COLLATE=en_US.utf8
>> >> > [5] LC_MONETARY=C LC_MESSAGES=en_US.utf8
>> >> > [7] LC_PAPER=en_US.utf8 LC_NAME=C
>> >> > [9] LC_ADDRESS=C LC_TELEPHONE=C
>> >> > [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C
>> >> >
>> >> > attached base packages:
>> >> > [1] stats graphics grDevices utils datasets methods base
>> >> >
>> >> > other attached packages:
>> >> > [1] GenomicRanges_1.1.25 IRanges_1.7.33
>> >> >
>> >> > loaded via a namespace (and not attached):
>> >> > [1] tools_2.12.0
>> >> >
>> >> > [[alternative HTML version deleted]]
>> >> >
>> >> > _______________________________________________
>> >> > Bioc-sig-sequencing mailing list
>> >> > Bioc-sig-sequencing at r-project.org
>> >> > https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>> >> >
>> >>
>> >> _______________________________________________
>> >> Bioc-sig-sequencing mailing list
>> >> Bioc-sig-sequencing at r-project.org
>> >> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>> >
>> >
>
>
More information about the Bioc-sig-sequencing
mailing list