[Bioc-sig-seq] Resizing a GRanges object with elements on the "*" strand

Ivan Gregoretti ivangreg at gmail.com
Tue Sep 14 23:59:33 CEST 2010


Well, backward compatibility is _the_ priority.

Your proposal is fine with me. It would be very important to update
the manual providing a couple examples. Whatever minimises the
trial-and-error of the new user.

Thanks,

Ivan



On Tue, Sep 14, 2010 at 5:35 PM, Michael Lawrence
<lawrence.michael at gene.com> wrote:
>
>
> On Tue, Sep 14, 2010 at 2:13 PM, Ivan Gregoretti <ivangreg at gmail.com> wrote:
>>
>> Hi Michael, since you are asking for opinions...
>>
>>
>> When specifying the 'fix=' argument:
>>
>> In my view, fix="start" should always resize at the "start" regardless
>> of strandedness.
>>
>>
>
> Well this would be inconsistent of 'flank'. If you do not want strand to
> affect the results, you could set strand temporarily to "*".
>
>>
>> Default behaviour (when not specifying the 'fix=' argument):
>>
>>
>> When the strand is "*", I would expect resize() to default to
>> fix="center" rather than "start".
>
> Why would you expect this? resize() on a Ranges defaults to start. If strand
> is irrelevant, I would expect it to behave like a Ranges.
>
>>
>> When strands are "+" and "-" I would expect resize to default to
>> fix="start" and fix="end' respectively.
>>
>>
>
> But what if the user is interested in the ends, and specified fix = "end",
> but then ended up overriding the strand-dependent default? I think having
> the default dependent on strand and the non-default independent is
> confusing. Better to take one perspective: either fix is relative to the
> beginning of transcription or it isn't, and then we need to fix flank() too.
> We need to worry about back-compatibility, and my solution preserves that,
> except for "*" indicating "center", which would surprise many. I'm trying to
> resist the temptation of a 1-1 mapping between two tri-state variables.
>
> Michael
>
>>
>> Thank you,
>>
>> Ivan
>>
>>
>>
>> On Tue, Sep 14, 2010 at 4:46 PM, Michael Lawrence
>> <lawrence.michael at gene.com> wrote:
>> > I just checked in some changes to IRanges, that make this method work:
>> >
>> > setMethod("resize", "GenomicRanges",
>> >           function(x, width, fix = "start", use.names = TRUE)
>> >           {
>> >             revFix <- c(start = "end", end = "start", center = "center")
>> >             fix <- ifelse(strand(x) == "-", revFix[fix], fix)
>> >             ranges <-
>> >               resize(ranges(x), width = width, fix = fix, use.names =
>> > use.names)
>> >             if (!IRanges:::anyMissing(seqlengths(x))) {
>> >               start(x) <- start(ranges)
>> >               end(x) <- end(ranges)
>> >             } else {
>> >               x <- clone(x, ranges = ranges)
>> >             }
>> >             x
>> >           }
>> >           )
>> >
>> > That will accept the fix argument, except start and end are reversed for
>> > negative strand features. '*' is treated just like '+'. If this is
>> > acceptable to the GenomicRanges guys, I will commit this.
>> >
>> > On Tue, Sep 14, 2010 at 9:36 AM, Ivan Gregoretti <ivangreg at gmail.com>
>> > wrote:
>> >>
>> >> Hello Leonardo,
>> >>
>> >> I believe that the issue here is that resize() does not support the
>> >> "fix" argument at all when handling GRanges.
>> >>
>> >> Actually that would be a nice upgrade of functionality for GRanges.
>> >>
>> >> I face the same limitation and I currently resize by hand. :(
>> >>
>> >> This is my work around:
>> >>
>> >> library(GenomicRanges)
>> >>
>> >> # a set of genomic features called C
>> >> C <- GRanges(seqnames=c("chr1","chr2","chr19","chrX"),
>> >>             ranges=IRanges(start=c(0,0,5,1),
>> >>                            end=c(150,150,150,400)),
>> >>             strand=c("*","-","*","+"),
>> >>             score=c(10,20,30,90))
>> >>
>> >>
>> >> # peek at C
>> >> C
>> >> GRanges with 4 ranges and 1 elementMetadata value
>> >>    seqnames    ranges strand |     score
>> >>       <Rle> <IRanges>  <Rle> | <numeric>
>> >> [1]     chr1  [0, 150]      * |        10
>> >> [2]     chr2  [0, 150]      - |        20
>> >> [3]    chr19  [5, 150]      * |        30
>> >> [4]     chrX  [1, 400]      + |        90
>> >>
>> >> seqlengths
>> >>  chr1 chr19  chr2  chrX
>> >>    NA    NA    NA    NA
>> >>
>> >> # this is the workaround
>> >> ranges(C) <- resize(ranges(C),1,fix="start")
>> >>
>> >> # peek at the resized set C
>> >> C
>> >> GRanges with 4 ranges and 1 elementMetadata value
>> >>    seqnames    ranges strand |     score
>> >>       <Rle> <IRanges>  <Rle> | <numeric>
>> >> [1]     chr1    [0, 0]      * |        10
>> >> [2]     chr2    [0, 0]      - |        20
>> >> [3]    chr19    [5, 5]      * |        30
>> >> [4]     chrX    [1, 1]      + |        90
>> >>
>> >> seqlengths
>> >>  chr1 chr19  chr2  chrX
>> >>    NA    NA    NA    NA
>> >>
>> >>
>> >> Cheers,
>> >>
>> >> Ivan
>> >>
>> >>
>> >> Ivan Gregoretti, PhD
>> >> National Institute of Diabetes and Digestive and Kidney Diseases
>> >> National Institutes of Health
>> >> 5 Memorial Dr, Building 5, Room 205.
>> >> Bethesda, MD 20892. USA.
>> >> Phone: 1-301-496-1016 and 1-301-496-1592
>> >> Fax: 1-301-496-9878
>> >>
>> >>
>> >>
>> >> On Tue, Sep 14, 2010 at 11:36 AM, Leonardo Collado Torres
>> >> <lcollado at ibt.unam.mx> wrote:
>> >> > Hello,
>> >> >
>> >> > I have a rather simple question that involves GenomicRanges' design.
>> >> >
>> >> > Basically, I have a GRanges object where all the elements are from
>> >> > the
>> >> > undefined "*" strand. I just want to resize them to get the 1st (from
>> >> > left
>> >> > to right) base. However, I'm not able to do so with the "resize"
>> >> > function
>> >> > even when specifying fix = "start" as it uses the fix = "center"
>> >> > method.
>> >> > Is
>> >> > this the desired performance? I have 2 workarounds, but I'm puzzled
>> >> > as
>> >> > the
>> >> > "flank" function actually uses the start (left to right) when
>> >> > elements
>> >> > are
>> >> > from the "*" strand. Is there a quicker way to do this or should I
>> >> > stick
>> >> > to
>> >> > the flank + shift workaround?
>> >> >
>> >> > Thank you and greetings,
>> >> > Leonardo
>> >> >
>> >> >> testGR <- GRanges( seqnames = rep("test", 3), ranges = IRanges (
>> >> >> start
>> >> >> =
>> >> > c(10,100,1000), width = c(10, 100, 1000)), strand =
>> >> > Rle(strand(c("+","-")),
>> >> > c(1,2)) )
>> >> >> testGR
>> >> > GRanges with 3 ranges and 0 elementMetadata values
>> >> >    seqnames       ranges strand |
>> >> >       <Rle>    <IRanges>  <Rle> |
>> >> > [1]     test [  10,   19]      + |
>> >> > [2]     test [ 100,  199]      - |
>> >> > [3]     test [1000, 1999]      - |
>> >> >
>> >> > seqlengths
>> >> >  test
>> >> >   NA
>> >> >> resize(testGR, 1, fix="start")
>> >> > GRanges with 3 ranges and 0 elementMetadata values
>> >> >    seqnames       ranges strand |
>> >> >       <Rle>    <IRanges>  <Rle> |
>> >> > [1]     test [  10,   10]      + |
>> >> > [2]     test [ 199,  199]      - |
>> >> > [3]     test [1999, 1999]      - |
>> >> >
>> >> > seqlengths
>> >> >  test
>> >> >   NA
>> >> >> testGR2 <- GRanges( seqnames = rep("test", 3), ranges = IRanges (
>> >> >> start
>> >> >> =
>> >> > c(10,100,1000), width = c(10, 100, 1000)), strand =
>> >> > Rle(strand(c("*")),
>> >> > c(3)) )
>> >> >> testGR2
>> >> > GRanges with 3 ranges and 0 elementMetadata values
>> >> >    seqnames       ranges strand |
>> >> >       <Rle>    <IRanges>  <Rle> |
>> >> > [1]     test [  10,   19]      * |
>> >> > [2]     test [ 100,  199]      * |
>> >> > [3]     test [1000, 1999]      * |
>> >> >
>> >> > seqlengths
>> >> >  test
>> >> >   NA
>> >> >> resize(testGR2, 1, fix="start")
>> >> > GRanges with 3 ranges and 0 elementMetadata values
>> >> >    seqnames       ranges strand |
>> >> >       <Rle>    <IRanges>  <Rle> |
>> >> > [1]     test [  14,   14]      * |
>> >> > [2]     test [ 149,  149]      * |
>> >> > [3]     test [1499, 1499]      * |
>> >> >
>> >> > seqlengths
>> >> >  test
>> >> >   NA
>> >> >
>> >> >> testGR3 <- GRanges ( seqnames = seqnames(testGR2), ranges = IRanges(
>> >> >> start
>> >> > = start(testGR2), width = 1), strand = strand(testGR2) )
>> >> >>
>> >> >> testGR3
>> >> > GRanges with 3 ranges and 0 elementMetadata values
>> >> >    seqnames       ranges strand |
>> >> >       <Rle>    <IRanges>  <Rle> |
>> >> > [1]     test [  10,   10]      * |
>> >> > [2]     test [ 100,  100]      * |
>> >> > [3]     test [1000, 1000]      * |
>> >> >
>> >> > seqlengths
>> >> >  test
>> >> >   NA
>> >> >>
>> >> >
>> >> >> testGR4 <- shift(flank( testGR2, 1), 1)
>> >> >> testGR4
>> >> > GRanges with 3 ranges and 0 elementMetadata values
>> >> >    seqnames       ranges strand |
>> >> >       <Rle>    <IRanges>  <Rle> |
>> >> > [1]     test [  10,   10]      * |
>> >> > [2]     test [ 100,  100]      * |
>> >> > [3]     test [1000, 1000]      * |
>> >> >
>> >> > seqlengths
>> >> >  test
>> >> >   NA
>> >> >
>> >> >> sessionInfo()
>> >> > R version 2.12.0 Under development (unstable) (2010-09-08 r52880)
>> >> > Platform: x86_64-unknown-linux-gnu (64-bit)
>> >> >
>> >> > locale:
>> >> >  [1] LC_CTYPE=en_US.utf8       LC_NUMERIC=C
>> >> >  [3] LC_TIME=en_US.utf8        LC_COLLATE=en_US.utf8
>> >> >  [5] LC_MONETARY=C             LC_MESSAGES=en_US.utf8
>> >> >  [7] LC_PAPER=en_US.utf8       LC_NAME=C
>> >> >  [9] LC_ADDRESS=C              LC_TELEPHONE=C
>> >> > [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C
>> >> >
>> >> > attached base packages:
>> >> > [1] stats     graphics  grDevices utils     datasets  methods   base
>> >> >
>> >> > other attached packages:
>> >> > [1] GenomicRanges_1.1.25 IRanges_1.7.33
>> >> >
>> >> > loaded via a namespace (and not attached):
>> >> > [1] tools_2.12.0
>> >> >
>> >> >        [[alternative HTML version deleted]]
>> >> >
>> >> > _______________________________________________
>> >> > Bioc-sig-sequencing mailing list
>> >> > Bioc-sig-sequencing at r-project.org
>> >> > https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>> >> >
>> >>
>> >> _______________________________________________
>> >> Bioc-sig-sequencing mailing list
>> >> Bioc-sig-sequencing at r-project.org
>> >> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>> >
>> >
>
>



More information about the Bioc-sig-sequencing mailing list