[Bioc-sig-seq] Finding Mean Value of Overlapping Ranges

Mon Jun 28 01:18:11 CEST 2010

Thanks for this suggestion. It is super fast !

- Dario.

---- Original message ----
>Date: Fri, 25 Jun 2010 23:11:25 -0700
>From: Hervé Pagès <hpages at fhcrc.org>  
>Subject: Re: [Bioc-sig-seq] Finding Mean Value of Overlapping Ranges  
>To: D.Strbenac at garvan.org.au
>Cc: Michael Lawrence <lawrence.michael at gene.com>, bioc-sig-sequencing at r-project.org
>
>Hi Dario,
>
>You can try to use 'successiveIRanges(runLength(qrle))' instead
>of 'as(qrle, "IRanges")'.
>
>Cheers,
>H.
>
>
>On 06/25/2010 01:05 AM, Dario Strbenac wrote:
>> That's a neat and elegant idea, but it's not actually possible to do the following part
>>
>> as(qrle, "IRanges")
>>
>> Error in asMethod(object) :
>>    cannot coerce a non-logical 'Rle' or a logical 'Rle' with NAs to an IRanges object
>>
>> Thanks,
>>         Dario.
>>
>>
>> ---- Original message ----
>>> Date: Thu, 24 Jun 2010 23:53:08 -0700
>>> From: Michael Lawrence<lawrence.michael at gene.com>
>>> Subject: Re: [Bioc-sig-seq] Finding Mean Value of Overlapping Ranges
>>> To: D.Strbenac at garvan.org.au
>>> Cc: bioc-sig-sequencing at r-project.org
>>>
>>>    On Thu, Jun 24, 2010 at 10:31 PM, Dario Strbenac
>>>    <D.Strbenac at garvan.org.au>  wrote:
>>>
>>>      Hello,
>>>
>>>      I have a question about what is the most efficient
>>>      way to perform my use case.
>>>
>>>      What I have done is gotten a matchMatrix from an
>>>      overlapping, then split it :
>>>
>>>      regionSiteMap<- findOverlaps(regions,
>>>      sites)@matchMatrix
>>>      indexList<- split(regionSiteMap[, "subject"],
>>>      regionSiteMap[, "query"])
>>>
>>>    Instead of splitting, get the scores and query hits
>>>    into an Rle:
>>>
>>>    ol<- findOverlaps(regions, sites)
>>>    srle<- Rle(scoreVec[subjectHits(ol)])
>>>    qrle<- Rle(queryHits(ol))
>>>
>>>    The Rle compression may not be appropriate for your
>>>    scores, but now you can use the query Rle to define
>>>    Views on the score Rle:
>>>
>>>    v<- Views(srle, as(qrle, "IRanges"))
>>>
>>>    Now all the view methods are at your disposal, like
>>>    viewMeans():
>>>
>>>    means<- viewMeans(v)
>>>
>>>    Michael
>>>
>>>
>>>      Now I'd like to, for each region, use the indices
>>>      to the sites to get the sites' scores from a
>>>      vector and take the mean, like :
>>>
>>>      means<- sapply(indicesList, function(indices)
>>>      mean(scoreVect[indices]))
>>>
>>>      The problem about this is that I have ~ 8 million
>>>      'regions', and ~ 28 million 'sites'. So the
>>>      indexList is a list of ~ 8 million elements with a
>>>      few indices in each one, and scoresVect is a
>>>      numeric vector of scores of length ~ 28 million.
>>>
>>>      Can anyone suggest what is the fastest way to go
>>>      on this task ?
>>>
>>>      --------------------------------------
>>>      Dario Strbenac
>>>      Research Assistant
>>>      Cancer Epigenetics
>>>      Garvan Institute of Medical Research
>>>      Darlinghurst NSW 2010
>>>      Australia
>>>
>>>      _______________________________________________
>>>      Bioc-sig-sequencing mailing list
>>>      Bioc-sig-sequencing at r-project.org
>>>      https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>
>>
>> --------------------------------------
>> Dario Strbenac
>> Research Assistant
>> Cancer Epigenetics
>> Garvan Institute of Medical Research
>> Darlinghurst NSW 2010
>> Australia
>>
>> _______________________________________________
>> Bioc-sig-sequencing mailing list
>> Bioc-sig-sequencing at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>

--------------------------------------
Dario Strbenac
Research Assistant
Cancer Epigenetics
Garvan Institute of Medical Research
Darlinghurst NSW 2010
Australia