[Bioc-sig-seq] Finding Mean Value of Overlapping Ranges

Sat Jun 26 08:11:25 CEST 2010

Hi Dario,

You can try to use 'successiveIRanges(runLength(qrle))' instead
of 'as(qrle, "IRanges")'.

Cheers,
H.

On 06/25/2010 01:05 AM, Dario Strbenac wrote:
> That's a neat and elegant idea, but it's not actually possible to do the following part
>
> as(qrle, "IRanges")
>
> Error in asMethod(object) :
>    cannot coerce a non-logical 'Rle' or a logical 'Rle' with NAs to an IRanges object
>
> Thanks,
>         Dario.
>
>
> ---- Original message ----
>> Date: Thu, 24 Jun 2010 23:53:08 -0700
>> From: Michael Lawrence<lawrence.michael at gene.com>
>> Subject: Re: [Bioc-sig-seq] Finding Mean Value of Overlapping Ranges
>> To: D.Strbenac at garvan.org.au
>> Cc: bioc-sig-sequencing at r-project.org
>>
>>    On Thu, Jun 24, 2010 at 10:31 PM, Dario Strbenac
>>    <D.Strbenac at garvan.org.au>  wrote:
>>
>>      Hello,
>>
>>      I have a question about what is the most efficient
>>      way to perform my use case.
>>
>>      What I have done is gotten a matchMatrix from an
>>      overlapping, then split it :
>>
>>      regionSiteMap<- findOverlaps(regions,
>>      sites)@matchMatrix
>>      indexList<- split(regionSiteMap[, "subject"],
>>      regionSiteMap[, "query"])
>>
>>    Instead of splitting, get the scores and query hits
>>    into an Rle:
>>
>>    ol<- findOverlaps(regions, sites)
>>    srle<- Rle(scoreVec[subjectHits(ol)])
>>    qrle<- Rle(queryHits(ol))
>>
>>    The Rle compression may not be appropriate for your
>>    scores, but now you can use the query Rle to define
>>    Views on the score Rle:
>>
>>    v<- Views(srle, as(qrle, "IRanges"))
>>
>>    Now all the view methods are at your disposal, like
>>    viewMeans():
>>
>>    means<- viewMeans(v)
>>
>>    Michael
>>
>>
>>      Now I'd like to, for each region, use the indices
>>      to the sites to get the sites' scores from a
>>      vector and take the mean, like :
>>
>>      means<- sapply(indicesList, function(indices)
>>      mean(scoreVect[indices]))
>>
>>      The problem about this is that I have ~ 8 million
>>      'regions', and ~ 28 million 'sites'. So the
>>      indexList is a list of ~ 8 million elements with a
>>      few indices in each one, and scoresVect is a
>>      numeric vector of scores of length ~ 28 million.
>>
>>      Can anyone suggest what is the fastest way to go
>>      on this task ?
>>
>>      --------------------------------------
>>      Dario Strbenac
>>      Research Assistant
>>      Cancer Epigenetics
>>      Garvan Institute of Medical Research
>>      Darlinghurst NSW 2010
>>      Australia
>>
>>      _______________________________________________
>>      Bioc-sig-sequencing mailing list
>>      Bioc-sig-sequencing at r-project.org
>>      https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>
>
> --------------------------------------
> Dario Strbenac
> Research Assistant
> Cancer Epigenetics
> Garvan Institute of Medical Research
> Darlinghurst NSW 2010
> Australia
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing