[Bioc-sig-seq] Finding Mean Value of Overlapping Ranges
Hervé Pagès
hpages at fhcrc.org
Sat Jun 26 08:11:25 CEST 2010
Hi Dario,
You can try to use 'successiveIRanges(runLength(qrle))' instead
of 'as(qrle, "IRanges")'.
Cheers,
H.
On 06/25/2010 01:05 AM, Dario Strbenac wrote:
> That's a neat and elegant idea, but it's not actually possible to do the following part
>
> as(qrle, "IRanges")
>
> Error in asMethod(object) :
> cannot coerce a non-logical 'Rle' or a logical 'Rle' with NAs to an IRanges object
>
> Thanks,
> Dario.
>
>
> ---- Original message ----
>> Date: Thu, 24 Jun 2010 23:53:08 -0700
>> From: Michael Lawrence<lawrence.michael at gene.com>
>> Subject: Re: [Bioc-sig-seq] Finding Mean Value of Overlapping Ranges
>> To: D.Strbenac at garvan.org.au
>> Cc: bioc-sig-sequencing at r-project.org
>>
>> On Thu, Jun 24, 2010 at 10:31 PM, Dario Strbenac
>> <D.Strbenac at garvan.org.au> wrote:
>>
>> Hello,
>>
>> I have a question about what is the most efficient
>> way to perform my use case.
>>
>> What I have done is gotten a matchMatrix from an
>> overlapping, then split it :
>>
>> regionSiteMap<- findOverlaps(regions,
>> sites)@matchMatrix
>> indexList<- split(regionSiteMap[, "subject"],
>> regionSiteMap[, "query"])
>>
>> Instead of splitting, get the scores and query hits
>> into an Rle:
>>
>> ol<- findOverlaps(regions, sites)
>> srle<- Rle(scoreVec[subjectHits(ol)])
>> qrle<- Rle(queryHits(ol))
>>
>> The Rle compression may not be appropriate for your
>> scores, but now you can use the query Rle to define
>> Views on the score Rle:
>>
>> v<- Views(srle, as(qrle, "IRanges"))
>>
>> Now all the view methods are at your disposal, like
>> viewMeans():
>>
>> means<- viewMeans(v)
>>
>> Michael
>>
>>
>> Now I'd like to, for each region, use the indices
>> to the sites to get the sites' scores from a
>> vector and take the mean, like :
>>
>> means<- sapply(indicesList, function(indices)
>> mean(scoreVect[indices]))
>>
>> The problem about this is that I have ~ 8 million
>> 'regions', and ~ 28 million 'sites'. So the
>> indexList is a list of ~ 8 million elements with a
>> few indices in each one, and scoresVect is a
>> numeric vector of scores of length ~ 28 million.
>>
>> Can anyone suggest what is the fastest way to go
>> on this task ?
>>
>> --------------------------------------
>> Dario Strbenac
>> Research Assistant
>> Cancer Epigenetics
>> Garvan Institute of Medical Research
>> Darlinghurst NSW 2010
>> Australia
>>
>> _______________________________________________
>> Bioc-sig-sequencing mailing list
>> Bioc-sig-sequencing at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>
>
> --------------------------------------
> Dario Strbenac
> Research Assistant
> Cancer Epigenetics
> Garvan Institute of Medical Research
> Darlinghurst NSW 2010
> Australia
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
More information about the Bioc-sig-sequencing
mailing list