[Bioc-sig-seq] Finding Mean Value of Overlapping Ranges
Dario Strbenac
D.Strbenac at garvan.org.au
Mon Jun 28 01:18:11 CEST 2010
Thanks for this suggestion. It is super fast !
- Dario.
---- Original message ----
>Date: Fri, 25 Jun 2010 23:11:25 -0700
>From: Hervé Pagès <hpages at fhcrc.org>
>Subject: Re: [Bioc-sig-seq] Finding Mean Value of Overlapping Ranges
>To: D.Strbenac at garvan.org.au
>Cc: Michael Lawrence <lawrence.michael at gene.com>, bioc-sig-sequencing at r-project.org
>
>Hi Dario,
>
>You can try to use 'successiveIRanges(runLength(qrle))' instead
>of 'as(qrle, "IRanges")'.
>
>Cheers,
>H.
>
>
>On 06/25/2010 01:05 AM, Dario Strbenac wrote:
>> That's a neat and elegant idea, but it's not actually possible to do the following part
>>
>> as(qrle, "IRanges")
>>
>> Error in asMethod(object) :
>> cannot coerce a non-logical 'Rle' or a logical 'Rle' with NAs to an IRanges object
>>
>> Thanks,
>> Dario.
>>
>>
>> ---- Original message ----
>>> Date: Thu, 24 Jun 2010 23:53:08 -0700
>>> From: Michael Lawrence<lawrence.michael at gene.com>
>>> Subject: Re: [Bioc-sig-seq] Finding Mean Value of Overlapping Ranges
>>> To: D.Strbenac at garvan.org.au
>>> Cc: bioc-sig-sequencing at r-project.org
>>>
>>> On Thu, Jun 24, 2010 at 10:31 PM, Dario Strbenac
>>> <D.Strbenac at garvan.org.au> wrote:
>>>
>>> Hello,
>>>
>>> I have a question about what is the most efficient
>>> way to perform my use case.
>>>
>>> What I have done is gotten a matchMatrix from an
>>> overlapping, then split it :
>>>
>>> regionSiteMap<- findOverlaps(regions,
>>> sites)@matchMatrix
>>> indexList<- split(regionSiteMap[, "subject"],
>>> regionSiteMap[, "query"])
>>>
>>> Instead of splitting, get the scores and query hits
>>> into an Rle:
>>>
>>> ol<- findOverlaps(regions, sites)
>>> srle<- Rle(scoreVec[subjectHits(ol)])
>>> qrle<- Rle(queryHits(ol))
>>>
>>> The Rle compression may not be appropriate for your
>>> scores, but now you can use the query Rle to define
>>> Views on the score Rle:
>>>
>>> v<- Views(srle, as(qrle, "IRanges"))
>>>
>>> Now all the view methods are at your disposal, like
>>> viewMeans():
>>>
>>> means<- viewMeans(v)
>>>
>>> Michael
>>>
>>>
>>> Now I'd like to, for each region, use the indices
>>> to the sites to get the sites' scores from a
>>> vector and take the mean, like :
>>>
>>> means<- sapply(indicesList, function(indices)
>>> mean(scoreVect[indices]))
>>>
>>> The problem about this is that I have ~ 8 million
>>> 'regions', and ~ 28 million 'sites'. So the
>>> indexList is a list of ~ 8 million elements with a
>>> few indices in each one, and scoresVect is a
>>> numeric vector of scores of length ~ 28 million.
>>>
>>> Can anyone suggest what is the fastest way to go
>>> on this task ?
>>>
>>> --------------------------------------
>>> Dario Strbenac
>>> Research Assistant
>>> Cancer Epigenetics
>>> Garvan Institute of Medical Research
>>> Darlinghurst NSW 2010
>>> Australia
>>>
>>> _______________________________________________
>>> Bioc-sig-sequencing mailing list
>>> Bioc-sig-sequencing at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>
>>
>> --------------------------------------
>> Dario Strbenac
>> Research Assistant
>> Cancer Epigenetics
>> Garvan Institute of Medical Research
>> Darlinghurst NSW 2010
>> Australia
>>
>> _______________________________________________
>> Bioc-sig-sequencing mailing list
>> Bioc-sig-sequencing at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>
--------------------------------------
Dario Strbenac
Research Assistant
Cancer Epigenetics
Garvan Institute of Medical Research
Darlinghurst NSW 2010
Australia
More information about the Bioc-sig-sequencing
mailing list