[Bioc-sig-seq] Finding Mean Value of Overlapping Ranges
Dario Strbenac
D.Strbenac at garvan.org.au
Fri Jun 25 10:05:42 CEST 2010
That's a neat and elegant idea, but it's not actually possible to do the following part
as(qrle, "IRanges")
Error in asMethod(object) :
cannot coerce a non-logical 'Rle' or a logical 'Rle' with NAs to an IRanges object
Thanks,
Dario.
---- Original message ----
>Date: Thu, 24 Jun 2010 23:53:08 -0700
>From: Michael Lawrence <lawrence.michael at gene.com>
>Subject: Re: [Bioc-sig-seq] Finding Mean Value of Overlapping Ranges
>To: D.Strbenac at garvan.org.au
>Cc: bioc-sig-sequencing at r-project.org
>
> On Thu, Jun 24, 2010 at 10:31 PM, Dario Strbenac
> <D.Strbenac at garvan.org.au> wrote:
>
> Hello,
>
> I have a question about what is the most efficient
> way to perform my use case.
>
> What I have done is gotten a matchMatrix from an
> overlapping, then split it :
>
> regionSiteMap <- findOverlaps(regions,
> sites)@matchMatrix
> indexList <- split(regionSiteMap[, "subject"],
> regionSiteMap[, "query"])
>
> Instead of splitting, get the scores and query hits
> into an Rle:
>
> ol <- findOverlaps(regions, sites)
> srle <- Rle(scoreVec[subjectHits(ol)])
> qrle <- Rle(queryHits(ol))
>
> The Rle compression may not be appropriate for your
> scores, but now you can use the query Rle to define
> Views on the score Rle:
>
> v <- Views(srle, as(qrle, "IRanges"))
>
> Now all the view methods are at your disposal, like
> viewMeans():
>
> means <- viewMeans(v)
>
> Michael
>
>
> Now I'd like to, for each region, use the indices
> to the sites to get the sites' scores from a
> vector and take the mean, like :
>
> means <- sapply(indicesList, function(indices)
> mean(scoreVect[indices]))
>
> The problem about this is that I have ~ 8 million
> 'regions', and ~ 28 million 'sites'. So the
> indexList is a list of ~ 8 million elements with a
> few indices in each one, and scoresVect is a
> numeric vector of scores of length ~ 28 million.
>
> Can anyone suggest what is the fastest way to go
> on this task ?
>
> --------------------------------------
> Dario Strbenac
> Research Assistant
> Cancer Epigenetics
> Garvan Institute of Medical Research
> Darlinghurst NSW 2010
> Australia
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
--------------------------------------
Dario Strbenac
Research Assistant
Cancer Epigenetics
Garvan Institute of Medical Research
Darlinghurst NSW 2010
Australia
More information about the Bioc-sig-sequencing
mailing list