[Bioc-sig-seq] Finding Mean Value of Overlapping Ranges

Fri Jun 25 07:31:15 CEST 2010

Hello,

I have a question about what is the most efficient way to perform my use case.

What I have done is gotten a matchMatrix from an overlapping, then split it :

regionSiteMap <- findOverlaps(regions, sites)@matchMatrix
indexList <- split(regionSiteMap[, "subject"], regionSiteMap[, "query"])

Now I'd like to, for each region, use the indices to the sites to get the sites' scores from a vector and take the mean, like :

means <- sapply(indicesList, function(indices) mean(scoreVect[indices]))

The problem about this is that I have ~ 8 million 'regions', and ~ 28 million 'sites'. So the indexList is a list of ~ 8 million elements with a few indices in each one, and scoresVect is a numeric vector of scores of length ~ 28 million.

Can anyone suggest what is the fastest way to go on this task ?

--------------------------------------
Dario Strbenac
Research Assistant
Cancer Epigenetics
Garvan Institute of Medical Research
Darlinghurst NSW 2010
Australia