[Bioc-sig-seq] Mean coverage of capture regions

Sat Oct 10 02:23:30 CEST 2009

Karl,
I just added a viewsMeans method to IRanges 1.3.90 within BioC 2.5 to 
make this task easier. (This is in svn now and will be available via 
biocLite tomorrow.) So assuming you are using R-devel (2.10)

## load the packages
suppressMessages(library(ShortRead))

## create lane1 AlignedRead object here

## generate the coverage
## this will produce a SimpleRleList
curCov <- coverage(lane1)

## get the exon boundaries
## this will create a CompressedIRangesList
curExons <-
  split(IRanges(start=capturedExonsOnChrom$exonStart,
                end=capturedExonsOnChrom$exonEnd),
        capturedExons$chrom)

## now get the chroms you are looking for
curCov <- curCov[c(1:22, "X", "Y")]
curExons <- curExons[c(1:22, "X", "Y")]

## create views on your chromosomes
curViews <- RleViewsList(rleList = curCov, rangesList = curExons)

## generate the exon means
viewMeans(curViews)

> sessionInfo()
R version 2.10.0 alpha (2009-10-08 r49995) 
i386-apple-darwin9.8.0 

locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] ShortRead_1.3.36   lattice_0.17-26    BSgenome_1.13.16   Biostrings_2.13.50
[5] IRanges_1.3.90    

loaded via a namespace (and not attached):
[1] Biobase_2.5.8 grid_2.10.0   hwriter_1.1  

Sean Davis wrote:
> On Fri, Oct 9, 2009 at 4:55 PM, Dykema, Karl <Karl.Dykema at vai.org> wrote:
>
>   
>> Hello All,
>>
>> I am attempting to calculate the mean number of reads for every exon that
>> was represented on our custom Nimblegen capture chip. The capture regions
>> don't line up exactly with the exon coordinates so I identified all exons
>> that have an overlap, about 9500. I have previously done some crude sampling
>> of the pileup output and my results were a lot better than what I seem to be
>> calculating now. Currently I am running coverage() on my alignedRead object
>> and then looping through the capture regions something like this:
>>
>> curCov <- coverage(lane1)
>> for(q in c(1:22,"X","Y")){
>>        capturedExonsOnChrom <-
>> capturedExons[which(capturedExons$chrom==q),]
>>        for(i in 1:length(capturedExonsOnChrom)) {
>>                exonCoordinates <-
>> capturedExonsOnChrom[i]$exonStart:capturedExonsOnChrom[i]$exonEnd
>>                capturedExonMeanCoverage[] <-
>> mean(as.vector(curCov[[q]])[exonCoordinates])
>>        }
>> }
>>
>> First off, I am assuming that there is a more efficient method to do this.
>> Is that the case? Also, I have also been using GenomeGraphs to create plots
>> of the coverage and those plots also seem to show plenty of reads where my
>> calculations above indicate none. I am hoping there is an error in my
>> current method because the previous coverage numbers were quite encouraging.
>> Thanks in advance for your help.
>>
>>
>>     
> Hi, Karl.
>
> I put together a little vignette that includes, among other things, a little
> vignette that includes a short example of calculation of given a capture
> method (Agilent, but the idea is the same).  The reads are read in using the
> ShortRead package and the capture regions are in the form of a simple bed
> file.  The vignette is here:
>
> http://watson.nci.nih.gov/~sdavis/sequencing_example.pdf
>
> Feel free to take a look at it for a slightly more efficient way of doing
> things.  I'm not guaranteeing total correctness in the code and not all
> parts will apply to nimblegen, I suppose, but it might get you started.
>
> Sean
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>