[Bioc-sig-seq] Mean coverage of capture regions
Patrick Aboyoun
paboyoun at fhcrc.org
Sat Oct 10 02:23:30 CEST 2009
Karl,
I just added a viewsMeans method to IRanges 1.3.90 within BioC 2.5 to
make this task easier. (This is in svn now and will be available via
biocLite tomorrow.) So assuming you are using R-devel (2.10)
## load the packages
suppressMessages(library(ShortRead))
## create lane1 AlignedRead object here
## generate the coverage
## this will produce a SimpleRleList
curCov <- coverage(lane1)
## get the exon boundaries
## this will create a CompressedIRangesList
curExons <-
split(IRanges(start=capturedExonsOnChrom$exonStart,
end=capturedExonsOnChrom$exonEnd),
capturedExons$chrom)
## now get the chroms you are looking for
curCov <- curCov[c(1:22, "X", "Y")]
curExons <- curExons[c(1:22, "X", "Y")]
## create views on your chromosomes
curViews <- RleViewsList(rleList = curCov, rangesList = curExons)
## generate the exon means
viewMeans(curViews)
> sessionInfo()
R version 2.10.0 alpha (2009-10-08 r49995)
i386-apple-darwin9.8.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] ShortRead_1.3.36 lattice_0.17-26 BSgenome_1.13.16 Biostrings_2.13.50
[5] IRanges_1.3.90
loaded via a namespace (and not attached):
[1] Biobase_2.5.8 grid_2.10.0 hwriter_1.1
Sean Davis wrote:
> On Fri, Oct 9, 2009 at 4:55 PM, Dykema, Karl <Karl.Dykema at vai.org> wrote:
>
>
>> Hello All,
>>
>> I am attempting to calculate the mean number of reads for every exon that
>> was represented on our custom Nimblegen capture chip. The capture regions
>> don't line up exactly with the exon coordinates so I identified all exons
>> that have an overlap, about 9500. I have previously done some crude sampling
>> of the pileup output and my results were a lot better than what I seem to be
>> calculating now. Currently I am running coverage() on my alignedRead object
>> and then looping through the capture regions something like this:
>>
>> curCov <- coverage(lane1)
>> for(q in c(1:22,"X","Y")){
>> capturedExonsOnChrom <-
>> capturedExons[which(capturedExons$chrom==q),]
>> for(i in 1:length(capturedExonsOnChrom)) {
>> exonCoordinates <-
>> capturedExonsOnChrom[i]$exonStart:capturedExonsOnChrom[i]$exonEnd
>> capturedExonMeanCoverage[] <-
>> mean(as.vector(curCov[[q]])[exonCoordinates])
>> }
>> }
>>
>> First off, I am assuming that there is a more efficient method to do this.
>> Is that the case? Also, I have also been using GenomeGraphs to create plots
>> of the coverage and those plots also seem to show plenty of reads where my
>> calculations above indicate none. I am hoping there is an error in my
>> current method because the previous coverage numbers were quite encouraging.
>> Thanks in advance for your help.
>>
>>
>>
> Hi, Karl.
>
> I put together a little vignette that includes, among other things, a little
> vignette that includes a short example of calculation of given a capture
> method (Agilent, but the idea is the same). The reads are read in using the
> ShortRead package and the capture regions are in the form of a simple bed
> file. The vignette is here:
>
> http://watson.nci.nih.gov/~sdavis/sequencing_example.pdf
>
> Feel free to take a look at it for a slightly more efficient way of doing
> things. I'm not guaranteeing total correctness in the code and not all
> parts will apply to nimblegen, I suppose, but it might get you started.
>
> Sean
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>
More information about the Bioc-sig-sequencing
mailing list