[BioC] Amplicon and exon level read counts and GC content
Yu Chuan Tai
yuchuan at stat.berkeley.edu
Thu Jun 7 06:53:38 CEST 2012
Hi Martin,
More questions on your approaches below. If my BAM files are generated by
Bowtie2 on pair-end fastq files, for scanBamFlag(), should I set
isPaired=TRUE? Do I need to worry about other input arguments for
scanBamFlag() or ScanBamParam(), if I want to calculate coverage properly?
Also, summarizeOverlaps() doesn't seem to handle paired-end reads. How to
get around this, or it won't affect coverage calculation?
Finally, is there any way to calculate base-specific coverage at any
genomic locus or interval in Rsamtools?
Thanks!
Best,
Yu Chuan
> More specifically, after
>
> library(Rsamtools)
> example(scanBam) # defines 'fl', a path to a bam file
>
> for a _single_ genomic range
>
> param = ScanBamParam(what="seq",
> which=GRanges("seq1", IRanges(100, 500)))
> dna = scanBam(fl, param=param)[[1]][["seq"]]
> length(dna) # 365 reads overlap region
> alphabetFrequency(dna, collapse=TRUE, baseOnly=TRUE) # 2838 + 3003 GC
>
> though you'd likely want to specify several regions (vector arguments to
> GRanges) and think about flags (scanBamFlag() and the flag argument to
> ScanBamParam), read mapping quality, reads overlapping more than one region,
> etc. (summarizeOverlaps implements several counting strategies, but it is
> 'easy' to implement arbitrary approaches).
>
>>
>> Martin
>>
>>>
>>> Thanks for any input!
>>>
>>> Best,
>>> Yu Chuan
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>
>
> --
> Computational Biology
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
>
> Location: M1-B861
> Telephone: 206 667-2793
>
More information about the Bioconductor
mailing list