[Bioc-sig-seq] Rsamtools -getting allele frequencies
Martin Morgan
mtmorgan at fhcrc.org
Thu Nov 4 13:56:56 CET 2010
On 11/03/2010 10:32 PM, Paul Leo wrote:
> Hi
>
> What I want to do is calculate the minor allele frequency of a SNP/
> small indel from bam files
>
>
> I am not sure what is the PREFERRED way to do this
>
> I have been running samtools and just do pileup in those regions of
> interest... and read the pileup file with Rsamtools...
>
>
> With scanBam on a single position I get back all the reads that span
> that location. So I've tried using the start position and the cigar and
> the DNAStringSet in the seq slot to get the alleles at the position I
> want (works ok)
>
> Is there a function written that already does that? seems like something
> of general utility , the only complication is the cigar
The GenomicRanges::GappedAlignments class has some facilities for
manipulating cigars; the idea would be to refine the queries with
GappedAlignments, and then to separately extract the relevant sequence;
this might be no better than what you are doing currently.
>
> Would also be nice to have a existing function with writes the output
> of scanBam back into sam format (assuming necessary attributes are
> available).
noted.
>
> ALSO perhaps I am being daft:
> but in scanBamFlag
>
> isValidVendorRead: A logical(1) indicating whether invalid (FALSE),
> valid (TRUE), or any (NA) read should be returned. A 'valid'
> read is one flagged by the vendor as passing quality control
> criteria.
>
> so isValidVendorRead=TRUE would return VALID vendor reads?
> at the moment it is returning the INVALID vendor reads
> isValidVendorRead=FALSE returns the valid ones. (Solid reads)
yes, the logic here is reversed. Fixed in devel, in 1.2.1 (release) /
1.3.4 (devel) when they appear.
Thanks,
Martin
>
> Thanks
> Paul
>
>> sessionInfo()
> R version 2.13.0 Under development (unstable) (2010-11-03 r53517)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_AU.UTF-8 LC_NUMERIC=C
> LC_TIME=en_AU.UTF-8 LC_COLLATE=en_AU.UTF-8
> LC_MONETARY=C
> [6] LC_MESSAGES=en_AU.UTF-8 LC_PAPER=en_AU.UTF-8 LC_NAME=C
> LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods
> base
>
> other attached packages:
> [1] ShortRead_1.9.2 lattice_0.19-13 Biobase_2.11.2
> Rsamtools_1.3.3 Biostrings_2.19.0 GenomicFeatures_1.3.5
> [7] GenomicRanges_1.3.2 IRanges_1.9.6
>
> loaded via a namespace (and not attached):
> [1] biomaRt_2.7.0 BSgenome_1.19.0 DBI_0.2-5
> grid_2.13.0 hwriter_1.2 RCurl_1.4-4
> RSQLite_0.9-2
> [8] rtracklayer_1.11.3 tcltk_2.13.0 tools_2.13.0
> XML_3.2-0
>>
>
>
>
>
>
>
>
>
>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
--
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
Location: M1-B861
Telephone: 206 667-2793
More information about the Bioc-sig-sequencing
mailing list