[Bioc-sig-seq] Quality Value Analysis from a BStringSet
Steve Lianoglou
mailinglist.honeypot at gmail.com
Thu Jun 3 22:04:49 CEST 2010
Hi,
On Thu, Jun 3, 2010 at 3:39 PM, Pratap, Abhishek
<APratap at som.umaryland.edu> wrote:
> Hi All
>
> I would like to extract and count the last 5 quality values from the FASTQ file. I have read the file using "readFastq" and have stored the quality values as a BStringSet.
>
> Eg :
> A BStringSet instance of length 5119916
> width seq
> [1] 75 BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB...BBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
> [2] 75 bbbbbbbbbbbbabbbbbb`bbbbbbab`b_...BBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
> [3] 75 aaaaaaa_aaaaO`aa^aaa_a_T_``^[`S...BBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
> [4] 75 bbbbbbbbbbbbaabbbb`bbb_Uaa___BB...BBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
> [5] 75 ``a`aa`aaYaTaaaBBBBBBBBBBBBBBBB...BBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
>
> What I would like to do is subseq the last 5 quality values and do a count on #B. We suspect despite good avg quality we still have HIGH bad bases at the end of reads.
>
> Any other ideas welcome.
How about just plotting the average quality score at each base
position by doing something like:
1. Converting your phred score BStringSet into a matrix of its numeric values
2. Plotting the colMeans(...) of that matrix.
Maybe?
--
Steve Lianoglou
Graduate Student: Computational Systems Biology
| Memorial Sloan-Kettering Cancer Center
| Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact
More information about the Bioc-sig-sequencing
mailing list