[Bioc-sig-seq] Calculating/plotting Avg Base Quality
Martin Morgan
mtmorgan at fhcrc.org
Mon Aug 3 22:40:58 CEST 2009
Pratap, Abhishek wrote:
> Thanks for a quick revert. Kind of an expected question, when you talk
> about quality format, by default is it the Phred one ? I will be
> importing the FASTQ from Illumina Eland output which is 0-40 scoring
> window. Does that gets automatically converted ?
fastq files don't contain information on how qualities were encoded;
readFastq isn't smart about this and assumes they're Solexa. If that's
not correct, you can change to phred with
FastqQuality(quality(quality(rfq)))
and vice-versa. Also recent Solexa encodings have changed their
calculation (not their encoding) to use use something closer to phred
rather than log-odds. This has consequence for low quality scores.
You might also look at qa() and report() in the ShortRead package, to
generate a quality report that contains this information. The idea is to
do this in two stages
qastats <- qa(dirPath, pattern) # slow!
report(qastats) # fast
see ?qa and ?report.
Martin
>
>
>
> Thanks,
>
> -Abhi
>
>
>
> From: johannes.waage at gmail.com [mailto:johannes.waage at gmail.com] On
> Behalf Of Johannes Waage
> Sent: Monday, August 03, 2009 4:04 PM
> To: Pratap, Abhishek; Bioc-sig-sequencing at r-project.org
> Subject: Re: [Bioc-sig-seq] Calculating/plotting Avg Base Quality
>
>
>
> Try this:
>
> library(ShortRead)
> my_reads<- readFastq("~/", pattern="my_reads.fastq")
> qualities <- FastqQuality(quality(my_reads))
> qualities <- as(qualities, "matrix")
> boxplot(as.data.frame((qualities )), outline = F, xlab="Cycle",
> ylab="Quality")
>
> Depending on quality format, you might have to correct your numeric
> values in the matrix, or even convert to base call probabilities.
>
> Regards,
> Johannes Waage,
> Uni of copenhagen
>
> On Mon, Aug 3, 2009 at 9:41 PM, Pratap, Abhishek
> <APratap at som.umaryland.edu> wrote:
>
> Hi All
>
>
>
> Just wondering if there is a R function in any package to do the
> following. Want to make sure before I write something.
>
>
>
> We would like to calculate avg base quality score for each base called
> per cycle / lane. What will be nice is to also plot these avg quality
> scores/lane for couple of different runs.
>
>
>
> Thanks,
>
> -Abhi
>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>
>
>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
More information about the Bioc-sig-sequencing
mailing list