[Bioc-sig-seq] Calculating/plotting Avg Base Quality

Martin Morgan mtmorgan at fhcrc.org
Mon Aug 3 22:40:58 CEST 2009


Pratap, Abhishek wrote:
> Thanks for a quick revert. Kind of an expected question, when you talk
> about quality format,  by default is it the Phred one ?  I will be
> importing the FASTQ from Illumina Eland output which is 0-40 scoring
> window.  Does that gets automatically converted ?

fastq files don't contain information on how qualities were encoded;
readFastq isn't smart about this and assumes they're Solexa. If that's
not correct, you can change to phred with

  FastqQuality(quality(quality(rfq)))

and vice-versa. Also recent Solexa encodings have changed their
calculation (not their encoding) to use use something closer to phred
rather than log-odds. This has consequence for low quality scores.

You might also look at qa() and report() in the ShortRead package, to
generate a quality report that contains this information. The idea is to
do this in two stages

  qastats <- qa(dirPath, pattern) # slow!
  report(qastats) # fast

see ?qa and ?report.

Martin


> 
>  
> 
> Thanks,
> 
> -Abhi
> 
>  
> 
> From: johannes.waage at gmail.com [mailto:johannes.waage at gmail.com] On
> Behalf Of Johannes Waage
> Sent: Monday, August 03, 2009 4:04 PM
> To: Pratap, Abhishek; Bioc-sig-sequencing at r-project.org
> Subject: Re: [Bioc-sig-seq] Calculating/plotting Avg Base Quality
> 
>  
> 
> Try this:
> 
> library(ShortRead)
> my_reads<- readFastq("~/", pattern="my_reads.fastq")
> qualities <- FastqQuality(quality(my_reads))
> qualities <- as(qualities, "matrix")
> boxplot(as.data.frame((qualities )), outline = F, xlab="Cycle",
> ylab="Quality")
> 
> Depending on quality format, you might have to correct your numeric
> values in the matrix, or even convert to base call probabilities.
> 
> Regards,
> Johannes Waage,
> Uni of copenhagen
> 
> On Mon, Aug 3, 2009 at 9:41 PM, Pratap, Abhishek
> <APratap at som.umaryland.edu> wrote:
> 
> Hi All
> 
> 
> 
> Just wondering if there is a R function in any package to do the
> following. Want to make sure before I write something.
> 
> 
> 
> We would like to calculate avg base quality score for each base called
> per cycle / lane.  What will be nice is to also plot these avg quality
> scores/lane  for couple of different runs.
> 
> 
> 
> Thanks,
> 
> -Abhi
> 
> 
>        [[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
> 
> 
> 
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing



More information about the Bioc-sig-sequencing mailing list