[Bioc-sig-seq] filtering using solexa quality scores

Martin Morgan mtmorgan at fhcrc.org
Thu Apr 16 00:24:19 CEST 2009


Vince -- a different possibility is that 'fastq' scores are encoded on
a different scale from that which you are expecting, e.g., 'sanger'
rather than 'solexa', which differ by 32. There is no way to know this
from the fastq files themselves.

Martin

Vincent Carey <stvjc at channing.harvard.edu> writes:

> i have scoured our archives and found little regarding role of solexa
> quality
> scores as reported in fastq outputs in short read filtering.
>
> my understanding is that a numerical score of -4 or greater indicates more
> probability
> mass on the called base than on any other.  in checking 1e6 reads on each of
> two lanes
> i found the frequency of the event " fewer than three bases have score less
> than -4" to be
> 4e-3 in one lane and 2e-3 in another.  in other words, filtering by
> requiring no more than
> two < -4 scores would take you from a million reads to about 2000-4000,
> assuming i have
> not taken a biased sample (i may have, just took the first 1e6 in fastq).
>
> is there any reason to regard a call with score < -4 to be much different
> from an 'N'?
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioc-sig-sequencing mailing list