[Bioc-sig-seq] filtering using solexa quality scores
Martin Morgan
mtmorgan at fhcrc.org
Thu Apr 16 00:24:19 CEST 2009
Vince -- a different possibility is that 'fastq' scores are encoded on
a different scale from that which you are expecting, e.g., 'sanger'
rather than 'solexa', which differ by 32. There is no way to know this
from the fastq files themselves.
Martin
Vincent Carey <stvjc at channing.harvard.edu> writes:
> i have scoured our archives and found little regarding role of solexa
> quality
> scores as reported in fastq outputs in short read filtering.
>
> my understanding is that a numerical score of -4 or greater indicates more
> probability
> mass on the called base than on any other. in checking 1e6 reads on each of
> two lanes
> i found the frequency of the event " fewer than three bases have score less
> than -4" to be
> 4e-3 in one lane and 2e-3 in another. in other words, filtering by
> requiring no more than
> two < -4 scores would take you from a million reads to about 2000-4000,
> assuming i have
> not taken a biased sample (i may have, just took the first 1e6 in fastq).
>
> is there any reason to regard a call with score < -4 to be much different
> from an 'N'?
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
--
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
More information about the Bioc-sig-sequencing
mailing list