[Bioc-sig-seq] ShortRead and calibrated qualities
Martin Morgan
mtmorgan at fhcrc.org
Tue Nov 4 21:46:59 CET 2008
Hi Victor --
I cc'd the bioc-sig-sequencing list, in case this is interesting to
other people.
Victor Ruotti wrote:
> Hello Martin,
> We are using ShortRead and are very happy with it.
> I was just looking at the graph in the QA report "per-cycle quality
> score" given by ShortRead. I noticed that the quality scores are
> "calibrated" using the alignment information. How easy it is to plot the
> same graph using raw qualities scores, i.e. uncalibrated quality scores?
>
> Is there a way to do this using ShortReads?
There is not automatic way, but here's what you can do.
The uncalibrated scores are in _prb.txt files, and can be read with
> prb = readPrb(dirPath, regex)
where regex is a regular expression defining which _prb files you want
to read in to a single 'prb' object. E.g., the first two tiles of lane 1
of a solexa run
> prb <- readPrb(sp, "s_1_000[1:2]_prb.txt")
> prb
class: SFastqQuality
quality:
A BStringSet instance of length 58018
width seq
[1] 36 hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
[2] 36 hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
[3] 36 hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhSh
[4] 36 hhhhh]ChUhhhhhhhhhhBhhhhG`Jhh_hhWhMN
[5] 36 hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhM
[6] 36 hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
[7] 36 hhhhhhhhhhhhhhhhhhhhhRhhhhhhhhhMXhSA
[8] 36 hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
[9] 36 hhhEhhhhhhhahhhhhOhEIAAKChhhhhhhUGhH
... ... ...
[58010] 36 hhhhhhhhhhPhch`hhRQhWKCGP?BGPIDPSNJL
[58011] 36 GVQhE_E at U`VULMFKIUHPQ>KFH>ACGGH??BH=
[58012] 36 JNeSPWTJI]NUPHIEJMMEHHTECJNIHIKFB>HA
[58013] 36 I;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
[58014] 36 IGJLEG?GO@?AJ?L>??KIBBIBAI?D?>=F@?BE
[58015] 36 hheJShSh_cThhh[ghh\MZKQJI@[SPLC at REJJ
[58016] 36 Bhc^eAR>a>hhLY@=EDDCCVEJM=HT?@D>BF>R
[58017] 36 Shh at hFQhhKhhh`ZJhhhh]hbRh[hfh]dIhhAh
[58018] 36 [hhhhhhh[hhhhhh`hghNbh[=hd?I`hheHhTP
From here you can, for instance,
> m = as(prb, "matrix")
> colMeans(m)
> dim(m)
[1] 58018 36
> colMeans(m)
[1] 36.68253 34.77074 34.76543 33.89381 33.70597 33.06138 32.16550
32.02453
[9] 32.00359 31.28284 31.17655 30.73650 30.98292 30.23918 29.65099
28.58637
[17] 27.54683 27.70426 27.19865 25.85327 25.37835 24.94302 24.83817 23.99030
[25] 23.41432 23.42025 22.13992 21.79825 20.64564 19.83484 19.90560 18.64320
[33] 17.63711 17.35210 16.81506 16.62260
Martin
> Thanks in advance.
> Victor Ruotti
>
--
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M2 B169
Phone: (206) 667-2793
More information about the Bioc-sig-sequencing
mailing list