[Bioc-sig-seq] ShortRead and calibrated qualities
Victor Ruotti
ruotti at wisc.edu
Tue Nov 4 22:31:18 CET 2008
Yes! Thank you.
That is exactly what I was looking for.
I'd like to compare the calibrated versus the raw quality scores.
Great help.
Victor
On Nov 4, 2008, at 2:46 PM, Martin Morgan wrote:
> Hi Victor --
>
> I cc'd the bioc-sig-sequencing list, in case this is interesting to
> other people.
>
> Victor Ruotti wrote:
>> Hello Martin,
>> We are using ShortRead and are very happy with it.
>> I was just looking at the graph in the QA report "per-cycle quality
>> score" given by ShortRead. I noticed that the quality scores are
>> "calibrated" using the alignment information. How easy it is to
>> plot the same graph using raw qualities scores, i.e. uncalibrated
>> quality scores?
>> Is there a way to do this using ShortReads?
>
> There is not automatic way, but here's what you can do.
>
> The uncalibrated scores are in _prb.txt files, and can be read with
>
> > prb = readPrb(dirPath, regex)
>
> where regex is a regular expression defining which _prb files you
> want to read in to a single 'prb' object. E.g., the first two tiles
> of lane 1 of a solexa run
>
> > prb <- readPrb(sp, "s_1_000[1:2]_prb.txt")
> > prb
> class: SFastqQuality
> quality:
> A BStringSet instance of length 58018
> width seq
> [1] 36 hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
> [2] 36 hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
> [3] 36 hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhSh
> [4] 36 hhhhh]ChUhhhhhhhhhhBhhhhG`Jhh_hhWhMN
> [5] 36 hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhM
> [6] 36 hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
> [7] 36 hhhhhhhhhhhhhhhhhhhhhRhhhhhhhhhMXhSA
> [8] 36 hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
> [9] 36 hhhEhhhhhhhahhhhhOhEIAAKChhhhhhhUGhH
> ... ... ...
> [58010] 36 hhhhhhhhhhPhch`hhRQhWKCGP?BGPIDPSNJL
> [58011] 36 GVQhE_E at U`VULMFKIUHPQ>KFH>ACGGH??BH=
> [58012] 36 JNeSPWTJI]NUPHIEJMMEHHTECJNIHIKFB>HA
> [58013] 36 I;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
> [58014] 36 IGJLEG?GO@?AJ?L>??KIBBIBAI?D?>=F@?BE
> [58015] 36 hheJShSh_cThhh[ghh\MZKQJI@[SPLC at REJJ
> [58016] 36 Bhc^eAR>a>hhLY@=EDDCCVEJM=HT?@D>BF>R
> [58017] 36 Shh at hFQhhKhhh`ZJhhhh]hbRh[hfh]dIhhAh
> [58018] 36 [hhhhhhh[hhhhhh`hghNbh[=hd?I`hheHhTP
>
> From here you can, for instance,
>
> > m = as(prb, "matrix")
> > colMeans(m)
>
> > dim(m)
> [1] 58018 36
> > colMeans(m)
> [1] 36.68253 34.77074 34.76543 33.89381 33.70597 33.06138 32.16550
> 32.02453
> [9] 32.00359 31.28284 31.17655 30.73650 30.98292 30.23918 29.65099
> 28.58637
> [17] 27.54683 27.70426 27.19865 25.85327 25.37835 24.94302 24.83817
> 23.99030
> [25] 23.41432 23.42025 22.13992 21.79825 20.64564 19.83484 19.90560
> 18.64320
> [33] 17.63711 17.35210 16.81506 16.62260
>
> Martin
>
>> Thanks in advance.
>> Victor Ruotti
>
>
> --
> Martin Morgan
> Computational Biology / Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N.
> PO Box 19024 Seattle, WA 98109
>
> Location: Arnold Building M2 B169
> Phone: (206) 667-2793
More information about the Bioc-sig-sequencing
mailing list