[Bioc-sig-seq] ShortRead and calibrated qualities

Victor Ruotti ruotti at wisc.edu
Tue Nov 4 22:31:18 CET 2008


Yes! Thank you.
That is exactly what I was looking for.
I'd like to compare the calibrated versus the raw quality scores.

Great help.
Victor

On Nov 4, 2008, at 2:46 PM, Martin Morgan wrote:

> Hi Victor --
>
> I cc'd the bioc-sig-sequencing list, in case this is interesting to  
> other people.
>
> Victor Ruotti wrote:
>> Hello Martin,
>> We are using ShortRead and are very happy with it.
>> I was just looking at the graph in the QA report "per-cycle quality  
>> score" given by ShortRead. I noticed that the quality scores are  
>> "calibrated" using the alignment information. How easy it is to  
>> plot the same graph using raw qualities scores, i.e. uncalibrated  
>> quality scores?
>> Is there a way to do this using ShortReads?
>
> There is not automatic way, but here's what you can do.
>
> The uncalibrated scores are in _prb.txt files, and can be read with
>
> > prb = readPrb(dirPath, regex)
>
> where regex is a regular expression defining which _prb files you  
> want to read in to a single 'prb' object. E.g., the first two tiles  
> of lane 1 of a solexa run
>
> > prb <- readPrb(sp, "s_1_000[1:2]_prb.txt")
> > prb
> class: SFastqQuality
> quality:
>  A BStringSet instance of length 58018
>        width seq
>    [1]    36 hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
>    [2]    36 hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
>    [3]    36 hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhSh
>    [4]    36 hhhhh]ChUhhhhhhhhhhBhhhhG`Jhh_hhWhMN
>    [5]    36 hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhM
>    [6]    36 hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
>    [7]    36 hhhhhhhhhhhhhhhhhhhhhRhhhhhhhhhMXhSA
>    [8]    36 hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
>    [9]    36 hhhEhhhhhhhahhhhhOhEIAAKChhhhhhhUGhH
>    ...   ... ...
> [58010]    36 hhhhhhhhhhPhch`hhRQhWKCGP?BGPIDPSNJL
> [58011]    36 GVQhE_E at U`VULMFKIUHPQ>KFH>ACGGH??BH=
> [58012]    36 JNeSPWTJI]NUPHIEJMMEHHTECJNIHIKFB>HA
> [58013]    36 I;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
> [58014]    36 IGJLEG?GO@?AJ?L>??KIBBIBAI?D?>=F@?BE
> [58015]    36 hheJShSh_cThhh[ghh\MZKQJI@[SPLC at REJJ
> [58016]    36 Bhc^eAR>a>hhLY@=EDDCCVEJM=HT?@D>BF>R
> [58017]    36 Shh at hFQhhKhhh`ZJhhhh]hbRh[hfh]dIhhAh
> [58018]    36 [hhhhhhh[hhhhhh`hghNbh[=hd?I`hheHhTP
>
> From here you can, for instance,
>
> > m = as(prb, "matrix")
> > colMeans(m)
>
> > dim(m)
> [1] 58018    36
> > colMeans(m)
> [1] 36.68253 34.77074 34.76543 33.89381 33.70597 33.06138 32.16550  
> 32.02453
> [9] 32.00359 31.28284 31.17655 30.73650 30.98292 30.23918 29.65099  
> 28.58637
> [17] 27.54683 27.70426 27.19865 25.85327 25.37835 24.94302 24.83817  
> 23.99030
> [25] 23.41432 23.42025 22.13992 21.79825 20.64564 19.83484 19.90560  
> 18.64320
> [33] 17.63711 17.35210 16.81506 16.62260
>
> Martin
>
>> Thanks in advance.
>> Victor Ruotti
>
>
> -- 
> Martin Morgan
> Computational Biology / Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N.
> PO Box 19024 Seattle, WA 98109
>
> Location: Arnold Building M2 B169
> Phone: (206) 667-2793



More information about the Bioc-sig-sequencing mailing list