[BioC] shortread quality
David martin
vilanew at gmail.com
Thu Jun 7 09:50:31 CEST 2012
thanks Martin,
That's exactly what i wanted to get !!!
On 06/06/2012 10:34 PM, Martin Morgan wrote:
> On 06/06/2012 12:11 PM, Martin Morgan wrote:
>> Hi,
>>
>> On 06/06/2012 08:00 AM, David martin wrote:
>>> Hi,
>>> I'm reading a fastq file from the solexa sequencer.
>>> I would like to know how many reads have a phred score (>Q29). The thing
>>
>> If you mean the average base quality score, then
>>
>> fq <- readFastq(sp, fqpattern)
>> score <- alphabetScore(fq)
>>
>> gives the sum of the base quality scores for each read, so is a vector
>> as long as the length of the reads. The average is
>>
>> aveScore <- score / width(fq)
>>
>> and then you're in the realm of familiar R again, e.g.,
>>
>> hist(aveScore)
>> table(aveScore > 29)
>>
>> etc.
>>
>> Hope that heps,
>
> I guess the qa object already gets you further, as you've indicated
>
> df <- qaSummary[["readQualityScore"]]
>
> the 'density' column (apparently not really a density) could be turned
> into a cumulative density
>
> cdensity <- cumsum(df$density) / sum(df$density)
>
> and then look up the cumulative density nearest the quality that you're
> interested in
>
> cdensity[findInterval(29, df$quality)]
>
> You'd want to do these steps separately for each lane, if there were
> several in df.
>
> Martin
>
>
>
>>
>> Martin
>>
>>
>>
>>> is that i get the densities so i don't really know how many reads from
>>> the total pass that filter. It's probaly easy for you so any hint would
>>> be helpful
>>>
>>> library("ShortRead")
>>> fqpattern <- "1102sdd_SN148_A_s_3_seq_GJH-85.txt"
>>>
>>> path = getwd()
>>> sp <- SolexaPath(path,dataPath=path,analysisPath=path)
>>>
>>> # Read fastq File and save report
>>> fq <- readFastq(sp, fqpattern)
>>> qaSummary <- qa(fq,fqpattern)
>>> save(qaSummary, file=file.path("./", paste(fqpattern,".rda",sep="" )))
>>> report(qaSummary,dest="report")
>>>
>>> #Quality
>>>
>>> idx = which(qaSummary[["readQualityScore"]]["quality"] > 29)
>>> a = cbind( qaSummary[["readQualityScore"]][idx,"quality"] ,
>>> qaSummary[["readQualityScore"]][idx,"density"])
>>> a #reads with a quality >Q29
>>>
>>> #How to get the total number ? or percent compared to the total number
>>> of reads ?
>>>
>>> thanks
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>
>
More information about the Bioconductor
mailing list