[Bioc-sig-seq] perTile QA element in the FastqQA class
Sirisha Sunkara
ssunkara at lbl.gov
Tue Apr 13 20:20:00 CEST 2010
Hi Martin,
I am using the sequence.txt files generated by the Illumina pipeline
(OLB1.6/RTA1.6) as is, which seem to have the tile coordinates.
Just so I can focus on the ReadIDs part for now (and I am sure this is
not exactly what you asked for), I parsed out the readIDs from the
fastq, and am working with those.
This is what my fastqs look like:
@ILLUMINA06:8:1:6:849#0/1
GCTCTTTTTGATTCTCAAATCCGGCGTCAACCATA
+ILLUMINA06:8:1:6:849#0/1
a`abaa_aaa]_a`_a_[]`]a_`aa_`_aa`aaa
@ILLUMINA06:8:1:6:1169#0/1
TAATGCCACTCCTCTCCCGACTGTTAACACTGCTG
+ILLUMINA06:8:1:6:1169#0/1
ab`_Z_aXa`bbababbbaabaaaaababaaa`V`
My very basic attempt at this:
> fqhead <-
read.table("./Contam_Screening/Run703/sequence_8_1_hdrs.txt", sep=":")
To extract all entries for instance in lane 8, tile 120:
> fqhead[fqhead$V2 == "3" & fq$V3 == "120",]
I hope I am somewhat closer to what you asked for...
Thanks a lot!
Sirisha
Martin Morgan wrote:
> On 04/12/2010 02:49 PM, Sirisha Sunkara wrote:
>
>> Hi Martin,
>>
>> The qa function that reads in fastq format files, doesn't seem to
>> populate the perTile QA element with row information...
>> The row counts are zero for both the readCounts and
>> medianReadQualityScore list elements of perTile.
>>
>> Is this feature still work in progress..? Essentially, I am trying to
>> get the TileQC plots for lanes where there was no reference genome to
>> align (no export.txt files)
>>
>
> Hi Sirisha -- fastq files can't be guaranteed to have tile info so
> ShortRead doesn't try to guess these, even if some software adopts
> conventions for embedding the information in the read ids.
>
> The tile images are generated by
>
> ShortRead:::.plotTileCounts
>
> and
>
> ShortRead:::.plotTileQualityScore
>
> both take a regular data.frame. For .plotTileCounts, the columns are
> 'type' (safe to ignore, I think), 'tile' (integer tile index), 'lane'
> (integer lane index), and 'count' (number of reads in this particular
> lane & tile). As an untested work-around, you could create a data frame
> like this by parsing your read IDs using standard R commands; provide an
> example of what the read IDs look like and I'll help you. For the
> .plotTileQualityScore, the columns are 'type', 'tile', 'lane', and
> 'score', where 'score' is the median 'qualityScore'
> (alphabetScore(quality(srq)) / width(quality(srq)) for some ShortReadQ
> object srq obtained by readFastq) over all reads in the tile.
>
> Martin
>
>
>>> qafq <- qa("./Contam_Screening/Run703/","s_8_1_sequence.txt",
>>>
>> type="fastq")
>>
>>> qafq
>>>
>> class: FastqQA(9)
>> QA elements (access with qa[["elt"]]):
>> readCounts: data.frame(1 3)
>> baseCalls: data.frame(1 5)
>> readQualityScore: data.frame(512 4)
>> baseQuality: data.frame(94 3)
>> alignQuality: data.frame(1 3)
>> frequentSequences: data.frame(50 4)
>> sequenceDistribution: data.frame(1663 4)
>> perCycle: list(2)
>> baseCall: data.frame(150 4)
>> quality: data.frame(1081 5)
>> perTile: list(2)
>> readCounts: data.frame(0 4)
>> medianReadQualityScore: data.frame(0 4)
>>
>> Thank You,
>> Sirisha
>>
>>
>>> sessionInfo()
>>>
>> R version 2.11.0 Under development (unstable) (2010-03-07 r51225)
>> x86_64-unknown-linux-gnu
>>
>> locale:
>> [1] C
>>
>> attached base packages:
>> [1] stats graphics grDevices utils datasets methods base
>> other attached packages:
>> [1] ShortRead_1.5.21 lattice_0.18-3 Biostrings_2.15.22
>> [4] GenomicRanges_0.1.0 IRanges_1.5.74 Rmpi_0.5-8
>> loaded via a namespace (and not attached):
>> [1] Biobase_2.7.5 grid_2.11.0 hwriter_1.2 tools_2.11.0
>>
>> _______________________________________________
>> Bioc-sig-sequencing mailing list
>> Bioc-sig-sequencing at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>
>
>
>
More information about the Bioc-sig-sequencing
mailing list