[Bioc-sig-seq] perTile QA element in the FastqQA class
Sirisha Sunkara
ssunkara at lbl.gov
Tue Apr 13 20:39:03 CEST 2010
I am sorry, this line should read as:
> fqhead[fqhead$V2 == "8" & fqhead$V3 == "120",]
my dataframe looks like this:
V2 is the lane number, V3 is the tile number, V4 and V5 being the x and
y coordinates of the cluster position.
> head(fqhead)
V1 V2 V3 V4 V5
1 @ILLUMINA06 8 1 6 849
2 @ILLUMINA06 8 1 6 1169
3 @ILLUMINA06 8 1 6 1163
4 @ILLUMINA06 8 1 6 1512
5 @ILLUMINA06 8 1 6 1251
6 @ILLUMINA06 8 1 6 372
7 @ILLUMINA06 8 1 6 1555
8 @ILLUMINA06 8 1 6 1644
9 @ILLUMINA06 8 1 6 2011
10 @ILLUMINA06 8 1 7 1835
Sirisha
Sirisha Sunkara wrote:
> Hi Martin,
>
> I am using the sequence.txt files generated by the Illumina pipeline
> (OLB1.6/RTA1.6) as is, which seem to have the tile coordinates.
>
> Just so I can focus on the ReadIDs part for now (and I am sure this is
> not exactly what you asked for), I parsed out the readIDs from the
> fastq, and am working with those.
>
> This is what my fastqs look like:
>
> @ILLUMINA06:8:1:6:849#0/1
> GCTCTTTTTGATTCTCAAATCCGGCGTCAACCATA
> +ILLUMINA06:8:1:6:849#0/1
> a`abaa_aaa]_a`_a_[]`]a_`aa_`_aa`aaa
> @ILLUMINA06:8:1:6:1169#0/1
> TAATGCCACTCCTCTCCCGACTGTTAACACTGCTG
> +ILLUMINA06:8:1:6:1169#0/1
> ab`_Z_aXa`bbababbbaabaaaaababaaa`V`
>
> My very basic attempt at this:
>
> > fqhead <-
> read.table("./Contam_Screening/Run703/sequence_8_1_hdrs.txt", sep=":")
>
> To extract all entries for instance in lane 8, tile 120:
> > fqhead[fqhead$V2 == "3" & fq$V3 == "120",]
>
> I hope I am somewhat closer to what you asked for...
>
> Thanks a lot!
> Sirisha
>
>
> Martin Morgan wrote:
>> On 04/12/2010 02:49 PM, Sirisha Sunkara wrote:
>>
>>> Hi Martin,
>>>
>>> The qa function that reads in fastq format files, doesn't seem to
>>> populate the perTile QA element with row information...
>>> The row counts are zero for both the readCounts and
>>> medianReadQualityScore list elements of perTile.
>>>
>>> Is this feature still work in progress..? Essentially, I am trying to
>>> get the TileQC plots for lanes where there was no reference genome to
>>> align (no export.txt files)
>>>
>>
>> Hi Sirisha -- fastq files can't be guaranteed to have tile info so
>> ShortRead doesn't try to guess these, even if some software adopts
>> conventions for embedding the information in the read ids.
>>
>> The tile images are generated by
>>
>> ShortRead:::.plotTileCounts
>>
>> and
>>
>> ShortRead:::.plotTileQualityScore
>>
>> both take a regular data.frame. For .plotTileCounts, the columns are
>> 'type' (safe to ignore, I think), 'tile' (integer tile index), 'lane'
>> (integer lane index), and 'count' (number of reads in this particular
>> lane & tile). As an untested work-around, you could create a data frame
>> like this by parsing your read IDs using standard R commands; provide an
>> example of what the read IDs look like and I'll help you. For the
>> .plotTileQualityScore, the columns are 'type', 'tile', 'lane', and
>> 'score', where 'score' is the median 'qualityScore'
>> (alphabetScore(quality(srq)) / width(quality(srq)) for some ShortReadQ
>> object srq obtained by readFastq) over all reads in the tile.
>>
>> Martin
>>
>>
>>>> qafq <- qa("./Contam_Screening/Run703/","s_8_1_sequence.txt",
>>>>
>>> type="fastq")
>>>
>>>> qafq
>>>>
>>> class: FastqQA(9)
>>> QA elements (access with qa[["elt"]]):
>>> readCounts: data.frame(1 3)
>>> baseCalls: data.frame(1 5)
>>> readQualityScore: data.frame(512 4)
>>> baseQuality: data.frame(94 3)
>>> alignQuality: data.frame(1 3)
>>> frequentSequences: data.frame(50 4)
>>> sequenceDistribution: data.frame(1663 4)
>>> perCycle: list(2)
>>> baseCall: data.frame(150 4)
>>> quality: data.frame(1081 5)
>>> perTile: list(2)
>>> readCounts: data.frame(0 4)
>>> medianReadQualityScore: data.frame(0 4)
>>>
>>> Thank You,
>>> Sirisha
>>>
>>>
>>>> sessionInfo()
>>>>
>>> R version 2.11.0 Under development (unstable) (2010-03-07 r51225)
>>> x86_64-unknown-linux-gnu
>>>
>>> locale:
>>> [1] C
>>>
>>> attached base packages:
>>> [1] stats graphics grDevices utils datasets methods
>>> base other attached packages:
>>> [1] ShortRead_1.5.21 lattice_0.18-3 Biostrings_2.15.22
>>> [4] GenomicRanges_0.1.0 IRanges_1.5.74 Rmpi_0.5-8 loaded
>>> via a namespace (and not attached):
>>> [1] Biobase_2.7.5 grid_2.11.0 hwriter_1.2 tools_2.11.0
>>>
>>> _______________________________________________
>>> Bioc-sig-sequencing mailing list
>>> Bioc-sig-sequencing at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>>
>>
>>
>>
>
>
More information about the Bioc-sig-sequencing
mailing list