[Bioc-sig-seq] shortRead qa() minor bug
Martin Morgan
mtmorgan at fhcrc.org
Wed May 18 06:18:58 CEST 2011
On 05/17/2011 05:34 PM, Marcus Davy wrote:
> This might also work;
>
> dfOccur <- with(df, {
> nOccur <- tapply(nOccurrences, lane, c)
> cumulative <- tapply(nOccurrences * nReads, lane, function(elt) {
> cs <- cumsum(elt)
> (cs - cs[1] + 1)/max(1, (diff(range(cs))+1)) ## Add 1 to
> difference
> })
> data.frame(nOccurrences = unlist(nOccur), cumulative =
> unlist(cumulative),
> lane =lane, row.names=NULL) ## Removing row names
> })
Thanks Marcus I made the diff()+1 and row.names change to the devel
branch; does it have more than minor aesthetic consequences? Martin
>
> Marcus
>
>
> On Wed, May 18, 2011 at 12:25 PM, Marcus Davy <mdavy86 at gmail.com
> <mailto:mdavy86 at gmail.com>> wrote:
>
> Hi Martin,
> there appears to be a minor bug in one of the qa() plots in the code;
>
> ShortRead:::.plotReadOccurrences
>
> If the code chunk which generates the dataset for plotting the read
> occurences is run manually on an example dataset, the cumulative
> proportion of reads can be slightly greater than 1. It looks like a
> slight error in the denominator of the calculation.
>
> library(ShortRead)
>
> exptPath <- system.file("extdata", package = "ShortRead")
> sp <- SolexaPath(exptPath)
> qa <- qa(sp)
>
> df <- qa[["sequenceDistribution"]]
>
> tmp <- with(df, {
> nOccur <- tapply(nOccurrences, lane, c)
> cumulative <- tapply(nOccurrences * nReads, lane,
> function(elt) {
> cs <- cumsum(elt)
> (cs - cs[1] + 1)/max(1, diff(range(cs))) ## denominator
> under estimates by 1
> })
> lane <- tapply(lane, lane, c)
> data.frame(nOccurrences = unlist(nOccur), cumulative =
> unlist(cumulative),
> lane = unlist(lane))
> })
>
> print(tmp)
> nOccurrences cumulative lane
> s_2_export.txt1 1 0.0008347245 1
> s_2_export.txt2 2 0.0091819699 1
> s_2_export.txt3 3 0.0116861436 1
> s_2_export.txt4 5 0.0158597663 1
> s_2_export.txt5 10 0.0242070117 1
> s_2_export.txt6 1 0.6435726210 1
> s_2_export.txt7 2 0.6519198664 1
> s_2_export.txt8 3 0.6544240401 1
> s_2_export.txt9 9 0.6619365609 1
> s_2_export.txt10 1 0.9991652755 1
> s_2_export.txt11 2 1.0008347245 1
>
> tmp[which(tmp$cumulative>1),]
> nOccurrences cumulative lane
> s_2_export.txt11 2 1.000835 1
>
> tmp <- with(df, {
> nOccur <- tapply(nOccurrences, lane, c)
> cumulative <- tapply(nOccurrences * nReads, lane,
> function(elt) {
> cs <- cumsum(elt)
> (cs - cs[1] + 1)/max(1, (diff(range(cs))+1)) ## Add 1
> to difference
> })
> lane <- tapply(as.character(lane), lane, c) ## as.character
> to stop factor coercing
> data.frame(nOccurrences = unlist(nOccur), cumulative =
> unlist(cumulative),
> lane = unlist(lane), row.names=NULL) ## Removing row names
> })
>
> print(tmp)
> nOccurrences cumulative lane
> 1 1 0.0008340284 s_2_export.txt
> 2 2 0.0091743119 s_2_export.txt
> 3 3 0.0116763970 s_2_export.txt
> 4 5 0.0158465388 s_2_export.txt
> 5 10 0.0241868224 s_2_export.txt
> 6 1 0.6430358632 s_2_export.txt
> 7 2 0.6513761468 <tel:0.6513761468> s_2_export.txt
> 8 3 0.6538782319 s_2_export.txt
> 9 9 0.6613844871 <tel:0.6613844871> s_2_export.txt
> 10 1 0.9983319433 s_2_export.txt
> 11 2 1.0000000000 s_2_export.txt
>
>
>
> One suggestion is that the first part of this code (or similar)
> would be useful in the
> help so example(qa) could be run and testing with reporoducible
> examples can be quicker.
>
> cheers,
>
> Marcus
>
>
> sessionInfo()
> R version 2.13.0 (2011-04-13)
> Platform: i386-pc-mingw32/i386 (32-bit)
>
> locale:
> [1] LC_COLLATE=English_New Zealand.1252 LC_CTYPE=English_New
> Zealand.1252
> [3] LC_MONETARY=English_New Zealand.1252 LC_NUMERIC=C
> [5] LC_TIME=English_New Zealand.1252
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] ShortRead_1.10.3 Rsamtools_1.4.1 lattice_0.19-23
> [4] Biostrings_2.20.1 GenomicRanges_1.4.3 IRanges_1.10.3
>
> loaded via a namespace (and not attached):
> [1] Biobase_2.12.1 grid_2.13.0 hwriter_1.3
>
>
>
>
--
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
Location: M1-B861
Telephone: 206 667-2793
More information about the Bioc-sig-sequencing
mailing list