[BioC] a possible bug of trimLRPatterns
Martin Morgan
mtmorgan at fhcrc.org
Fri Mar 9 12:10:54 CET 2012
On 03/08/2012 04:40 PM, wang peter wrote:
> reads<- readFastq(fastqfile);
> seqs<- sread(reads);
> max.mismatchs<- mismatch_rate*1:nchar(DNAString(PCR2rc))
> trimmedCoords<- trimLRPatterns(Rpattern = PCR2rc, subject = seqs,
> max.Rmismatch= max.mismatchs, with.Rindels=T,ranges=T)
>
>> end(trimmedCoords)[1:20]
> [1] 22 18 20 33 14 22 22 20 22 22 22 15 20 37 19 13 20 22 0 34
>> start(trimmedCoords)[1:20]
> [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
>
> there is a "0" in the end of trimmedCoords so i cannot get the trimmed sequences
The sequence has been trimmed entirely
> as.character(Views(DNAString("AA"), IRanges(1, 1)))
[1] "A"
> as.character(Views(DNAString("AA"), IRanges(1, 0)))
[1] ""
Martin
>
> trimmed3End<- narrow(reads, start=end(trimmedCoords), end=width(reads))
>
> R version 2.14.1 (2011-12-22)
> Platform: x86_64-redhat-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=C LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] ShortRead_1.12.4 latticeExtra_0.6-19 RColorBrewer_1.0-5
> [4] Rsamtools_1.6.3 lattice_0.20-0 Biostrings_2.22.0
> [7] GenomicRanges_1.6.7 IRanges_1.12.6
>
> loaded via a namespace (and not attached):
> [1] Biobase_2.14.0 bitops_1.0-4.1 BSgenome_1.22.0 grid_2.14.1
> [5] hwriter_1.3 RCurl_1.91-1 rtracklayer_1.14.4 tools_2.14.1
> [9] XML_3.9-4 zlibbioc_1.0.0
>
--
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
Location: M1-B861
Telephone: 206 667-2793
More information about the Bioconductor
mailing list