[BioC] bug in Biostrings mismatchTable?
Janet Young
jayoung at fhcrc.org
Thu Oct 11 02:13:11 CEST 2012
Hi there,
I think I've found a bug in mismatchTable (Biostrings). It's reporting a mismatch after the end of the reported alignment. I think the code below shows the problem.
thanks, as usual!
Janet
#####
library(Biostrings)
### couple of seqs, the middle portion aligns, but the last few bases don't. I'm not interested in those last few bases, so I do a local alignment
seq1 <- DNAString("GCTGAAGTAGTTCTCCAGAA")
seq2 <- DNAString("GTAGTTCTCCAAAGT")
aln1 <- pairwiseAlignment ( seq1, seq2, type="local" )
aln1
# Local PairwiseAlignmentsSingleSubject (1 of 1)
# pattern: [7] GTAGTTCTCCA
# subject: [1] GTAGTTCTCCA
# score: 21.79932
end(pattern(aln1))
# [1] 17
mismatchTable(aln1)
# PatternId PatternStart PatternEnd PatternSubstring PatternQuality
#1 1 18 18 G 7
# SubjectStart SubjectEnd SubjectSubstring SubjectQuality
#1 12 12 A 7
#### the one mismatch that's reported is after the end of the alignment as reported above. There's another mismatch after the end of the alignment that wasn't reported
sessionInfo()
R Under development (unstable) (2012-10-03 r60868)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] Biostrings_2.27.2 IRanges_1.17.0 BiocGenerics_0.5.0
loaded via a namespace (and not attached):
[1] parallel_2.16.0 stats4_2.16.0
More information about the Bioconductor
mailing list