[Bioc-sig-seq] stringDist; hamming
Patrick Aboyoun
paboyoun at fhcrc.org
Mon Jun 21 19:19:18 CEST 2010
Ludo,
Thanks for your bug report. As Harris mentioned in a private e-mail,
there was an issue at the C-level that resulted in the Hamming distance
being inappropriately capped at 1. I just fixed this in BioC 2.6
(Biostrings 2.16.6) and BioC 2.7 (Biostrings 2.17.8). You can obtain
these new versions from svn directly now, or wait approximately 24-36
hours to download them via bioconductor.org and biocLite.
> words <- c("lazy", "hazy", "dasy")
> stringDist(words, method='hamming')
1 2
2 1
3 2 2
> as.matrix(stringDist(words, method='hamming'))
1 2 3
1 0 1 2
2 1 0 2
3 2 2 0
> sessionInfo()
R version 2.11.1 Patched (2010-05-31 r52167)
i386-apple-darwin9.8.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] Biostrings_2.16.5 IRanges_1.6.8
loaded via a namespace (and not attached):
[1] Biobase_2.8.0
Patrick
On 6/21/10 7:15 AM, Ludo Pagie wrote:
> Hi all,
>
> I want to calculate hamming distance between equal length
> strings, ie, number of substution differences between two
> strings.
> > From the helppage of 'stringDist' I think the following should
> return the same results but they don't. What am I doing/seeing
> wrong?
>
> words<- c("lazy", "hazy", "dasy")
> sapply(words, neditStartingAt,'lazy',starting.at=1)
> lazy hazy dasy
> 0 1 2
> stringDist(words,method='hamming')
> 1 2
> 2 1
> 3 1 1
>
> I want the result as returned by neditStartingAt, clearly.
>
>
>> sessionInfo()
>>
> R version 2.12.0 Under development (unstable) (2010-06-17
> r52313)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets
> methods base
>
> other attached packages:
> [1] Biostrings_2.17.7 IRanges_1.7.7
>
> loaded via a namespace (and not attached):
> [1] Biobase_2.9.0 tools_2.12.0
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>
More information about the Bioc-sig-sequencing
mailing list