[R] counting sequence mismatches
Martin Morgan
mtmorgan at fhcrc.org
Sat Feb 23 03:41:41 CET 2008
One kind of ugly solution
> d.f=data.frame(seq1, seq2, stringsAsFactors=FALSE)
> d.f[["nMismatch"]] <- with(d.f, {
+ m <- mapply("!=", strsplit(seq1, ""), strsplit(seq2, ""))
+ colSums(m)
+ })
Check out the Bioconductor Biostrings package, especially the version
available with the development version of R, for DNA string algorithms.
Martin
joseph wrote:
> Hello
> I have 2 columns of short sequences that I would like to compare and count the number of mismatches and record the number of mismatches in a new column. The sequences are part of a data frame that looks like this:
> seq1=c("CGGTGTAGAGGAAAAAAAGGAAACAGGAGTTC","CGGTGGTCAGTCTGGGACCTGGGCAGCAGGCT", "CGGGCCTCTCGGCCTGCAGCCCCCAACAGCCA")
> seq2=c("AGGTGTAGAGGAAAAAAAGGAAACAGGAGTTC","CAGTGGTCAGTCTGGGACCTGGGCATCAGGCT", "CGGGCCTCTCGGCCTGCAGCCCCCAACAGCCA")
> d.f=data.frame(seq1, seq2)
> thank you for your help
> Joseph
>
>
>
>
>
>
> ____________________________________________________________________________________
> Looking for last minute shopping deals?
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list