[Bioc-sig-seq] a wired problem
Harris A. Jaffee
hj at jhu.edu
Wed Sep 28 23:45:19 CEST 2011
I forgot to say that in this code,
> result <- vcountPattern(PCR2rc, seqs, max.mismatch=1, min.mismatch=0,
> with.indels=TRUE, algorithm="indels")
> reads <- reads[result]
you do NOT want to select from 'reads' using 'result' as is, since
its values are all 0 or 1:
> str(result)
int [1:25000] 0 0 0 0 0 0 0 0 0 0 ...
> table(result)
result
0 1
24959 41
I suppose you are getting 41 instances of your first read, namely
40-letter "DNAString" instance
seq: GTTTTCTCATTNGAAATTTTGTTACCGCAAAANNNCCACC
when instead, you want something like
reads[result == 1]
On Sep 28, 2011, at 5:27 PM, Harris A. Jaffee wrote:
> I don't understand where your 'subject1' sequence:
>
>> subject1 = "GTTGGTGCAAACATTAGTTCTTCTGTTGGTGCAACCTTTG"
>
> came from.
>
> Modulo that, everything looks fine.
>
> In my hands, your vcountPattern call finds 41 40-letter elements
> of 'seqs', all equal to GATCGGAAGAGCTCGTATGCCGTCTTCTGCTTGAAACAAA,
> that match your 41-letter pattern, subject to your
>
> max.mismatch=1, with.indels=TRUE'
>
> They are all equal to your pattern with its first letter deleted.
>
> I haven't looked closely, but I don't see a reason to doubt this
> result.
>
> The following calls would seem to be the sanity checks you want.
> They at least confirm the 41 fuzzy matches.
>
> > countPattern(PCR2rc, "GATCGGAAGAGCTCGTATGCCGTCTTCTGCTTGAAACAAA",
> max.mismatch=1, with.indels=TRUE)
> [1] 1
>
> > matchPattern(PCR2rc, "GATCGGAAGAGCTCGTATGCCGTCTTCTGCTTGAAACAAA",
> max.mismatch=1, with.indels=TRUE)
>
> Views on a 40-letter BString subject
> subject: GATCGGAAGAGCTCGTATGCCGTCTTCTGCTTGAAACAAA
> views:
> start end width
> [1] 1 40 40 [GATCGGAAGAGCTCGTATGCCGTCTTCTGCTTGAAACAAA]
>
> On Sep 28, 2011, at 4:14 PM, wang peter wrote:
>
>> dear all:
>> Using vcountPattern, i found some matched sequences.
>> but those are not similar to the pattern.
>> see such coding
>>
>> rm(list=ls())
>> reads <- readFastq(fastqfile);#downloaded from
>> http://biocluster.ucr.edu/~tbackman/query.fastq
>> seqs <- sread(reads);
>> PCR2rc<-DNAString("AGATCGGAAGAGCTCGTATGCCGTCTTCTGCTTGAAACAAA")
>> result <- vcountPattern(PCR2rc, seqs, max.mismatch=1, min.mismatch=0,
>> with.indels=TRUE, algorithm="indels")
>> reads <- reads[result]
>> seqs <- sread(reads)
>> sum(result)
>> then using countPattern, i found they are really not match
>>
>> subject1 = "GTTGGTGCAAACATTAGTTCTTCTGTTGGTGCAACCTTTG"
>> result <- countPattern(PCR2rc, subject1, max.mismatch=1,
>> min.mismatch=0,
>> with.indels=TRUE)
>> [1] 0
>>
>> shan gao
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioc-sig-sequencing mailing list
>> Bioc-sig-sequencing at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
More information about the Bioc-sig-sequencing
mailing list