[Bioc-sig-seq] remove adaptor before remove barcode
Harris A. Jaffee
hj at jhu.edu
Tue Sep 27 00:45:17 CEST 2011
As I understand the question, you have first removed a 5' barcode
("GCATT"),
and you're wondering if you should have actually left it there to
facilitate
trimming by your 3' adaptor (which might not occur at the 3' end). I
don't
believe so.
Now, if your barcode-trimmed read ("ATCGAGAT...") actually contained
your 3'
adaptor ("AATCGAGAT...") or at least a prefix of it, things would be
easier.
Then we could try to proceed somewhat normally as follows:
> subject
>
> ATCGAGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCGTATGCCGTCTTCTGCTTGAAAAA
> AAAAAAATATT
> Rpattern
> AATCGAGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCGTATGCCGTCTTCTGCTTG
NNN = paste(rep("N", nchar(subject)-nchar(Rpattern)), collapse="")
new_pattern = paste(Rpattern, NNN, sep="")
trimLRPatterns(Rpattern=new_pattern, subject=subject,
Rfixed="subject")
But since this subject only contains the substring of your 3' adaptor
starting
at letter 2, this trimLRPatterns call won't accomplish anything. So,
it seems
that we have to take one of these two approaches:
1) a 'new_pattern' like above but with at least 1 more N on the right,
making it 1 letter longer than the subject, "on the left", if you will
or
2) we have to allow for some edits (with indels)
Possibly, 1) can be made to work, but I get an error:
> new2
[1]
"AATCGAGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCGTATGCCGTCTTCTGCTTGNNNNN
NNNNNNNNNNN"
> nchar(new2)
[1] 82
> nchar(subject)
[1] 81
> trimLRPatterns(Rpattern=new2, subject=subject, Rfixed="subject",
max.Rmismatch=1)
Error in .Call2("solve_user_SEW", refwidths, start, end, width,
translate.negative.coord, :
solving row 1: 'allow.nonnarrowing' is FALSE and the supplied
start (0) is < 1
If this is just a bug that can be fixed, then we might be done.
For 2), I would like to do this:
> trimLRPatterns(Rpattern=new_pattern, subject=subject,
Rfixed="subject",
with.Rindels=TRUE, max.Rmismatch=2)
0-letter "DNAString" instance
seq:
I think this is the result you want.
Note that max.Rmismatch >= 2 is needed because of this:
> neditEndingAt(new_pattern, subject, ending.at=nchar(subject),
with.indels=TRUE,
fixed="subject")
[1] 2
However (!), what I've just done in the last 2 calls is NOT publicly
available.
You would get:
when 'with.indels' is TRUE, only 'fixed=TRUE' is supported
The ability to find adaptors which do not "flank" where they normally
should
is in the works, by me. Hopefully in finite time, but even so I fear
it will
be a little short of your question.
On Sep 26, 2011, at 9:37 AM, wang peter wrote:
> dear all:
> i found a problem trimLRPatterns cannot allow this situation
> that is the Rpattern has a left shift from subject sequence, see below
>
> subject
>
> ATCGAGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCGTATGCCGTCTTCTGCTTGAAAAA
> AAAAAAATATT
>
> Rpattern
> AATCGAGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCGTATGCCGTCTTCTGCTTG
>
> so i change the code to keep the 5' barcode GCATT, so the pattern
> can be
> recognized
>
>
> subject
>
> GCATTATCGAGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCGTATGCCGTCTTCTGCTTG
> AAAAAAAAAAAATATT
>
> Rpattern
>
> AATCGAGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCGTATGCCGTCTTCTGCTTG
>
> after remove 3' adaptor, i can remove the 5' barcode
>
> any ideas?
>
> shan gao
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
More information about the Bioc-sig-sequencing
mailing list