[Bioc-sig-seq] remove adaptor before remove barcode

Harris A. Jaffee hj at jhu.edu
Tue Sep 27 00:45:17 CEST 2011


As I understand the question, you have first removed a 5' barcode  
("GCATT"),
and you're wondering if you should have actually left it there to  
facilitate
trimming by your 3' adaptor (which might not occur at the 3' end).  I  
don't
believe so.

Now, if your barcode-trimmed read ("ATCGAGAT...") actually contained  
your 3'
adaptor ("AATCGAGAT...") or at least a prefix of it, things would be  
easier.
Then we could try to proceed somewhat normally as follows:

> subject
>   
> ATCGAGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCGTATGCCGTCTTCTGCTTGAAAAA 
> AAAAAAATATT


> Rpattern
> AATCGAGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCGTATGCCGTCTTCTGCTTG

   NNN = paste(rep("N", nchar(subject)-nchar(Rpattern)), collapse="")
   new_pattern = paste(Rpattern, NNN, sep="")
   trimLRPatterns(Rpattern=new_pattern, subject=subject,  
Rfixed="subject")

But since this subject only contains the substring of your 3' adaptor  
starting
at letter 2, this trimLRPatterns call won't accomplish anything.  So,  
it seems
that we have to take one of these two approaches:

	1) a 'new_pattern' like above but with at least 1 more N on the right,
	making it 1 letter longer than the subject, "on the left", if you will
or
	2) we have to allow for some edits (with indels)

Possibly, 1) can be made to work, but I get an error:

 > new2
[1]  
"AATCGAGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCGTATGCCGTCTTCTGCTTGNNNNN 
NNNNNNNNNNN"
 > nchar(new2)
[1] 82
 > nchar(subject)
[1] 81
 > trimLRPatterns(Rpattern=new2, subject=subject, Rfixed="subject",  
max.Rmismatch=1)
Error in .Call2("solve_user_SEW", refwidths, start, end, width,  
translate.negative.coord,  :
   solving row 1: 'allow.nonnarrowing' is FALSE and the supplied  
start (0) is < 1

If this is just a bug that can be fixed, then we might be done.

For 2), I would like to do this:

 > trimLRPatterns(Rpattern=new_pattern, subject=subject,  
Rfixed="subject",
	with.Rindels=TRUE, max.Rmismatch=2)
   0-letter "DNAString" instance
seq:

I think this is the result you want.

Note that max.Rmismatch >= 2 is needed because of this:

 > neditEndingAt(new_pattern, subject, ending.at=nchar(subject),  
with.indels=TRUE,
	fixed="subject")
[1] 2

However (!), what I've just done in the last 2 calls is NOT publicly  
available.
You would get:

	when 'with.indels' is TRUE, only 'fixed=TRUE' is supported

The ability to find adaptors which do not "flank" where they normally  
should
is in the works, by me.  Hopefully in finite time, but even so I fear  
it will
be a little short of your question.

On Sep 26, 2011, at 9:37 AM, wang peter wrote:
> dear all:
>  i found a problem trimLRPatterns cannot allow this situation
> that is the Rpattern has a left shift from subject sequence, see below
>
> subject
>   
> ATCGAGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCGTATGCCGTCTTCTGCTTGAAAAA 
> AAAAAAATATT
>
> Rpattern
> AATCGAGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCGTATGCCGTCTTCTGCTTG
>
> so i change the code to keep the 5' barcode GCATT, so the pattern  
> can be
> recognized
>
>
> subject
>   
> GCATTATCGAGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCGTATGCCGTCTTCTGCTTG 
> AAAAAAAAAAAATATT
>
> Rpattern
>             
> AATCGAGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCGTATGCCGTCTTCTGCTTG
>
> after remove 3' adaptor, i can remove the 5' barcode
>
> any ideas?
>
> shan gao
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing



More information about the Bioc-sig-sequencing mailing list