[Bioc-sig-seq] using subseq() in Biostrings to insert bases
Hervé Pagès
hpages at fhcrc.org
Tue Jul 20 22:45:34 CEST 2010
Hi Andrew,
On 07/19/2010 08:11 PM, Andrew Yee wrote:
> Is there a way to use subseq() to insert nucleotides?
>
> Take for example foo:
>
> foo<- DNAStringSet('ACTTA')
>
> I'd like to insert e.g. a G between positions 2 and 3 so that foo looks like
> 'ACGTTA' Is there a way to do this using subseq()? Or is an alternative
> function recommended?
Just do:
> foo <- DNAString('ACTTA')
> subseq(foo, start=3, end=2) <- DNAString("G")
> foo
6-letter "DNAString" instance
seq: ACGTTA
Generally speaking, if performance is important, you should get better
results by *not* switching back and forth between DNAString/DNAStringSet
objects and regular character vectors. For example, if you want to
replace the nucleotides starting at pos 101 in Human chrY by those in
chrM:
o Using subseq<-():
> library(BSgenome.Hsapiens.UCSC.hg19)
> chrY <- unmasked(Hsapiens$chrY)
> chrM <- unmasked(Hsapiens$chrM)
> system.time(subseq(chrY, start=101, width=length(chrM)) <- chrM)
user system elapsed
0.190 0.010 0.193
> gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 1158959 61.9 1835812 98.1 1394696 74.5
Vcells 15445470 117.9 17364147 132.5 15568274 118.8
o Using substr():
> library(BSgenome.Hsapiens.UCSC.hg19)
> chrY <- unmasked(Hsapiens$chrY)
> chrM <- unmasked(Hsapiens$chrM)
> system.time({tmp <- as.character(chrY);
tmp2 <- paste(substr(tmp, start=1, stop=100),
as.character(chrM),
substr(tmp, start=101+length(chrM),
stop=length(chrY)),
sep="");
chrY <- DNAString(tmp2)})
user system elapsed
1.860 0.230 2.088
> gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 1128874 60.3 1835812 98.1 1368491 73.1
Vcells 30276832 231.0 35483245 270.8 30284765 231.1
Using subseq<-() is faster and more memory efficient. It's also
more convenient.
Cheers,
H.
>
> Thanks,
> Andrew
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319
More information about the Bioc-sig-sequencing
mailing list