[Bioc-sig-seq] write.XStringSet() terribly slow

Steffen Neumann sneumann at ipb-halle.de
Fri Apr 16 15:17:16 CEST 2010


Hi,

I have some major performance problems writing fasta files
with Biostrings. I have the full Arabidopsis Chr1 (30MByte) in one DNAString,
and writing that to a file takes ages, as you see from the strace output
below: I obtain ~5 lines (80 chars each) per second. The runtime
of the system call <in brackets> is neglectible.

library(Biostrings)
chromosome <-read.DNAStringSet("Chr1_TAIR9.fasta", "fasta")
write.XStringSet(chromosome, file="/tmp/test.fasta", format="fasta")

Is there a fundamental flaw in my thinking ? 
Is there an alternative to write.XStringSet() ?
This happens both on my laptop and a beefy server.

I also tried the (ancient) IRanges_1.0.16 and Biostrings_2.10.22,
and get ~11 lines per second. 

Yours,
Steffen

13:06:09.949290 write(4, "TAGGAGTTGATGAAGACATCTAACGAAAATTC"..., 80) = 80 <0.000137>
13:06:10.138835 write(4, "GTGCTCAGGCTTCATTGATAAGGAAAGAAACA"..., 80) = 80 <0.000142>
13:06:10.328395 write(4, "AAAGCAGAAACCGACGTGAAATATTACAGAGA"..., 80) = 80 <0.000133>
13:06:10.537475 write(4, "AGACTACTCGAGAATCATTGCACTGAAGAAAG"..., 80) = 80 <0.000159>
13:06:10.727281 write(4, "AAGTGAAAAGAGAAAGAGAATGTGTGATGTGT"..., 80) = 80 <0.000133>
13:06:10.916854 write(4, "CTTTGCTTTAAATGCAATCAGCTTCACGAGAA"..., 80) = 80 <0.000136>
13:06:11.105687 write(4, "GATTCAAGCTCGTTTCGCTCGCTCCGGGTGAA"..., 80) = 80 <0.000594>

sessionInfo()
R version 2.10.0 (2009-10-26)
x86_64-unknown-linux-gnu

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] Biostrings_2.14.12 IRanges_1.4.16

loaded via a namespace (and not attached):
[1] Biobase_2.6.0

-- 
IPB Halle                    AG Massenspektrometrie & Bioinformatik
Dr. Steffen Neumann          http://www.IPB-Halle.DE
Weinberg 3                   http://msbi.bic-gh.de
06120 Halle                  Tel. +49 (0) 345 5582 - 1470
                                  +49 (0) 345 5582 - 0
sneumann(at)IPB-Halle.DE     Fax. +49 (0) 345 5582 - 1409



More information about the Bioc-sig-sequencing mailing list