[Bioc-sig-seq] Size of Illumina fastaq files to be read in shortReads

Anastasia Gioti anastasia.gioti at ebc.uu.se
Wed Jun 24 20:37:33 CEST 2009


Dear list,
I just started playing with shortReads package in order to read fastaq  
files from the illumina analyzer, and i have some issues.
The most important is the fact that the readFastaq crashes because of  
memory I suppose when i try to read files >1GB. Ex:
fqpattern='s_3_1_sequence.txt'
 > afrN=file.path(analysisPath(sp), fqpattern)
 > afrN
[1] "/Users/nat/Data/Illumina/Solexa_disk_modforR/Data/ 
HJSN_FC1_280409_3//Data/C1-C55Firecrest/Bustard1.3.2_06-05-2009_rdixon/ 
GERALD_06-05-2009_rdixon/s_3_1_sequence.txt"
 > afrNq=readFastq(sp, fqpattern)
Error: cannot allocate vector of size 27.0 Mb
R(1337,0xa07a2720) malloc: *** mmap(size=28340224) failed (error  
code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
R(1337,0xa07a2720) malloc: *** mmap(size=28340224) failed (error  
code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug

I only succeeded in reading a file < 1GB, but i suppose that the  
shortReads class is designed for big files ;-).
Another minor issue is the names of the folders in the Illumina output  
directory that I need to designate in exptPath so that  
p=SolexaPath(exptPath) is correctly parsed. I finally managed to find  
the logic behind this, but I would like to confirm that the path  
absolutely needs to contain this string: Data/C1- 
C(readlength)Firecrest. At least in my hands it would not work with  
other names (which are currently produced by illumina, for ex IPAR  
instead of Firecrest). Is that correct? Maybe this parser is hard  
coded for previous versions of Illumina outputs? In that case is there  
any plan to update it? Although this is not very important

I use R2.8 on a Leopard with 8GB of memory, so I think that my problem  
with fastq does not come from my computer...
Any help /suggestions are welcome!
Thank you,

Anastasia Gioti
Post-Doc, Evolutionary Biology Department
Upssala University
Norbyvagen 18D
SE-752 36  UPPSALA
anastasia.gioti at ebc.uu.se
Tel: +46-18-471 6465
Fax: +46-18-471 6310



More information about the Bioc-sig-sequencing mailing list