[BioC] qrqc with variable length of short reads? - readSeqFile could not handle a 2GB zipped file.
Sang Chul Choi
schoi at cornell.edu
Tue Jun 5 21:01:09 CEST 2012
I have tried to tunn off the option when reading sequences of variable lengths in a gzipped FASTQ file (2GB) using readSeqFile. The computer has 16 GB memory, and it used up all of the memory, leaving R in "Dead" or not running any more. Is there a way of sidestepping this problem?
Thank you,
SangChul
On Jun 1, 2012, at 4:55 PM, Vince Buffalo wrote:
> Hi SangChul,
>
> By default readSeqFile hashes a proportion of the reads to check against many being non-unique. Specify hash=FALSE to turn this off and your memory usage will decrease.
>
> Best,
> Vince
>
> Sent from my iPhone
>
> On Jun 1, 2012, at 1:23 PM, Sang Chul Choi <schoi at cornell.edu> wrote:
>
>> Hi,
>>
>> I am using qrqc to plot base quality of a short read fastq file. When the FASTQ file has short reads of the same length, the readSeqFile could read in the FASTQ file (25 millions of 100bp reads) with a couple of GB of memory. I trimmed 3' end of the short reads, which would lead to short reads of variable length because of different base quality at the 3' end. Then, I tried to read in this second FASTQ file of reads of variable length. It used up all of the 16 GB memory, and not using CPUs at all. It seems there are some efficient code in readSeqFile as mentioned in the readSeqFile help message. It seems to fall apart when short reads are of different size.
>>
>> I wish to see how the trimming change the base-quality plots, and this is a problem. I am wondering if there is a way of sidestepping this problem.
>>
>> Thank you,
>>
>> SangChul
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list