[Bioc-sig-seq] About ShortRead Package
Pratap, Abhishek
APratap at som.umaryland.edu
Fri Jul 10 18:12:10 CEST 2009
Hi
I understand the error now. I had used split command based on file size
so the last line (indicated by the line number) was incomplete.
Everything works now.
Thanks,
-Abhi
-----Original Message-----
From: Martin Morgan [mailto:mtmorgan at fhcrc.org]
Sent: Friday, July 10, 2009 12:06 PM
To: Pratap, Abhishek
Cc: bioc-sig-sequencing at r-project.org
Subject: Re: [Bioc-sig-seq] About ShortRead Package
Pratap, Abhishek wrote:
> Hi Martin
>
> Sorry to bother you again. May be I am a bit excited to get this
running
> to see the functionalities.
> I created small chunks of export file I want to use (250 Mb each).
>
> Loading them gives me following error.
>
>
aln=readAligned("/home/apratap/dev/R-workspace/Illumina-test",type="Sole
> xaExport","s_6_export_part1.txt")
> Error: Input/Output
> 'readAligned' failed to parse files
> dirPath: '/home/apratap/dev/R-workspace/Illumina-test'
> pattern: 's_6_export_part1.txt'
> type: 'SolexaExport'
> error: too few fields,
>
/home/apratap/dev/R-workspace/Illumina-test/s_6_export_part1.txt:1576504
the :1576504 indicates the line number that is causing the problem;
since it's so far into the file I guess that it's the place where split
did it's work, i.e., the end of part1. Did you split based on number of
lines, so each file is the same? What does tail s_6_export_part1.txt
look like?
Also so that we are on the same page what is the output of the R command
sessionInfo() ?
Martin
>
> PS: It is a standard Illumina export file.
>
> Thanks,
> -A
>
> -----Original Message-----
> From: Martin Morgan [mailto:mtmorgan at fhcrc.org]
> Sent: Friday, July 10, 2009 11:25 AM
> To: Pratap, Abhishek
> Cc: bioc-sig-sequencing at r-project.org
> Subject: Re: [Bioc-sig-seq] About ShortRead Package
>
> Hi Abhi --
>
> Pratap, Abhishek wrote:
>> Hi All
>>
>>
>>
>> I have recently started to acquaint my self with new R packages for
> NGS
>> data processing/analysis. I think the community has done a great
work
>> in developing these packages. I must say I am amazingly surprised to
> see
>> some of the capabilities.
>>
>>
>>
>> I have a quick comment. While playing with ShortRead package I am
> not
>> able to successfully load the export file even one for that matter. I
> am
>> using a single PC to do this . I belv it has It has sufficient
memory
>> (4 GB) to handle one lane of data. I waited for 15-18 minutes before
> my
>> pc started to show sign of sickness. I eventually had to kill the
>> process.
>>
>>
>>
>> Here is wat I did.
>>
>>
>>
>> Library(ShortRead)
>>
>> sp=SolexaPath("/local/seq_archive/solexa/090309_HWI-EAS397_0006")
>>
>> path=analysisPath(sp)[4] ### I just wanted to look at one GERALD
>> folder.
>>
>> aln=readAligned(path,type="SolexaExport","s_6_export.txt")
>
> One tricky point is the number of files specified by your pattern.
Does
>
> list.files(path, "s_6_export.txt")
>
> return just a single file? if not (e.g., because there are both .txt
and
> .txt.gz versions of s_6_export) then specify the pattern more
precisely,
> e.g., "^s_6_export.txt$"
>
> It might be that your computer does not have enough memory. Very
> roughly, for this initial stage, you might expect R to require 3-5
times
> as much memory as the file occupies on disk. If your reads are
> relatively short, your files might be 500MB or so and you might be
fine,
> but if your reads are longer the files could be > 1GB and you'd be in
> trouble.
>
> There are several options, the best being to use a computer with more
> memory. You could also split the export file (using unix 'split'
> command, for instance). We have also been working on making input more
> space- and time- efficient, so the version of ShortRead available with
> the development version of R will do a better job (but still require
> considerable memory).
>
> Martin
>
>> GOT STUCK HERE
>>
>>
>>
>> Is there anything I am doing the wrong way. Please let me know.
>>
>>
>>
>> Cheers,
>>
>> -Abhi
>>
>> -----------------------------
>> Abhishek Pratap
>> Bioinformatics Software Engineer
>> Institute for Genome Sciences <http://www.igs.umaryland.edu/>
>> School of Medicine, Univ of Maryland
>> 801, W. Baltimore Street, Baltimore, MD 21209
>> Ph: (+1)-410-706-2296
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioc-sig-sequencing mailing list
>> Bioc-sig-sequencing at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>
More information about the Bioc-sig-sequencing
mailing list