[Bioc-sig-seq] limit to character length for read.DNAStringSet()
Steve Lianoglou
mailinglist.honeypot at gmail.com
Wed Sep 22 17:05:30 CEST 2010
Hi,
On Wed, Sep 22, 2010 at 10:51 AM, Andrew Yee <yee at post.harvard.edu> wrote:
> Is there a limit to the number of characters in a line for read.DNAStringSet()?
<snip>
>> bar <- read.DNAStringSet(filepath='~/sandbox/foo.fasta', format='fasta')
> Error in .read.fasta.in.XStringSet(filepath, set.names, elementType, lkup) :
> reading FASTA file : cannot read line 2, line is too long
Apparently so :-)
Assuming your on a *nix-type machine, you can use the `fold` command
(from the terminal) to pretty easily fix your problem ... you would
have to assume a maxlength for your header(?) lines (the ones in your
fasta file that start with ">"). Since you've already shown that the
read.DNAStringSet function can handle line lengths of 2000, maybe you
can use that (or some smaller number), if you like.
>From terminal:
$ fold -w 2000 foo.fasta > foo.folded.fasta
Then fire up R and do you reading as usual.
-steve
--
Steve Lianoglou
Graduate Student: Computational Systems Biology
| Memorial Sloan-Kettering Cancer Center
| Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact
More information about the Bioc-sig-sequencing
mailing list