[R] How to import ENSEMBL text data using R

Charles C. Berry cberry at tajo.ucsd.edu
Tue Jan 1 00:58:58 CET 2008


On Mon, 31 Dec 2007, mohamed nur anisah wrote:

> Dear all,
>  I have a data which is in text file and i would like to import the data
>  to R. From the manual, i've found the read.table command function is
>  the most appropriate but when i wrote the command an error had occur.
>  It say 'Error in read.table"C:/Users/user/Documents/cfa-1.txt", header
>  = T, sep = "\t",skip=10) :more columns than column names'. Please help
>  me with this as i'm a first time user to R.

First, did you read

        R Data Import/Export
        2 Spreadsheet-like data

especially 2.1 Variations on read.table ??

If not go there and study up - there are many useful hints.

Looking at your file I see complications. Consider these questions:

 	What is the field separator?

 	     looks like '\t', but ...

 	Do you have the same number of field separators (and fields)
 	in every row?

 	   apparently not, and there seem to be unusual variations on
 	   the record structure - like tabs missing where I would have
 	   expected to see them (starting line line CSO 8.4 following
 	   the first ']') and text fields in some but not all records
 	   - and the use of square brackets to enclose some fields and
 	   reverse square brackets for others is new to me!

 	Did some gremlin edit this file in WORD or EXCEL or otherwise
 	corrupt it?

 	    If so all bets are off. Tell whoever did this to you to
 	    go memorize
 	    http://www.burns-stat.com/pages/Tutor/spreadsheet_addiction.html
 	    and make him/her/them promise never to do it again.


If you do not have the same number of field separators in each row,
you will quickly run aground without resorting to more programmerish
tricks.

If it were me, I'd swallow all of the data ( or a few thousand lines
for exploratory purposes ) with readLines() and then use string
processing and regular expression trickery to decipher the
records. But if you are not skilled in that art it may take you a
while to catch on.

Also, I might see if there is another output format available (XML?)
that R might parse, and/or I'd see if there is an annotation or
package on BioConductor that can give what is needed (consider posting
to that list, but state the problem you want to solve broadly rather
than just posting the same troublesome line of code as here).

HTH,

Chuck

p.s. Did you read the Posting Guide (as requested)? There have been
lots of read.table questions posted to this list and plenty of
guidance on getting past read.table hiccups.


>
>  Thanks in advance.
>
>  Cheers,
>  Anisah
>
>
>
> ---------------------------------
>

Charles C. Berry                            (858) 534-2098
                                             Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu	            UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901




More information about the R-help mailing list