[R] How to import ENSEMBL text data using R

Martin Morgan mtmorgan at fhcrc.org
Tue Jan 1 01:14:13 CET 2008


The biomaRt package might be what you're interested in. See the
Bioconductor web site

http://bioconductor.org

for details, in particular

http://bioconductor.org/download

for instructions on downloading Bioconductor packages and

http://www.bioconductor.org/packages/release/Software.html

for current links to package descriptions. The biomaRt package has a
vignette (biomaRt.pdf) that provides extensive illustration, and can
be perused before package download.

As Charles suggests, if you choose the biomaRt option then follow-up
questions will most effectively be answered on the Bioconductor
mailing list. You'll want to include the output of sessionInfo(), too.

Congratulations, Charles, on the first posts of the new year!

Martin

"Charles C. Berry" <cberry at tajo.ucsd.edu> writes:

> On Mon, 31 Dec 2007, mohamed nur anisah wrote:
>
>> Dear all,
>>  I have a data which is in text file and i would like to import the data
>>  to R. From the manual, i've found the read.table command function is
>>  the most appropriate but when i wrote the command an error had occur.
>>  It say 'Error in read.table"C:/Users/user/Documents/cfa-1.txt", header
>>  = T, sep = "\t",skip=10) :more columns than column names'. Please help
>>  me with this as i'm a first time user to R.
>
> First, did you read
>
>         R Data Import/Export
>         2 Spreadsheet-like data
>
> especially 2.1 Variations on read.table ??
>
> If not go there and study up - there are many useful hints.
>
> Looking at your file I see complications. Consider these questions:
>
>  	What is the field separator?
>
>  	     looks like '\t', but ...
>
>  	Do you have the same number of field separators (and fields)
>  	in every row?
>
>  	   apparently not, and there seem to be unusual variations on
>  	   the record structure - like tabs missing where I would have
>  	   expected to see them (starting line line CSO 8.4 following
>  	   the first ']') and text fields in some but not all records
>  	   - and the use of square brackets to enclose some fields and
>  	   reverse square brackets for others is new to me!
>
>  	Did some gremlin edit this file in WORD or EXCEL or otherwise
>  	corrupt it?
>
>  	    If so all bets are off. Tell whoever did this to you to
>  	    go memorize
>  	    http://www.burns-stat.com/pages/Tutor/spreadsheet_addiction.html
>  	    and make him/her/them promise never to do it again.
>
>
> If you do not have the same number of field separators in each row,
> you will quickly run aground without resorting to more programmerish
> tricks.
>
> If it were me, I'd swallow all of the data ( or a few thousand lines
> for exploratory purposes ) with readLines() and then use string
> processing and regular expression trickery to decipher the
> records. But if you are not skilled in that art it may take you a
> while to catch on.
>
> Also, I might see if there is another output format available (XML?)
> that R might parse, and/or I'd see if there is an annotation or
> package on BioConductor that can give what is needed (consider posting
> to that list, but state the problem you want to solve broadly rather
> than just posting the same troublesome line of code as here).
>
> HTH,
>
> Chuck
>
> p.s. Did you read the Posting Guide (as requested)? There have been
> lots of read.table questions posted to this list and plenty of
> guidance on getting past read.table hiccups.
>
>
>>
>>  Thanks in advance.
>>
>>  Cheers,
>>  Anisah
>>
>>
>>
>> ---------------------------------
>>
>
> Charles C. Berry                            (858) 534-2098
>                                              Dept of Family/Preventive Medicine
> E mailto:cberry at tajo.ucsd.edu	            UC San Diego
> http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M2 B169
Phone: (206) 667-2793




More information about the R-help mailing list