[R] numerical data frame

Richard.Cotton at hsl.gov.uk Richard.Cotton at hsl.gov.uk
Mon Jan 7 11:07:19 CET 2008


>   I've successfully import my synteny data to R by using scan 
> command. Below show my results. My major problem with my data is how
> am i going to combine the column names with the data( splt) where i 
> have tried on cbind but a warning message occur. I have realized 
> that the splt data only have 5 column instead of 6. Please help me with 
this!!
> 
>   I want my data to be a numerical data with a proper column and 
> column names and to  replace CS with 1 and CSO with 0 and also to 
> get remove all the punctuations and the characters from the data.

>   1)for col names
> 
>   nms<-scan("C:/Users/user/Documents/cfa-1.txt",sep="\t",nlines=1,
> skip=10,what=character(0))
> Read 6 items
> > nms
> [1] "CS(O) id (number of marker/anchor) " 
> [2] " Location(s) on reference " 
> [3] "CS(O) size" 
> [4] "CS(O) density on reference chromosome" 
> [5] "Location(s) on tested  " 
> [6] "Breakpoints CS(O) locations (denstiy of marker/anchor)"
> 
> 2) my data
> 
>   x<-scan("C:/Users/user/Documents/cfa-1.txt",sep="\n",skip=12,
> what=character(0))
> Read 21 items
> > splt<-strsplit(x,"\t")
> > splt
> [[1]]
> [1] "CS 1 (73): "                       " cfa1: [ 3251712 - 24126920 ] " 
 
> [3] "  20875208 "                       " 3 "  
> [5] " hsa18: [ 132170848 - 50139168 ] " "] 24126920, 24153560 [(8 ) "  
>   [[2]]
> [1] "CS 2 (3): "                       " cfa1: [ 24153560 - 24265894 ] " 

> [3] "  112334 "                        " 27 " 
> [5] " hsa18: [ 50105060 - 49934572 ] " "] 24265894, 24823786 [(7 ) " 
>   [[3]]
> [1] "CSO 3.1 (6): "  
> [2] " cfa1: [ 24823786 - 27113036 ] "  
> [3] "  2289250 "  
> [4] " 3  "  
> [5] " hsa18: [ 48121156 - 46579500 ]- Decreasing order - ] 27113036,
> 27418228 [ (13)"
> ...

You are probably better off using read.table or read.delim to get your 
data into R, since you most likely want it in the form of a data frame 
rather than a list.

Otherwise,try this.
#Convert to matrix
datamat <- matrix(unlist(splt),  ncol=6, byrow=TRUE)

#This will remove punctuation, but it looks like you want to do something 
more with some of the columns; I'm just not sure what it is.
nopunct <- gsub("[[:punct:]]", "", datamat) 

#Convert to a data frame
df <- as.data.frame(nopunct)

#Make column 3 numeric (you will probably want to do something like this 
for each one)
df[,3] <- as.numeric(df[,3])

# Set column names
names(df) <- nms

Regards,
Richie.

Mathematical Sciences Unit
HSL


------------------------------------------------------------------------
ATTENTION:

This message contains privileged and confidential inform...{{dropped:20}}




More information about the R-help mailing list