[R] regular expression for na.strings / read.table
jessica.gervais at tudor.lu
jessica.gervais at tudor.lu
Tue Feb 12 15:30:30 CET 2008
Dear all,
I am working with a csv file.
Some data of the file are not valid and they are marked with a star '*'.
For example : *789.
I have attached with this email a example file (test.txt) that looks like
the data I have to work with.
I see 2 possibilities ..thast I cannot manage anyway in R:
1-first & easiest solution:
Read the data with read.csv in R, and define as na strings all cells
containing a star (*).
Something which would looks like this ...
>
DATA<-read.csv("test.txt",na.strings=list(length(grep("\\*",DATA,value=T))==0))
> DATA
X1 X.789 LNM. X78 X56 X89 X56.1 X100
1 2 700 AUW 78 56 89 56 100
2 3 400 TOC 78 56 89 56 10
3 4 389 RMN 78 56 89 56 *89
4 5 400 LNM 78 56 *452 56 100
5 6 200 UTC 78 *40 89 56 100
6 7 100 GAT 78 56 8 56 *100
7 8 79 *LNM 78 56 9 56 100
8 9 89 TCG 78 56 800 56 *100
9 10 78* LNM 78 56 89 56 100
...but which would work (Stars are still there)! Do anyone knows how to do
that ?
2-Second solution:
- first read the file with DATA<-read.csv("test.txt")
- then replace all fields containing a * with NA in applying the following
function to the object DATA:
DATA_cleaned<-apply(DATA,c(1,2),function(x){if(length(grep("\\*",x,value=TRUE))==1){x<-NA}})
DATA_cleaned
X1 X.789 LNM. X78 X56 X89 X56.1 X100
[1,] NULL NULL NULL NULL NULL NULL NULL NULL
[2,] NULL NULL NULL NULL NULL NULL NULL NULL
[3,] NULL NULL NULL NULL NULL NULL NULL NA
[4,] NULL NULL NULL NULL NULL NA NULL NULL
[5,] NULL NULL NULL NULL NA NULL NULL NULL
[6,] NULL NULL NULL NULL NULL NULL NULL NA
[7,] NULL NULL NA NULL NULL NULL NULL NULL
[8,] NULL NULL NULL NULL NULL NULL NULL NA
[9,] NULL NA NULL NULL NULL NULL NULL NULL
stars have deaseper, but all the rest too !
The pb comes from the fact that if a field does not contain any *, the
command
if(length(grep("\\*",x,value=T))==1) return NULL instead of FALSE !
I you have any idea, please let me know !
Many thanks,
Jessica
____________________________________
Jessica Gervais
Mail: jessica.gervais at tudor.lu
Resource Centre for Environmental Technologies,
Public Research Centre Henri Tudor,
Technoport Schlassgoart,
66 rue de Luxembourg,
P.O. BOX 144,
L-4002 Esch-sur-Alzette, Luxembourg
(See attached file: test.txt)
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: test.txt
Url: https://stat.ethz.ch/pipermail/r-help/attachments/20080212/b67d1cbd/attachment.txt
More information about the R-help
mailing list