[R] Regular Expression
Henrik Singmann
henrik.singmann at psychologie.uni-freiburg.de
Tue Jul 24 19:52:06 CEST 2012
Hi,
one problem, many solutions, only one of which uses regular expression but work equally well.
dat1<-read.table(text="
MONTH QUARTER YEAR
2012-07 2012-3 2012
2001-07 2001-3 2001
2002-01 2002-1 2002
",sep="",as.is = TRUE, header=TRUE)
# using substr:
substr(dat1$MONTH, 6,7)
substr(dat1$QUARTER, 6,7)
# using strsplit:
vapply(strsplit(dat1$MONTH, "-"), "[", i = 2, "")
vapply(strsplit(dat1$QUARTER, "-"), "[", i = 2, "")
# using sub:
sub("[[:digit:]]*-", "", dat1$MONTH)
sub("[[:digit:]]*-", "", dat1$QUARTER)
all produce the desired outcome.
[1] "07" "07" "01"
and
[1] "3" "3" "1"
IF the data is regularly like this, I personally would prefer substr.
Cheers,
Henrik
Am 24.07.2012 19:36, schrieb Fred G:
> Hi--
>
> I have three columns in an input file:
> MONTH QUARTER YEAR
> 2012-07 2012-3 2012
> 2001-07 2001-3 2001
> 2002-01 2002-1 2002
>
> I want to make output like so:
> MONTH QUARTER YEAR
> 07 3 2012
> 07 3 2001
> 01 1 2002
>
> I was having some trouble getting the regular expression to work. I think
> it should be something like the following:
> tmp <- uncurated$MONTH
> *tmp <- gsub("[^-\\d\\d]","",tmp,perl=TRUE)*
> *tmp[tmp=="-"] <- ""*
> *curated$MONTH <- tmp*
> *
> *
> tmp <- uncurated$QUARTER
> *tmp <- gsub("[^-\\d]","",tmp,perl=TRUE)*
> *tmp[tmp=="-"] <- ""*
> *curated$QUARTER <- tmp*
> *
> *
> *but it's not quite working. I want to be able to isolate any digits that
> occur after the hyphen and to delete everything before and including the
> hyphen. Would greatly appreciate any clarification anyone can provide.*
>
> [[alternative HTML version deleted]]
>
--
Dipl. Psych. Henrik Singmann
PhD Student
Albert-Ludwigs-Universität Freiburg, Germany
http://www.psychologie.uni-freiburg.de/Members/singmann
More information about the R-help
mailing list