[R] String manipulation
    Gabor Grothendieck 
    ggrothendieck at gmail.com
       
    Sun Feb 13 18:40:13 CET 2011
    
    
  
On Sun, Feb 13, 2011 at 10:27 AM, Megh Dal <megh700004 at gmail.com> wrote:
> Please consider following string:
>
> MyString <- "ABCFR34564IJVEOJC3434"
>
> Here you see that, there are 4 groups in above string. 1st and 3rd groups
> are for english letters and 2nd and 4th for numeric. Given a string, how can
> I separate out those 4 groups?
>
Try this.  "\\D+" and "\\d+" match non-digits and digits respectively.
 The portions within parentheses are captures and passed to the c
function.  It returns a list with a component for each element of
MyString.  Like R's split it returns a list with a component per
element of MyString but MyString only has one element so we get its
contents using  [[1]].
> library(gsubfn)
> strapply(MyString, "(\\D+)(\\d+)(\\D+)(\\d+)", c)[[1]]
[1] "ABCFR"   "34564"   "IJVEOJC" "3434"
Alternately we could convert the relevant portions to numbers at the
same time.  ~ list(...) is interpreted as a  function whose body is
the right hand side of the ~ and whose arguments are the free
variables, i.e. s1, s2, s3 and s4.
strapply(MyString, "(\\D+)(\\d+)(\\D+)(\\d+)", ~ list(s1,
as.numeric(s2), s3, as.numeric(s4)))[[1]]
See http://gsubfn.googlecode.com for more.
-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com
    
    
More information about the R-help
mailing list