[R] strapply and characters adjacent to the matched pattern
mdvaan
mathijsdevaan at gmail.com
Wed Jul 25 22:34:44 CEST 2012
Thanks Gabor. That worked really well. I have been reading about the use of
POSIX and regular expressions and I tried to use your example to see if I
could ignore all matches in which the character preceding (rather than
following) the match is one of [:alpha:]? So far, I have been unsuccessful.
Could anyone help me out here or direct me to a source that explains the
combined use of POSIX and regular expressions? Thanks!
require(gsubfn)
# this only checks for the characters following the match and therefore
matches also matches the third element
# however I want it to match only the 2nd, 5th and 6th elements
strapply(c("abc", "ab", "abdef", "defc", "def", " def "),
"(def|ab)($|[^[[:alpha:]])")
The outcome should look like this:
[[1]]
NULL
[[2]]
[1] "ab"
[[3]]
NULL
[[4]]
NULL
[[5]]
[1] "def"
[[6]]
[1] "def"
Gabor Grothendieck wrote
>
> On Tue, Jul 24, 2012 at 5:06 PM, mdvaan <mathijsdevaan@> wrote:
>> Hi,
>>
>> In the example below, one of the searched patterns "SE" is matched in the
>> word "second". I would like to ignore all matches in which the character
>> following the match is one of [:alpha:]. How do I do this without
>> removing
>> the "ignore.case = T" argument of the strapply function? Thank you very
>> much!
>>
>> # load library
>> require(gsubfn)
>> # read in data
>> data <- c("Santa Fe Gold Corp|Starpharma Holdings|SE")
>> # define the object to be searched
>> text <- c("the first is Santa Fe Gold Corp", "the second is Starpharma
>> Holdings")
>> # match
>> strapply(text, data, ignore.case = T)
>>
>> The preferred outcome would be:
>>
>> [[1]]
>> [1] "Santa Fe Gold Corp"
>>
>> [[2]]
>> [1] "Starpharma Holdings"
>>
>> instead of:
>>
>> [[1]]
>> [1] "Santa Fe Gold Corp"
>>
>> [[2]]
>> [1] "se" "Starpharma Holdings"
>>
>>
>
> Try this:
>
>> strapply(c("abc", "ab", "ab def"), "(ab|d)($|[^[[:alpha:]])")
> [[1]]
> NULL
>
> [[2]]
> [1] "ab"
>
> [[3]]
> [1] "ab"
>
>
> --
> Statistics & Software Consulting
> GKX Group, GKX Associates Inc.
> tel: 1-877-GKX-GROUP
> email: ggrothendieck at gmail.com
>
> ______________________________________________
> R-help@ mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
View this message in context: http://r.789695.n4.nabble.com/strapply-and-characters-adjacent-to-the-matched-pattern-tp4637673p4637835.html
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help
mailing list