[R] Maximum number of patterns and speed in grep
mdvaan
mathijsdevaan at gmail.com
Fri Jul 6 16:45:48 CEST 2012
Hi,
I am using R's grep function to find patterns in vectors of strings. The
number of patterns I would like to match is 7,700 (of different sizes). I
noticed that I get an error message when I do the following:
data <- array()
for (j in 1:length(x))
{
array[j] <- length(grep(paste(patterns[1:7700], collapse = "|"), x[j],
value = T))
}
When I break this up into 4 chunks of patterns it works:
data <- array()
for (j in 1:length(x))
{
array$chunk1[j] <- length(grep(paste(patterns[1:2500], collapse = "|"),
x[j], value = T))
array$chunk1[j] <- length(grep(paste(patterns[2501:5000], collapse = "|"),
x[j], value = T))
array$chunk1[j] <- length(grep(paste(patterns[5001:7500], collapse = "|"),
x[j], value = T))
array$chunk1[j] <- length(grep(paste(patterns[7501:7700], collapse = "|"),
x[j], value = T))
}
My questions: what's the maximum size of the patterns argument in grep? Is
there a way to do this faster? It is very slow.
Thanks.
Math
Sorry for not providing a reproducible example. It's a size issue which
makes it difficult to provide an example.
--
View this message in context: http://r.789695.n4.nabble.com/Maximum-number-of-patterns-and-speed-in-grep-tp4635613.html
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help
mailing list