[R] slow with indexing with is.na
    arun 
    smartpink111 at yahoo.com
       
    Thu May 15 12:58:56 CEST 2014
    
    
  
Hi,
If you can convert the data.frame to matrix, there would be some improvement.
For e.g.
fun1 <- function(data){
data[is.na(data)] <- 0
data}
fun2 <- function(data){
mat <- as.matrix(data)
mat[is.na(mat)] <-0
mat}
fun3 <- function(data){
mat <- as.matrix(data)
indx <- which(is.na(mat), arr.ind=TRUE)
mat[indx] <- 0
mat}
fun4 <- function(data){
 mat <- as.matrix(data)
 indx <- is.na(mat)
 mat[indx] <- 0
 mat}
 
set.seed(4853)
dat1 <- as.data.frame(matrix(sample(c(NA,1:20),3e3*3e3,replace=TRUE),ncol=3e3))
system.time(res1 <- fun1(dat1))
# user  system elapsed 
#  1.224   0.040   1.267 
system.time(res2 <- fun2(dat1))
#  user  system elapsed 
#  0.368   0.052   0.420 
system.time(res3 <- fun3(dat1))
#user  system elapsed 
#  0.170   0.052   0.223 
system.time(res4 <- fun4(dat1))
#   user  system elapsed
#  0.277   0.075   0.354 
 identical(res1,as.data.frame(res2))
#[1] TRUE
identical(res1,as.data.frame(res3))
#[1] TRUE
A.K.
Hi,
I am new to r (with experience in Matlab).  I'm still exploring with the syntax and learning to think in a R way.  
I have some data (3000 x 3000) in data.frame class and the following code seems to perform very slow.  
data[is.na(data)] = 0
Would be good get some comments on this from some experienced users.  Thanks.   
    
    
More information about the R-help
mailing list