[R] speeding up regressions using ddply
    Alison Macalady 
    ali at kmhome.org
       
    Wed Sep 22 13:05:12 CEST 2010
    
    
  
Hi,
I have a data set that I'd like to run logistic regressions on, using  
ddply to speed up the computation of many models with different  
combinations of variables.  I would like to run regressions on every  
unique two-variable combination in a portion of my data set,  but I  
can't quite figure out how to do using ddply.  The data set looks like  
this, with "status" as the binary dependent variable and V1:V8 as  
potential independent variables in the logistic regression:
m <- matrix(rnorm(288), nrow = 36)
colnames(m) <- paste('V', 1:8, sep = '')
x <- data.frame( status = factor(rep(rep(c('D','L'), each = 6), 3)),
                as.data.frame(m))
I used melt to put my data frame into a more workable format
require(reshape)
xm <- melt(x, id = 'status')
Here is the basic shape of the function I'd like to apply to every  
combination of variables in the dataset:
h<- function(df)
{
attach(df)
log.glm <- (glm(status ~ value1+ value2 , family=binomial(link=logit),  
na.action=na.omit)) #What I can't figure out is how to specify 2  
different variables (I've put value1 and value2 as placeholders) from  
the xm to include in the model
glm.summary<-summary(log.glm)
aic <- extractAIC(log.glm)
coef <- coef(glm.summary)
list(Est1=coef[1,2], Est2=coef[3,2],  AIC=aic[2]) #or whatever other  
output here
}
And then I'd like to use ddply to speed up the computations.
require(pplyr)
output<-dddply(xm, .(variable), as.data.frame.function(h))
output
I can easily do this using ddply when I only want to use 1 variable in  
the model, but can't figure out how to do it with two variables.
Many thanks for any hints!
Ali
--------------------
Alison Macalady
Ph.D. Candidate
University of Arizona
School of Geography and Development
& Laboratory of Tree Ring Research
    
    
More information about the R-help
mailing list