[R] gam (mgcv), multiple imputation, f-stats/p-values, and summary(gam)
Andrew Crane-Droesch
andrewcd at gmail.com
Wed May 8 16:12:03 CEST 2013
Dear All,
I'm using gam for a project that involves multiple imputation, and it
has led me to a question about how f-statistics/p-values work in gam.
Specifically, how do the values in summary(gam) get generated? As is
made clear by the dumb example below, I'm manipul;ating gam objects to
reflect the MI procedures that I'm using. I don't trust the
f-statistics/p-values that I'm getting, but I'd like to know how to
further manipulate these objects to get trustworthy values. Part of
that invovles knowing how the output in summary(gam) gets generated, and
from what elements of a gam object.
Here is the example:
library(mgcv)
par(mfrow=c(2,2))
impute <- function (a, a.impute){
ifelse (is.na(a), a.impute, a)
}
#make some dumb fake data
x1.knots = c(-1,-.5,0,.5,1)
x2.knots = c(-1,-.5,0,.5,1)
K=5
x1 = rnorm(100)
x2 = rnorm(100)
y = .05*exp(x1)-.5*x1 + .1*x2 + x1*x2 + rnorm(100)
#some of it is missing
x1[81:100] = NA
x2[71:90] = NA
#do a few dumb imputations, and fit models to the partially-imputed data
x1.imp = impute(x1, rnorm(100))
x2.imp = impute(x2, rnorm(100))
m1 = gam(y~te(x1.imp,x2.imp,k=c(K,K)),knots = list(x1.imp = x1.knots,
x2.imp = x2.knots))
plot(m1,plot.type="contour",scheme=2,main="Imp 1")
x1.imp = impute(x1, rnorm(100))
x2.imp = impute(x2, rnorm(100))
m2 = gam(y~te(x1.imp,x2.imp,k=c(K,K)),knots = list(x1.imp = x1.knots,
x2.imp = x2.knots))
plot(m2,plot.type="contour",scheme=2,main="Imp 2")
x1.imp = impute(x1, rnorm(100))
x2.imp = impute(x2, rnorm(100))
m3 = gam(y~te(x1.imp,x2.imp,k=c(K,K)),knots = list(x1.imp = x1.knots,
x2.imp = x2.knots))
plot(m3,plot.type="contour",scheme=2,main="Imp 3")
results = list(m1,m2,m3)
#Combine the results according to rubin's rules
reps=3
bhat=results[[1]]$coeff
for (i in 2:reps){
bhat=bhat+results[[i]]$coeff
}
bhat = bhat/reps
W=results[[1]]$Vp
for (i in 2:reps){
W = W+results[[i]]$Vp
}
W = W/reps
B= (results[[1]]$coeff-bhat) %*% t(results[[1]]$coeff-bhat)
for (i in 2:reps){
B = B+(results[[i]]$coeff-bhat) %*% t(results[[i]]$coeff-bhat)
}
B=B/(reps-1)
Vb = W+(1+1/reps)*B
#Put the results into a convenient gam object
MI = results[[1]]
MI$coefficients=bhat
MI$Vp = Vb
#and summarize
summary(MI)
plot(MI,plot.type="contour",scheme=2,main="MI")
When I do something similar with non-fake data I get wacky f-statistics
and p-values. For example, F could be >100 and p could equal 1. This
probably has something to do with degrees of freedom.
My easy question: what goes on under the hood with gam to generate
these values? What parts of a gam object are called up?
My harder question: how might one construct principled analogs of these
statistics in an MI context, when degrees of freedom will vary across
models, depending on the imputed data? Has anybody thought about this,
or do I have have some serious pencil-and-paper ahead of me?
Thanks,
Andrew
More information about the R-help
mailing list