[R-sig-ME] Calculating upper and lower confidence limits on a population estimate derived from multiple point estimates
Lenth, Russell V
russell-lenth at uiowa.edu
Thu Jul 31 02:59:35 CEST 2014
As I understand it, you want a confidence interval for the sum of all 150000 true predictions - is that right?
You can't get that by summing the confidence limits. You should sum the rows of your mm matrix FIRST:
summ <- apply(mm, 2, sum)
Then compute
pred <- sum(summ * fixef(glmmadmb_object))
predvar <- sum(summ * (vcov(glmmadmb_object) %*% summ))
(This is just what you did before, specialized to the 1-row case. The result is a single number (aka a 1x1 matrix). Then your CI is
pred + c(LCL=-1, est=0, UCL=1) * 1.96 * sqrt{predvar)
There's a potential issue with the fact that the vcov() result is a 150000 x 150000 matrix. But I suppose it's rather sparse and stored that way?
Russ
Russell V. Lenth - Professor Emeritus
Department of Statistics and Actuarial Science
The University of Iowa - Iowa City, IA 52242 USA
Voice (319)335-0712 (Dept. office) - FAX (319)335-3017
Thank you for your response everyone. I should provide more information on
the method I used.
1.) I fit the model using the glmmadmb package with a negative binomial
distribution.
2.) I created a model matrix (mm) of fixed effects with new data
(n=~150,000), and multiplied this matrix with the vcov(glmmadmb_model)
matrix, according to some excellent documentation and code examples
developed online (r-sig-wikidot-FAQ-GLMM, etc.):
predvar<-diag(mm%*%vcov(glmmadmb_object)%*%t(mm))
3.) Then I took the square root of predvar to get the SE of each prediction
point:
newdata$SE<-sqrt(predvar)
4.) I made predictions with the fixed effects of the glmmadmb model, and
transformed that response with the link (exp(prediction)).
5.) To compute the upper and lower ranges of population estimates at each
point, I used the following:
(exp transformed prediction) +/- 1.96(SE)
The result is three columns of ~150,000 observations in my new dataset:
upper CI, lower CI, and transformed prediction.
As my question states I was hoping to simply sum the columns for a
reach-wide, minimum and maximum population estimate. However, I understand
the caveats of using glmms in this way for predictions. Have I missed
something? I'm hoping to use this method for a manuscript intended for
publication, so it needs to be sound, if that is even possible with all
things considered.
More information about the R-sig-mixed-models
mailing list