[R] Interpretation of residual variance components and scale parameters in GLMMs

Mon Oct 1 18:09:59 CEST 2007

Dear R-listers,

I am working with generalized linear mixed models to quantify the
variance due to two nested random factors, but have hit a snag in the
interpretation of variance components.  Despite my best efforts with
Venables & Ripley 2002, Fahrmeir & Tutz 2001, R-help archives, Google,
and other eminent sources (i.e. local R gurus), I have not been able
to find a definitive answer explaining when the value reported for the
residual error represents a scale parameter, and when it represents
the actual variance (or standard error) and can be interpreted
directly.

As I understand it, the residual variance of a glm or glmm with
logarithm link is defined as (pi^2)/3, and so most GL(M)M programs and
modules do not report this value, but just report the scale parameter
(if estimated, otherwise it is 1). For example, this is the case in
lmer, where the output statement gives 'Estimated scale (compare to
1)'. This reported scale parameter (but not the variance components of
the random terms) then needs to be multiplied by (pi^2)/3 to give the
actual residual variance.

However, it remains unclear to me whether this is necessary in
glmmPQL, or whether the reported 'residual standard error' is
(similarly to the standard error attributed to the other random terms
in the model) the exact value of sigma, and should simply be squared
to get the actual residual variance. In other words, and I am unsure
if glmmPQL actually estimates the residual variance component instead
of just scaling the expected value.

Here's a simplified example dataset (d) to explain what I mean:

d = data.frame( "maleID" = c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4),
"successes" = c(5,8,4,6,0,0,4,2,0,0,8,10,0,0,2,0), "status" =
c('a','a','a','a','b','b','a','a','b','b','a','a','b','b','b','b') )

d$maleID = factor(d$maleID)

Essentially:  males are found in different status categories; there is
variance in success within each status group; and the variance in
success between status groups is considerably greater than that within
status groups.

> model1 = lmer(successes~1 + (1|status) + (1|status:maleID),family = poisson, data = d, method = "PQL")
> summary(model1)

Generalized linear mixed model fit using PQL
Formula: successes ~ 1 + (1 | status) + (1 | status:maleID)
   Data: d
 Family: poisson(log link)
   AIC   BIC logLik deviance
 29.97 32.29 -11.99    23.97
Random effects:
 Groups        Name        Variance Std.Dev.
 status:maleID (Intercept) 0.15423  0.39273
 status        (Intercept) 2.12881  1.45904
number of obs: 16, groups: status:maleID, 6; status, 2

Estimated scale (compare to  1 )  1.010882

Fixed effects:
            Estimate Std. Error z value Pr(>|z|)
(Intercept)    0.295      1.089  0.2709    0.786

Question 1:
So, in lmer, am I right to multiply the "estimated scale"(here,
1.010483) by the expected variance of a glm with logarithmic link
function (1.010483*((3.14159^2)/3) to get the actual residual
variance?  And is it also correct that no change would be made to the
variance components of the random terms, so that variance due to
status = 2.12881, and variance of individuals within status categories
= 0.15423?

Question 2:
At risk of sparking comments about the problems of comparing glmmPQL
and lmer (NOT the point of this question): running the same data in
glmmPQL gives essentially the same values for random variance
components, with 1.010882 listed as the "residual StDev" (model2 =
glmmPQL(successes~1, random =(~1|status/maleID),family = poisson, data
= d).  Does that mean that the residual variance is actually
1.010882^2?  Or should it be interpreted as a scale parameter, as in
lmer?

Thanks for any insight you can offer!

Emily DuVal
Postdoctoral Fellow, Max Planck Institute for Ornithology