[R-sig-ME] Plans for lme4-1.0

Sat Dec 13 20:23:21 CET 2008

The discussion of the quasi families in the thread "Re: [R-sig-ME]
single argument anova for GLMMs (really, glmer, or dispersion?)" has
helped to clarify my thinking about the types of models that I plan to
be able to fit in the 1.0 release of lme4.  I know this won't include
every model that users would like to be able to fit but I hope it will
include some useful models.  (Ben mentioned George Box's famous quote
about models.  The complete quote is "All models are wrong; some
models are useful.")

The model is defined by the conditional distribution of the response
given the random effects,  (Y|B=b) in my notation, and the
unconditional distribution of the random effects.  The
lmer/glmer/nlmer functions are based on a multivariate Gaussian
unconditional distribution of the random effects with E[B] = 0 and
Var(B) = \Sigma(\theta).  That is, the covariance matrix of the random
effects is a parameterized, positive semidefinite, symmetric matrix.
The parameters, \theta, determine the left Cholesky factor,
\Lambda(\theta), of \Sigma(\theta), for conditional distributions
without a common scale parameter, or \Sigma(\theta)/\sigma^2, for
conditional distributions, such as the Gaussian, with a common scale
parameter, \sigma.

The conditional distribution, (Y|B=b), must satisfy

 1) The scalar conditional distributions are independent and from the
same distribution family.

 2) The scalar conditional distributions are completely determined by
the conditional mean and, at most, one additional parameter which can
be expressed as a common scale parameter.

 3) The vector of conditional means, \mu, depends on b only through
the linear predictor \gamma = Z\beta+Xb

 4) The map from \gamma to \mu is multi-diagonal (explained below) and
can be expressed as

    \gamma -> \eta -> \mu

where \eta -> \mu is the vectorized inverse link function for a glm
family.  This is a diagonal map in the sense that the Jacobian matrix
for the map is diagonal because the i'th element of \mu depends only
on the i'th element of \eta.  Furthermore, the vector-valued link and
inverse link functions are based on a common scalar link or inverse
link.  (In other words, if you have a logit link for one element of
\mu, you must have the logit link for all elements of \mu.)

The map from \gamma to \eta has a similar property except that the
dimension of \gamma can be a multiple, s, of n, the dimension of \eta,
so the Jacobian matrix d\eta/d\gamma will be of dimension n by ns.
This matrix must be the horizontal concatenation of s diagonal
matrices and they must be generated by a single model function, m,
mapping an s-dimensional vector to a scalar.

The nonlinear model in nlmer defines the map from \gamma to \eta.  The
glm family defines the map from \eta to \mu and also defines the
conditional variance (up to a constant).  The conditional variance
determines the weights in the iteratively reweighted least squares
calculation.

By factoring the model this way it is clear what information must be
evaluated and retained and how one would define a generalized
nonlinear mixed model, which I think will have widespread application
in psychometrics and other areas.

The map shown above takes b to \mu through \gamma and \eta and allows
us to evaluate the Jacobian of this map.  However, that is not the
full story.  We introduce another vector of random effects called U
with a spherical Gaussian marginal distribution.  That is, E[U] = 0
and Var(U) = \sigma^2 I (the name "spherical" comes from the fact that
the contours of constant density are spheres) and define B =
\Lambda(\theta) P' U where P is a fixed permutation matrix derived
from the pattern of nonzeros in Z.  P has no effect on the statistical
theory but is very important in the computation.

Now the effects of all the parameters in the model are incorporated
into the map from u to \mu and we can evaluate the conditional modes
of U and the conditional estimates of \beta and \sigma, given \theta
through a penalized iteratively reweighted least squares (PIRLS)
algorithm.  This, linear or nonlinear, weighted least squares problem
may involve a huge number of variables but the penalty part makes it a
very well conditioned problem.   Furthermore, the Jacobian matrix
d\mu/d u is very sparse.  This is where the sparse matrix methods come
in.  The sparse Cholesky factor of a matrix derived from d\mu/d u
provides the conditional covariance matrix for U given Y from which
the Laplace approximation to the deviance is derived.

Although this may seem like a very long-winded explanation (I seem to
be prone to offering long-winded explanations) the point that I want
to highlight is that the restrictions on the model form, in particular
the assumption of a Gaussian unconditional distribution for the random
effects, B, are not made frivolously.  This assumption is central to
the computational method using PIRLS.  In models with many random
effects in potentially complex structures you must be able to deal
with the random effects efficiently and the PIRLS algorithm combined
with sparse Cholesky factorization does that.

The code in the allcoef branch makes the central role of PIRLS more
explicit.  The name "allcoef" comes from the fact that all the
coefficients (u and \beta) in the linear predictor, \gamma = Z
\Lambda(\theta)P'u + X\beta, are optimized in the PIRLS step.  In the
currently released version of lme4 u and \beta are jointly optimized
for linear mixed models but only u is optimized in this step for
generalized linear mixed models and nonlinear mixed models.

It is still the case that only the brave or foolhardy should try the
code in that branch.  I'm gradually cleaning it up but there are still
many parts that are not working.  In particular mcmcsamp is not
currently working in that branch.  My goal is to get mcmcsamp stable
and to get all the other parts working in this branch then release it
for testing and finally release lme4-1.0.  The subtext in that message
is that only the models I describe above will be fit by the lme4-1.0
package.