[R-sig-ME] cAIC

Sat Mar 16 19:02:36 CET 2013

Oliver Soong <osoong+r at ...> writes:

> 
> Hi,
> 
> I'm using a linear mixed effects model on estimates of plant cover in
> different years in plots situated along transects within different
> zones.  The full set of random effects is (1 | year) + (1 |
> zone/transect/plot), where each term is treated categorically.  I'm
> interested in determining whether the plot-level random effect is
> worth including in the model, and of course, that's where the trouble
> begins.
> 
> I'm thinking in terms of AIC.  Of course, the problem with AIC is
> determining the d.f. of random effects.  As I've read a number of
> times, the appropriate d.f. lies somewhere between 1 and the number of
> random effect groups/clusters, depending on the random effect variance
> lying somewhere in (0,Inf).
> 
> In my naive view of things, then, the true AIC lies somewhere between
> AIC1 based on 1 d.f. per random effect and AICn based on the number of
> random effect groups.  Dangerously pursuing this naive train of
> thought, given two nested models in which the additional random
> effects is itself nested (e.g., zone/transect vs. zone/transect/plot),
> there is some d.f. per random effect at which the AIC differ by 2.

 [snip]

> I looked into Vaida & Blanchard (2005) and Greven & Kneib (2010).  I
> won't pretend to understand everything they're doing, but I did dig up
> the code used by G&K.  I think I managed to adapt it to work with lme4
> in addition to nlme (through RLRsim) ... [snip]

> At the presumption of trying to use this code, I noticed something
> surprising.  I'm comparing my zone/transect model against my
> zone/transect/plot model.  The difference in log-likelihood is ~0.2.
> Even using AIC1, I would conclude that adding the plot-level random
> effects does not significantly improve the model fit, although I can
> only claim that the simpler model with only the transect-level random
> effect is significantly better if the plot-level random effect
> represents at least ~1.1 degree of freedom.  cAIC favors the model
> with plot-level random effects (the difference in cAIC is ~1.5), which
> is the bias G&K address.  However, their ccAIC also favors the model
> with plot-level random effects, and the difference in ccAIC is even
> larger (~4.8).  

[snip]

> I have 1863 observations and 21 fixed effects (including intercept).
> Except for the additional plot-level random effect, the two fitted
> fitted models are essentially identical (differences in coefficients
> for terms shared between the zone/transect and zone/transect/plot
> models are less than 1% of the corresponding estimated variances).
> I'm using R 2.15.2 and lme4 0.999999.0.

  I'm not going to review the code (sorry), although it seems
potentially very useful.  I have one general comment and one code
comment.

  General comment: the log-likelihood that you're comparing is the
marginal log-likelihood (i.e. the likelihood at the whole population
level), where as [c]cAIC is trying to assess the predictive accuracy
at the plot-level, so it may sometimes surprise you by choosing what
seem to be overly complex models ... this is closely related to the
issue of "level of focus" that's frequently discussed in the context
of DIC (the deviance information criterion, for multilevel Bayesian
models).

  Code comment: I think you can get a lot of what you need
directly from lme4 via the getME() function: getME(m,"X"),
getME(m,"Z"), getME(m,"y"), lme4::sigma(m)^2; you can get
the random-effects variance with VarCorr(m) ... of course, the
RLRsim approach does let you handle nlme and lme4 together more
easily ...

  Ben Bolker

 [code snipped to make gmane happy -- sorry]