[R-sig-ME] Troublesome example of lmer fitting an overidentified model; Stata bad too.

Wed Feb 24 18:48:32 CET 2016

I agree that the model is overspecified.  What is happening in the two fits
is that the likelihood surface is flat, up to round-off, in certain
directions so the optimizer converges close to the starting estimates.
Stata is using an EM algorithm and after one step the fixed-effects terms
involving variety will have taken out the variation due to variety.  That's
why the estimates of the variances of the random effects from Stata are
close to zero.

In lmer the starting estimates for the covariance parameters correspond to
a relative covariance factor of the identity.   This, in turn, corresponds
to a covariance matrix for the random effects which is sigma * I.  Notice
that estimates of the residual standard error and both standard errors of
the random effects have converged to 0.592

Detecting this situation is not trivial.  You can do it symbolically but
that involves a lot of symbolic analysis to catch all the possibilities.
You can try to do it numerically by comparing columns in the fixed-effects
model matrix corresponding to model terms and those in the random-effects
model matrix but any such numeric procedure must involve a tolerance and it
is not clear how to set that.  Anyone who has taken a linear algebra course
knows that the rank of a matrix is well defined.  In practice, using
floating-point arithmetic, reliably determining the rank of a matrix is a
notoriously difficult problem - probably an unsolveable problem.

Writing code to detect ill-posed models or failure to converge or other
problematic conditions is a game of whack-a-mole.  You have to guess in
what way the model will be ill-posed then write code to detect this, etc.
This leads to code bloat.  Worse it slows things down, takes up memory,
etc. for all model fits and you need to contend with false positives, which
has been an ongoing issue with the convergence checks in lme4.

Eventually it comes down to the extent to which the developers of the code
feel the need to protect users from themselves. Ben is much more inclined
to do this that I am.  I take the approach of telling the user "don't do
that".  Of course, that doesn't help the next user who tries to do the same
thing.

On Wed, Feb 24, 2016 at 11:18 AM Paul Johnson <pauljohn32 at gmail.com> wrote:

> I'm teaching mixed models, working through a lot of homework problems
> in the 2 volume set Multilevel and Longitudinal Modeling using Stata,
> by S Rave-Hesketh and A. Skrondal. If you did not look at it yet, I
> expect you'll find lots of useful, interesting exercises.
>
> Their Exercise 4.4 is about yield from 10 wheat varieties.  The variables
> are:
>
> "plot"     "variety"  "yield"    "moist"
>
> variety is the grouping variable in the exercise, but it comes in as a
> floating point number. I thought that was the root of the surprise
> described here, but now I don't think so.
>
> I was reading one student's homework in Stata and I was
> stunned/surprised that Stata's "mixed" function would fit a random
> effects model that included the grouping variable as a fixed effect
> predictor as well.  This appeared to be grossly overidentified to me.
>
> Here's the stata input and output,
>
> . mixed yield c.moist##i.variety || variety:moist, mle stddev
>
> Performing EM optimization:
>
> Performing gradient-based optimization:
>
> Iteration 0:   log likelihood = -42.123671
> Iteration 1:   log likelihood = -41.540417
> Iteration 2:   log likelihood = -41.520012
> Iteration 3:   log likelihood = -41.520012
>
> Computing standard errors:
>
> Mixed-effects ML regression                     Number of obs     =
>  60
> Group variable: variety                         Number of groups  =
>  10
>
>                                                 Obs per group:
>                                                               min =
>   6
>                                                               avg =
> 6.0
>                                                               max =
>   6
>
>                                                 Wald chi2(19)     =
>  37795.42
> Log likelihood = -41.520012                     Prob > chi2       =
>  0.0000
>
>
> ---------------------------------------------------------------------------------
>           yield |      Coef.   Std. Err.      z    P>|z|     [95%
> Conf. Interval]
>
> ----------------+----------------------------------------------------------------
>           moist |   .6077128   .0124644    48.76   0.000      .583283
>   .6321425
>                 |
>         variety |
>              2  |  -2.844581   .8429087    -3.37   0.001    -4.496652
>  -1.192511
>              3  |  -1.871277   .7903544    -2.37   0.018    -3.420343
>  -.3222105
>              4  |  -.3432833   .7323671    -0.47   0.639    -1.778696
>    1.09213
>              5  |   .2294764   1.121912     0.20   0.838     -1.96943
>   2.428383
>              6  |    3.46207    .690504     5.01   0.000     2.108707
>   4.815433
>              7  |  -12.05622   .6729943   -17.91   0.000    -13.37527
>  -10.73718
>              8  |   1.175528   .7302294     1.61   0.107    -.2556957
>   2.606751
>              9  |  -1.434985   .7917805    -1.81   0.070    -2.986846
>   .1168767
>             10  |   3.494313   1.392236     2.51   0.012     .7655807
>   6.223044
>                 |
> variety#c.moist |
>              2  |  -.0354326   .0260328    -1.36   0.173    -.0864559
>   .0155908
>              3  |   .1297872   .0190785     6.80   0.000     .0923941
>   .1671804
>              4  |   .0264085   .0210603     1.25   0.210    -.0148689
>    .067686
>              5  |   .0284318   .0240342     1.18   0.237    -.0186744
>    .075538
>              6  |   .0790988   .0161154     4.91   0.000     .0475132
>   .1106844
>              7  |   .1164752   .0189384     6.15   0.000     .0793566
>   .1535938
>              8  |   .0791545   .0188487     4.20   0.000     .0422118
>   .1160972
>              9  |    .080266   .0190236     4.22   0.000     .0429804
>   .1175516
>             10  |   .0027604   .0282729     0.10   0.922    -.0526535
>   .0581742
>                 |
>           _cons |   34.58378   .5478187    63.13   0.000     33.51007
>   35.65748
>
> ---------------------------------------------------------------------------------
>
>
> ------------------------------------------------------------------------------
>   Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf.
> Interval]
>
> -----------------------------+------------------------------------------------
> variety: Independent         |
>                    sd(moist) |   3.46e-15   2.27e-14      8.96e-21
> 1.34e-09
>                    sd(_cons) |   1.18e-11   1.34e-07             0
>    .
>
> -----------------------------+------------------------------------------------
>                 sd(Residual) |   .4833867   .0441285      .4041925
> .5780976
>
> ------------------------------------------------------------------------------
> LR test vs. linear model: chi2(2) = 0.00                  Prob > chi2 =
> 1.0000
>
> Note: LR test is conservative and provided only for reference.
>
>
> As you can see, Stata deals with the overidentification by reporting
> miniscule estimates for the random effects.
>
> I think its a bug, wondered what lme4 would do. I predicted it would
> refuse to fit at all, as soon as we declare variety as a factor
> (varietyf).  However, that does not happen. The full R runable example
> is below, but here's the bad part:
>
> > m.wrong <- lmer(yield ~ moist*varietyf + (moist|varietyf), data = dat)
> > summary(m.wrong)
> Linear mixed model fit by REML ['lmerMod']
> Formula: yield ~ moist * varietyf + (moist | varietyf)
>    Data: dat
>
> REML criterion at convergence: 157.7
>
> Scaled residuals:
>      Min       1Q   Median       3Q      Max
> -1.74078 -0.52638 -0.00039  0.56385  1.79231
>
> Random effects:
>  Groups   Name        Variance Std.Dev. Corr
>  varietyf (Intercept) 0.3505   0.592
>           moist       0.3505   0.592    0.00
>  Residual             0.3505   0.592
> Number of obs: 60, groups:  varietyf, 10
>
> Fixed effects:
>                   Estimate Std. Error t value
> (Intercept)       34.58378    0.89479   38.65
> moist              0.60771    0.59222    1.03
> varietyf2         -2.84458    1.32918   -2.14
> varietyf3         -1.87128    1.27984   -1.46
> varietyf4         -0.34328    1.22700   -0.28
> varietyf5          0.22948    1.60904    0.14
> varietyf6          3.46207    1.19003    2.91
> varietyf7        -12.05622    1.17489  -10.26
> varietyf8          1.17553    1.22509    0.96
> varietyf9         -1.43498    1.28116   -1.12
> varietyf10         3.49431    1.89960    1.84
> moist:varietyf2   -0.03543    0.83786   -0.04
> moist:varietyf3    0.12979    0.83758    0.15
> moist:varietyf4    0.02641    0.83765    0.03
> moist:varietyf5    0.02843    0.83777    0.03
> moist:varietyf6    0.07910    0.83748    0.09
> moist:varietyf7    0.11647    0.83757    0.14
> moist:varietyf8    0.07915    0.83757    0.09
> moist:varietyf9    0.08027    0.83757    0.10
> moist:varietyf10   0.00276    0.83797    0.00
>
>
>
>
> Here' s my full R
>
>
> ## Paul Johnson <pauljohn at ku.edu>
> ## 20160224
> library(foreign)
> dat <- read.dta("http://www.stata-press.com/data/mlmus3/wheat.dta")
> ## write.dta(dat, file = "whead.dta12")
> ## dat <- read.dta("wheat.dta12")
> summary(dat)
> ## Variety should be an integer, so should plot, but we don't use it
> dat$varietyf <- as.factor(dat$variety)
>
> library(lme4)
>
> ## This is obviously overidentified/incoherent.
> ## lmer should bounce the user out.
> m.wrong <- lmer(yield ~ moist*varietyf + (moist|varietyf), data = dat)
> summary(m.wrong)
>
> ## Continue with the homework
> m1 <-  lmer(yield ~ moist + (1|varietyf), data = dat)
>
> m2 <- lmer(yield ~ moist + (moist|varietyf), data = dat)
> library(lattice)
> dotplot(ranef(m2, condVar = TRUE))
>
> ## Here's a shocker
> anova(m2, m1)
> ## Really? Look at the pictures!
>
> ## Here's a spaghetti plot.
> plot(m2)
> m2coef <- coef(m2)[["varietyf"]]
> m2coef$variety <- rownames(m2coef)
> plot(yield ~ moist, data = dat, col = dat$variety)
> apply(m2coef, 1, function(x){ browser();  abline(a = x[1], b = x[2],
> col = x["variety"])})
>
> ## If Var(e) is small enough, even trivial slope differences are
> ## "statistically significant".
>
>
>
>
>
>
> --
> Paul E. Johnson
> Professor, Political Science        Director
> 1541 Lilac Lane, Room 504      Center for Research Methods
> University of Kansas                 University of Kansas
> http://pj.freefaculty.org              http://crmda.ku.edu
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>

	[[alternative HTML version deleted]]