[R-sig-ME] Nested random effect or unbalanced design?

Thu Sep 4 15:34:16 CEST 2014

Dear list, 

I am using lmer to identify a mixed effects model, but I am puzzled by whether my design has nested random effects or whether it is just unbalanced.

I have an experiment where subjects are repeatedly measured a physiological variable A. Each subject sees a specific level of a variable B once every time they are measured. Theory goes that higher B gives higher A. 

The trick is that levels of B are drawn randomly from some distribution for each subject. These draws are not reproducible in other experiments, so B should be incorporated somehow as a random effect.

Now if I want to know how B affects A as a random effect, I start from the model 

(M1) A ~ (1|B) + time + (1|subject)

But in this model the random effect B has a very low variance. So I specify B as a fixed effect with the model

(M2) A ~ B + time + (1|subject)

This model is desirable because I am mostly interested in stating how B affects A as a fixed effect, not as a nuisance.

However, if I understand nesting correctly, (M2) is not sufficient because subjects are nested in B. As a toy example, B has levels 1,2,3,4,5 and there are five repeated measurements. Subject 1 sees B={1,2,3,2,2}, subject 2 sees B={4,2,1,1,2}, subject 3 sees B={1,2,3,4,5}, and subject 4 sees only one level B={4,4,4,4,4}. A cross-tabulation of this looks like 
> xtabs(~ B+subject, sparse=T)
5 x 4 sparse Matrix of class "dgCMatrix"
  S1 S2 S3 S4
1  1  2  1  .
2  3  2  1  .
3  1  .  1  .
4  .  1  1  5
5  .  .  1  .

Therefore, because each subject sees their unique set of levels of B, I would use the model with nested random effects

(M3) A ~ B + time + (1|subject) + (1|B:subject)

Here I get the result that the variance of the random effect B:subject is very small relative to variance of B or the residual. A likelihood ratio test does not see (M3) different than the simpler model (M2).

If I decide to go with (M2), I try to keep randomness maximal by examining the random slopes

(M4) A ~ B + time + (1+B|subject)

But here, the random intercept and slope are highly or perfectly correlated. My interpretation of this situation is that the by-subject random slopes are unidentifiable because of the nesting; for example, S4 in the toy example cannot have a "slope" for B.

How should I proceed? Should I just forget about the fact that B is a random effect and pretend that my design is unbalanced? 

Thank you in advance,

Ilkka Leppänen
Aalto University