[R-sig-ME] Group size in mixed/multilevel model: How to obtain weighted effects.

Sat Aug 9 20:12:14 CEST 2014

Dear list,

I have a relatively basic question regarding the influence of group size in a simple mixed model with 1 grouping factor (i.e., a one-level multilevel model). My data has one numerical DV, one numerical IV, and one grouping factor with 19 levels. Importantly, the number of observations in each group dramatically differs, from 5 to 136 (complete data and code is given below).

My problem is that it seems that (a) only the small groups show an effect and not the large groups and furthermore (b) results of a simple mixed model are not taking into account that the larger groups do not show an effect but tend to reflect something like an unweighted means (i.e., weighing the effect of each groups identically).

In other words: When looking at the data the overall or weighted mean (i.e., not taking grouping into account) and overall correlation between DV and IV are basically 0: Mean = -0.2 and r = -0.04.
In contrast, unweighted means (i.e., same weight for each group) show rather strong effects: Mean = -1.1 and r = -.26. Now it seems that the mixed model points strongly towards the unweighted means although there are dramatic differences in group sizes. The estimates mean is -1.0 and the estimated effect of the IV is also substantial.

After removing the four smallest groups which amount to less than 4% of all data points and are basically the only ones showing a dramatic effect, the values become much more reasonable. Estimated mean = -0.5 and effect of IV also smaller.

My question is what to do in such a situation:
- Is it a good reasons to remove small groups because of this?
- Is there a way to take group size into account like in a meta-analysis?
- Is there literature discussing this issue?

Thanks in advance,
Henrik

###### Complete example code ######

require(lattice)  # for plot of data
require(lme4)
require(plyr)  # for unweighted means
dat <- read.table("http://pastebin.com/raw.php?i=KiQ1kkew")

#plot data
dat_print <- within(dat, levels(group) <- paste0(levels(group), ", n= ", table(group)))
xyplot(dv ~ iv|group, dat_print, panel = function(x, y) {
          panel.xyplot(x, y)
          panel.abline(lm(y ~ x))
        })  # number is group size

# weighted means
cor(dat$dv, dat$iv)
mean(dat$dv)

# unweighted means:
mean(daply(dat, .(group), function(x) cor(x$dv, x$iv)))
mean(daply(dat, .(group), function(x) mean(x$dv)))

# full model:
summary(lmer(dv~I(scale(iv, scale=FALSE))+(iv|group), dat))

# model with small groups removed:
groups_exclude <- c("b", "c", "d", "s")
ndat <- dat[!(dat$group %in% groups_exclude),]
summary(lmer(dv~I(scale(iv, scale=FALSE))+(iv|group), ndat))

-- 
Dr. Henrik Singmann
PostDoc
Albert-Ludwigs-Universität Freiburg, Germany
http://www.psychologie.uni-freiburg.de/Members/singmann