[R-sig-ME] Mixed effect models for a study with no (real) replication

Mon Jul 28 09:20:05 CEST 2014

Dear Thierry,

thank you for spotting the mistake. Some questions are still standing though:

1) Why lme4 estimates the st.dev of the fixed effect correctly when it is larger than the residual st.dev, while if the residual st.dev is higher than the st.dev of the fixed effect, the latter is overestimated?
2) Why blme is better than lme4 in detecting the st.dev of the ranef "Stream.ID” ? I have tested this only in the case the st.dev of both the fixed effect and the residual are small. What does blme do that lme4 does not?

Thank you very much for your help,

Marco

-----
Marco Plebani
Institute of Evolutionary Biology and Environmental Studies
University of Zurich

On 22/lug/2014, at 15:57, ONKELINX, Thierry <Thierry.ONKELINX at inbo.be> wrote:

> Dear Marco,
> 
> You are simulating underdispersed data by rounding the expected values. Use rpois() instead of round() and you will get more sensible results.
> 
> rr$pred.rich_e1_e2 <- rpois(nrow(rr), lambda = 13 * exp(-0.07 * rr$temperature) * exp(rr$epsilon1) * exp(rr$epsilon2)) # rounded to the closest integer
> 
> I find it easier to generate all expected values on the log scale.
> 
> rr$mu <- log(13) - 0.07 * rr$temperature + rr$epsilon1 + rr$epsilon2
> rr$pred.rich_e1_e2 <- rpois(nrow(rr), lambda = exp(rr$mu))
> 
> Best regards,
> 
> ir. Thierry Onkelinx
> Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest
> team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
> Kliniekstraat 25
> 1070 Anderlecht
> Belgium
> + 32 2 525 02 51
> + 32 54 43 61 85
> Thierry.Onkelinx at inbo.be
> www.inbo.be
> 
> To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of.
> ~ Sir Ronald Aylmer Fisher
> 
> The plural of anecdote is not data.
> ~ Roger Brinner
> 
> The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data.
> ~ John Tukey
> 
> 
> -----Oorspronkelijk bericht-----
> Van: r-sig-mixed-models-bounces at r-project.org [mailto:r-sig-mixed-models-bounces at r-project.org] Namens Marco Plebani
> Verzonden: dinsdag 22 juli 2014 14:22
> Aan: r-sig-mixed-models at r-project.org
> Onderwerp: [R-sig-ME] Mixed effect models for a study with no (real) replication
> 
> Dear list members,
> 
> I am analyzing the effects of temperature on species richness across a natural temperature gradient. The study is based on 13 streams, each at a different temperature; for each stream I have measures of species richness obtained from three samples i.e. 39 data points in total.
> I expect the observed variability to be due to four possible sources, three of which are discernible:
> - differences AMONG streams due to temperature,
> - differences AMONG streams not due to temperature (e.g. due to other environmental factors not accounted for in the study),
> - differences WITHIN streams (due to either natural heterogeneity and/or the observation process, indiscernible without having multiple measures for each sample).
> 
> I should point out that "temperature" is coded as a continuous variable while " Stream.ID" is a factor (I did this for clarity; I could have used "temperature" as both fixef and ranef).
> A very similar issue has been discussed recently (https://stat.ethz.ch/pipermail/r-sig-mixed-models/2014q3/022365.html, "Same variable as both fixed and random") so I fitted the following model:
> 
> glmer(species.richness ~ temperature + (1 | Stream.ID), data=dd, family=poisson)
> 
> I tested the above-mentioned model on simulated data obtained as follows (see script at the end of this message):
> 
> species.richness = a * exp(b* temperature * epsilon1 * epsilon2)
> 
> Where epsilon1 adds variability AMONG streams not explained by temperature, and epsilon2 adds variability WITHIN streams.
> The estimates of parameters a and b are coherent with the parameters of the simulation, but the st.dev estimates due to Stream.ID are often far off (somewhere between the actual st.dev due to Stream ID and the residual st.dev)  and often touching zero (when the simulated st.dev due to Stream ID and the residual st.dev are similar).
> Why does that happen? Here: https://stat.ethz.ch/pipermail/r-sig-mixed-models/2014q3/022386.html ("LMM: including ranef or not?") Ben Bolker foresees such an issue when the ranef has only few levels and suggests to use blme to fix the problem. I tried; blme still confounds the st.dev due to Stream ID and the residual st.dev when the latter is much higher then the former, but it does succeed in reducing the estimate of false zeroes for the st.dev of Stream ID when st.dev(Stream.ID)~st.dev(residual) and both <1.
> Why blme is better than lme4 in detecting the st.dev of the ranef "Stream.ID" when it is small?
> My familiarity with Bayesian stats is limited to studying Bayes's theorem as an undergrad, so to be honest I have no idea what's going on "inside" blme.
> 
> Thank you very much in advance for any help provided.
> Cheers,
> 
> Marco
> 
> -----
> Marco Plebani
> Institute of Evolutionary Biology and Environmental Studies University of Zurich http://www.ieu.uzh.ch/staff/phd/plebani.html
> 
> 
> 
> 
> 
> ###################################
> ###################################
> # Marco Plebani, July 17th 2014
> # Given a study based on 13 streams at 13 different temperatures, sampled 3 times each for species richness and total biomass, is it possible to discern the temperature effect on species richness and total biomass from the effect of variability among streams not due to temperature?
> # Note that the multiple samples are PSEUDO-replicates and that there is complete temperature=stream identity.
> ###################################
> ###################################
> 
> rm(list=ls())
> library(lme4)
> library(blme)
> #set.seed(2)
> 
> # Simulate data based on the observed temperature effect.
> # the model is:
> 
> # y = a * exp(b*x) + epsilon
> 
> #a, i.e. species richness at 0°C, is set at 13 # b is set at -0.07
> 
> # epsilon is the sum of:
> # epsilon1 due to variability AMONG streams not explained by temperature, # epsilon2 due to variability WITHIN streams (i.e. residual variuability, made up by both the streams natural heterogeneity and the sampling error)
> 
> # create empty vectors where to save estimates for the sd due to Stream ID, and the p-val of parameter "b" (see model above):
> 
> stream.ID.stdev<-rep(NA,50)
> stream.ID.stdev.bayes<-rep(NA,50)
> model.significance<-rep(NA,50)
> 
> # create the dataset:
> 
> reps <- 1:3
> Stream.ID <- letters[1:13]
> temperature <- seq(5,20, length.out=13)
> rr <- expand.grid(Stream.ID = Stream.ID, reps=reps) rr$temperature = rep(temperature,3)
> 
> ## predicted species richness for the observed temperature:
> rr$pred.rich <- 13 * exp(-0.07 * rr$temperature) # a and b values are realistic estimates
> 
> # continuous prediction:
> temp=seq(0,22,0.1)
> pred.richness <- 13 * exp(-0.07 * temp)
> 
> par(mfrow=c(1,3))
> plot(x= temp, y= pred.richness, xlim=c(0,23), ylim=c(0, 15), type="l")
> 
> # run the analysis on 50 datasets drawn from the same distribution:
> 
> for(i in 1:50){
> 
> ## epsilon1: stream.ID effect
> stream.eff <- data.frame(Stream.ID = rr$Stream.ID,
>                                               effect=rnorm(n=length(Stream.ID), mean=0, sd=0.2))
> # add epsilon1 to the dataset
> rr$epsilon1 <- stream.eff$effect[match(rr$Stream.ID, stream.eff$Stream.ID)]
> 
> ## epsilon2: residual variability within streams
> rr$epsilon2 <-  rnorm(length(rr$temperature), mean=0, sd=0.2)
> 
> # pred.richness accounting for epsilon1
> rr$pred.rich_e1 <- 13 * exp(-0.07 * rr$temperature) * exp(rr$epsilon1)
> 
> # pred.richness accounting for epsilon1 and epsilon2
> rr$pred.rich_e1_e2 <- round(13 * exp(-0.07 * rr$temperature) * exp(rr$epsilon1) * exp(rr$epsilon2)) # rounded to the closest integer
> 
> points(x= rr$temperature, y= rr$pred.rich_e1, pch=8, cex=1) #points(x= rr$temperature, y= rr$pred.rich_e1_e2, cex=1.5) # solid line represents predicted mean richness # *'s represent predicted mean richness + epsilon1 # circles represent predicted mean richness + epsilon1 + epsilon2, i.e. the observations from each sample
> 
> # glmm:
> 
> richness.glmm <- glmer(pred.rich_e1_e2 ~ temperature + (1|Stream.ID), data=rr, family=poisson) # "pred.rich_e1_e2" is the observed richness in each sample.
> richness.glmm.bayes <- bglmer(pred.rich_e1_e2 ~ temperature + (1|Stream.ID), data=rr, family=poisson)
> 
> stream.ID.stdev[i] <- as.numeric(attributes(summary(richness.glmm)$varcor[[1]])$stddev)
> stream.ID.stdev.bayes[i] <- as.numeric(attributes(summary(richness.glmm.bayes)$varcor[[1]])$stddev)
> model.significance[i] <- ifelse(summary(richness.glmm)$coefficients[2,4]<0.05,1,0)
> # library(MuMIn) # estimates pseudo-R^2 for MEM.
> # r.squaredGLMM(richness.glmm)
> 
> }
> 
> hist(stream.ID.stdev)
> abline(v=mean(stream.ID.stdev), col="red",lwd=2) legend("topright",title=paste("mean=", round(mean(stream.ID.stdev),2), "; sd=", round(sd(stream.ID.stdev),2)), legend=c(NA))
> 
> hist(stream.ID.stdev.bayes)
> abline(v=mean(stream.ID.stdev.bayes), col="red",lwd=2) legend("topright",title=paste("mean=", round(mean(stream.ID.stdev.bayes),2), "; sd=", round(sd(stream.ID.stdev.bayes),2)), legend=c(NA))
> 
> # end