[R-sig-ME] Mixed effect models for a study with no (real) replication

Tue Jul 22 15:57:30 CEST 2014

Dear Marco,

You are simulating underdispersed data by rounding the expected values. Use rpois() instead of round() and you will get more sensible results.

rr$pred.rich_e1_e2 <- rpois(nrow(rr), lambda = 13 * exp(-0.07 * rr$temperature) * exp(rr$epsilon1) * exp(rr$epsilon2)) # rounded to the closest integer

I find it easier to generate all expected values on the log scale.

rr$mu <- log(13) - 0.07 * rr$temperature + rr$epsilon1 + rr$epsilon2
rr$pred.rich_e1_e2 <- rpois(nrow(rr), lambda = exp(rr$mu))

Best regards,

ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium
+ 32 2 525 02 51
+ 32 54 43 61 85
Thierry.Onkelinx op inbo.be
www.inbo.be

To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey

-----Oorspronkelijk bericht-----
Van: r-sig-mixed-models-bounces op r-project.org [mailto:r-sig-mixed-models-bounces op r-project.org] Namens Marco Plebani
Verzonden: dinsdag 22 juli 2014 14:22
Aan: r-sig-mixed-models op r-project.org
Onderwerp: [R-sig-ME] Mixed effect models for a study with no (real) replication

Dear list members,

I am analyzing the effects of temperature on species richness across a natural temperature gradient. The study is based on 13 streams, each at a different temperature; for each stream I have measures of species richness obtained from three samples i.e. 39 data points in total.
I expect the observed variability to be due to four possible sources, three of which are discernible:
- differences AMONG streams due to temperature,
- differences AMONG streams not due to temperature (e.g. due to other environmental factors not accounted for in the study),
- differences WITHIN streams (due to either natural heterogeneity and/or the observation process, indiscernible without having multiple measures for each sample).

I should point out that "temperature" is coded as a continuous variable while " Stream.ID" is a factor (I did this for clarity; I could have used "temperature" as both fixef and ranef).
A very similar issue has been discussed recently (https://stat.ethz.ch/pipermail/r-sig-mixed-models/2014q3/022365.html, "Same variable as both fixed and random") so I fitted the following model:

glmer(species.richness ~ temperature + (1 | Stream.ID), data=dd, family=poisson)

I tested the above-mentioned model on simulated data obtained as follows (see script at the end of this message):

species.richness = a * exp(b* temperature * epsilon1 * epsilon2)

Where epsilon1 adds variability AMONG streams not explained by temperature, and epsilon2 adds variability WITHIN streams.
The estimates of parameters a and b are coherent with the parameters of the simulation, but the st.dev estimates due to Stream.ID are often far off (somewhere between the actual st.dev due to Stream ID and the residual st.dev)  and often touching zero (when the simulated st.dev due to Stream ID and the residual st.dev are similar).
Why does that happen? Here: https://stat.ethz.ch/pipermail/r-sig-mixed-models/2014q3/022386.html ("LMM: including ranef or not?") Ben Bolker foresees such an issue when the ranef has only few levels and suggests to use blme to fix the problem. I tried; blme still confounds the st.dev due to Stream ID and the residual st.dev when the latter is much higher then the former, but it does succeed in reducing the estimate of false zeroes for the st.dev of Stream ID when st.dev(Stream.ID)~st.dev(residual) and both <1.
Why blme is better than lme4 in detecting the st.dev of the ranef "Stream.ID" when it is small?
My familiarity with Bayesian stats is limited to studying Bayes's theorem as an undergrad, so to be honest I have no idea what's going on "inside" blme.

Thank you very much in advance for any help provided.
Cheers,

Marco

-----
Marco Plebani
Institute of Evolutionary Biology and Environmental Studies University of Zurich http://www.ieu.uzh.ch/staff/phd/plebani.html

###################################
###################################
# Marco Plebani, July 17th 2014
# Given a study based on 13 streams at 13 different temperatures, sampled 3 times each for species richness and total biomass, is it possible to discern the temperature effect on species richness and total biomass from the effect of variability among streams not due to temperature?
# Note that the multiple samples are PSEUDO-replicates and that there is complete temperature=stream identity.
###################################
###################################

rm(list=ls())
library(lme4)
library(blme)
#set.seed(2)

# Simulate data based on the observed temperature effect.
# the model is:

# y = a * exp(b*x) + epsilon

#a, i.e. species richness at 0°C, is set at 13 # b is set at -0.07

# epsilon is the sum of:
# epsilon1 due to variability AMONG streams not explained by temperature, # epsilon2 due to variability WITHIN streams (i.e. residual variuability, made up by both the streams natural heterogeneity and the sampling error)

# create empty vectors where to save estimates for the sd due to Stream ID, and the p-val of parameter "b" (see model above):

stream.ID.stdev<-rep(NA,50)
stream.ID.stdev.bayes<-rep(NA,50)
model.significance<-rep(NA,50)

# create the dataset:

reps <- 1:3
Stream.ID <- letters[1:13]
temperature <- seq(5,20, length.out=13)
rr <- expand.grid(Stream.ID = Stream.ID, reps=reps) rr$temperature = rep(temperature,3)

## predicted species richness for the observed temperature:
rr$pred.rich <- 13 * exp(-0.07 * rr$temperature) # a and b values are realistic estimates

# continuous prediction:
temp=seq(0,22,0.1)
pred.richness <- 13 * exp(-0.07 * temp)

par(mfrow=c(1,3))
plot(x= temp, y= pred.richness, xlim=c(0,23), ylim=c(0, 15), type="l")

# run the analysis on 50 datasets drawn from the same distribution:

for(i in 1:50){

## epsilon1: stream.ID effect
stream.eff <- data.frame(Stream.ID = rr$Stream.ID,
                                                effect=rnorm(n=length(Stream.ID), mean=0, sd=0.2))
# add epsilon1 to the dataset
rr$epsilon1 <- stream.eff$effect[match(rr$Stream.ID, stream.eff$Stream.ID)]

## epsilon2: residual variability within streams
rr$epsilon2 <-  rnorm(length(rr$temperature), mean=0, sd=0.2)

# pred.richness accounting for epsilon1
rr$pred.rich_e1 <- 13 * exp(-0.07 * rr$temperature) * exp(rr$epsilon1)

# pred.richness accounting for epsilon1 and epsilon2
rr$pred.rich_e1_e2 <- round(13 * exp(-0.07 * rr$temperature) * exp(rr$epsilon1) * exp(rr$epsilon2)) # rounded to the closest integer

points(x= rr$temperature, y= rr$pred.rich_e1, pch=8, cex=1) #points(x= rr$temperature, y= rr$pred.rich_e1_e2, cex=1.5) # solid line represents predicted mean richness # *'s represent predicted mean richness + epsilon1 # circles represent predicted mean richness + epsilon1 + epsilon2, i.e. the observations from each sample

# glmm:

richness.glmm <- glmer(pred.rich_e1_e2 ~ temperature + (1|Stream.ID), data=rr, family=poisson) # "pred.rich_e1_e2" is the observed richness in each sample.
richness.glmm.bayes <- bglmer(pred.rich_e1_e2 ~ temperature + (1|Stream.ID), data=rr, family=poisson)

stream.ID.stdev[i] <- as.numeric(attributes(summary(richness.glmm)$varcor[[1]])$stddev)
stream.ID.stdev.bayes[i] <- as.numeric(attributes(summary(richness.glmm.bayes)$varcor[[1]])$stddev)
model.significance[i] <- ifelse(summary(richness.glmm)$coefficients[2,4]<0.05,1,0)
# library(MuMIn) # estimates pseudo-R^2 for MEM.
# r.squaredGLMM(richness.glmm)

}

hist(stream.ID.stdev)
abline(v=mean(stream.ID.stdev), col="red",lwd=2) legend("topright",title=paste("mean=", round(mean(stream.ID.stdev),2), "; sd=", round(sd(stream.ID.stdev),2)), legend=c(NA))

hist(stream.ID.stdev.bayes)
abline(v=mean(stream.ID.stdev.bayes), col="red",lwd=2) legend("topright",title=paste("mean=", round(mean(stream.ID.stdev.bayes),2), "; sd=", round(sd(stream.ID.stdev.bayes),2)), legend=c(NA))

# end
_______________________________________________
R-sig-mixed-models op r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
* * * * * * * * * * * * * D I S C L A I M E R * * * * * * * * * * * * *
Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is door een geldig ondertekend document.
The views expressed in this message and any annex are purely those of the writer and may not be regarded as stating an official position of INBO, as long as the message is not confirmed by a duly signed document.