[R-sig-ME] Pseudoreplication in a factorial GLM with a proportional response variable and categorical predictor variables
Lionel
hughes.dupond at gmx.de
Wed Jul 30 19:20:09 CEST 2014
Dear Berin Mackenzie,
I am no specialist of nested design so I am just giving you some of my
thoughts.
1. The random term is not specified correctly, if you look here:
http://glmm.wikidot.com/faq#modelspec you would note that Season should
come first and then PetriDish. Otherwise yes the model is correct.
2. See response above
3. I am not aware of any rules about this but looking at the other
coefficient which are 9 order of magnitude bigger than the estimated
standard deviation of the random term, I guess you could rather
confidently say that the petri dish influence (within season) is
negligible on your germination rates.
4. The estimated standard deviation value represent the added variation
by your petri dishes (within season treatment) on top of the fixed
effect predicted values, usually in such experimental studies random
terms are due to the design and interpreting their effect is not of
interest. For other type of studies model comparison with different
random term part is the way to go (PBmodcomp in the pbkrtest package is
one way to do this).
Hope that it helped,
Lionel
On 30/07/2014 05:03, Berin Mackenzie wrote:
> Dear list members,
>
>
>
> I'm very new to generalized linear models and mixed effects models, and I'm trying to teach myself how to apply them in R to analyse a series of factorial germination experiments with a proportional response variable and three categorical predictor variables. The problem is that one of the predictor variables is pseudoreplicated and I'm not sure how to deal with this. I came across this post - https://stat.ethz.ch/pipermail/r-help/2011-April/275895.html - which suggested a mixed effects model might be provide a solution, but I'd be extremely grateful for any advice from experienced analysts.
>
>
>
> Full details of the design, a sample dataset, and some models I've tried, appear below.
>
>
>
> Warm regards and many thanks in advance for your time and advice!
>
> Berin
>
>
> BACKGROUND:
> I performed a factorial seed germination experiment to test the effects of i) Heat shock (2 levels: heat or no heat), Smoke (2 levels: smoke or no smoke), and Season (3 levels: summer, autumn, winter), and their interactions, on germination. The experimental units were petri dishes of 20 seeds each, and there were 4 replicate dishes for each factorial treatment combination (2 x 2 x 3 =12 possible treatments x 4 dishes each = 48 petri dishes/observations in total).
> The Heat and Smoke treatments were independently applied, however, Season was pseudoreplicated as only three temperature-controlled incubators were available (one for each level of Season) and all 4 replicate dishes for each unique factorial treatment were placed inside the same incubator. For true independence, each replicate would have required a separate incubator (48 incubators in total for this experiment) which would have been prohibitively expensive and impractical. As such, pseudoreplication of temperature treatments is very common in germination studies but most researchers appear to accept/ignore this limitation in their analyses and treat replicates as though they're independent. I'm wondering if there might be a better way...
> So far, I've tried the following approaches:
> i) a binomial GLM (although this ignores the non-independence of replicates), and
> ii) a binomial GLMM with PetriDish as a random factor within Season (this idea is based on comments in the post I linked above)
> MODELS & RESULTS:
> Below are the relevant data:
>
> R version 3.1.1 (2014-07-10)
>
> Platform: x86_64-w64-mingw32/x64 (64-bit)
>
>
>
> locale:
>
> [1] LC_COLLATE=English_Australia.1252 LC_CTYPE=English_Australia.1252
>
> [3] LC_MONETARY=English_Australia.1252 LC_NUMERIC=C
>
> [5] LC_TIME=English_Australia.1252
>
>
>
> attached base packages:
>
> [1] stats graphics grDevices utils datasets methods base
>
>
>
> other attached packages:
>
> [1] lme4_1.1-7 Rcpp_0.11.2 Matrix_1.1-4
>
>
>
> loaded via a namespace (and not attached):
>
> [1] grid_3.1.1 lattice_0.20-29 MASS_7.3-33 minqa_1.2.3 nlme_3.1-117
>
> [6] nloptr_1.0.0 splines_3.1.1 tools_3.1.1
> I've performed this experiment for 7 different species but the table below comprises data on a single species as an example. 'Replicate' refers to the 4 replicate petri dishes for each of the 12 factorial treatments but is not used in my models, and 'CumNoGerm' is the cumulative number of germinants. All other column names are self-explanatory.
>
>> dat[]
> Heat Smoke Season PetriDish Replicate ViableSeed CumNoGerm
>
> 1 0NoHeat 0NoSmoke 1SUM PD01 1 14 0
>
> 2 0NoHeat 0NoSmoke 1SUM PD02 2 11 0
>
> 3 0NoHeat 0NoSmoke 1SUM PD03 3 20 0
>
> 4 0NoHeat 0NoSmoke 1SUM PD04 4 18 0
>
> 5 0NoHeat 0NoSmoke 2AUT PD05 1 13 0
>
> 6 0NoHeat 0NoSmoke 2AUT PD06 2 12 0
>
> 7 0NoHeat 0NoSmoke 2AUT PD07 3 17 0
>
> 8 0NoHeat 0NoSmoke 2AUT PD08 4 13 0
>
> 9 0NoHeat 0NoSmoke 3WIN PD09 1 16 0
>
> 10 0NoHeat 0NoSmoke 3WIN PD10 2 17 1
>
> 11 0NoHeat 0NoSmoke 3WIN PD11 3 13 1
>
> 12 0NoHeat 0NoSmoke 3WIN PD12 4 16 1
>
> 13 1Heat 0NoSmoke 1SUM PD13 1 6 0
>
> 14 1Heat 0NoSmoke 1SUM PD14 2 14 0
>
> 15 1Heat 0NoSmoke 1SUM PD15 3 14 0
>
> 16 1Heat 0NoSmoke 1SUM PD16 4 17 1
>
> 17 1Heat 0NoSmoke 2AUT PD17 1 13 0
>
> 18 1Heat 0NoSmoke 2AUT PD18 2 13 2
>
> 19 1Heat 0NoSmoke 2AUT PD19 3 15 2
>
> 20 1Heat 0NoSmoke 2AUT PD20 4 14 1
>
> 21 1Heat 0NoSmoke 3WIN PD21 1 16 0
>
> 22 1Heat 0NoSmoke 3WIN PD22 2 14 1
>
> 23 1Heat 0NoSmoke 3WIN PD23 3 8 0
>
> 24 1Heat 0NoSmoke 3WIN PD24 4 18 2
>
> 25 0NoHeat 1Smoke 1SUM PD25 1 18 3
>
> 26 0NoHeat 1Smoke 1SUM PD26 2 14 4
>
> 27 0NoHeat 1Smoke 1SUM PD27 3 9 1
>
> 28 0NoHeat 1Smoke 1SUM PD28 4 16 1
>
> 29 0NoHeat 1Smoke 2AUT PD29 1 19 8
>
> 30 0NoHeat 1Smoke 2AUT PD30 2 16 6
>
> 31 0NoHeat 1Smoke 2AUT PD31 3 13 9
>
> 32 0NoHeat 1Smoke 2AUT PD32 4 11 5
>
> 33 0NoHeat 1Smoke 3WIN PD33 1 19 7
>
> 34 0NoHeat 1Smoke 3WIN PD34 2 17 12
>
> 35 0NoHeat 1Smoke 3WIN PD35 3 14 7
>
> 36 0NoHeat 1Smoke 3WIN PD36 4 13 8
>
> 37 1Heat 1Smoke 1SUM PD37 1 14 3
>
> 38 1Heat 1Smoke 1SUM PD38 2 14 3
>
> 39 1Heat 1Smoke 1SUM PD39 3 13 4
>
> 40 1Heat 1Smoke 1SUM PD40 4 16 6
>
> 41 1Heat 1Smoke 2AUT PD41 1 14 5
>
> 42 1Heat 1Smoke 2AUT PD42 2 17 11
>
> 43 1Heat 1Smoke 2AUT PD43 3 5 1
>
> 44 1Heat 1Smoke 2AUT PD44 4 16 6
>
> 45 1Heat 1Smoke 3WIN PD45 1 15 2
>
> 46 1Heat 1Smoke 3WIN PD46 2 13 3
>
> 47 1Heat 1Smoke 3WIN PD47 3 14 4
>
> 48 1Heat 1Smoke 3WIN PD48 4 19 3
> 1. Generalized linear model with binomial error structure. First I fitted the global model (model1)...
>
>> model1 = glm(cbind(CumNoGerm,ViableSeed-CumNoGerm) ~ Heat*Smoke*Season, family = binomial, data = dat)
> ...and then I used drop1() and AICc to identify the minimal adequate model (model1.ma)
>
>> model1.ma = glm(cbind(CumNoGerm,ViableSeed-CumNoGerm) ~ Heat + Smoke + Season + Heat:Smoke + Heat:Season, family = binomial, data = dat)
>> anova(model1.ma, test = "Chisq")
> Analysis of Deviance Table
>
>
>
> Model: binomial, link: logit
>
>
>
> Response: cbind(CumNoGerm, ViableSeed - CumNoGerm)
>
>
>
> Terms added sequentially (first to last)
>
>
>
>
>
> Df Deviance Resid. Df Resid. Dev Pr(>Chi)
>
> NULL 47 200.822
>
> Heat 1 0.714 46 200.109 0.3982723
>
> Smoke 1 124.486 45 75.623 < 2.2e-16 ***
>
> Season 2 18.201 43 57.422 0.0001116 ***
>
> Heat:Smoke 1 6.086 42 51.336 0.0136276 *
>
> Heat:Season 2 16.414 40 34.922 0.0002727 ***
>
> ---
>
> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>
>> summary(model1.ma)
>
>
> Call:
>
> glm(formula = cbind(CumNoGerm, ViableSeed - CumNoGerm) ~ Heat +
>
> Smoke + Season + Heat:Smoke + Heat:Season, family = binomial,
>
> data = dat)
>
>
>
> Deviance Residuals:
>
> Min 1Q Median 3Q Max
>
> -1.6767 -0.7036 -0.3749 0.5618 1.7105
>
>
>
> Coefficients:
>
> Estimate Std. Error z value Pr(>|z|)
>
> (Intercept) -5.5039 0.6908 -7.967 1.62e-15 ***
>
> Heat1Heat 2.3836 0.8130 2.932 0.003368 **
>
> Smoke1Smoke 3.7959 0.6090 6.233 4.58e-10 ***
>
> Season2AUT 1.5367 0.4414 3.481 0.000499 ***
>
> Season3WIN 1.9490 0.4362 4.468 7.88e-06 ***
>
> Heat1Heat:Smoke1Smoke -1.7194 0.7225 -2.380 0.017320 *
>
> Heat1Heat:Season2AUT -0.7232 0.5751 -1.258 0.208550
>
> Heat1Heat:Season3WIN -2.1981 0.5912 -3.718 0.000201 ***
>
> ---
>
> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>
>
>
> (Dispersion parameter for binomial family taken to be 1)
>
>
>
> Null deviance: 200.822 on 47 degrees of freedom
>
> Residual deviance: 34.922 on 40 degrees of freedom
>
> AIC: 138.32
>
>
>
> Number of Fisher Scoring iterations: 5
> 2. Next, I tried a generalized linear mixed effects model in the 'lme4' package with binomial error and PetriDish as a random factor within Season to account for pseudoreplication. Below is the global model (model2):
>
>> model2 = glmer(cbind(CumNoGerm,ViableSeed-CumNoGerm) ~ Heat*Smoke*Season + (1|PetriDish:Season), family = binomial, data = dat)
> ...and then I fitted the minimal adequate model I'd identified with the GLM in 1) above.
>
>> model2.ma = glmer(cbind(CumNoGerm,ViableSeed-CumNoGerm) ~ Heat + Smoke + Season + Heat:Smoke + Heat:Season + (1|PetriDish:Season), family = binomial, data = dat)
>> summary(model2.ma)
> Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) [glmerMod
>
> ]
>
> Family: binomial ( logit )
>
> Formula: cbind(CumNoGerm, ViableSeed - CumNoGerm) ~ Heat + Smoke + Season +
>
> Heat:Smoke + Heat:Season + (1 | PetriDish:Season)
>
> Data: dat
>
>
>
> AIC BIC logLik deviance df.resid
>
> 140.3 157.2 -61.2 122.3 39
>
>
>
> Scaled residuals:
>
> Min 1Q Median 3Q Max
>
> -1.6819 -0.5778 -0.3011 0.5930 1.8417
>
>
>
> Random effects:
>
> Groups Name Variance Std.Dev.
>
> PetriDish:Season (Intercept) 1.401e-17 3.743e-09
>
> Number of obs: 48, groups: PetriDish:Season, 48
>
>
>
> Fixed effects:
>
> Estimate Std. Error z value Pr(>|z|)
>
> (Intercept) -5.5039 0.6908 -7.967 1.62e-15 ***
>
> Heat1Heat 2.3836 0.8130 2.932 0.003368 **
>
> Smoke1Smoke 3.7959 0.6090 6.233 4.58e-10 ***
>
> Season2AUT 1.5367 0.4414 3.481 0.000499 ***
>
> Season3WIN 1.9490 0.4362 4.468 7.88e-06 ***
>
> Heat1Heat:Smoke1Smoke -1.7194 0.7225 -2.380 0.017320 *
>
> Heat1Heat:Season2AUT -0.7232 0.5751 -1.258 0.208550
>
> Heat1Heat:Season3WIN -2.1981 0.5912 -3.718 0.000201 ***
>
> ---
>
> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>
>
>
> Correlation of Fixed Effects:
>
> (Intr) Het1Ht Smk1Sm Ss2AUT Ss3WIN H1H:S1 H1H:S2
>
> Heat1Heat -0.850
>
> Smoke1Smoke -0.852 0.724
>
> Season2AUT -0.465 0.395 0.043
>
> Season3WIN -0.510 0.433 0.090 0.682
>
> Ht1Ht:Smk1S 0.718 -0.827 -0.843 -0.036 -0.076
>
> Ht1Ht:S2AUT 0.357 -0.495 -0.033 -0.768 -0.524 0.065
>
> Ht1Ht:S3WIN 0.376 -0.477 -0.066 -0.503 -0.738 0.052 0.611
>
> Specifically, I'd appreciate advice on the following:
>
> 1. Is model 2 more appropriate than model 1, given the pseudoreplication of Season?
>
> 2. If so, is the correct random effect to fit PetriDish:Season, instead of PetriDish/Season, as Season is included as a fixed effect?
>
> 3. Does the near-zero variance of PetriDish:Season suggest there is no dish effect between , in which case wouldn't model 1 (GLM) suffice (the coefficients and standard errors are identical between the two models anyway...)?
>
> 4. How would I interpret a non-zero variance for PetriDish:Season, and would that depend on how large it was?
>
> 5. If model 2 isn't adequate to address the pseudoreplication, could anyone suggest other approaches or resources I could try?
>
> Thanks again for your time and assistance - it's very much appreciated.
>
> Warm regards,
> Berin
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------
> This email is intended for the addressee(s) named and may contain confidential and/or privileged information.
> If you are not the intended recipient, please notify the sender and then delete it immediately.
> Any views expressed in this email are those of the individual sender except where the sender expressly and with authority states them to be the views of the NSW Office of Environment and Heritage.
>
> PLEASE CONSIDER THE ENVIRONMENT BEFORE PRINTING THIS EMAIL
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
More information about the R-sig-mixed-models
mailing list