[R-sig-ME] Pseudoreplication in a factorial GLM with a proportional response variable and categorical predictor variables

Wed Jul 30 19:20:09 CEST 2014

Dear Berin Mackenzie,

I am no specialist of nested design so I am just giving you some of my 
thoughts.
1. The random term is not specified correctly, if you look here: 
http://glmm.wikidot.com/faq#modelspec you would note that Season should 
come first and then PetriDish. Otherwise yes the model is correct.
2. See response above
3. I am not aware of any rules about this but looking at the other 
coefficient which are 9 order of magnitude bigger than the estimated 
standard deviation of the random term, I guess you could rather 
confidently say that the petri dish influence (within season) is 
negligible on your germination rates.
4. The estimated standard deviation value represent the added variation 
by your petri dishes (within season treatment) on top of the fixed 
effect predicted values, usually in such experimental studies random 
terms are due to the design and interpreting their effect is not of 
interest. For other type of studies model comparison with different 
random term part is the way to go (PBmodcomp in the pbkrtest package is 
one way to do this).

Hope that it helped,
Lionel

On 30/07/2014 05:03, Berin Mackenzie wrote:
> Dear list members,
>
>
>
> I'm very new to generalized linear models and mixed effects models, and I'm trying to teach myself how to apply them in R to analyse a series of factorial germination experiments with a proportional response variable and three categorical predictor variables. The problem is that one of the predictor variables is pseudoreplicated and I'm not sure how to deal with this. I came across this post - https://stat.ethz.ch/pipermail/r-help/2011-April/275895.html - which suggested a mixed effects model might be provide a solution, but I'd be extremely grateful for any advice from experienced analysts.
>
>
>
> Full details of the design, a sample dataset, and some models I've tried, appear below.
>
>
>
> Warm regards and many thanks in advance for your time and advice!
>
> Berin
>
>
> BACKGROUND:
> I performed a factorial seed germination experiment to test the effects of i) Heat shock (2 levels: heat or no heat), Smoke (2 levels: smoke or no smoke), and Season (3 levels: summer, autumn, winter), and their interactions, on germination. The experimental units were petri dishes of 20 seeds each, and there were 4 replicate dishes for each factorial treatment combination (2 x 2 x 3 =12 possible treatments x 4 dishes each = 48 petri dishes/observations in total).
> The Heat and Smoke treatments were independently applied, however, Season was pseudoreplicated as only three temperature-controlled incubators were available (one for each level of Season) and all 4 replicate dishes for each unique factorial treatment were placed inside the same incubator. For true independence, each replicate would have required a separate incubator (48 incubators in total for this experiment) which would have been prohibitively expensive and impractical. As such, pseudoreplication of temperature treatments is very common in germination studies but most researchers appear to accept/ignore this limitation in their analyses and treat replicates as though they're independent. I'm wondering if there might be a better way...
> So far, I've tried the following approaches:
>              i) a binomial GLM (although this ignores the non-independence of replicates), and
> ii) a binomial GLMM with PetriDish as a random factor within Season (this idea is based on comments in the post I linked above)
> MODELS & RESULTS:
> Below are the relevant data:
>
> R version 3.1.1 (2014-07-10)
>
> Platform: x86_64-w64-mingw32/x64 (64-bit)
>
>
>
> locale:
>
> [1] LC_COLLATE=English_Australia.1252  LC_CTYPE=English_Australia.1252
>
> [3] LC_MONETARY=English_Australia.1252 LC_NUMERIC=C
>
> [5] LC_TIME=English_Australia.1252
>
>
>
> attached base packages:
>
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
>
>
> other attached packages:
>
> [1] lme4_1.1-7   Rcpp_0.11.2  Matrix_1.1-4
>
>
>
> loaded via a namespace (and not attached):
>
> [1] grid_3.1.1      lattice_0.20-29 MASS_7.3-33     minqa_1.2.3     nlme_3.1-117
>
> [6] nloptr_1.0.0    splines_3.1.1   tools_3.1.1
> I've performed this experiment for 7 different species but the table below comprises data on a single species as an example. 'Replicate' refers to the 4 replicate petri dishes for each of the 12 factorial treatments but is not used in my models, and 'CumNoGerm' is the cumulative number of germinants. All other column names are self-explanatory.
>
>> dat[]
>        Heat    Smoke Season PetriDish Replicate ViableSeed CumNoGerm
>
> 1  0NoHeat 0NoSmoke   1SUM      PD01         1         14         0
>
> 2  0NoHeat 0NoSmoke   1SUM      PD02         2         11         0
>
> 3  0NoHeat 0NoSmoke   1SUM      PD03         3         20         0
>
> 4  0NoHeat 0NoSmoke   1SUM      PD04         4         18         0
>
> 5  0NoHeat 0NoSmoke   2AUT      PD05         1         13         0
>
> 6  0NoHeat 0NoSmoke   2AUT      PD06         2         12         0
>
> 7  0NoHeat 0NoSmoke   2AUT      PD07         3         17         0
>
> 8  0NoHeat 0NoSmoke   2AUT      PD08         4         13         0
>
> 9  0NoHeat 0NoSmoke   3WIN      PD09         1         16         0
>
> 10 0NoHeat 0NoSmoke   3WIN      PD10         2         17         1
>
> 11 0NoHeat 0NoSmoke   3WIN      PD11         3         13         1
>
> 12 0NoHeat 0NoSmoke   3WIN      PD12         4         16         1
>
> 13   1Heat 0NoSmoke   1SUM      PD13         1          6         0
>
> 14   1Heat 0NoSmoke   1SUM      PD14         2         14         0
>
> 15   1Heat 0NoSmoke   1SUM      PD15         3         14         0
>
> 16   1Heat 0NoSmoke   1SUM      PD16         4         17         1
>
> 17   1Heat 0NoSmoke   2AUT      PD17         1         13         0
>
> 18   1Heat 0NoSmoke   2AUT      PD18         2         13         2
>
> 19   1Heat 0NoSmoke   2AUT      PD19         3         15         2
>
> 20   1Heat 0NoSmoke   2AUT      PD20         4         14         1
>
> 21   1Heat 0NoSmoke   3WIN      PD21         1         16         0
>
> 22   1Heat 0NoSmoke   3WIN      PD22         2         14         1
>
> 23   1Heat 0NoSmoke   3WIN      PD23         3          8         0
>
> 24   1Heat 0NoSmoke   3WIN      PD24         4         18         2
>
> 25 0NoHeat   1Smoke   1SUM      PD25         1         18         3
>
> 26 0NoHeat   1Smoke   1SUM      PD26         2         14         4
>
> 27 0NoHeat   1Smoke   1SUM      PD27         3          9         1
>
> 28 0NoHeat   1Smoke   1SUM      PD28         4         16         1
>
> 29 0NoHeat   1Smoke   2AUT      PD29         1         19         8
>
> 30 0NoHeat   1Smoke   2AUT      PD30         2         16         6
>
> 31 0NoHeat   1Smoke   2AUT      PD31         3         13         9
>
> 32 0NoHeat   1Smoke   2AUT      PD32         4         11         5
>
> 33 0NoHeat   1Smoke   3WIN      PD33         1         19         7
>
> 34 0NoHeat   1Smoke   3WIN      PD34         2         17        12
>
> 35 0NoHeat   1Smoke   3WIN      PD35         3         14         7
>
> 36 0NoHeat   1Smoke   3WIN      PD36         4         13         8
>
> 37   1Heat   1Smoke   1SUM      PD37         1         14         3
>
> 38   1Heat   1Smoke   1SUM      PD38         2         14         3
>
> 39   1Heat   1Smoke   1SUM      PD39         3         13         4
>
> 40   1Heat   1Smoke   1SUM      PD40         4         16         6
>
> 41   1Heat   1Smoke   2AUT      PD41         1         14         5
>
> 42   1Heat   1Smoke   2AUT      PD42         2         17        11
>
> 43   1Heat   1Smoke   2AUT      PD43         3          5         1
>
> 44   1Heat   1Smoke   2AUT      PD44         4         16         6
>
> 45   1Heat   1Smoke   3WIN      PD45         1         15         2
>
> 46   1Heat   1Smoke   3WIN      PD46         2         13         3
>
> 47   1Heat   1Smoke   3WIN      PD47         3         14         4
>
> 48   1Heat   1Smoke   3WIN      PD48         4         19         3
> 1. Generalized linear model with binomial error structure. First I fitted the global model (model1)...
>
>> model1 = glm(cbind(CumNoGerm,ViableSeed-CumNoGerm) ~ Heat*Smoke*Season, family = binomial, data = dat)
> ...and then I used drop1() and AICc to identify the minimal adequate model (model1.ma)
>
>> model1.ma = glm(cbind(CumNoGerm,ViableSeed-CumNoGerm) ~ Heat + Smoke + Season + Heat:Smoke + Heat:Season, family = binomial, data = dat)
>> anova(model1.ma, test = "Chisq")
> Analysis of Deviance Table
>
>
>
> Model: binomial, link: logit
>
>
>
> Response: cbind(CumNoGerm, ViableSeed - CumNoGerm)
>
>
>
> Terms added sequentially (first to last)
>
>
>
>
>
>              Df Deviance Resid. Df Resid. Dev  Pr(>Chi)
>
> NULL                           47    200.822
>
> Heat         1    0.714        46    200.109 0.3982723
>
> Smoke        1  124.486        45     75.623 < 2.2e-16 ***
>
> Season       2   18.201        43     57.422 0.0001116 ***
>
> Heat:Smoke   1    6.086        42     51.336 0.0136276 *
>
> Heat:Season  2   16.414        40     34.922 0.0002727 ***
>
> ---
>
> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>
>> summary(model1.ma)
>
>
> Call:
>
> glm(formula = cbind(CumNoGerm, ViableSeed - CumNoGerm) ~ Heat +
>
>      Smoke + Season + Heat:Smoke + Heat:Season, family = binomial,
>
>      data = dat)
>
>
>
> Deviance Residuals:
>
>      Min       1Q   Median       3Q      Max
>
> -1.6767  -0.7036  -0.3749   0.5618   1.7105
>
>
>
> Coefficients:
>
>                        Estimate Std. Error z value Pr(>|z|)
>
> (Intercept)            -5.5039     0.6908  -7.967 1.62e-15 ***
>
> Heat1Heat               2.3836     0.8130   2.932 0.003368 **
>
> Smoke1Smoke             3.7959     0.6090   6.233 4.58e-10 ***
>
> Season2AUT              1.5367     0.4414   3.481 0.000499 ***
>
> Season3WIN              1.9490     0.4362   4.468 7.88e-06 ***
>
> Heat1Heat:Smoke1Smoke  -1.7194     0.7225  -2.380 0.017320 *
>
> Heat1Heat:Season2AUT   -0.7232     0.5751  -1.258 0.208550
>
> Heat1Heat:Season3WIN   -2.1981     0.5912  -3.718 0.000201 ***
>
> ---
>
> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>
>
>
> (Dispersion parameter for binomial family taken to be 1)
>
>
>
>      Null deviance: 200.822  on 47  degrees of freedom
>
> Residual deviance:  34.922  on 40  degrees of freedom
>
> AIC: 138.32
>
>
>
> Number of Fisher Scoring iterations: 5
> 2. Next, I tried a generalized linear mixed effects model in the 'lme4' package with binomial error and PetriDish as a random factor within Season to account for pseudoreplication. Below is the global model (model2):
>
>> model2 = glmer(cbind(CumNoGerm,ViableSeed-CumNoGerm) ~ Heat*Smoke*Season + (1|PetriDish:Season), family = binomial, data = dat)
> ...and then I fitted the minimal adequate model I'd identified with the GLM in 1) above.
>
>> model2.ma = glmer(cbind(CumNoGerm,ViableSeed-CumNoGerm) ~ Heat + Smoke + Season + Heat:Smoke + Heat:Season + (1|PetriDish:Season), family = binomial, data = dat)
>> summary(model2.ma)
> Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) [glmerMod
>
> ]
>
>   Family: binomial  ( logit )
>
> Formula: cbind(CumNoGerm, ViableSeed - CumNoGerm) ~ Heat + Smoke + Season +
>
>      Heat:Smoke + Heat:Season + (1 | PetriDish:Season)
>
>     Data: dat
>
>
>
>       AIC      BIC   logLik deviance df.resid
>
>     140.3    157.2    -61.2    122.3       39
>
>
>
> Scaled residuals:
>
>      Min      1Q  Median      3Q     Max
>
> -1.6819 -0.5778 -0.3011  0.5930  1.8417
>
>
>
> Random effects:
>
>   Groups           Name        Variance  Std.Dev.
>
>   PetriDish:Season (Intercept) 1.401e-17 3.743e-09
>
> Number of obs: 48, groups:  PetriDish:Season, 48
>
>
>
> Fixed effects:
>
>                        Estimate Std. Error z value Pr(>|z|)
>
> (Intercept)            -5.5039     0.6908  -7.967 1.62e-15 ***
>
> Heat1Heat               2.3836     0.8130   2.932 0.003368 **
>
> Smoke1Smoke             3.7959     0.6090   6.233 4.58e-10 ***
>
> Season2AUT              1.5367     0.4414   3.481 0.000499 ***
>
> Season3WIN              1.9490     0.4362   4.468 7.88e-06 ***
>
> Heat1Heat:Smoke1Smoke  -1.7194     0.7225  -2.380 0.017320 *
>
> Heat1Heat:Season2AUT   -0.7232     0.5751  -1.258 0.208550
>
> Heat1Heat:Season3WIN   -2.1981     0.5912  -3.718 0.000201 ***
>
> ---
>
> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>
>
>
> Correlation of Fixed Effects:
>
>              (Intr) Het1Ht Smk1Sm Ss2AUT Ss3WIN H1H:S1 H1H:S2
>
> Heat1Heat   -0.850
>
> Smoke1Smoke -0.852  0.724
>
> Season2AUT  -0.465  0.395  0.043
>
> Season3WIN  -0.510  0.433  0.090  0.682
>
> Ht1Ht:Smk1S  0.718 -0.827 -0.843 -0.036 -0.076
>
> Ht1Ht:S2AUT  0.357 -0.495 -0.033 -0.768 -0.524  0.065
>
> Ht1Ht:S3WIN  0.376 -0.477 -0.066 -0.503 -0.738  0.052  0.611
>
> Specifically, I'd appreciate advice on the following:
>
> 1. Is model 2 more appropriate than model 1, given the pseudoreplication of Season?
>
> 2. If so, is the correct random effect to fit PetriDish:Season, instead of PetriDish/Season, as Season is included as a fixed effect?
>
> 3. Does the near-zero variance of PetriDish:Season suggest there is no dish effect between , in which case wouldn't model 1 (GLM) suffice (the coefficients and standard errors are identical between the two models anyway...)?
>
> 4. How would I interpret a non-zero variance for PetriDish:Season, and would that depend on how large it was?
>
> 5. If model 2 isn't adequate to address the pseudoreplication, could anyone suggest other approaches or resources I could try?
>
> Thanks again for your time and assistance - it's very much appreciated.
>
> Warm regards,
> Berin
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------
> This email is intended for the addressee(s) named and may contain confidential and/or privileged information.
> If you are not the intended recipient, please notify the sender and then delete it immediately.
> Any views expressed in this email are those of the individual sender except where the sender expressly and with authority states them to be the views of the NSW Office of Environment and Heritage.
>
> PLEASE CONSIDER THE ENVIRONMENT BEFORE PRINTING THIS EMAIL
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models