[R-sig-ME] sums of squares and F values in anova

Tue Sep 30 17:18:01 CEST 2014

John,

Regarding K-R etc. it's good to know that I can get up-to-date statistical tests on lmer() output.  I can use this in a current analysis.  For a literature citation, I can say that statistical tests were conducted using the car package.  However, current documentation indicates that Anova can be used with lmer(), and that pbkrtest is recommended along with car.  One may therefore infer what is likely going on.  It might be good to see a more direct statement in ?Anova.  

Any thoughts about literature citation?  

fwiw I'm a fan of car and the associated text.

I may also do some parametric bootstrapping. 

David Farrar, 

-----Original Message-----
From: r-sig-mixed-models-bounces at r-project.org [mailto:r-sig-mixed-models-bounces at r-project.org] On Behalf Of John Fox
Sent: Wednesday, September 24, 2014 6:04 PM
To: Ben Bolker
Cc: r-sig-mixed-models at r-project.org; Alexandra Kuznetsova
Subject: Re: [R-sig-ME] sums of squares and F values in anova

Dear Ben,

anova() computes sequential ("type I") tests, so one wouldn't for correlated Xs expect the result to be the same as the "type II" or "type III" tests computed by Anova(), which, however, for an additive model should be the same as each other. You *would* expect the last test in the sequence produced by anova() to be the same as the corresponding type-II or -III test. 

In the case of LMMs fit by lmer(), Anova() computes Wald F-tests using the Kenward-Roger coefficient covariance matrix and Satterthwaite df; for the data/model in your example, that should produce a small difference in the F-statistics.

Here's what I get for your example:

------------- snip ----------

> anova(fit)
Analysis of Variance Table

Response: sr
          Df Sum Sq Mean Sq F value    Pr(>F)    
pop15      1 204.12 204.118 14.1157 0.0004922 ***
pop75      1  53.34  53.343  3.6889 0.0611255 .  
dpi        1  12.40  12.401  0.8576 0.3593551    
ddpi       1  63.05  63.054  4.3605 0.0424711 *  
Residuals 45 650.71  14.460                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

> anova(fit2) ## practically equal to anova(fit) above
Analysis of Variance Table
      Df  Sum Sq Mean Sq F value
pop15  1 204.118 204.118 14.1157
pop75  1  53.343  53.343  3.6889
dpi    1  12.401  12.401  0.8576
ddpi   1  63.054  63.054  4.3605

> Anova(fit)
Anova Table (Type II tests)

Response: sr
          Sum Sq Df F value   Pr(>F)   
pop15     147.01  1 10.1666 0.002603 **
pop75      35.24  1  2.4367 0.125530   
dpi         1.89  1  0.1309 0.719173   
ddpi       63.05  1  4.3605 0.042471 * 
Residuals 650.71 45
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1  

> Anova(fit, type=3)
Anova Table (Type III tests)

Response: sr
            Sum Sq Df F value    Pr(>F)    
(Intercept) 218.16  1 15.0867 0.0003338 ***
pop15       147.01  1 10.1666 0.0026030 ** 
pop75        35.24  1  2.4367 0.1255298    
dpi           1.89  1  0.1309 0.7191732    
ddpi         63.05  1  4.3605 0.0424711 *  
Residuals   650.71 45                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

> Anova(fit2)
Analysis of Deviance Table (Type II Wald chisquare tests)

Response: sr
        Chisq Df Pr(>Chisq)   
pop15 10.1666  1    0.00143 **
pop75  2.4367  1    0.11852   
dpi    0.1309  1    0.71748   
ddpi   4.3605  1    0.03678 * 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

> Anova(fit2, test="F")
Loading required package: pbkrtest
Loading required package: MASS
Analysis of Deviance Table (Type II Wald F tests with Kenward-Roger df)

Response: sr
            F Df Df.res  Pr(>F)   
pop15 10.1519  1 44.031 0.00265 **
pop75  2.4326  1 44.036 0.12599   
dpi    0.1245  1 44.811 0.72588   
ddpi   4.2179  1 44.600 0.04589 * 
---                  
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

> Anova(fit2, test="F", type=3)
Analysis of Deviance Table (Type III Wald F tests with Kenward-Roger df)

Response: sr
                  F Df Df.res    Pr(>F)    
(Intercept) 14.9643  1 44.560 0.0003538 ***
pop15       10.1519  1 44.031 0.0026503 ** 
pop75        2.4326  1 44.036 0.1259919    
dpi          0.1245  1 44.811 0.7258840    
ddpi         4.2179  1 44.600 0.0458898 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

----------------- snip ----------

These results are as expected.

Best,
 John

On Wed, 24 Sep 2014 17:14:31 -0400
 Ben Bolker <bbolker at gmail.com> wrote:
> On 14-09-24 04:21 PM, Alexandra Kuznetsova wrote:
> > Dear lme4 authors,
> > 
> > I have a question regarding the calculation of sums of squares in 
> > anova for lmerMod object. I know that there were some discussions 
> > regarding how they are calculated (but still remains unclear to 
> > me..).
> > 
> > As far as I understand the way they are calculated is similar to the 
> > way they are calculated in lm objects, that is transforming Y into 
> > orthogonal Q space, and then computing sums of squares for the 
> > independent effects.
> > 
> > I have found in your  JSS paper "Fitting linear mixed effects models 
> > using lme4" some explanations (in equations 67 and 68). Would it be 
> > possible to give some more comments on these equations? And what 
> > about the partial (type 3) sums of squares -  is there a way to 
> > calculate them using the same way?
> > 
> > Hope my questions were clear! Thank you in advance!
> 
>   I'm not sure exactly what your questions are.  Could you please 
> clarify (sorry!) what you mean about partial sums of squares?  Here's 
> an example showing that the results of lme4's anova.merMod and base R's:
> anova.lm do in fact agree for a model with the random effects variance 
> forced to (almost) zero.  At the risk of further muddying the water, 
> I'll point out that car::Anova(fit,type="II") and
> car::Anova(fit,type="III") give two different answers for this 
> problem, neither of which matches the computation below ...
> 
> ===================
> fit <- lm(sr ~ ., data = LifeCycleSavings)
> anova(fit)
> 
> 
> ## construct lmer model with near-zero variance
> LC2 <- transform(LifeCycleSavings,f=factor(1:2)) ## bogus
> library("lme4")
> form <- sr ~ pop15 + pop75 + dpi + ddpi + (1|f)  ## hack ## to avoid 
> (Error in terms.formula(formula(x, fixed.only = TRUE)) :
> ##   '.' in formula and no 'data' argument)
> 
> lmod <- lFormula(form, data=LC2)
> d2 <- lmer(form, data=LC2,devFunOnly=TRUE) llik <- d2(1e-5)
> fit2 <- mkMerMod(environment(d2),opt=list(par=1e-5,
>                            fval=llik,
>                            feval=1,
>                            conv=0,
>                            message=NULL),
>                          lmod$reTrms, fr = lmod$fr)
> all.equal(coef(fit),fixef(fit2))
> anova(fit2)   ## practically equal to anova(fit) above
> ==================
> 
> For those following along, the paper is at
> http://arxiv.org/abs/1406.5823 (or http://arxiv.org/pdf/1406.5823v1 
> for a direct link to the PDF), and the raw LaTeX for the specific section is:
> 
> ===============
> To understand how these quantities are computed, let $\bm R_i$ contain 
> the rows of $\bm R_X$ (Equation~\ref{eq:blockCholeskyDecomp}) 
> associated with the $i$th fixed-effects term.  Then the sum of squares 
> for term $i$ is, \begin{equation}
>   \label{eq:SS}
>   SS_i = \widehat{\bm\beta}\trans\bm R_i\trans \bm R_i 
> \widehat{\bm\beta} \end{equation} If $DF_i$ is the number of columns 
> in $\bm R_i$, then the $F$~statistic for term $i$ is, \begin{equation}
>   \label{eq:Fstat}
>   F_i = \frac{SS_i}{\widehat{\sigma}^2 DF_i} \end{equation}
> 
> 
> > 
> > Alexandra Kuznetsova _______________________________________________ 
> > R-sig-mixed-models at r-project.org mailing list 
> > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >
> 
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models