[R] anova.lm and F-test
SKrishna
madzientist at gmail.com
Mon Jul 9 18:36:13 CEST 2012
Dear Peter,
Thank you very much for that excellent answer to a rather stupid question :)
I did not notice that the RSS actually increased for the model with more
parameters and so in this case the F-statistic is negative and therefore a
p-value from the F-distribution is meaningless. But I guess your answer also
clarifies that as long as the F-statistic is in the valid range (>=0),
anova() will calculate it and return a p-value (whether or not the models
are nested).
Best, Suresh
Peter Dalgaard-2 wrote
>
> On Jul 9, 2012, at 15:40 , Suresh Krishna wrote:
>
>>
>> Hello,
>>
>> Why does anova.lm sometimes return a p-value and at other times not ? Is
>> it because it recognizes nested models from non-nested ones ?
>>
>>> x<-seq(1,100,1)
>>> y<-3*x+rnorm(100)
>>> anova(lm(y~x),lm(y~x+I(x^2)),test="F")
>> Analysis of Variance Table
>>
>> Model 1: y ~ x
>> Model 2: y ~ x + I(x^2)
>> Res.Df RSS Df Sum of Sq F Pr(>F)
>> 1 98 90.449
>> 2 97 90.288 1 0.16117 0.1732 0.6782
>>
>>> anova(lm(y~x),lm(y~I(x^2)+I(x^3)),test="F")
>> Analysis of Variance Table
>>
>> Model 1: y ~ x
>> Model 2: y ~ I(x^2) + I(x^3)
>> Res.Df RSS Df Sum of Sq F Pr(>F)
>> 1 98 90.4
>> 2 97 7345.7 1 -7255.3
>>
>
> You have Df and Sum of Sq with opposite sign, so more parameters with a
> worse fit. The models are not nested, so the F test makes no sense.
>
> I'd say that the real question is why anova.lm doesn't protest loudly when
> detecting this? One possible answer is that it also misses other
> non-nested cases where the signs do not clash, and warning only in some of
> the incorrect cases could lead the naive user to believe that the other
> ones are OK. E.g. this F test is equally meaningless
>
>> anova(lm(y~I(x^4)),lm(y~I(x^2)+I(x^3)),test="F")
> Analysis of Variance Table
>
> Model 1: y ~ I(x^4)
> Model 2: y ~ I(x^2) + I(x^3)
> Res.Df RSS Df Sum of Sq F Pr(>F)
> 1 98 186639
> 2 97 7101 1 179538 2452.4 < 2.2e-16 ***
>
> (Non-nestedness could in principle be determined by checking whether
> cbind(model.matrix(m1), model.matrix(m2)) has higher rank that both of its
> constituents, but numerical rank determination is a bit error-prone and
> slow, so this was not implemented).
>
>
> --
> Peter Dalgaard, Professor
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Email: pd.mes@ Priv: PDalgd@
>
> ______________________________________________
> R-help@ mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
View this message in context: http://r.789695.n4.nabble.com/anova-lm-and-F-test-tp4635845p4635867.html
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help
mailing list