[R] Don't understand removing constant on 1-way ANOVA
Berwin A Turlach
berwin at maths.uwa.edu.au
Fri Feb 8 07:08:25 CET 2008
G'day Brian,
On Thu, 07 Feb 2008 17:56:07 -0500
Brian McGill <brian.mcgill at mcgill.ca> wrote:
> I am playing with the a 1-way anova with and without the "-1" option.
>
> [...]
>
> From what I can tell:
> 1) the estimated means of the different levels are correctly
> estimated either way (although reported as means with the -1 and as
> contrasts without the -1 as expected)
> 2) the residuals are identical (in this contrived example they differ
> slightly due to numeric instability but in a more real-world example
> they truly are identical)
> 3) BUT the r2/F/p-value are different (in my real-world example they
> are drastically different)
>
> How can a model that gets the same parameter estimates on the same
> data leading to the same residuals get different r2/F/p-value?
In an R session, type "help(summary.lm)" and, under the section
"Value", read about the way r.squared is calculated.
Note that if your model specifies a "-1" (or a "+0"), then the model is
assumed to have no intercept. Or, rather, that the null model (i.e.
the model when all covariates are removed) is a response of 0 + noise.
This should explain the differences that you see.
R does not check whether a vector of 1s is in the column space of the
design matrix and, of course, does not base the decision on whether the
model has an intercept or not (i.e. whether the null model should be
"mu + noise" or "0 + noise) on this not-performed test.
It is a design issue, presumably to be compatible to S and, obviously,
it also makes implementation of summary.lm easier, I guess. :)
Occasionally, I use "-1" in the formula to construct a specific design
matrix and, hence, get estimates and standard errors for quantities
that are of interest. It is a bit annoying to have to refit the model
with out the "-1" to get the r2/F that I want, but one has to live
with it. I would prefer that the decision on whether the null model
has an intercept or not would be based on whether a vector of 1s is in
the column space of the design matrix and not on whether the formula
has a "-1" on it. But I can also see the argument for the alternative
preference.
But I just noticed, if you so wish, you could be subversive:
> fm1<-lm(y~x-1)
> attr(fm1$terms, "intercept") <- 1
> summary(fm1)
Call:
lm(formula = y ~ x - 1)
Residuals:
1 2 3 4 5 6
1.000e-01 3.608e-16 -1.000e-01 -3.469e-18 1.000e-01 -1.000e-01
Coefficients:
Estimate Std. Error t value Pr(>|t|)
x1 1.00000 0.05774 17.32 6.52e-05 ***
x2 2.00000 0.05774 34.64 4.14e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.1 on 4 degrees of freedom
Multiple R-Squared: 0.974, Adjusted R-squared: 0.9675
F-statistic: 150 on 1 and 4 DF, p-value: 0.0002552
Hope this helps.
Cheers,
Berwin
=========================== Full address =============================
Berwin A Turlach Tel.: +65 6516 4416 (secr)
Dept of Statistics and Applied Probability +65 6516 6650 (self)
Faculty of Science FAX : +65 6872 3919
National University of Singapore
6 Science Drive 2, Blk S16, Level 7 e-mail: statba at nus.edu.sg
Singapore 117546 http://www.stat.nus.edu.sg/~statba
More information about the R-help
mailing list