[R] R-squared value for linear regression passing through origin using lm()
Berwin A Turlach
berwin at maths.uwa.edu.au
Fri Oct 19 11:32:34 CEST 2007
G'day Ralf,
On Fri, 19 Oct 2007 09:51:37 +0200
Ralf Goertz <R_Goertz at web.de> wrote:
> Thanks to Thomas Lumley there is another convincing example. But still
> I've got a problem with it:
>
> > x<-c(2,3,4);y<-c(2,3,3)
>
> [...]
> That's okay, but neither [...] nor [...]
> give the result of summary(lm(y~x+0)), which is 0.9796.
Why should either of those formula yield the output of
summary(lm(y~x+0)) ? The R-squared output of that command is
documented in help(summary.lm):
r.squared: R^2, the 'fraction of variance explained by the model',
R^2 = 1 - Sum(R[i]^2) / Sum((y[i]- y*)^2),
where y* is the mean of y[i] if there is an intercept and
zero otherwise.
And, indeed:
> 1-sum(residuals(lm(y~x+0))^2)/sum((y-0)^2)
[1] 0.9796238
confirms this.
Note: if you do not have an intercept in your model, the residuals do
not have to add to zero; and, typically, they will not. Hence,
var(residuals(lm(y~x+0)) does not give you the residual sum of squares.
> In order to save the role of R^2 as a goodness-of-fit indicator
R^2 is no goodness-of-fit indicator, neither in models with intercept
nor in models without intercept. So I do not see how you can save its
role as a goodness-of-fit indicator. :)
Since you are posting from a .de domain, I assume you will understand
the following quote from Tutz (2000), "Die Analyse kategorialer Daten",
page 18:
R^2 misst *nicht* die Anpassungsguete des linearen Modelles, es sagt
nichts darueber aus, ob der lineare Ansatz wahr oder falsch ist, sondern
nur ob durch den linearen Ansatz individuelle Beobachtungen
vorhersagbar sind. R^2 wird wesentlich vom Design, d.h. den Werten,
die x annimmt bestimmt (vgl. Kockelkorn (1998)).
The latter reference is:
Kockelkorn, U. (1998). Lineare Modelle. Skript, TU Berlin.
> in zero intercept models one could use the same formula like in models
> with a constant. I mean, if R^2 is the proportion of variance
> explained by the model we should use the a priori variance of y[i].
>
> > 1-var(residuals(lm(y~x+0)))/var(y)
> [1] 0.3567182
>
> But I assume that this has probably been discussed at length somewhere
> more appropriate than r-help.
I am sure about that, but it was also discussed here on r-help (long
ago). The problem is that this compares two models that are not nested
in each other which is a quite controversial thing to do; some might
even go so far as saying that it makes no sense at all. The other
problem with this approaches is illustrated by my example:
> set.seed(20070807)
> x <- runif(100)*2+10
> y <- 4+rnorm(x, sd=1)
> 1-var(residuals(lm(y~x+0)))/var(y)
[1] -0.04848273
How do you explain that a quantity that is called R-squared, implying
that it is the square of something, hence always non-negative, can
become negative?
Cheers,
Berwin
=========================== Full address =============================
Berwin A Turlach Tel.: +65 6515 4416 (secr)
Dept of Statistics and Applied Probability +65 6515 6650 (self)
Faculty of Science FAX : +65 6872 3919
National University of Singapore
6 Science Drive 2, Blk S16, Level 7 e-mail: statba at nus.edu.sg
Singapore 117546 http://www.stat.nus.edu.sg/~statba
More information about the R-help
mailing list