[R] R-squared value for linear regression passing through origin using lm()
Ralf Goertz
R_Goertz at web.de
Fri Oct 19 09:51:37 CEST 2007
Berwin A Turlach, Donnerstag, 18. Oktober 2007:
> G'day all,
>
> I must admit that I have not read the previous e-mails in this thread,
> but why should that stop me to comment? ;-)
Your comments are very welcome.
> On Thu, 18 Oct 2007 16:17:38 +0200
> Ralf Goertz <R_Goertz at web.de> wrote:
>
> > But in that case the numerator is very large, too, isn't it?
>
> Not necessarily.
>
> > I don't want to argue, though.
>
> Good, you might lose the argument. :)
Yes, I admit I lost. :-(
> > But so far, I have not managed to create a dataset where R^2 is
> > larger for the model with forced zero intercept (although I have not
> > tried very hard). It would be very convincing to see one (Etienne?)
>
> Indeed, you haven't tried hard. It is not difficult. Here are my
> canonical commands to convince people why regression through the
> intercept is evil; the pictures should illustrate what is going on:
> [example snipped]
Thanks to Thomas Lumley there is another convincing example. But still
I've got a problem with it:
> x<-c(2,3,4);y<-c(2,3,3)
> 1-2*var(residuals(lm(y~x+1)))/sum((y-mean(y))^2)
[1] 0.75
That's okay, but neither
> 1-3*var(residuals(lm(y~x+0)))/sum((y-0)^2)
[1] 0.97076
nor
> 1-2*var(residuals(lm(y~x+0)))/sum((y-0)^2)
[1] 0.9805066
give the result of summary(lm(y~x+0)), which is 0.9796.
> > IIRC, I have not been told so. Perhaps my teachers were not as good
> > they should have been. So what is R^2 good if not to indicate the
> > goodness of fit?.
>
> I am wondering about that too sometimes. :) I was always wondering
> that R^2 was described to me by my lecturers as the square of the
> correlation between the x and the y variate. But on the other hand,
> they pretended that x was fixed and selected by the experimenter (or
> should be regarded as such). If x is fixed and y is random, then it
> does not make sense to me to speak about a correlation between x and y
> (at least not on the population level).
I see the point. But I was raised with that description, too, and it's
hard to drop that idea.
> My best guess at the moment is that R^2 was adopted by users of
> statistics before it was properly understood; and by the time it was
> properly understood, it was too much entrenched to abandon it. Try not
> to teach it these days and see what your "client faculties" will tell
> you.
In order to save the role of R^2 as a goodness-of-fit indicator in zero
intercept models one could use the same formula like in models with a
constant. I mean, if R^2 is the proportion of variance explained by the
model we should use the a priori variance of y[i].
> 1-var(residuals(lm(y~x+0)))/var(y)
[1] 0.3567182
But I assume that this has probably been discussed at length somewhere
more appropriate than r-help.
Thanks,
Ralf
More information about the R-help
mailing list