[R] R-squared with Intercept set to 0 (zero) for linear regression in R is incorrect
John Sorkin
jsorkin at grecc.umaryland.edu
Fri Jul 13 19:22:30 CEST 2012
Pamela
R squared with a non-zero, and with a zero intercept can be very different as the regression line that you get with and without a zero intercept can be very different. Have you plotted your data plot(k[,2],k[,1]) to see if a zero intercept is reasonable for your data? Have you drawn the regression lines that you get from your models and compared the lines to the plots of your data?
John
>>> Pamela Krone-Davis <pkrone-davis at csumb.edu> 7/13/2012 12:00:36 PM >>>
Hi,
I have been using lm in R to do a linear regression and find the slope
coefficients and value for R-squared. The R-squared value reported by R
(R^2 = 0.9558) is very different than the R-squared value when I use the
same equation in Exce (R^2 = 0.328). I manually computed R-squared and the
Excel value is correct. I show my code for the determination of R^2 in R.
When I do not set 0 as the intercept, the R^2 value is the same in R and
Excel. In both cases the slope coefficient from R and from Excel are
identical.
k is a data frame with two columns.
M1 = lm(k[,1]~k[,2] + 0) ## set intercept to 0 and get different
R^2 values in R and Excel
M2 = lm(k[,1]~k[,2])
sumM1 = summary(M1)
sumM2 = summary(M2) ## get same value as Excel when intercept is not
set to 0
Below is what R returns for sumM1:
lm(formula = k[, 1] ~ k[, 2] + 0)
Residuals:
Min 1Q Median 3Q Max
-0.057199 -0.015857 0.003793 0.013737 0.056178
Coefficients:
Estimate Std. Error t value Pr(>|t|)
k[, 2] 1.05022 0.04266 24.62 <2e-16 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 0.02411 on 28 degrees of freedom
Multiple R-squared: 0.9558, Adjusted R-squared: 0.9543
F-statistic: 606.2 on 1 and 28 DF, p-value: < 2.2e-16
Way manual determination was performed. The value returned coincides with
the value from Excel:
#### trying to figure out why the R^2 for R and Excel are so different.
sqerr = (k[,1] - predict(M1))^2
sqtot = (k[,1] - mean(k[,1]) ^2
R2 = 1 - sum(sqerr)/sum(sqtot) ## for 1D get 0.328 same as
excel value
I am very puzzled by this. How does R compute the value for R^2 in this
case? Did i write the lm incorrectly?
Thanks
Pam
PS In case you are interested, the data I am using for hte two columns is
below.
k[, 1]
1]
[1] 0.17170228 0.10881539 0.11843669 0.11619201 0.08441067 0.09424441
0.04782264 0.09526496 0.11596476 0.10323453 0.06487894 0.08916484
0.06358752 0.07945473
[15] 0.11213532 0.06531185 0.11503484 0.13679548 0.13762677 0.13126827
0.12350649 0.12842441 0.13075654 0.15026602 0.14536351 0.07841638
0.08419016 0.11995240
[29] 0.14425678
> k[,2]
[1] 0.11 0.10 0.11 0.10 0.10 0.09 0.10 0.09 0.09 0.11 0.09 0.10 0.09 0.10
0.09 0.10 0.10 0.10 0.11 0.10 0.11 0.11 0.12 0.13 0.15 0.10 0.09 0.11 0.12
--
Pam Krone-Davis
Project Research Assistant and Grant Manager
PO Box 22122
Carmel, CA 93922
(831)582-3684 (o)
(831)324-0391 (h)
[[alternative HTML version deleted]]
Confidentiality Statement:
This email message, including any attachments, is for th...{{dropped:6}}
More information about the R-help
mailing list