[R] discrepancy in the result of R and SAS on same data in logistics regression
Atul Malik
a.malik at decisioncraft.com
Fri Oct 5 07:39:00 CEST 2007
Dear Members,
Greetings!
I have come across a discrepancy shown by R and SAS results on same data for logistics regression..
When I processed the above csv file(1000.csv) for predicting the Action (i/c) by Age Group(1-7,Na) and Gender(M,F,Na) with GLM of R I get:
R result
Call:
glm(formula = Action ~ Gender + AgeGroup, family = binomial,
data = mydata1, na.action = na.pass)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.828 -0.973 -0.709 1.087 1.734
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.2939 0.3180 4.069 4.73e-05 ***
GenderM -0.8794 0.1637 -5.371 7.85e-08 ***
GenderNa -1.4407 0.2749 -5.240 1.60e-07 ***
AgeGroup2 -1.2053 0.3971 -3.035 0.00240 **
AgeGroup3 -1.6670 0.3262 -5.110 3.21e-07 ***
AgeGroup4 -1.0786 0.3714 -2.904 0.00368 **
AgeGroup5 -0.8232 0.3829 -2.150 0.03156 *
AgeGroup6 0.1682 0.3501 0.481 0.63081
AgeGroup7 -0.3361 0.3617 -0.929 0.35281
AgeGroupNa -1.7956 0.3433 -5.231 1.69e-07 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1342.7 on 999 degrees of freedom
Residual deviance: 1213.2 on 990 degrees of freedom
AIC: 1233.2
Number of Fisher Scoring iterations: 4
where as SAS gives on same data:
Analysis of Maximum Likelihood Estimates
Parameter
Action
DF
Estimate
Standard
Error
Wald
Chi-Square
Pr > ChiSq
Intercept
c
1
0.3217
0.0953
11.4025
0.0007
AgeGroup
2
c
1
0.3631
0.2434
2.2260
0.1357
AgeGroup
3
c
1
0.8248
0.1411
34.1508
<.0001
AgeGroup
4
c
1
0.2364
0.2146
1.2136
0.2706
AgeGroup
5
c
1
-0.0190
0.2299
0.0068
0.9343
AgeGroup
6
c
1
-1.0104
0.1822
30.7454
<.0001
AgeGroup
7
c
1
-0.5061
0.1974
6.5711
0.0104
AgeGroup
Na
c
1
0.9534
0.1718
30.7884
<.0001
Gender
M
c
1
0.1060
0.1103
0.9246
0.3363
Gender
N
c
1
0.6674
0.1686
15.6744
<.0001
I compared the resultant probabilities of Action "c" on all three packages: R, SAS and StatGraphics and found that R and StatGraphics have same results but SAS has different results for some combinations of AgeGroup and Gender as in attached document for probability of Action.
I will appreciate if you can help me sorting out the issue.
Thanks and Best Regards
Atul Malik
StatGraphics results as follows:
Estimated Regression Model (Maximum Likelihood)
Standard
Estimated
Parameter
Estimate
Error
Odds Ratio
CONSTANT
-1.94239
0.298622
AgeGroup=1
1.79555
0.343277
6.02282
AgeGroup=2
0.590229
0.316943
1.8044
AgeGroup=3
0.128605
0.216341
1.13724
AgeGroup=4
0.716996
0.288917
2.04827
AgeGroup=5
0.972326
0.30544
2.64409
AgeGroup=6
1.9638
0.262721
7.12638
AgeGroup=7
1.45945
0.275966
4.3036
Gender=F
1.44072
0.274922
4.22375
Gender=M
0.56134
0.256286
1.75302
Analysis of Deviance
Source
Deviance
Df
P-Value
Model
129.506
9
0.0000
Residual
1213.21
990
0.0000
Total (corr.)
1342.71
999
More information about the R-help
mailing list