[R] Specify model with polynomial interaction terms up to degree n
David Winsemius
dwinsemius at comcast.net
Tue Jul 3 00:05:46 CEST 2012
On Jul 2, 2012, at 4:13 PM, YTP wrote:
> I am not sure that method works. It appears to be doing something
> close, but
> returns too many slope coefficients, since I think it is returning
> interaction terms of degree smaller and greater than what was passed
> to it.
No. It _was_ doing what you asked for. I think you don't understand
the expansion.
>
> Here is a small example of degree 2:
>
>> X = data.frame(cbind(c(1,2,3), c(4,5,6)))
>> X
> X1 X2
> 1 1 4
> 2 2 5
> 3 3 6
>
>> y = c(0,1,0)
>
>
>> mymodel2 = glm(y ~ poly(X$X1,2)*poly(X$X2,2),
>> family=binomial(link="logit"))
>
>
> # We see slope coefficients returned for interaction terms of degree
> 1, 3,
> and 4.
Degree 3 and 4 ??? That wasn't how I would have described the results
(there being only first and second order terms as determined by the
'degree' argument), but the fact remains.... You cannot estimate more
than three parameters with a dataset of only 3 points. That is basic
math.
It seems you have demonstrated that these weapons are "too sharp" to
be wielded safely in your hands, so maybe you should step back a few
paces and re-consider what you are really trying to accomplish. Is the
goal really curve fitting with with N^2 polynomial parameters. And do
you _really_ want to be describing to your audience the interpretation
of models created with a logit link that have high degree polynomial
terms? Or are you instead interested in flexible regression for
purposes of description or exploratory analyses such as provided by
loess (one form of local regression) or perhaps GAM's with smoothing
terms?
test = data.frame(y = rnorm(1000),X1=rnorm(1000), X2=rnorm(1000) )
plot.new()
persp( x= seq(-2, 2,by=0.1), y= seq(-2, 2,by=0.1),
z= predict(mymodel2,
newdata =expand.grid(X1=seq(-2, 2,by=0.1), X2=seq(-2,
2,by=0.1)) ) ,
ticktype="detailed")
This has the advantage that the polynomial basis is hidden and you
only end up looking at the predictions. I also like working with Frank
Harrell's 'rms' package because his perimeter function restricts
plotted estimates to regions with sufficient data and the restricted
cubic splines have less tendency to blow-up near the boundaries of hte
data.
>
>> summary(mymodel2)
>
> Estimate Std. Error z value Pr(>|z|)
> (Intercept) -7.855e+00 4.588e+04 0 1
> poly(X$X1, 2)1 6.059e-15 7.946e+04 0 1
> poly(X$X1, 2)2 -3.848e+01 7.946e+04 0 1
> poly(X$X2, 2)1 NA NA NA NA
> poly(X$X2, 2)2 NA NA NA NA
> poly(X$X1, 2)1:poly(X$X2, 2)1 NA NA NA NA
> poly(X$X1, 2)2:poly(X$X2, 2)1 NA NA NA NA
> poly(X$X1, 2)1:poly(X$X2, 2)2 NA NA NA NA
> poly(X$X1, 2)2:poly(X$X2, 2)2 NA NA NA NA
>
>
>
>
>
> # Data used in the model
>
>> mymodel2$model
>
> y poly(X$X1, 2).1 poly(X$X1, 2).2 poly(X$X2, 2).1 poly(X$X2, 2).2
> 1 0 -7.071068e-01 4.082483e-01 -7.071068e-01 4.082483e-01
> 2 1 4.350720e-18 -8.164966e-01 4.350720e-18 -8.164966e-01
> 3 0 7.071068e-01 4.082483e-01 7.071068e-01 4.082483e-01
>
>
Orthogonal basis.
> where instead I would expect the data to be like
>
> 1 4 16
> 4 10 25
> 9 18 36
?poly
You can get something like that with poly(X1, degree=2, raw=TRUE)
> poly(1:3, degree=2, raw=TRUE)
1 2
[1,] 1 1
[2,] 2 4
[3,] 3 9
attr(,"degree")
[1] 1 2
attr(,"class")
[1] "poly" "matrix"
--
David Winsemius, MD
West Hartford, CT
More information about the R-help
mailing list