[R] help with the by term in the smoother of gam - mgcv
    Miguel Lurgi Rivera 
    miguel.lurgirivera at adelaide.edu.au
       
    Tue Sep  8 11:03:02 CEST 2015
    
    
  
Hi,
I have written a GAM model with two predictor variables, one of which is a factor (with 7 levels). I want to model an interaction between x0 and x1, so I'm using a ti smoother because I think that is the appropriate way to model interactions.
The general form of the model is y ~ ti(x0, by=x1), where x0 is a continuous numeric variable and x1 is the factor (with 7 levels).
However, when I run the model, the outcome is not as expected (the fit was not great). When I look at plots of the smoothers for each level of the factor, they are terrible and don’t appear to follow the shape of the data points at all (see attached plots - Figure 1).
When searching over the help pages I found in page 56 of the manual that it says: 'Note that when using factor by variables, centering constraints are applied to the smooths, which usually means that the by variable should be included as a parametric term, as well'.
Even though I wasn't sure what that meant (!), I proceeded to add my categorical by variable (x1) as a parametric term and ended up with the following model:
y ~ x1 + ti(x0, by=x1)
When doing this results improved greatly (see attached plots - Figure 2) and I don't quite understand why. I suspect it is because of the 'centering constraints' mentioned in the manual, but I am not sure.
So, my questions are:
- Is the second model the correct formulation?
- What is (in plainer terms) the difference between using those two model formulations and why should I include the by factor also as a parametric term? I thought that usually any covariates listed in an interaction were also automatically modelled as main effects too….
On a different topic, as you can see from Figure 2, curves fit to the data is still not ideal. Do you have any suggestions on how to improve this? I have lowered k to 3 in the smoother (less than that prevents model convergence) and have used bs = 'fs' as a baseline for the smoother because the manual says is more penalising. But none of these tricks have actually worked.
Thanks in advance!
Cheers,
Miguel.-
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Figure-1.png
Type: image/png
Size: 116451 bytes
Desc: Figure-1.png
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20150908/74d33a9b/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Figure-2.png
Type: image/png
Size: 152094 bytes
Desc: Figure-2.png
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20150908/74d33a9b/attachment-0001.png>
    
    
More information about the R-help
mailing list