[R] How to avoid overfitting in gam(mgcv)

Simon Wood s.wood at bath.ac.uk
Wed Oct 3 11:58:01 CEST 2007


On Wednesday 03 October 2007 10:49, Ariyo Kanno wrote:
> I appreciate your quick reply.
> I am using the model of the following structure :
>
> fit <- gam(y~x1+s(x2))
>
> ,where y, x1, and x2 are quantitative variables.
> So the response distribution is assumed to be gaussian(default).
>
> Now I understand that the data size was too small.
-- Well, the 10 end is definitely too small, but you can get quite reasonable 
estimates of a single smoothing parameter from 30+ gaussian data. 
-- You can force smoother models my either setting the smoothing parameter 
yourself using the `sp' argument to `gam', or by using the `min.sp' argument 
to set a lower bound on the smoothing parameter. 
-- I'm suprised that `gamma' had no effect - how high did you try?

best,
Simon



> Thank you.
>
> Best Wishes,
>
> Ariyo
>
> 2007/10/3, Simon Wood <s.wood at bath.ac.uk>:
> > What sort of model structure are you using? In particular what is the
> > response distribution? For poisson and binomial then overfitting can be a
> > sign of overdispersion and quasipoisson or quasibinomial may be better.
> > Also I would not expect to get useful smoothing parameter estimates from
> > 10 data!
> >
> > best,
> > Simon
> >
> > On Wednesday 03 October 2007 06:55, 神野有生 wrote:
> > > Dear listers,
> > >
> > > I'm using gam(from mgcv) for semi-parametric regression on small and
> > > noisy datasets(10 to 200
> > > observations), and facing a problem of overfitting.
> > >
> > > According to the book(Simon N. Wood / Generalized Additive Models: An
> > > Introduction with R), it is
> > > suggested to avoid overfitting by inflating the effective degrees of
> > > freedom in GCV evaluation with
> > > increased "gamma" value(e.g. 1.4). But in my case, it didn't make a
> > > significant change in the
> > > results.
> > >
> > > The only way I've found to suppress overfitting is to set the basis
> > > dimension "k" at very low values
> > > (3 to 5). However, I don't think this is reasonable because knots
> > > selection will then be an
> > > important issue.
> > >
> > > Is there any other means to avoid overfitting when alalyzing small
> > > datasets?
> > >
> > > Thank you for your help in advance,
> > > Ariyo Kanno
> > >
> > > --
> > > Ariyo Kanno
> > > 1st-year doctor's degree student at
> > > Institute of Environmental Studies,
> > > The University of Tokyo
> > >
> > > ______________________________________________
> > > R-help at r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html and provide commented,
> > > minimal, self-contained, reproducible code.
> >
> > --
> >
> > > Simon Wood, Mathematical Sciences, University of Bath, Bath, BA2 7AY UK
> > > +44 1225 386603  www.maths.bath.ac.uk/~sw283
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html and provide commented,
> > minimal, self-contained, reproducible code.

-- 
> Simon Wood, Mathematical Sciences, University of Bath, Bath, BA2 7AY UK
> +44 1225 386603  www.maths.bath.ac.uk/~sw283 



More information about the R-help mailing list