[Rd] stringsAsFactors
William Dunlap
wdunlap at tibco.com
Mon Feb 11 18:13:09 CET 2013
Note that changing this does not just mean getting rid of "silly warnings".
Currently, predict.lm() can give wrong answers when stringsAsFactors is FALSE.
> d <- data.frame(x=1:10, f=rep(c("A","B","C"), c(4,3,3)), y=c(1:4, 15:17, 28.1,28.8,30.1))
> fit_ab <- lm(y ~ x + f, data = d, subset = f!="B")
Warning message:
In model.matrix.default(mt, mf, contrasts) :
variable 'f' converted to a factor
> predict(fit_ab, newdata=d)
1 2 3 4 5 6 7 8 9 10
1 2 3 4 25 26 27 8 9 10
Warning messages:
1: In model.matrix.default(Terms, m, contrasts.arg = object$contrasts) :
variable 'f' converted to a factor
2: In predict.lm(fit_ab, newdata = d) :
prediction from a rank-deficient fit may be misleading
fit_ab is not rank-deficient and the predict should report
1 2 3 4 NA NA NA 28 29 30
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
> -----Original Message-----
> From: r-devel-bounces at r-project.org [mailto:r-devel-bounces at r-project.org] On Behalf
> Of Terry Therneau
> Sent: Monday, February 11, 2013 5:50 AM
> To: r-devel at r-project.org; Duncan Murdoch
> Subject: Re: [Rd] stringsAsFactors
>
> I think your idea to remove the warnings is excellent, and a good compromise.
> Characters
> already work fine in modeling functions except for the silly warning.
>
> It is interesting how often the defaults for a program reflect the data sets in use at the
> time the defaults were chosen. There are some such in my own survival package whose
> proper value is no longer as "obvious" as it was when I chose them. Factors are very
> handy for variables which have only a few levels and will be used in modeling. Every
> character variable of every dataset in "Statistical Models in S", which introduced
> factors, is of this type so auto-transformation made a lot of sense. The "solder" data
> set there is one for which Helmert contrasts are proper so guess what the default
> contrast
> option was? (I think there are only a few data sets in the world for which Helmert makes
> sense, however, and R eventually changed the default.)
>
> For character variables that should not be factors such as a street adress
> stringsAsFactors can be a real PITA, and I expect that people's preference for the option
> depends almost entirely on how often these arise in their own work. As long as there is
> an option that can be overridden I'm okay. Yes, I'd prefer FALSE as the default, partly
> because the current value is a tripwire in the hallway that eventually catches every new
> user.
>
> Terry Therneau
>
> On 02/11/2013 05:00 AM, r-devel-request at r-project.org wrote:
> > Both of these were discussed by R Core. I think it's unlikely the
> > default for stringsAsFactors will be changed (some R Core members like
> > the current behaviour), but it's fairly likely the show.signif.stars
> > default will change. (That's if someone gets around to it: I
> > personally don't care about that one. P-values are commonly used
> > statistics, and the stars are just a simple graphical display of them.
> > I find some p-values to be useful, and the display to be harmless.)
> >
> > I think it's really unlikely the more extreme changes (i.e. dropping
> > show.signif.stars completely, or dropping p-values) will happen.
> >
> > Regarding stringsAsFactors: I'm not going to defend keeping it as is,
> > I'll let the people who like it defend it. What I will likely do is
> > make a few changes so that character vectors are automatically changed
> > to factors in modelling functions, so that operating with
> > stringsAsFactors=FALSE doesn't trigger silly warnings.
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
More information about the R-devel
mailing list