[Rd] Regression stars
Hervé Pagès
hpages at fhcrc.org
Tue Feb 12 19:47:31 CET 2013
On 02/12/2013 08:20 AM, peter dalgaard wrote:
>
> On Feb 12, 2013, at 17:05 , Brian Lee Yung Rowe wrote:
>
>>
>> I thought that the default was the way it was for performance reasons. For large data.frames or repeated applications, using factors should be faster for non-trivial strings.
>
> I think not. Historically, it's more like "In statistics we have two kinds of variables, numerical and categorical. OK, so we have the occasional truly character-type variables like name and address, let's handle those as a special case".
<sarcasm>
Since character vectors are sooooo bad and people use them where
they should instead use a factor, I propose to go all the way and
by adding the stringsAsFactors arg to character() too. That way
people are put on the right track from the very start.
</sarcasm>
No seriously, if my variable is categorical, it's already in a factor
and that's how I pass it to data.frame(). But if I have it in a
character vector, it's because that's how I want it. It's my choice.
How could anybody ever think that having data.frame() alter his/her
data is a good thing?
Please *remove* the stringsAsFactors arg of data.frame() in R 3.0.
You'll do a big favor to your user base.
Thanks,
H.
>
>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319
More information about the R-devel
mailing list