[R] configure ddply() to avoid reordering of '.variables'
Liviu Andronic
landronimirc at gmail.com
Mon May 27 10:47:24 CEST 2013
Hello,
I'm using ddply() in plyr and I notice that it has the habit of
re-ordering the levels of the '.variables' by which the splitting is
done. I'm concerned about correctly retrieving the original ordering.
Consider:
require(plyr)
x <- iris[ order(iris$Species, decreasing=T), ]
head(x)
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#101 6.3 3.3 6.0 2.5 virginica
#102 5.8 2.7 5.1 1.9 virginica
#103 7.1 3.0 5.9 2.1 virginica
#104 6.3 2.9 5.6 1.8 virginica
#105 6.5 3.0 5.8 2.2 virginica
#106 7.6 3.0 6.6 2.1 virginica
xa <- ddply(x, .(Species), function(x)
{data.frame(Sepal.Length=x$Sepal.Length, mean.adj=(x$Sepal.Length -
mean(x$Sepal.Length)))})
# |==============================================================================================|
100%
##notice how the ordering of Species is different
##from that in the input data frame
head(xa)
# Species Sepal.Length mean.adj
#1 setosa 5.1 0.094
#2 setosa 4.9 -0.106
#3 setosa 4.7 -0.306
#4 setosa 4.6 -0.406
#5 setosa 5.0 -0.006
#6 setosa 5.4 0.394
all.equal(xa$Species, x$Species)
#[1] "100 string mismatches"
all.equal(xa[ order(xa$Species, decreasing=T), ]$Species, x$Species)
#[1] TRUE
all.equal(xa$Sepal.Length, x$Sepal.Length)
#[1] "Mean relative difference: 0.2785"
all.equal(xa[ order(xa$Species, decreasing=T), ]$Sepal.Length, x$Sepal.Length)
#[1] TRUE
In my real data, should I be concerned that simply reordering by the
'.variables' variable wouldn't necessarily restore the original
ordering as in the input data frame? Is it possible to instruct
ddply() to avoid re-ordering the supplied '.variables' variable?
Regards,
Liviu
--
Do you know how to read?
http://www.alienetworks.com/srtest.cfm
http://goodies.xfce.org/projects/applications/xfce4-dict#speed-reader
Do you know how to write?
http://garbl.home.comcast.net/~garbl/stylemanual/e.htm#e-mail
More information about the R-help
mailing list