[Rd] bug and enhancement to split?
Martin Morgan
mtmorgan at fhcrc.org
Sun Jan 27 20:20:37 CET 2013
With
> R.version.string
[1] "R Under development (unstable) (2013-01-26 r61752)"
'split.default' recycles a short factor for unclassed 'x', but not for an
instance of x that is a class
> split(1:5, 1:2)
$`1`
[1] 1 3 5
$`2`
[1] 2 4
Warning message:
In split.default(1:5, 1:2) :
data length is not a multiple of split variable
> x = structure(1:5, class="A")
> split(x, 1:2)
$`1`
[1] 1
$`2`
[1] 2
Also, this is inconsistent with split<-, which does have recycling
> split(x, 1:2) <- 1:2
Warning message:
In split.default(seq_along(x), f, drop = drop, ...) :
data length is not a multiple of split variable
> x
[1] 1 2 1 2 1
attr(,"class")
[1] "A"
A solution is to change a call to seq_along(f) toward the end of split.default
to seq_along(x).
@@ -32,7 +32,7 @@
lf <- levels(f)
y <- vector("list", length(lf))
names(y) <- lf
- ind <- .Internal(split(seq_along(f), f))
+ ind <- .Internal(split(seq_along(x), f))
for(k in lf) y[[k]] <- x[ind[[k]]]
y
}
Maybe a little harder to argue the following, but in split.default, for a class
that one might wish to develop factor-like behaviour, e.g.,
Rle = setClass("Rle", representation(values="integer", lengths="integer"))
f = Rle(values=1:2, lengths=2:3)
the code
if (is.list(f))
f <- interaction(f, drop = drop, sep = sep)
else if (drop || !is.factor(f))
f <- factor(f)
requires that one make factor a generic and develop a method for factor.Rle.
This contradicts the documentation
f: a ‘factor’ in the sense that ‘as.factor(f)’ defines the
grouping, or a list of such factors in which case their
interaction is used for the grouping.
and perhaps the more common (?) pattern of coercion using as.*. A solution is to
make as.factor a generic and revises the code above to use something like
if (is.list(f)) f <- interaction(f, drop = drop, sep = sep)
else if (!is.factor(f)) f <- as.factor(f)
else if (drop) f <- factor(f)
One then gets split behaviour if there is an as.factor.Rle method
as.factor.Rle <- function(x, ...)
factor(rep(x at values, x at lengths), levels=unique(x at values))
setAs("Rle", "factor", function(from) as.factor.Rle(from))
These more elaborate changes are in the attached diff.
Martin
--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
-------------- next part --------------
A non-text attachment was scrubbed...
Name: split.diff.tar.gz
Type: application/x-gzip
Size: 1184 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20130127/8541e1ea/attachment.gz>
More information about the R-devel
mailing list