[Rd] Suggestions for 'diff.default'
Suharto Anggono Suharto Anggono
suharto_anggono at yahoo.com
Mon Feb 4 06:28:44 CET 2013
Inspired by discussion in "Need very fast application of 'diff' - ideas?" (around https://stat.ethz.ch/pipermail/r-help/2012-January/301873.html), I have another suggestion.
Suggestion 3: Make 'diff.default' run faster.
For vector case (if suggestion 2 is not applied or if unclassed input is treated specially), without resorting to C, I found that a speedup may be gained by changing
r[-length(r):-(length(r)-lag+1L)]
with
`length<-`(r, length(r)-lag)
Another way, with similar idea, that triggers warning, is doing as follows.
{
for (i in seq_len(differences)) r <- r[i1] - r
length(r) <- xlen - lag * differences
}
Variables 'i1' and 'xlen' are as defined in function 'diff.default' in R.
--- On Tue, 29/1/13, Suharto Anggono Suharto Anggono <suharto_anggono at yahoo.com> wrote:
> From: Suharto Anggono Suharto Anggono <suharto_anggono at yahoo.com>
> Subject: Re: Suggestions for 'diff.default'
> To: R-devel at lists.R-project.org
> Date: Tuesday, 29 January, 2013, 10:32 AM
>
>
> --- On Mon, 28/1/13, Suharto Anggono Suharto Anggono <suharto_anggono at yahoo.com>
> wrote:
>
> > From: Suharto Anggono Suharto Anggono <suharto_anggono at yahoo.com>
> > Subject: Suggestions for 'diff.default'
> > To: R-devel at lists.R-project.org
> > Date: Monday, 28 January, 2013, 5:31 PM
> > I have suggestions for function
> > 'diff.default' in R.
> >
> >
> > Suggestion 1: If the input is matrix, always return
> matrix,
> > even if empty.
> >
> > What happens in R 2.15.2:
> >
> > > rbind(1:2) # matrix
> > [,1] [,2]
> > [1,] 1 2
> > > diff(rbind(1:2)) # not matrix
> > integer(0)
> > > sessionInfo()
> > R version 2.15.2 (2012-10-26)
> > Platform: i386-w64-mingw32/i386 (32-bit)
> >
> > locale:
> > [1] LC_COLLATE=English_United States.1252
> > [2] LC_CTYPE=English_United States.1252
> > [3] LC_MONETARY=English_United States.1252
> > [4] LC_NUMERIC=C
> > [5] LC_TIME=English_United States.1252
> >
> > attached base packages:
> > [1] stats graphics grDevices
> > utils datasets
> > methods base
> >
> >
> > The documentation for 'diff' says, "If 'x' is a matrix
> then
> > the difference operations are carried out on each
> column
> > separately."
> > If the result is empty, I expect that the result still
> has
> > as many columns as the input.
> >
> >
> > Suggestion 2: Make 'diff.default' applicable more
> generally
> > by
> > (a) not performing 'unclass';
> > (b) generalizing (changing)
> > ismat <- is.matrix(x)
> > to become
> > ismat <- length(dim(x)) == 2L
> >
> >
> > If suggestion 1 is to be applied, if 'unclass' is not
> wanted
> > (point (a) in suggestion 2 is also to be applied),
> >
> > if (lag * differences >= xlen)
> > return(x[0L])
> >
> > can be changed to
> >
> > if (lag * differences >= xlen)
> > return(
> > if (ismat) x[0L, ,
> > drop = FALSE] - x[0L, , drop = FALSE] else
> > x[0L] - x[0L])
> >
> > It will handle class where subtraction (minus)
> operation
> > change class.
> Sorry, I wasn't careful enough. To obtain the correct class
> for the result, differencing should be done as many times as
> specified by argument 'differences'.
>
> I consider the case of
> diff(as.POSIXct(c("2012-01-01", "2012-02-01"), tz="UTC"),
> d=2)
> versus
> diff(diff(as.POSIXct(c("2012-01-01", "2012-02-01"),
> tz="UTC")))
> To be safe, maybe just compute as usual, even when it is
> known that the end result will be empty. It can be done like
> this.
>
> empty <- integer()
> if (ismat)
> for (i in seq_len(differences))
> r <- if (lag >=
> nrow(r))
>
> r[empty, , drop = FALSE] - r[empty, , drop = FALSE] else
> ...
> else
> for (i in seq_len(differences))
> r <- if (lag
> >= length(r))
>
> r[empty] - r[empty] else
> ...
>
> If that way is used, 'xlen' is no longer needed.
> >
> > Otherwise, if 'unclass' is wanted, maybe the handling
> of
> > empty result can be moved to be after 'unclass', to be
> > consistent with non-empty result.
> >
> >
> > If point (a) in suggestion 2 is applied, 'diff.default'
> can
> > handle input of class "Date" and "POSIXt". If, in
> addition,
> > point (b) in suggestion 2 is also applied,
> 'diff.default'
> > can handle data frame as input.
> >
>
More information about the R-devel
mailing list