[R] vectorized mle / optim
Arnaud Le Rouzic
a.p.s.lerouzic at bio.uio.no
Wed Oct 24 19:00:05 CEST 2007
Hi the list,
I would need some advice on something that looks like a FAQ: the
possibility of providing vectors to optim() function.
Here is a stupid and short example summarizing the problem:
-------------------------------- example 1 ------------ 8<
----------------------
library(stats4)
data <- rnorm(100,0,1)
lik1 <- function(m, v, data) {
N <- length(data)
lik.mean <- dnorm(mean(data), m, sqrt(v/N), log=T)
lik.var <- dchisq(N*var(data)/v, N-1, log=T)
return(-lik.mean - lik.var)
}
ml.result <- mle(lik1, start=list(m=2, v=2), fixed=list(data=data))
summary(ml.result)
---------------------------------------------------------------------------------------
This works perfectly (except that the default algorithm sometimes tries
some negative variance parameters).
However, if the parameters (m and v) are considered as a vector of
parameters, the result is very disappointing:
-------------------------------- example 2 ------------ 8<
----------------------
lik2 <- function(param, data) {
N <- length(data)
lik.mean <- dnorm(mean(data), param["m"], sqrt(param["v"]/N), log=T)
lik.var <- dchisq(N*var(data)/param["v"], N-1, log=T)
return(-lik.mean - lik.var)
}
ml.result <- mle(lik2, start=list(param=c(m=2, v=2)), fixed=list(data=data))
---------------------------------------------------------------------------------------
"Error in dnorm(mean(data), param["m"], sqrt(param["v"]/N), log = T) :
argument "param" is missing, with no default"
One could trust the error message and provide default values, but
unfortunately,
-------------------------------- example 2b ------------ 8<
----------------------
lik2b <- function(param=c(m=1, v=1), data) {
N <- length(data)
lik.mean <- dnorm(mean(data), param["m"], sqrt(param["v"]/N), log=T)
lik.var <- dchisq(N*var(data)/param["v"], N-1, log=T)
return(-lik.mean - lik.var)
}
ml.result <- mle(lik2b, start=list(param=c(m=2, v=2)),
fixed=list(data=data))
---------------------------------------------------------------------------------------
"Error in validObject(.Object) :
invalid class "mle" object: invalid object for slot "fullcoef" in
class "mle": got class "list", should be or extend class "numeric""
The example above is very stupid, but there are cases where the estimate
of vectors of parameters can hardly be avoided. In particular, I'm
working with time series models with random effects that has to be
estimated at each time point (the parameters underlying the distribution
of these random effects are "real" parameters, but the effect itself is
a nuisance parameter). Of course, the number of variables to estimate
will depend on the size of the data set, and the function is supposed to
work for any dataset (convergence is another issue).
The archives of r-help are quite clear on the fact that mle (and optim)
are not vectorized, dot. I tried to trick mle by defining a likelihood
function with a "..." parameter, and converting the c(...) into named
parameters afterwards, but it does not work: mle checks the list of
parameters against "fullcoef <- formals(minuslogl)". I derived my own
mle function to force the call of optim() without the "fool-proof"
tests, but it was probably very naive because it crashes as well.
The fact is that the code of mle(), as well as the R part of optim(), is
full of (not always clean) tricks to ensure that the parameters are
exactly as they are supposed to be. I guess the aim of this code is not
only to slow down the function and to annoy the user, so I would like to
get some feedback before trying to modify mle() and optim() to force
them, in one way or another, to turn lists of vectors into lists of
numeric values. I'm a bit afraid that the problem goes down to the
.Internal optim function, and I will probably give up if it is the case.
Thanks for any comment,
Arnaud.
PS: By the way, the "paranoid parameter tests" in mle / optim lead to an
annoying bug: the "minuslogl" function cannot have "computed" default
parameters, because all of them must be specified either as "fixed" or
"start".
-------------------------------- example 3 ------------ 8<
----------------------
lik1b <- function(m, v=var(data), data) {
N <- length(data)
lik.mean <- dnorm(mean(data), m, sqrt(v/N), log=T)
lik.var <- dchisq(N*var(data)/v, N-1, log=T)
return(-lik.mean - lik.var)
}
ml.result.1 <- mle(lik1b, start=list(m=2, v=2), fixed=list(data=data))
ml.result.2 <- mle(lik1b, start=list(m=2), fixed=list(data=data))
---------------------------------------------------------------------------------------
The second call to mle crashes
"Error in validObject(.Object) :
invalid class "mle" object: invalid object for slot "fullcoef" in
class "mle": got class "list", should be or extend class "numeric""
I don't see any reason for that: a call to lik1b(m=0, data=data) returns
the expected value. The bug can be bypassed by using a default such as
v=NULL, and then if(is.null(v)) v <- var(data) , but it does not seem
very elegant...
More information about the R-help
mailing list