[R-sig-ME] glmer takes long time even after restricting iterations
Ben Bolker
bbolker at gmail.com
Fri Sep 5 23:34:11 CEST 2014
On 14-09-05 05:05 PM, Douglas Bates wrote:
> I will take it as a compliment that you have sufficient confidence in our
> software to try to fit such a model. :-)
>
> Sadly, even with 400,000 observations it is highly unlikely you would be
> able to converge to parameter estimates for these modesl and even more
> unlikely that the estimates would be meaningful.
>
> The optimization in glmer is different than the optimization in lmer. For
> a linear mixed model the optimization is over the parameters in the
> relative covariance matrix only. In this case it looks like there would be
> 15 such parameters. The optimization problem involving even these
> parameters would be difficult, as it is likely that the solution will be on
> the boundary of the feasible region, representing a singular covariance
> matrix. For glmer the optimization is much more difficult because it is
> over the concatenation of the fixed-effects parameters and the covariance
> parameters. I lost track of what the number of fixed-effects parameters is
> but that number is large. As you have seen the first model failed to
> converge in 100,000 iterations. That is not encouraging.
>
> Regarding the warning messages I will let Ben or Steve respond as they know
> more about the convergence checks than I do. I believe those diagnostics
> involve creating a finite-difference approximation to the gradient vector
> and the Hessian matrix. The approximation of the Hessian matrix at the
> optimum is probably where the time is being spent.
>
For speeding things up I would try setting nAGQ=0, and setting
control=glmerControl(check.conv.grad="ignore",check.conv.singular="ignore",
check.conv.hess="ignore")
-- this should deactivate the Hessian and gradient computations
(although at some point you will probably want to go back to testing these!)
It looks like you have 79 fixed-effect parameters, plus what looks
like 10 random-effect parameters (this is a quick count, and assumes
that all your variables are numeric) -- this means that the Hessian
computation will have to do approximately 4000 (n*(n+1)/2) function
evaluations ...
You can also try using the bobyqa implementation from nloptr, with
appropriate convergence settings, as described here:
https://github.com/lme4/lme4/issues/150#issuecomment-45813306
I believe these are the same settings that are implemented in ?nloptwrap.
> The best advice is to simplify the model. You say that ALS is a binary
> variable, which means that even with 400,000 observations you have only
> 400,000 bits of information to which to fit the model. That's not a lot.
> A continuous response provides much more information per observation than
> a binary response.
>
> Try to fit the fixed-effects only using glm. I'm confident that most of
> the coefficients will not be significant.
>
> On Fri, Sep 5, 2014 at 1:19 PM, Prachi Sanghavi <prachi.sanghavi at gmail.com>
> wrote:
>
>> Hello!
>>
>> I have a fairly complex multilevel, multivariate logistic model that I am
>> trying to fit. In both models below, the variables injury, AMI, stroke,
>> and resp are binary, as well as ALS and most other variables. There are a
>> total of about 400,000 observations. When I try to fit the model (Original
>> Model), I get several warnings, and I have pasted these below. I am
>> largely concerned about number 4. I think this problem is due to having
>> too many parameters in the model, and so I removed several interactions
>> that were unnecessary anyway (Modified Model). I ran the Modified Model
>> with a fixed number of iterations, and it finished these quickly enough
>> (maybe 20 minutes?). But then it took another 19 hours to actually stop
>> running, during which time I suspect R was doing various checks that led to
>> the warnings. I'm not sure. When the Modified Model finished, it produced
>> the warnings below.
>>
>> My biggest problem right now is the amount of time it takes for R to stop
>> running, even after restricting the number of iterations to 100. Because
>> of this problem, it is impractical to try to figure out how to address the
>> warnings.
>>
>> Can somebody please help me figure out why R is taking so long, even after
>> it has finished the 100 iterations? And what can I do about it?
>>
>> Thank you!!
>>
>> Prachi Sanghavi
>> Harvard University
>>
>>
>> Original Model and Warnings:
>>
>> AMI_county_final_2 <- glmer(ALS ~ -1 + AMI + (injury + stroke +
>> resp)*(FEMALE + AGE + MTUS_CNT + Asian + Black + Hispanic + Other +
>> Custodial + Nursing + Scene + WhiteHigh + BlackHigh + BlackLow +
>> IntegratedHigh + IntegratedLow + combinedscore + Year06 + Year07 + Year08 +
>> Year09 + Year10 + Metro + Per_College_Plus + Per_Gen_Prac + Any_MedSchlAff
>> + Any_Trauma) + (-1 + injury + AMI + stroke + resp | fullcounty),
>> family=binomial, data=rbind(IARS,IARS2), verbose=2,
>> control=glmerControl(optCtrl=list(maxfun=100)))
>>
>> Warning messages:
>> 1: In (function (fn, par, lower = rep.int(-Inf, n), upper = rep.int(Inf,
>> :
>> failure to converge in 10000 evaluations
>> 2: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
>> Model failed to converge with max|grad| = 480.605 (tol = 0.001)
>> 3: In if (resHess$code != 0) { :
>> the condition has length > 1 and only the first element will be used
>> 4: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
>> Model is nearly unidentifiable: very large eigenvalue
>> - Rescale variables?;Model is nearly unidentifiable: large eigenvalue
>> ratio
>> - Rescale variables?
>>
>> Modified Model and Warnings:
>>
>> AMI_county_final_2 <- glmer(ALS ~ -1 + Year06 + Year07 + Year08 + Year09 +
>> Year10 + Metro + AMI + (injury + stroke + resp)*(FEMALE + AGE + MTUS_CNT +
>> Asian + Black + Hispanic + Other + Custodial + Nursing + Scene + WhiteHigh
>> + BlackHigh + BlackLow + IntegratedHigh + IntegratedLow + combinedscore) +
>> (-1 + injury + AMI + stroke + resp | fullcounty), family=binomial,
>> data=rbind(IARS,IARS2), verbose=2,
>> control=glmerControl(optCtrl=list(maxfun=100)))
>>
>> Warning messages:
>> 1: In commonArgs(par, fn, control, environment()) :
>> maxfun < 10 * length(par)^2 is not recommended.
>> 2: In optwrap(optimizer, devfun, start, rho$lower, control = control, :
>> convergence code 1 from bobyqa: bobyqa -- maximum number of function
>> evaluations exceeded
>> 3: In (function (fn, par, lower = rep.int(-Inf, n), upper = rep.int(Inf,
>> :
>> failure to converge in 100 evaluations
>> 4: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
>> Model failed to converge with max|grad| = 15923.5 (tol = 0.001)
>> 5: In if (resHess$code != 0) { :
>> the condition has length > 1 and only the first element will be used
>> 6: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
>> Model is nearly unidentifiable: very large eigenvalue
>> - Rescale variables?;Model is nearly unidentifiable: large eigenvalue
>> ratio
>> - Rescale variables?
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
More information about the R-sig-mixed-models
mailing list