[R-sig-ME] parallel MCMCglmm, RNGstreams, starting values & priors
Jarrod Hadfield
j.hadfield at ed.ac.uk
Mon Aug 25 18:14:14 CEST 2014
Hi Ruben,
You will need to provide over-dispersed starting values for
multiple-chain convergence diagnostics to be useful (GLMM are so
simple I am generally happy if the output of a single run looks
reasonable).
With non-Gaussian data everything is Gibbs sampled conditional on the
latent variables, so you only need to pass them:
l<-rnorm(200, -1, sqrt(1))
t<-(-log(1-runif(200)*(1-exp(-exp(l)))))
y<-rpois(200,exp(l)-t)+1
# generate zero-truncated data with an intercept of -1
dat<-data.frame(y=y)
set.seed(1)
m1<-MCMCglmm(y~1, family="ztpoisson", data=dat, start=list(Liab=l))
# use true latent variable as starting values
set.seed(1)
m2<-MCMCglmm(y~1, family="ztpoisson", data=dat, start=list(Liab=rnorm(200)))
# use some very bad starting values
plot(mcmc.list(m1$Sol, m2$Sol))
# not identical despite the same seed because of different starting
values but clearly sampling the same posterior distribution:
gelman.diag(mcmc.list(m1$Sol, m2$Sol))
Cheers,
Jarrod
Quoting Ruben Arslan <rubenarslan at gmail.com> on Mon, 25 Aug 2014
18:00:08 +0200:
> Dear Jarrod,
>
> thanks for the quick reply. Please, don't waste time looking into
> doMPI – I am happy that I
> get the expected result, when I specify that reproducible seed,
> whyever that may be.
> I'm pretty sure that is the deciding factor, because I tested it
> explicitly, I just have no idea
> how/why it interacts with the choice of family.
>
> That said, is setting up different RNG streams for my workers (now
> that it works) __sufficient__
> so that I get independent chains and can use gelman.diag() for
> convergence diagnostics?
> Or should I still tinker with the starting values myself?
> I've never found a worked example of supplying starting values and
> am thus a bit lost.
>
> Sorry for sending further questions, I hope someone else takes pity while
> you're busy with lectures.
>
> Best wishes
>
> Ruben
>
>
>
> On 25 Aug 2014, at 17:29, Jarrod Hadfield <j.hadfield at ed.ac.uk> wrote:
>
>> Hi Ruben,
>>
>> I do not think the issue is with the starting values, because even
>> if the same starting values were used the chains would still differ
>> because of the randomness in the Markov Chain (if I interpret your
>> `identical' test correctly). This just involves a call to
>> GetRNGstate() in the C++ code (L 871 ofMCMCglmm.cc) so I think for
>> some reason doMPI/foreach is not doing what you expect. I am not
>> familiar with doMPI and am in the middle of writing lectures so
>> haven't got time to look into it carefully. Outside of the context
>> of doMPI I get the behaviour I expect:
>>
>>
>> l<-rnorm(200, -1, sqrt(1))
>> t<-(-log(1-runif(200)*(1-exp(-exp(l)))))
>> y<-rpois(200,exp(l)-t)+1
>> # generate zero-truncated data with an intercept of -1
>>
>> dat<-data.frame(y=y)
>> set.seed(1)
>> m1<-MCMCglmm(y~1, family="ztpoisson", data=dat)
>> set.seed(2)
>> m2<-MCMCglmm(y~1, family="ztpoisson", data=dat)
>> set.seed(2)
>> m3<-MCMCglmm(y~1, family="ztpoisson", data=dat)
>>
>> plot(mcmc.list(m1$Sol, m2$Sol))
>> # different, as expected
>> plot(mcmc.list(m2$Sol, m3$Sol))
>> # the same, as expected
>>
>>
>>
>>
>>
>> Quoting Ruben Arslan <rubenarslan at gmail.com> on Mon, 25 Aug 2014
>> 16:58:06 +0200:
>>
>>> Dear list,
>>>
>>> sorry for bumping my old post, I hope to elicit a response with a
>>> more focused question:
>>>
>>> When does MCMCglmm automatically start from different values when
>>> using doMPI/foreach?
>>>
>>> I have done some tests with models of varying complexity. For
>>> example, the script in my last
>>> post (using "zapoisson") yielded 40 identical chains:
>>>> identical(mcmclist[1], mcmclist[30])
>>> TRUE
>>>
>>> A simpler (?) model (using "ztpoisson" and no specified prior),
>>> however, yielded different chains
>>> and I could use them to calculate gelman.diag()
>>>
>>> Changing my script to the version below, i.e. seeding foreach
>>> using .options.mpi=list( seed= 1337)
>>> so as to make RNGstreams reproducible (or so I thought), led to
>>> different chains even for the
>>> "zapoisson" model.
>>>
>>> In no case have I (successfully) tried to supplant the default of
>>> MCMCglmm's "start" argument.
>>> Is starting my models from different RNGsubstreams inadequate
>>> compared to manipulating
>>> the start argument explicitly? If so, is there any worked example
>>> of explicit starting value manipulation
>>> in parallel computation?
>>> I've browsed the MCMCglmm source to understand how the default
>>> starting values are generated,
>>> but didn't find any differences with respect to RNG for the two
>>> families "ztpoisson" and "zapoisson"
>>> (granted, I did not dig very deep).
>>>
>>> Best regards,
>>>
>>> Ruben Arslan
>>>
>>>
>>> # bsub -q mpi -W 12:00 -n 41 -R np20 mpirun -H localhost -n 41 R
>>> --slave -f
>>> "/usr/users/rarslan/rpqa/rpqa_main/rpqa_children_parallel.R"
>>>
>>> library(doMPI)
>>> cl <-
>>> startMPIcluster(verbose=T,workdir="/usr/users/rarslan/rpqa/rpqa_main/")
>>> registerDoMPI(cl)
>>> Children_mcmc1 = foreach(i=1:clusterSize(cl),.options.mpi =
>>> list(seed=1337) ) %dopar% {
>>> library(MCMCglmm);library(data.table)
>>> load("/usr/users/rarslan/rpqa/rpqa1.rdata")
>>>
>>> nitt = 130000; thin = 100; burnin = 30000
>>> prior.m5d.2 = list(
>>> R = list(V = diag(c(1,1)), nu = 0.002),
>>> G=list(list(V=diag(c(1,1e-6)),nu=0.002))
>>> )
>>>
>>> rpqa.1 = na.omit(rpqa.1[spouses>0, list(idParents, children,
>>> male, urban, spouses, paternalage.mean, paternalage.factor)])
>>> (m1 = MCMCglmm( children ~ trait * (male + urban + spouses +
>>> paternalage.mean + paternalage.factor),
>>> rcov=~us(trait):units,
>>> random=~us(trait):idParents,
>>> family="zapoisson",
>>> prior = prior.m5d.2,
>>> data=rpqa.1,
>>> pr = F, saveX = F, saveZ = F,
>>> nitt=nitt,thin=thin,burnin=burnin))
>>> }
>>>
>>> library(coda)
>>> mcmclist = mcmc.list(lapply(Children_mcmc1,FUN=function(x) { x$Sol}))
>>> save(Children_mcmc1,mcmclist, file =
>>> "/usr/users/rarslan/rpqa/rpqa_main/rpqa_mcmc_kids_za.rdata")
>>> closeCluster(cl)
>>> mpi.quit()
>>>
>>>
>>>
>>> On 04 Aug 2014, at 20:25, Ruben Arslan <rubenarslan at gmail.com> wrote:
>>>
>>>> Dear list,
>>>>
>>>> would someone be willing to share her or his efforts in
>>>> parallelising a MCMCglmm analysis?
>>>>
>>>> I had something viable using harvestr that seemed to properly initialise
>>>> the starting values from different random number streams (which
>>>> is desirable,
>>>> as far as I could find out), but I ended up being unable to use
>>>> harvestr, because
>>>> it uses an old version of plyr, where parallelisation works only
>>>> for multicore, not for
>>>> MPI.
>>>>
>>>> I pasted my working version, that does not do anything about
>>>> starting values or RNG
>>>> at the end of this email. I can try to fumble further in the dark
>>>> or try to update harvestr,
>>>> but maybe someone has gone through all this already.
>>>>
>>>> I'd also appreciate any tips for elegantly post-processing such
>>>> parallel data, as some of my usual
>>>> extraction functions and routines are hampered by the fact that
>>>> some coda functions
>>>> do not aggregate results over chains. (What I get from a
>>>> single-chain summary in MCMCglmm
>>>> is a bit more comprehensive, than what I managed to cobble
>>>> together with my own extraction
>>>> functions).
>>>>
>>>> The reason I'm parallelising my analyses is that I'm having
>>>> trouble getting a good effective
>>>> sample size for any parameter having to do with the many zeroes
>>>> in my data.
>>>> Any pointers are very appreciated, I'm quite inexperienced with MCMCglmm.
>>>>
>>>> Best wishes
>>>>
>>>> Ruben
>>>>
>>>> # bsub -q mpi-short -W 2:00 -n 42 -R np20 mpirun -H localhost -n
>>>> 41 R --slave -f "rpqa/rpqa_main/rpqa_children_parallel.r"
>>>> library(doMPI)
>>>> cl <- startMPIcluster()
>>>> registerDoMPI(cl)
>>>> Children_mcmc1 = foreach(i=1:40) %dopar% {
>>>> library(MCMCglmm)
>>>> load("rpqa1.rdata")
>>>>
>>>> nitt = 40000; thin = 100; burnin = 10000
>>>> prior = list(
>>>> R = list(V = diag(c(1,1)), nu = 0.002),
>>>> G=list(list(V=diag(c(1,1e-6)),nu=0.002))
>>>> )
>>>>
>>>> MCMCglmm( children ~ trait -1 + at.level(trait,1):male +
>>>> at.level(trait,1):urban + at.level(trait,1):spouses +
>>>> at.level(trait,1):paternalage.mean +
>>>> at.level(trait,1):paternalage.factor,
>>>> rcov=~us(trait):units,
>>>> random=~us(trait):idParents,
>>>> family="zapoisson",
>>>> prior = prior,
>>>> data=rpqa.1,
>>>> pr = F, saveX = T, saveZ = T,
>>>> nitt=nitt,thin=thin,burnin=burnin)
>>>> }
>>>>
>>>> library(coda)
>>>> mcmclist = mcmc.list(lapply(Children_mcmc1,FUN=function(x) { x$Sol}))
>>>> save(Children_mcmc1,mcmclist, file = "rpqa_mcmc_kids_za.rdata")
>>>> closeCluster(cl)
>>>> mpi.quit()
>>>>
>>>>
>>>> --
>>>> Ruben C. Arslan
>>>>
>>>> Georg August University G�ttingen
>>>> Biological Personality Psychology and Psychological Assessment
>>>> Georg Elias M�ller Institute of Psychology
>>>> Go�lerstr. 14
>>>> 37073 G�ttingen
>>>> Germany
>>>> Tel.: +49 551 3920704
>>>> https://psych.uni-goettingen.de/en/biopers/team/arslan
>>>>
>>>>
>>>>
>>>
>>>
>>> [[alternative HTML version deleted]]
>>>
>>>
>>
>>
>>
>> --
>> The University of Edinburgh is a charitable body, registered in
>> Scotland, with registration number SC005336.
>
>
--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
More information about the R-sig-mixed-models
mailing list