[R-sig-ME] parallel MCMCglmm, RNGstreams, starting values & priors
Jarrod Hadfield
j.hadfield at ed.ac.uk
Fri Aug 29 09:13:33 CEST 2014
Hi Ruben,
Actually I might know what it is. When you sample different starting
values do you inadvertently sample a new residual variance for the
unidentified za part? You need to make sure that this is always fixed
at the same value (otherwise the model is different). This is not a
problem under the trait:units specification.
Cheers,
Jarrod.
Quoting Jarrod Hadfield <j.hadfield at ed.ac.uk> on Fri, 29 Aug 2014
08:04:42 +0100:
> Hi Ruben,
>
> Can you share your data and I will take a look. Its definitely not
> Monte Carlo error.
>
> Cheers,
>
> Jarrod
>
>
> Quoting Ruben Arslan <rubenarslan at gmail.com> on Thu, 28 Aug 2014
> 22:44:47 +0200:
>
>> Hi Jarrod,
>>
>> those two matched up quite well yes. I just completed another 20
>> chains, using more variable
>> starting values. There's still two fixed effects
>> traitza_children:spouses and :male which haven't converged
>> according to multi-chain (gelman), but have according to geweke.
>> The offending traces: http://imgur.com/Qm6Ovfr
>> These specific effects aren't of interest to me, so if this doesn't
>> affect the rest of my estimates, I can be happy
>> with this, but I can't conclude that, can I?
>>
>> I'm now also doing a run to see how it deals with the more
>> intensely zero-inflated data when including
>> the unmarried.
>>
>> Thanks a lot for all that help,
>>
>> Ruben
>>
>>> gelman.diag(mcmclist)
>> Potential scale reduction factors:
>>
>> Point est. Upper C.I.
>> (Intercept) 1.00 1.00
>> traitza_children 1.40 1.65
>> male 1.00 1.00
>> spouses 1.00 1.00
>> paternalage.mean 1.00 1.00
>> paternalage.factor(25,30] 1.00 1.00
>> paternalage.factor(30,35] 1.00 1.00
>> paternalage.factor(35,40] 1.00 1.00
>> paternalage.factor(40,45] 1.00 1.00
>> paternalage.factor(45,50] 1.00 1.00
>> paternalage.factor(50,55] 1.00 1.00
>> paternalage.factor(55,90] 1.00 1.00
>> traitza_children:male 1.33 1.54
>> traitza_children:spouses 2.21 2.83
>> traitza_children:paternalage.mean 1.01 1.02
>> traitza_children:paternalage.factor(25,30] 1.05 1.08
>> traitza_children:paternalage.factor(30,35] 1.08 1.13
>> traitza_children:paternalage.factor(35,40] 1.15 1.25
>> traitza_children:paternalage.factor(40,45] 1.15 1.26
>> traitza_children:paternalage.factor(45,50] 1.26 1.43
>> traitza_children:paternalage.factor(50,55] 1.15 1.25
>> traitza_children:paternalage.factor(55,90] 1.14 1.23
>>
>> Multivariate psrf
>>
>> 8.99
>>
>>> summary(mcmclist)
>>
>> Iterations = 100001:149951
>> Thinning interval = 50
>> Number of chains = 20
>> Sample size per chain = 1000
>>
>> 1. Empirical mean and standard deviation for each variable,
>> plus standard error of the mean:
>>
>> Mean SD Naive
>> SE Time-series SE
>> (Intercept) 1.36326 0.04848
>> 0.0003428 0.0003542
>> traitza_children -0.76679 0.28738
>> 0.0020321 0.0016682
>> male 0.09980 0.01633
>> 0.0001155 0.0001222
>> spouses 0.12333 0.01957
>> 0.0001384 0.0001414
>> paternalage.mean 0.07215 0.02194
>> 0.0001551 0.0001596
>> paternalage.factor(25,30] -0.03381 0.04184
>> 0.0002959 0.0003066
>> paternalage.factor(30,35] -0.08380 0.04270
>> 0.0003019 0.0003118
>> paternalage.factor(35,40] -0.16502 0.04569
>> 0.0003231 0.0003289
>> paternalage.factor(40,45] -0.16738 0.05090
>> 0.0003599 0.0003697
>> paternalage.factor(45,50] -0.18383 0.05880
>> 0.0004158 0.0004242
>> paternalage.factor(50,55] -0.18241 0.07277
>> 0.0005146 0.0005302
>> paternalage.factor(55,90] -0.40612 0.09875
>> 0.0006983 0.0007467
>> traitza_children:male 0.12092 0.08223
>> 0.0005815 0.0004697
>> traitza_children:spouses 0.64881 0.21132
>> 0.0014942 0.0008511
>> traitza_children:paternalage.mean -0.02741 0.08550
>> 0.0006046 0.0006221
>> traitza_children:paternalage.factor(25,30] -0.17296 0.18680
>> 0.0013209 0.0013750
>> traitza_children:paternalage.factor(30,35] -0.19027 0.19267
>> 0.0013624 0.0013901
>> traitza_children:paternalage.factor(35,40] -0.24911 0.21282
>> 0.0015049 0.0014391
>> traitza_children:paternalage.factor(40,45] -0.29772 0.23403
>> 0.0016548 0.0015956
>> traitza_children:paternalage.factor(45,50] -0.51782 0.28589
>> 0.0020215 0.0017602
>> traitza_children:paternalage.factor(50,55] -0.46126 0.32064
>> 0.0022673 0.0021397
>> traitza_children:paternalage.factor(55,90] -0.38612 0.41461
>> 0.0029317 0.0027396
>>
>> 2. Quantiles for each variable:
>>
>> 2.5% 25%
>> 50% 75% 97.5%
>> (Intercept) 1.26883 1.33106
>> 1.36322 1.39575 1.4589722
>> traitza_children -1.20696 -0.95751
>> -0.81076 -0.63308 -0.0365042
>> male 0.06785 0.08878
>> 0.09970 0.11085 0.1320168
>> spouses 0.08467 0.11030
>> 0.12343 0.13643 0.1617869
>> paternalage.mean 0.02950 0.05751
>> 0.07202 0.08683 0.1153881
>> paternalage.factor(25,30] -0.11581 -0.06174
>> -0.03397 -0.00574 0.0473783
>> paternalage.factor(30,35] -0.16656 -0.11250
>> -0.08358 -0.05519 0.0003065
>> paternalage.factor(35,40] -0.25518 -0.19530
>> -0.16500 -0.13440 -0.0757366
>> paternalage.factor(40,45] -0.26887 -0.20164
>> -0.16675 -0.13335 -0.0677407
>> paternalage.factor(45,50] -0.30080 -0.22320
>> -0.18339 -0.14440 -0.0687967
>> paternalage.factor(50,55] -0.32663 -0.23034
>> -0.18227 -0.13317 -0.0415547
>> paternalage.factor(55,90] -0.60202 -0.47303
>> -0.40454 -0.33994 -0.2139128
>> traitza_children:male -0.01083 0.06634
>> 0.11024 0.16109 0.3295892
>> traitza_children:spouses 0.37857 0.51072
>> 0.59398 0.71395 1.2127940
>> traitza_children:paternalage.mean -0.19138 -0.08250
>> -0.02985 0.02493 0.1468989
>> traitza_children:paternalage.factor(25,30] -0.57457 -0.28481
>> -0.16489 -0.05151 0.1728148
>> traitza_children:paternalage.factor(30,35] -0.61499 -0.30350
>> -0.17736 -0.06299 0.1555147
>> traitza_children:paternalage.factor(35,40] -0.74251 -0.36752
>> -0.22966 -0.10777 0.1151897
>> traitza_children:paternalage.factor(40,45] -0.84165 -0.42691
>> -0.27729 -0.14322 0.1032436
>> traitza_children:paternalage.factor(45,50] -1.21782 -0.66568
>> -0.48420 -0.32873 -0.0476720
>> traitza_children:paternalage.factor(50,55] -1.21327 -0.63623
>> -0.43432 -0.24957 0.0955360
>> traitza_children:paternalage.factor(55,90] -1.33772 -0.62227
>> -0.35364 -0.11050 0.3361684
>>
>>> effectiveSize(mcmclist)
>> (Intercept)
>> traitza_children
>> 18814.05
>> 16359.33
>> male
>> spouses
>> 18132.98
>> 19547.05
>> paternalage.mean
>> paternalage.factor(25,30]
>> 19238.72
>> 18974.81
>> paternalage.factor(30,35]
>> paternalage.factor(35,40]
>> 18874.33
>> 19406.63
>> paternalage.factor(40,45]
>> paternalage.factor(45,50]
>> 19075.18
>> 19401.77
>> paternalage.factor(50,55]
>> paternalage.factor(55,90]
>> 18960.11
>> 17893.23
>> traitza_children:male
>> traitza_children:spouses
>> 18545.55
>> 14438.51
>> traitza_children:paternalage.mean
>> traitza_children:paternalage.factor(25,30]
>> 18464.09
>> 16943.43
>> traitza_children:paternalage.factor(30,35]
>> traitza_children:paternalage.factor(35,40]
>> 16827.44
>> 17230.04
>> traitza_children:paternalage.factor(40,45]
>> traitza_children:paternalage.factor(45,50]
>> 17144.78
>> 18191.67
>> traitza_children:paternalage.factor(50,55]
>> traitza_children:paternalage.factor(55,90]
>> 17466.60
>> 18540.59
>>
>>
>> ### current script:
>>
>> # bsub -q mpi -W 24:00 -n 21 -R np20 mpirun -H localhost -n 21 R
>> --slave -f "/usr/users/rarslan/rpqa/krmh_main/children.R"
>> setwd("/usr/users/rarslan/rpqa/")
>> library(doMPI)
>> cl <-
>> startMPIcluster(verbose=T,workdir="/usr/users/rarslan/rpqa/krmh_main/")
>> registerDoMPI(cl)
>> Children = foreach(i=1:clusterSize(cl),.options.mpi =
>> list(seed=1337) ) %dopar% {
>> library(MCMCglmm);library(data.table)
>> setwd("/usr/users/rarslan/rpqa/krmh_main/")
>> source("../1 - extraction functions.r")
>> load("../krmh1.rdata")
>>
>> krmh.1 = recenter.pat(na.omit(krmh.1[spouses>0, list(idParents,
>> children, male, spouses, paternalage)]))
>>
>> samples = 1000
>> thin = 50; burnin = 100000
>> nitt = samples * thin + burnin
>>
>> prior <- list(
>> R=list(V=diag(2), nu=1.002, fix=2),
>> G=list(G1=list(V=diag(2), nu=1, alpha.mu=c(0,0), alpha.V=diag(2)*1000))
>> )
>>
>> start <- list(
>> liab=c(rnorm( nrow(krmh.1)*2 )),
>> R = list(R1 = rIW(diag(2), 10 )),
>> G = list(G1 = rIW(diag(2), 10 ))
>> )
>>
>> ( m1 = MCMCglmm( children ~ trait * (male + spouses +
>> paternalage.mean + paternalage.factor),
>> rcov=~idh(trait):units,
>> random=~idh(trait):idParents,
>> family="zapoisson",
>> start = start,
>> prior = prior,
>> data=krmh.1,
>> pr = F, saveX = F, saveZ = F,
>> nitt=nitt,thin=thin,burnin=burnin)
>> )
>> m1$Residual$nrt<-2
>> m1
>> }
>>
>> save(Children,file = "Children.rdata")
>> closeCluster(cl)
>> mpi.quit()
>>
>> On 28 Aug 2014, at 20:59, Jarrod Hadfield <j.hadfield at ed.ac.uk> wrote:
>>
>>> Hi,
>>>
>>> The posteriors for the two models look pretty close to me. Are the
>>> scale reduction factors really as high as previously reported?
>>> Before you had 1.83 for traitza_children:spouses, but the plot
>>> suggests that it should be close to 1?
>>>
>>> Cheers,
>>>
>>> Jarrod
>>>
>>>
>>>
>>>
>>> Quoting Ruben Arslan <rubenarslan at gmail.com> on Thu, 28 Aug 2014
>>> 19:59:16 +0200:
>>>
>>>> Sure! Thanks a lot.
>>>> I am using ~idh(trait):units already, sorry for saying that
>>>> incorrectly in my last email.
>>>> These models aren't the final thing, I will replace the
>>>> paternalage.factor variable
>>>> with its linear equivalent if that seems defensible (does so far)
>>>> and in this model it seems
>>>> okay to remove the za-effects for all predictors except spouses.
>>>> So a final model would have fewer fixed effects. I also have
>>>> datasets of 200k+ and 5m+,
>>>> but I'm learning MCMCglmm with this smaller one because my wrong
>>>> turns take less time.
>>>>
>>>> I've uploaded a comparison coef plot of two models:
>>>> http://i.imgur.com/sHUfnmd.png
>>>> m7 is with the default starting values, m1 is with the
>>>> specification I sent in my last email. I don't
>>>> know if such differences are something to worry about.
>>>>
>>>> I don't know what qualifies as highly overdispersed, here's a
>>>> plot of the outcome for ever
>>>> married people (slate=real data):
>>>> http://imgur.com/14MywgZ
>>>> here's with everybody born (incl. some stillborn etc.):
>>>> http://imgur.com/knRGa1v
>>>> I guess my approach (generating an overdispersed poisson with the
>>>> parameters from
>>>> the data and checking if it has as excess zeroes) is not the best
>>>> way to diagnose zero-inflation,
>>>> but especially in the second case it seems fairly clear-cut.
>>>>
>>>> Best regards,
>>>>
>>>> Ruben
>>>>
>>>>> summary(m1)
>>>>
>>>> Iterations = 50001:149951
>>>> Thinning interval = 50
>>>> Sample size = 2000
>>>>
>>>> DIC: 31249.73
>>>>
>>>> G-structure: ~idh(trait):idParents
>>>>
>>>> post.mean l-95% CI u-95% CI eff.samp
>>>> children.idParents 0.006611 4.312e-08 0.0159 523.9
>>>> za_children.idParents 0.193788 7.306e-02 0.3283 369.3
>>>>
>>>> R-structure: ~idh(trait):units
>>>>
>>>> post.mean l-95% CI u-95% CI eff.samp
>>>> children.units 0.1285 0.1118 0.1452 716.1
>>>> za_children.units 0.9950 0.9950 0.9950 0.0
>>>>
>>>> Location effects: children ~ trait * (male + spouses +
>>>> paternalage.mean + paternalage.factor)
>>>>
>>>> post.mean l-95% CI
>>>> u-95% CI eff.samp pMCMC
>>>> (Intercept) 1.3413364 1.2402100
>>>> 1.4326099 1789 <5e-04 ***
>>>> traitza_children -0.8362879 -1.2007980
>>>> -0.5016730 1669 <5e-04 ***
>>>> male 0.0994902 0.0679050
>>>> 0.1297394 2000 <5e-04 ***
>>>> spouses 0.1236033 0.0839000
>>>> 0.1624939 2000 <5e-04 ***
>>>> paternalage.mean 0.0533892 0.0119569
>>>> 0.0933960 2000 0.015 *
>>>> paternalage.factor(25,30] -0.0275822 -0.1116421
>>>> 0.0537359 1842 0.515
>>>> paternalage.factor(30,35] -0.0691025 -0.1463214
>>>> 0.0122393 1871 0.097 .
>>>> paternalage.factor(35,40] -0.1419933 -0.2277379
>>>> -0.0574678 1845 <5e-04 ***
>>>> paternalage.factor(40,45] -0.1364952 -0.2362714
>>>> -0.0451874 1835 0.007 **
>>>> paternalage.factor(45,50] -0.1445342 -0.2591767
>>>> -0.0421178 1693 0.008 **
>>>> paternalage.factor(50,55] -0.1302972 -0.2642965
>>>> 0.0077061 2000 0.064 .
>>>> paternalage.factor(55,90] -0.3407879 -0.5168972
>>>> -0.1493652 1810 <5e-04 ***
>>>> traitza_children:male 0.0926888 -0.0147379
>>>> 0.2006142 1901 0.098 .
>>>> traitza_children:spouses 0.5531197 0.3870616
>>>> 0.7314289 1495 <5e-04 ***
>>>> traitza_children:paternalage.mean 0.0051463 -0.1279396
>>>> 0.1460099 1617 0.960
>>>> traitza_children:paternalage.factor(25,30] -0.1538957 -0.4445749
>>>> 0.1462955 1781 0.321
>>>> traitza_children:paternalage.factor(30,35] -0.1747883 -0.4757851
>>>> 0.1162476 1998 0.261
>>>> traitza_children:paternalage.factor(35,40] -0.2261843 -0.5464379
>>>> 0.0892582 1755 0.166
>>>> traitza_children:paternalage.factor(40,45] -0.2807543 -0.6079678
>>>> 0.0650281 1721 0.100 .
>>>> traitza_children:paternalage.factor(45,50] -0.4905843 -0.8649214
>>>> -0.1244174 1735 0.010 **
>>>> traitza_children:paternalage.factor(50,55] -0.4648579 -0.9215759
>>>> -0.0002083 1687 0.054 .
>>>> traitza_children:paternalage.factor(55,90] -0.3945406 -1.0230155
>>>> 0.2481568 1793 0.195
>>>> ---
>>>> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>>>>
>>>>> describe(krmh.1[spouses>0,])
>>>> vars n mean sd median trimmed mad min
>>>> max range skew kurtosis se
>>>> children 2 6829 3.81 2.93 4.00 3.61 2.97 0.00
>>>> 16.00 16.00 0.47 -0.46 0.04
>>>> male 3 6829 0.46 0.50 0.00 0.45 0.00 0.00
>>>> 1.00 1.00 0.14 -1.98 0.01
>>>> spouses 4 6829 1.14 0.38 1.00 1.03 0.00 1.00
>>>> 4.00 3.00 2.87 8.23 0.00
>>>> paternalage 5 6829 3.65 0.80 3.57 3.60 0.80 1.83
>>>> 7.95 6.12 0.69 0.70 0.01
>>>> paternalage_c 6 6829 0.00 0.80 -0.08 -0.05 0.80 -1.82
>>>> 4.30 6.12 0.69 0.70 0.01
>>>> paternalage.mean 7 6829 0.00 0.68 -0.08 -0.05 0.59 -1.74
>>>> 4.30 6.04 0.95 1.97 0.01
>>>> paternalage.diff 8 6829 0.00 0.42 0.00 -0.01 0.38 -1.51
>>>> 1.48 2.99 0.17 0.17 0.01
>>>>
>>>>> table(krmh.1$paternalage.factor)
>>>>
>>>> [0,25] (25,30] (30,35] (35,40] (40,45] (45,50] (50,55] (55,90]
>>>> 309 1214 1683 1562 1039 623 269 130
>>>>
>>>> On 28 Aug 2014, at 19:05, Jarrod Hadfield <j.hadfield at ed.ac.uk> wrote:
>>>>
>>>>> Hi Ruben,
>>>>>
>>>>> It might be hard to detect (near) ECPs with so many fixed
>>>>> effects (can you post the model summary (and give us the mean
>>>>> and standard deviation of any continuous covariates)). Also, the
>>>>> complementary log-log link (which is the za specification) is
>>>>> non-symmetric and runs into problems outside the range -35 to
>>>>> 3.5 so there may be a problem there, particularly if you use
>>>>> rcov=~trait:units and the Poisson part is highly over-dispersed.
>>>>> You could try rcov=~idh(trait):units and fix the
>>>>> non-identifiable za residual variance to something smaller than
>>>>> 1 (say 0.5) - it will mix slower but it will reduce the chance
>>>>> of over/underflow.
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Jarrod
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Quoting Ruben Arslan <rubenarslan at gmail.com> on Thu, 28 Aug 2014
>>>>> 18:45:30 +0200:
>>>>>
>>>>>> Hi Jarrod,
>>>>>>
>>>>>>> 1) it did not return an error with rcov = ~trait:units because
>>>>>>> you used R1=rpois(2,1)+1 and yet this specification only fits
>>>>>>> a single variance (not a 2x2 covariance matrix).
>>>>>>> R1=rpois(2,1)+1 is a bit of a weird specification since it has
>>>>>>> to be integer. I would obtain starting values using rIW().
>>>>>>
>>>>>> I agree it's a weird specification, I was a bit lost and
>>>>>> thought I could get away with just putting some random numbers
>>>>>> in the starting value.
>>>>>> I didn't do R1=rpois(2,1)+1 though, I did
>>>>>> R1=diag(rpois(2,1)+1), so I got a 2x2 matrix, but yes, bound to
>>>>>> be integer.
>>>>>> I didn't know starting values should come from a conjugate
>>>>>> distribution, though that probably means I didn't think about
>>>>>> it much.
>>>>>>
>>>>>> I'm now doing
>>>>>> start <- list(
>>>>>> liab=c(rnorm( nrow(krmh.1)*2 )),
>>>>>> R = list(R1 = rIW( diag(2), nrow(krmh.1)) ),
>>>>>> G = list(G1 = rIW( diag(2), nrow(krmh.1)) )
>>>>>> )
>>>>>>
>>>>>> Is this what you had in mind?
>>>>>> I am especially unsure if I am supposed to use such a low
>>>>>> sampling variability (my sample size is probably not even
>>>>>> relevant for the starting values) and if I should start from
>>>>>> diag(2).
>>>>>>
>>>>>> And, I am still happily confused that this specification still
>>>>>> doesn't lead to errors with respect to rcov = ~trait:units .
>>>>>> Does this mean I'm doing it wrong?
>>>>>>
>>>>>>> 3) a) how many effective samples do you have for each
>>>>>>> parameter? and b) are you getting extreme category
>>>>>>> problems/numerical issues? If you store the latent variables
>>>>>>> (pl=TUE) what is their range for the Zi/za part?
>>>>>>
>>>>>> My parallel run using the above starting values isn't finished yet.
>>>>>> a) After applying the above starting values I get, for the
>>>>>> location effects 1600-2000 samples for a 2000 sample chain
>>>>>> (with thin set to 50). G and R-structure are from 369
>>>>>> (za_children.idParents) to 716 (and 0 for the fixed part).
>>>>>> Effective sample sizes were similar for my run using the
>>>>>> starting values for G/R that I drew from rpois, and using 40
>>>>>> chains I of course get
>>>>>> b) I don't think I am getting extreme categories. I would
>>>>>> probably be getting extreme categories if I included the
>>>>>> forever-alones (they almost never have children), but this way
>>>>>> no.
>>>>>> I wasn't sure how to examine the range of the latents
>>>>>> separately for the za part, but for a single chain it looks okay:
>>>>>>> quantile(as.numeric(m1$Liab),probs = c(0,0.01,0,0.99,1))
>>>>>> 0% 1% 0% 99% 100%
>>>>>> -4.934111 -1.290728 -4.934111 3.389847 7.484206
>>>>>>
>>>>>> Well, all considered now that I use the above starting value
>>>>>> specification I get slightly different estimates for all
>>>>>> za-coefficients. Nothing major, but still leading me to
>>>>>> think my estimates aren't exactly independent of the starting
>>>>>> values I use. I'll see what the parallel run yields.
>>>>>>
>>>>>> Thanks a lot,
>>>>>>
>>>>>> Ruben
>>>>>>
>>>>>>>
>>>>>>> Cheers,
>>>>>>>
>>>>>>> Jarrod
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Quoting Ruben Arslan <rubenarslan at gmail.com> on Wed, 27 Aug
>>>>>>> 2014 19:23:42 +0200:
>>>>>>>
>>>>>>>> Hi Jarrod,
>>>>>>>>
>>>>>>>> thanks again. I was able to get it running with your advice.
>>>>>>>> Some points of confusion remain:
>>>>>>>>
>>>>>>>> - You wrote that zi/za models would return an error with rcov
>>>>>>>> = ~trait:units + starting values. This did not happen in my
>>>>>>>> case, so I didn't build MCMCglmm myself with your suggested
>>>>>>>> edits. Also, have you considered putting your own MCMCglmm
>>>>>>>> repo on Github? Your users would be able to install
>>>>>>>> pre-releases and I'd think you'd get some time-saving pull
>>>>>>>> requests too.
>>>>>>>> - In my attempts to get my models to run properly, I messed
>>>>>>>> up a prior and did not use fix=2 in my prior specification
>>>>>>>> for my za models. This led to crappy convergence, it's much
>>>>>>>> better now and for some of my simpler models I think I won't
>>>>>>>> need parallel chains. I'm reminded of Gelman's folk theorem
>>>>>>>> of statistical computing.
>>>>>>>> - I followed your advice, but of course I could not set the
>>>>>>>> true values as starting values, but wanted to set random, bad
>>>>>>>> starting values. I pasted below what I arrived at, I'm
>>>>>>>> especially unsure whether I specified the starting values for
>>>>>>>> G and R properly (I think not).
>>>>>>>> start <- list(
>>>>>>>> liab=c(rnorm( nrow(krmh.1)*2 )),
>>>>>>>> R = list(R1 = diag(rpois(2, 1)+1)),
>>>>>>>> G = list(G1 = diag(rpois(2, 1)+1))
>>>>>>>> )
>>>>>>>>
>>>>>>>>
>>>>>>>> However, even though I may not need multiple chains for some
>>>>>>>> of my simpler models, I've now run into conflicting
>>>>>>>> diagnostics. The geweke.diag for each chain (and examination
>>>>>>>> of the traces) gives
>>>>>>>> satisfactory diagnostics. Comparing multiple chains using
>>>>>>>> gelman.diag, however, leads to one bad guy, namely the
>>>>>>>> traitza_children:spouses interaction.
>>>>>>>> I think this implies that I've got some starting value
>>>>>>>> dependence for this parameter, that won't be easily rectified
>>>>>>>> through longer burnin?
>>>>>>>> Do you have any ideas how to rectify this?
>>>>>>>> I am currently doing sequential analyses on episodes of
>>>>>>>> selection and in historical human data only those who marry
>>>>>>>> have a chance at having kids. I exclude the unmarried
>>>>>>>> from my sample where I predict number of children, because I
>>>>>>>> examine that in a previous model and the zero-inflation (65%
>>>>>>>> zeros, median w/o unmarried = 4) when including the unmarried
>>>>>>>> is so excessive.
>>>>>>>> Number of spouses is easily the strongest predictor in the
>>>>>>>> model, but only serves as a covariate here. Since my other
>>>>>>>> estimates are stable across chains and runs and agree well
>>>>>>>> across models and with theory, I'm
>>>>>>>> inclined to shrug this off. But probably I shouldn't ignore
>>>>>>>> this sign of non-convergence?
>>>>>>>>
>>>>>>>>> gelman.diag(mcmc_1)
>>>>>>>> Potential scale reduction factors:
>>>>>>>>
>>>>>>>> Point est. Upper C.I.
>>>>>>>> (Intercept) 1.00 1.00
>>>>>>>> traitza_children 1.27 1.39
>>>>>>>> male 1.00 1.00
>>>>>>>> spouses 1.00 1.00
>>>>>>>> paternalage.mean 1.00 1.00
>>>>>>>> paternalage.factor(25,30] 1.00 1.00
>>>>>>>> paternalage.factor(30,35] 1.00 1.00
>>>>>>>> paternalage.factor(35,40] 1.00 1.00
>>>>>>>> paternalage.factor(40,45] 1.00 1.00
>>>>>>>> paternalage.factor(45,50] 1.00 1.00
>>>>>>>> paternalage.factor(50,55] 1.00 1.00
>>>>>>>> paternalage.factor(55,90] 1.00 1.00
>>>>>>>> traitza_children:male 1.22 1.32
>>>>>>>> traitza_children:spouses 1.83 2.13
>>>>>>>> traitza_children:paternalage.mean 1.02 1.02
>>>>>>>> traitza_children:paternalage.factor(25,30] 1.03 1.05
>>>>>>>> traitza_children:paternalage.factor(30,35] 1.05 1.08
>>>>>>>> traitza_children:paternalage.factor(35,40] 1.10 1.15
>>>>>>>> traitza_children:paternalage.factor(40,45] 1.12 1.17
>>>>>>>> traitza_children:paternalage.factor(45,50] 1.19 1.28
>>>>>>>> traitza_children:paternalage.factor(50,55] 1.12 1.18
>>>>>>>> traitza_children:paternalage.factor(55,90] 1.11 1.17
>>>>>>>>
>>>>>>>> Multivariate psrf
>>>>>>>>
>>>>>>>> 7.27
>>>>>>>>
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>>
>>>>>>>> Ruben
>>>>>>>>
>>>>>>>>
>>>>>>>> On 26 Aug 2014, at 13:04, Jarrod Hadfield <j.hadfield at ed.ac.uk> wrote:
>>>>>>>>
>>>>>>>>> Hi Ruben,
>>>>>>>>>
>>>>>>>>> There are 400 liabilities in a zapoisson model (2 per
>>>>>>>>> datum). This code should work:
>>>>>>>>>
>>>>>>>>> g <-sample(letters[1:10], size = 200, replace = T)
>>>>>>>>> pred <- rnorm(200)
>>>>>>>>>
>>>>>>>>> l1<-rnorm(200, -1, sqrt(1))
>>>>>>>>> l2<-rnorm(200, -1, sqrt(1))
>>>>>>>>>
>>>>>>>>> y<-VGAM::rzapois(200, exp(l1), exp(-exp(l2)))
>>>>>>>>>
>>>>>>>>> # generate zero-altered data with an intercept of -1
>>>>>>>>> (because the intercept and variance are the same for both
>>>>>>>>> processes this is just standard Poisson)
>>>>>>>>>
>>>>>>>>> dat<-data.frame(y=y, g = g, pred = pred)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> start.1<-list(liab=c(l1,l2), R = list(R1=diag(2)),
>>>>>>>>> G=list(G1=diag(2)))
>>>>>>>>> prior.1<-list(R=list(V=diag(2), nu=1.002, fix=2),
>>>>>>>>> G=list(G1=list(V=diag(2), nu=2, alpha.mu=c(0,0),
>>>>>>>>> alpha.V=diag(2)*1000)))
>>>>>>>>>
>>>>>>>>> m1<-MCMCglmm(y~trait + pred:trait, random=~us(trait):g,
>>>>>>>>> family="zapoisson",rcov=~idh(trait):units, data=dat,
>>>>>>>>> prior=prior.1, start= start.1)
>>>>>>>>>
>>>>>>>>> However, there are 2 bugs in the current version of MCMCglmm
>>>>>>>>> that return an error message when the documentation implies
>>>>>>>>> it should be fine:
>>>>>>>>>
>>>>>>>>> a) it should be possible to have R=diag(2) rather than R =
>>>>>>>>> list(R1=diag(2)). This bug cropped up when I implemented
>>>>>>>>> block-diagonal R structures. It can be fixed by inserting:
>>>>>>>>>
>>>>>>>>> if(!is.list(start$R)){
>>>>>>>>> start$R<-list(R1=start$R)
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> on L514 of MCMCglmm.R below
>>>>>>>>>
>>>>>>>>> if(!is.list(prior$R[[1]])){
>>>>>>>>> prior$R<-list(R1=prior$R)
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> b) rcov=~trait:units models for zi/za models will return an
>>>>>>>>> error when passing starting values. To fix this insert
>>>>>>>>>
>>>>>>>>> if(diagR==3){
>>>>>>>>> if(dim(start)[1]!=1){
>>>>>>>>> stop("V is the wrong dimension for some
>>>>>>>>> strart$G/start$R elements")
>>>>>>>>> }
>>>>>>>>> start<-diag(sum(nfl))*start[1]
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> on L90 of priorfromat.R below
>>>>>>>>>
>>>>>>>>> if(is.matrix(start)==FALSE){
>>>>>>>>> start<-as.matrix(start)
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> I will put these in the new version.
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>>
>>>>>>>>> Jarrod
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Quoting Ruben Arslan <rubenarslan at gmail.com> on Mon, 25 Aug
>>>>>>>>> 2014 21:52:30 +0200:
>>>>>>>>>
>>>>>>>>>> Hi Jarrod,
>>>>>>>>>>
>>>>>>>>>> thanks for these pointers.
>>>>>>>>>>
>>>>>>>>>>>> You will need to provide over-dispersed starting values
>>>>>>>>>>>> for multiple-chain convergence diagnostics to be useful
>>>>>>>>>>>> (GLMM are so simple I am generally happy if the output of
>>>>>>>>>>>> a single run looks reasonable).
>>>>>>>>>>
>>>>>>>>>> Oh, I would be happy with single chains, but since
>>>>>>>>>> computation would take weeks this way, I wanted to
>>>>>>>>>> parallelise and I would use the multi-chain convergence as
>>>>>>>>>> a criterion that my parallelisation was proper
>>>>>>>>>> and is as informative as a single long chain. There don't
>>>>>>>>>> seem to be any such checks built-in – I was analysing my 40
>>>>>>>>>> chains for a bit longer than I like to admit until I
>>>>>>>>>> noticed they were identical (effectiveSize
>>>>>>>>>> and summary.mcmc.list did not yell at me for this).
>>>>>>>>>>
>>>>>>>>>>>> # use some very bad starting values
>>>>>>>>>> I get that these values are bad, but that is the goal for
>>>>>>>>>> my multi-chain aim, right?
>>>>>>>>>>
>>>>>>>>>> I can apply this to my zero-truncated model, but am again
>>>>>>>>>> getting stuck with the zero-altered one.
>>>>>>>>>> Maybe I need only specify the Liab values for this?
>>>>>>>>>> At least I'm getting nowhere with specifying R and G
>>>>>>>>>> starting values here. When I got an error, I always
>>>>>>>>>> went to the MCMCglmm source to understand why the checks
>>>>>>>>>> failed, but I didn't always understand
>>>>>>>>>> what was being checked and couldn't get it to work.
>>>>>>>>>>
>>>>>>>>>> Here's a failing example:
>>>>>>>>>>
>>>>>>>>>> l<-rnorm(200, -1, sqrt(1))
>>>>>>>>>> t<-(-log(1-runif(200)*(1-exp(-exp(l)))))
>>>>>>>>>> g = sample(letters[1:10], size = 200, replace = T)
>>>>>>>>>> pred = rnorm(200)
>>>>>>>>>> y<-rpois(200,exp(l)-t)
>>>>>>>>>> y[1:40] = 0
>>>>>>>>>> # generate zero-altered data with an intercept of -1
>>>>>>>>>>
>>>>>>>>>> dat<-data.frame(y=y, g = g, pred = pred)
>>>>>>>>>> set.seed(1)
>>>>>>>>>> start_true = list(Liab=l, R = 1, G = 1 )
>>>>>>>>>> m1<-MCMCglmm(y~1 + pred,random = ~ g,
>>>>>>>>>> family="zapoisson",rcov=~us(trait):units, data=dat, start=
>>>>>>>>>> start_true)
>>>>>>>>>>
>>>>>>>>>> # use true latent variable as starting values
>>>>>>>>>> set.seed(1)
>>>>>>>>>> # use some very bad starting values
>>>>>>>>>> start_rand = list(Liab=rnorm(200), R = rpois(1, 1)+1, G =
>>>>>>>>>> rpois(1, 1)+1 )
>>>>>>>>>> m2<-MCMCglmm(y~1 + pred,random = ~ g,rcov=~us(trait):units,
>>>>>>>>>> family="zapoisson", data=dat, start = start_rand)
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>>
>>>>>>>>>> Ruben
>>>>>>>>>>
>>>>>>>>>> On 25 Aug 2014, at 18:29, Jarrod Hadfield
>>>>>>>>>> <j.hadfield at ed.ac.uk> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Ruben,
>>>>>>>>>>>
>>>>>>>>>>> Sorry - I was wrong when I said that everything is Gibbs
>>>>>>>>>>> sampled conditional on the latent variables. The location
>>>>>>>>>>> effects (fixed and random effects) are also sampled
>>>>>>>>>>> conditional on the (co)variance components so you should
>>>>>>>>>>> add them to the starting values. In the case where the
>>>>>>>>>>> true values are used:
>>>>>>>>>>>
>>>>>>>>>>> m1<-MCMCglmm(y~1, family="ztpoisson", data=dat,
>>>>>>>>>>> start=list(Liab=l,R=1))
>>>>>>>>>>>
>>>>>>>>>>> Cheers,
>>>>>>>>>>>
>>>>>>>>>>> Jarrod
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Quoting Jarrod Hadfield <j.hadfield at ed.ac.uk> on Mon, 25
>>>>>>>>>>> Aug 2014 17:14:14 +0100:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Ruben,
>>>>>>>>>>>>
>>>>>>>>>>>> You will need to provide over-dispersed starting values
>>>>>>>>>>>> for multiple-chain convergence diagnostics to be useful
>>>>>>>>>>>> (GLMM are so simple I am generally happy if the output of
>>>>>>>>>>>> a single run looks reasonable).
>>>>>>>>>>>>
>>>>>>>>>>>> With non-Gaussian data everything is Gibbs sampled
>>>>>>>>>>>> conditional on the latent variables, so you only need to
>>>>>>>>>>>> pass them:
>>>>>>>>>>>>
>>>>>>>>>>>> l<-rnorm(200, -1, sqrt(1))
>>>>>>>>>>>> t<-(-log(1-runif(200)*(1-exp(-exp(l)))))
>>>>>>>>>>>> y<-rpois(200,exp(l)-t)+1
>>>>>>>>>>>> # generate zero-truncated data with an intercept of -1
>>>>>>>>>>>>
>>>>>>>>>>>> dat<-data.frame(y=y)
>>>>>>>>>>>> set.seed(1)
>>>>>>>>>>>> m1<-MCMCglmm(y~1, family="ztpoisson", data=dat,
>>>>>>>>>>>> start=list(Liab=l))
>>>>>>>>>>>> # use true latent variable as starting values
>>>>>>>>>>>> set.seed(1)
>>>>>>>>>>>> m2<-MCMCglmm(y~1, family="ztpoisson", data=dat,
>>>>>>>>>>>> start=list(Liab=rnorm(200)))
>>>>>>>>>>>> # use some very bad starting values
>>>>>>>>>>>>
>>>>>>>>>>>> plot(mcmc.list(m1$Sol, m2$Sol))
>>>>>>>>>>>> # not identical despite the same seed because of
>>>>>>>>>>>> different starting values but clearly sampling the same
>>>>>>>>>>>> posterior distribution:
>>>>>>>>>>>>
>>>>>>>>>>>> gelman.diag(mcmc.list(m1$Sol, m2$Sol))
>>>>>>>>>>>>
>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>
>>>>>>>>>>>> Jarrod
>>>>>>>>>>>>
>>>>>>>>>>>> Quoting Ruben Arslan <rubenarslan at gmail.com> on Mon, 25
>>>>>>>>>>>> Aug 2014 18:00:08 +0200:
>>>>>>>>>>>>
>>>>>>>>>>>>> Dear Jarrod,
>>>>>>>>>>>>>
>>>>>>>>>>>>> thanks for the quick reply. Please, don't waste time
>>>>>>>>>>>>> looking into doMPI – I am happy that I
>>>>>>>>>>>>> get the expected result, when I specify that
>>>>>>>>>>>>> reproducible seed, whyever that may be.
>>>>>>>>>>>>> I'm pretty sure that is the deciding factor, because I
>>>>>>>>>>>>> tested it explicitly, I just have no idea
>>>>>>>>>>>>> how/why it interacts with the choice of family.
>>>>>>>>>>>>>
>>>>>>>>>>>>> That said, is setting up different RNG streams for my
>>>>>>>>>>>>> workers (now that it works) __sufficient__
>>>>>>>>>>>>> so that I get independent chains and can use
>>>>>>>>>>>>> gelman.diag() for convergence diagnostics?
>>>>>>>>>>>>> Or should I still tinker with the starting values myself?
>>>>>>>>>>>>> I've never found a worked example of supplying starting
>>>>>>>>>>>>> values and am thus a bit lost.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Sorry for sending further questions, I hope someone else
>>>>>>>>>>>>> takes pity while
>>>>>>>>>>>>> you're busy with lectures.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Best wishes
>>>>>>>>>>>>>
>>>>>>>>>>>>> Ruben
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 25 Aug 2014, at 17:29, Jarrod Hadfield
>>>>>>>>>>>>> <j.hadfield at ed.ac.uk> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Ruben,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I do not think the issue is with the starting values,
>>>>>>>>>>>>>> because even if the same starting values were used the
>>>>>>>>>>>>>> chains would still differ because of the randomness in
>>>>>>>>>>>>>> the Markov Chain (if I interpret your `identical' test
>>>>>>>>>>>>>> correctly). This just involves a call to GetRNGstate()
>>>>>>>>>>>>>> in the C++ code (L 871 ofMCMCglmm.cc) so I think for
>>>>>>>>>>>>>> some reason doMPI/foreach is not doing what you expect.
>>>>>>>>>>>>>> I am not familiar with doMPI and am in the middle of
>>>>>>>>>>>>>> writing lectures so haven't got time to look into it
>>>>>>>>>>>>>> carefully. Outside of the context of doMPI I get the
>>>>>>>>>>>>>> behaviour I expect:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> l<-rnorm(200, -1, sqrt(1))
>>>>>>>>>>>>>> t<-(-log(1-runif(200)*(1-exp(-exp(l)))))
>>>>>>>>>>>>>> y<-rpois(200,exp(l)-t)+1
>>>>>>>>>>>>>> # generate zero-truncated data with an intercept of -1
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> dat<-data.frame(y=y)
>>>>>>>>>>>>>> set.seed(1)
>>>>>>>>>>>>>> m1<-MCMCglmm(y~1, family="ztpoisson", data=dat)
>>>>>>>>>>>>>> set.seed(2)
>>>>>>>>>>>>>> m2<-MCMCglmm(y~1, family="ztpoisson", data=dat)
>>>>>>>>>>>>>> set.seed(2)
>>>>>>>>>>>>>> m3<-MCMCglmm(y~1, family="ztpoisson", data=dat)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> plot(mcmc.list(m1$Sol, m2$Sol))
>>>>>>>>>>>>>> # different, as expected
>>>>>>>>>>>>>> plot(mcmc.list(m2$Sol, m3$Sol))
>>>>>>>>>>>>>> # the same, as expected
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Quoting Ruben Arslan <rubenarslan at gmail.com> on Mon, 25
>>>>>>>>>>>>>> Aug 2014 16:58:06 +0200:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Dear list,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> sorry for bumping my old post, I hope to elicit a
>>>>>>>>>>>>>>> response with a more focused question:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> When does MCMCglmm automatically start from different
>>>>>>>>>>>>>>> values when using doMPI/foreach?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I have done some tests with models of varying
>>>>>>>>>>>>>>> complexity. For example, the script in my last
>>>>>>>>>>>>>>> post (using "zapoisson") yielded 40 identical chains:
>>>>>>>>>>>>>>>> identical(mcmclist[1], mcmclist[30])
>>>>>>>>>>>>>>> TRUE
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> A simpler (?) model (using "ztpoisson" and no
>>>>>>>>>>>>>>> specified prior), however, yielded different chains
>>>>>>>>>>>>>>> and I could use them to calculate gelman.diag()
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Changing my script to the version below, i.e. seeding
>>>>>>>>>>>>>>> foreach using .options.mpi=list( seed= 1337)
>>>>>>>>>>>>>>> so as to make RNGstreams reproducible (or so I
>>>>>>>>>>>>>>> thought), led to different chains even for the
>>>>>>>>>>>>>>> "zapoisson" model.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> In no case have I (successfully) tried to supplant the
>>>>>>>>>>>>>>> default of MCMCglmm's "start" argument.
>>>>>>>>>>>>>>> Is starting my models from different RNGsubstreams
>>>>>>>>>>>>>>> inadequate compared to manipulating
>>>>>>>>>>>>>>> the start argument explicitly? If so, is there any
>>>>>>>>>>>>>>> worked example of explicit starting value manipulation
>>>>>>>>>>>>>>> in parallel computation?
>>>>>>>>>>>>>>> I've browsed the MCMCglmm source to understand how the
>>>>>>>>>>>>>>> default starting values are generated,
>>>>>>>>>>>>>>> but didn't find any differences with respect to RNG
>>>>>>>>>>>>>>> for the two families "ztpoisson" and "zapoisson"
>>>>>>>>>>>>>>> (granted, I did not dig very deep).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Ruben Arslan
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> # bsub -q mpi -W 12:00 -n 41 -R np20 mpirun -H
>>>>>>>>>>>>>>> localhost -n 41 R --slave -f
>>>>>>>>>>>>>>> "/usr/users/rarslan/rpqa/rpqa_main/rpqa_children_parallel.R"
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> library(doMPI)
>>>>>>>>>>>>>>> cl <-
>>>>>>>>>>>>>>> startMPIcluster(verbose=T,workdir="/usr/users/rarslan/rpqa/rpqa_main/")
>>>>>>>>>>>>>>> registerDoMPI(cl)
>>>>>>>>>>>>>>> Children_mcmc1 =
>>>>>>>>>>>>>>> foreach(i=1:clusterSize(cl),.options.mpi =
>>>>>>>>>>>>>>> list(seed=1337) ) %dopar% {
>>>>>>>>>>>>>>> library(MCMCglmm);library(data.table)
>>>>>>>>>>>>>>> load("/usr/users/rarslan/rpqa/rpqa1.rdata")
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> nitt = 130000; thin = 100; burnin = 30000
>>>>>>>>>>>>>>> prior.m5d.2 = list(
>>>>>>>>>>>>>>> R = list(V = diag(c(1,1)), nu = 0.002),
>>>>>>>>>>>>>>> G=list(list(V=diag(c(1,1e-6)),nu=0.002))
>>>>>>>>>>>>>>> )
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> rpqa.1 = na.omit(rpqa.1[spouses>0, list(idParents,
>>>>>>>>>>>>>>> children, male, urban, spouses, paternalage.mean,
>>>>>>>>>>>>>>> paternalage.factor)])
>>>>>>>>>>>>>>> (m1 = MCMCglmm( children ~ trait * (male + urban +
>>>>>>>>>>>>>>> spouses + paternalage.mean + paternalage.factor),
>>>>>>>>>>>>>>> rcov=~us(trait):units,
>>>>>>>>>>>>>>> random=~us(trait):idParents,
>>>>>>>>>>>>>>> family="zapoisson",
>>>>>>>>>>>>>>> prior = prior.m5d.2,
>>>>>>>>>>>>>>> data=rpqa.1,
>>>>>>>>>>>>>>> pr = F, saveX = F, saveZ = F,
>>>>>>>>>>>>>>> nitt=nitt,thin=thin,burnin=burnin))
>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> library(coda)
>>>>>>>>>>>>>>> mcmclist =
>>>>>>>>>>>>>>> mcmc.list(lapply(Children_mcmc1,FUN=function(x) {
>>>>>>>>>>>>>>> x$Sol}))
>>>>>>>>>>>>>>> save(Children_mcmc1,mcmclist, file =
>>>>>>>>>>>>>>> "/usr/users/rarslan/rpqa/rpqa_main/rpqa_mcmc_kids_za.rdata")
>>>>>>>>>>>>>>> closeCluster(cl)
>>>>>>>>>>>>>>> mpi.quit()
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 04 Aug 2014, at 20:25, Ruben Arslan
>>>>>>>>>>>>>>> <rubenarslan at gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Dear list,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> would someone be willing to share her or his efforts
>>>>>>>>>>>>>>>> in parallelising a MCMCglmm analysis?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I had something viable using harvestr that seemed to
>>>>>>>>>>>>>>>> properly initialise
>>>>>>>>>>>>>>>> the starting values from different random number
>>>>>>>>>>>>>>>> streams (which is desirable,
>>>>>>>>>>>>>>>> as far as I could find out), but I ended up being
>>>>>>>>>>>>>>>> unable to use harvestr, because
>>>>>>>>>>>>>>>> it uses an old version of plyr, where parallelisation
>>>>>>>>>>>>>>>> works only for multicore, not for
>>>>>>>>>>>>>>>> MPI.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I pasted my working version, that does not do
>>>>>>>>>>>>>>>> anything about starting values or RNG
>>>>>>>>>>>>>>>> at the end of this email. I can try to fumble further
>>>>>>>>>>>>>>>> in the dark or try to update harvestr,
>>>>>>>>>>>>>>>> but maybe someone has gone through all this already.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I'd also appreciate any tips for elegantly
>>>>>>>>>>>>>>>> post-processing such parallel data, as some of my usual
>>>>>>>>>>>>>>>> extraction functions and routines are hampered by the
>>>>>>>>>>>>>>>> fact that some coda functions
>>>>>>>>>>>>>>>> do not aggregate results over chains. (What I get
>>>>>>>>>>>>>>>> from a single-chain summary in MCMCglmm
>>>>>>>>>>>>>>>> is a bit more comprehensive, than what I managed to
>>>>>>>>>>>>>>>> cobble together with my own extraction
>>>>>>>>>>>>>>>> functions).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The reason I'm parallelising my analyses is that I'm
>>>>>>>>>>>>>>>> having trouble getting a good effective
>>>>>>>>>>>>>>>> sample size for any parameter having to do with the
>>>>>>>>>>>>>>>> many zeroes in my data.
>>>>>>>>>>>>>>>> Any pointers are very appreciated, I'm quite
>>>>>>>>>>>>>>>> inexperienced with MCMCglmm.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Best wishes
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Ruben
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> # bsub -q mpi-short -W 2:00 -n 42 -R np20 mpirun -H
>>>>>>>>>>>>>>>> localhost -n 41 R --slave -f
>>>>>>>>>>>>>>>> "rpqa/rpqa_main/rpqa_children_parallel.r"
>>>>>>>>>>>>>>>> library(doMPI)
>>>>>>>>>>>>>>>> cl <- startMPIcluster()
>>>>>>>>>>>>>>>> registerDoMPI(cl)
>>>>>>>>>>>>>>>> Children_mcmc1 = foreach(i=1:40) %dopar% {
>>>>>>>>>>>>>>>> library(MCMCglmm)
>>>>>>>>>>>>>>>> load("rpqa1.rdata")
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> nitt = 40000; thin = 100; burnin = 10000
>>>>>>>>>>>>>>>> prior = list(
>>>>>>>>>>>>>>>> R = list(V = diag(c(1,1)), nu = 0.002),
>>>>>>>>>>>>>>>> G=list(list(V=diag(c(1,1e-6)),nu=0.002))
>>>>>>>>>>>>>>>> )
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> MCMCglmm( children ~ trait -1 +
>>>>>>>>>>>>>>>> at.level(trait,1):male + at.level(trait,1):urban +
>>>>>>>>>>>>>>>> at.level(trait,1):spouses +
>>>>>>>>>>>>>>>> at.level(trait,1):paternalage.mean +
>>>>>>>>>>>>>>>> at.level(trait,1):paternalage.factor,
>>>>>>>>>>>>>>>> rcov=~us(trait):units,
>>>>>>>>>>>>>>>> random=~us(trait):idParents,
>>>>>>>>>>>>>>>> family="zapoisson",
>>>>>>>>>>>>>>>> prior = prior,
>>>>>>>>>>>>>>>> data=rpqa.1,
>>>>>>>>>>>>>>>> pr = F, saveX = T, saveZ = T,
>>>>>>>>>>>>>>>> nitt=nitt,thin=thin,burnin=burnin)
>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> library(coda)
>>>>>>>>>>>>>>>> mcmclist =
>>>>>>>>>>>>>>>> mcmc.list(lapply(Children_mcmc1,FUN=function(x) {
>>>>>>>>>>>>>>>> x$Sol}))
>>>>>>>>>>>>>>>> save(Children_mcmc1,mcmclist, file =
>>>>>>>>>>>>>>>> "rpqa_mcmc_kids_za.rdata")
>>>>>>>>>>>>>>>> closeCluster(cl)
>>>>>>>>>>>>>>>> mpi.quit()
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> Ruben C. Arslan
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Georg August University G�ttingen
>>>>>>>>>>>>>>>> Biological Personality Psychology and Psychological Assessment
>>>>>>>>>>>>>>>> Georg Elias M�ller Institute of Psychology
>>>>>>>>>>>>>>>> Go�lerstr. 14
>>>>>>>>>>>>>>>> 37073 G�ttingen
>>>>>>>>>>>>>>>> Germany
>>>>>>>>>>>>>>>> Tel.: +49 551 3920704
>>>>>>>>>>>>>>>> https://psych.uni-goettingen.de/en/biopers/team/arslan
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> [[alternative HTML version deleted]]
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> The University of Edinburgh is a charitable body, registered in
>>>>>>>>>>>>>> Scotland, with registration number SC005336.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> The University of Edinburgh is a charitable body, registered in
>>>>>>>>>>>> Scotland, with registration number SC005336.
>>>>>>>>>>>>
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> R-sig-mixed-models at r-project.org mailing list
>>>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> The University of Edinburgh is a charitable body, registered in
>>>>>>>>>>> Scotland, with registration number SC005336.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> The University of Edinburgh is a charitable body, registered in
>>>>>>>>> Scotland, with registration number SC005336.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> The University of Edinburgh is a charitable body, registered in
>>>>>>> Scotland, with registration number SC005336.
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> The University of Edinburgh is a charitable body, registered in
>>>>> Scotland, with registration number SC005336.
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> The University of Edinburgh is a charitable body, registered in
>>> Scotland, with registration number SC005336.
>>>
>>>
>>
>>
>
>
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
More information about the R-sig-mixed-models
mailing list