[R-sig-ME] parallel MCMCglmm, RNGstreams, starting values & priors
Jarrod Hadfield
j.hadfield at ed.ac.uk
Fri Aug 29 12:02:59 CEST 2014
Hi,
You need to make sure that they are fixed at the same value, not just fixed.
Cheers,
Jarrod
Quoting Ruben Arslan <rubenarslan at gmail.com> on Fri, 29 Aug 2014
11:07:21 +0200:
> Hi Jarrod,
>
> that was exactly it. I hadn't checked the $VCV mcmclist, but I'll do
> so in the future
> as the mistake is blindingly obvious that way: http://imgur.com/nlA0QwZ
>
> After adding fix=2 to R1 in my starting values, my parallel chains
> converged as well.
>
> For future amateurs reading this:
> prior <- list(
> R=list(V=diag(2), nu=1.002, fix=2),
> G=list(G1=list(V=diag(2), nu=1, alpha.mu=c(0,0), alpha.V=diag(2)*1000))
> )
>
> start <- list(
> liab=c(rnorm( nrow(krmh.1)*2 )),
> R = list(R1 = rIW(diag(2), 10 , fix = 2)), ### FIX=2 HERE AS WELL
> G = list(G1 = rIW(diag(2), 10 ))
> )
>
> Thank you so much. This sort of remote diagnosis with this little information
> seems borderline psychic to me.
> Now, onwards to modelling the pedigrees :-)
>
> Best wishes,
>
> Ruben
>
> PS.: I never actually figured out what kind of variability to use
> for rIW() in my starting values,
> but it doesn't seem to matter and that's what I should want I
> suppose. Actually, even with the mis-specified
> starting values my posterior looked pretty similar to now.
>
> On 29 Aug 2014, at 09:13, Jarrod Hadfield <j.hadfield at ed.ac.uk> wrote:
>
>> Hi Ruben,
>>
>> Actually I might know what it is. When you sample different
>> starting values do you inadvertently sample a new residual variance
>> for the unidentified za part? You need to make sure that this is
>> always fixed at the same value (otherwise the model is different).
>> This is not a problem under the trait:units specification.
>>
>> Cheers,
>>
>> Jarrod.
>>
>>
>>
>> Quoting Jarrod Hadfield <j.hadfield at ed.ac.uk> on Fri, 29 Aug 2014
>> 08:04:42 +0100:
>>
>>> Hi Ruben,
>>>
>>> Can you share your data and I will take a look. Its definitely not
>>> Monte Carlo error.
>>>
>>> Cheers,
>>>
>>> Jarrod
>>>
>>>
>>> Quoting Ruben Arslan <rubenarslan at gmail.com> on Thu, 28 Aug 2014
>>> 22:44:47 +0200:
>>>
>>>> Hi Jarrod,
>>>>
>>>> those two matched up quite well yes. I just completed another 20
>>>> chains, using more variable
>>>> starting values. There's still two fixed effects
>>>> traitza_children:spouses and :male which haven't converged
>>>> according to multi-chain (gelman), but have according to geweke.
>>>> The offending traces: http://imgur.com/Qm6Ovfr
>>>> These specific effects aren't of interest to me, so if this
>>>> doesn't affect the rest of my estimates, I can be happy
>>>> with this, but I can't conclude that, can I?
>>>>
>>>> I'm now also doing a run to see how it deals with the more
>>>> intensely zero-inflated data when including
>>>> the unmarried.
>>>>
>>>> Thanks a lot for all that help,
>>>>
>>>> Ruben
>>>>
>>>>> gelman.diag(mcmclist)
>>>> Potential scale reduction factors:
>>>>
>>>> Point est. Upper C.I.
>>>> (Intercept) 1.00 1.00
>>>> traitza_children 1.40 1.65
>>>> male 1.00 1.00
>>>> spouses 1.00 1.00
>>>> paternalage.mean 1.00 1.00
>>>> paternalage.factor(25,30] 1.00 1.00
>>>> paternalage.factor(30,35] 1.00 1.00
>>>> paternalage.factor(35,40] 1.00 1.00
>>>> paternalage.factor(40,45] 1.00 1.00
>>>> paternalage.factor(45,50] 1.00 1.00
>>>> paternalage.factor(50,55] 1.00 1.00
>>>> paternalage.factor(55,90] 1.00 1.00
>>>> traitza_children:male 1.33 1.54
>>>> traitza_children:spouses 2.21 2.83
>>>> traitza_children:paternalage.mean 1.01 1.02
>>>> traitza_children:paternalage.factor(25,30] 1.05 1.08
>>>> traitza_children:paternalage.factor(30,35] 1.08 1.13
>>>> traitza_children:paternalage.factor(35,40] 1.15 1.25
>>>> traitza_children:paternalage.factor(40,45] 1.15 1.26
>>>> traitza_children:paternalage.factor(45,50] 1.26 1.43
>>>> traitza_children:paternalage.factor(50,55] 1.15 1.25
>>>> traitza_children:paternalage.factor(55,90] 1.14 1.23
>>>>
>>>> Multivariate psrf
>>>>
>>>> 8.99
>>>>
>>>>> summary(mcmclist)
>>>>
>>>> Iterations = 100001:149951
>>>> Thinning interval = 50
>>>> Number of chains = 20
>>>> Sample size per chain = 1000
>>>>
>>>> 1. Empirical mean and standard deviation for each variable,
>>>> plus standard error of the mean:
>>>>
>>>> Mean SD Naive
>>>> SE Time-series SE
>>>> (Intercept) 1.36326 0.04848
>>>> 0.0003428 0.0003542
>>>> traitza_children -0.76679 0.28738
>>>> 0.0020321 0.0016682
>>>> male 0.09980 0.01633
>>>> 0.0001155 0.0001222
>>>> spouses 0.12333 0.01957
>>>> 0.0001384 0.0001414
>>>> paternalage.mean 0.07215 0.02194
>>>> 0.0001551 0.0001596
>>>> paternalage.factor(25,30] -0.03381 0.04184
>>>> 0.0002959 0.0003066
>>>> paternalage.factor(30,35] -0.08380 0.04270
>>>> 0.0003019 0.0003118
>>>> paternalage.factor(35,40] -0.16502 0.04569
>>>> 0.0003231 0.0003289
>>>> paternalage.factor(40,45] -0.16738 0.05090
>>>> 0.0003599 0.0003697
>>>> paternalage.factor(45,50] -0.18383 0.05880
>>>> 0.0004158 0.0004242
>>>> paternalage.factor(50,55] -0.18241 0.07277
>>>> 0.0005146 0.0005302
>>>> paternalage.factor(55,90] -0.40612 0.09875
>>>> 0.0006983 0.0007467
>>>> traitza_children:male 0.12092 0.08223
>>>> 0.0005815 0.0004697
>>>> traitza_children:spouses 0.64881 0.21132
>>>> 0.0014942 0.0008511
>>>> traitza_children:paternalage.mean -0.02741 0.08550
>>>> 0.0006046 0.0006221
>>>> traitza_children:paternalage.factor(25,30] -0.17296 0.18680
>>>> 0.0013209 0.0013750
>>>> traitza_children:paternalage.factor(30,35] -0.19027 0.19267
>>>> 0.0013624 0.0013901
>>>> traitza_children:paternalage.factor(35,40] -0.24911 0.21282
>>>> 0.0015049 0.0014391
>>>> traitza_children:paternalage.factor(40,45] -0.29772 0.23403
>>>> 0.0016548 0.0015956
>>>> traitza_children:paternalage.factor(45,50] -0.51782 0.28589
>>>> 0.0020215 0.0017602
>>>> traitza_children:paternalage.factor(50,55] -0.46126 0.32064
>>>> 0.0022673 0.0021397
>>>> traitza_children:paternalage.factor(55,90] -0.38612 0.41461
>>>> 0.0029317 0.0027396
>>>>
>>>> 2. Quantiles for each variable:
>>>>
>>>> 2.5% 25%
>>>> 50% 75% 97.5%
>>>> (Intercept) 1.26883 1.33106
>>>> 1.36322 1.39575 1.4589722
>>>> traitza_children -1.20696 -0.95751
>>>> -0.81076 -0.63308 -0.0365042
>>>> male 0.06785 0.08878
>>>> 0.09970 0.11085 0.1320168
>>>> spouses 0.08467 0.11030
>>>> 0.12343 0.13643 0.1617869
>>>> paternalage.mean 0.02950 0.05751
>>>> 0.07202 0.08683 0.1153881
>>>> paternalage.factor(25,30] -0.11581 -0.06174
>>>> -0.03397 -0.00574 0.0473783
>>>> paternalage.factor(30,35] -0.16656 -0.11250
>>>> -0.08358 -0.05519 0.0003065
>>>> paternalage.factor(35,40] -0.25518 -0.19530
>>>> -0.16500 -0.13440 -0.0757366
>>>> paternalage.factor(40,45] -0.26887 -0.20164
>>>> -0.16675 -0.13335 -0.0677407
>>>> paternalage.factor(45,50] -0.30080 -0.22320
>>>> -0.18339 -0.14440 -0.0687967
>>>> paternalage.factor(50,55] -0.32663 -0.23034
>>>> -0.18227 -0.13317 -0.0415547
>>>> paternalage.factor(55,90] -0.60202 -0.47303
>>>> -0.40454 -0.33994 -0.2139128
>>>> traitza_children:male -0.01083 0.06634
>>>> 0.11024 0.16109 0.3295892
>>>> traitza_children:spouses 0.37857 0.51072
>>>> 0.59398 0.71395 1.2127940
>>>> traitza_children:paternalage.mean -0.19138 -0.08250
>>>> -0.02985 0.02493 0.1468989
>>>> traitza_children:paternalage.factor(25,30] -0.57457 -0.28481
>>>> -0.16489 -0.05151 0.1728148
>>>> traitza_children:paternalage.factor(30,35] -0.61499 -0.30350
>>>> -0.17736 -0.06299 0.1555147
>>>> traitza_children:paternalage.factor(35,40] -0.74251 -0.36752
>>>> -0.22966 -0.10777 0.1151897
>>>> traitza_children:paternalage.factor(40,45] -0.84165 -0.42691
>>>> -0.27729 -0.14322 0.1032436
>>>> traitza_children:paternalage.factor(45,50] -1.21782 -0.66568
>>>> -0.48420 -0.32873 -0.0476720
>>>> traitza_children:paternalage.factor(50,55] -1.21327 -0.63623
>>>> -0.43432 -0.24957 0.0955360
>>>> traitza_children:paternalage.factor(55,90] -1.33772 -0.62227
>>>> -0.35364 -0.11050 0.3361684
>>>>
>>>>> effectiveSize(mcmclist)
>>>> (Intercept)
>>>> traitza_children
>>>> 18814.05
>>>> 16359.33
>>>> male
>>>> spouses
>>>> 18132.98
>>>> 19547.05
>>>> paternalage.mean
>>>> paternalage.factor(25,30]
>>>> 19238.72
>>>> 18974.81
>>>> paternalage.factor(30,35]
>>>> paternalage.factor(35,40]
>>>> 18874.33
>>>> 19406.63
>>>> paternalage.factor(40,45]
>>>> paternalage.factor(45,50]
>>>> 19075.18
>>>> 19401.77
>>>> paternalage.factor(50,55]
>>>> paternalage.factor(55,90]
>>>> 18960.11
>>>> 17893.23
>>>> traitza_children:male
>>>> traitza_children:spouses
>>>> 18545.55
>>>> 14438.51
>>>> traitza_children:paternalage.mean
>>>> traitza_children:paternalage.factor(25,30]
>>>> 18464.09
>>>> 16943.43
>>>> traitza_children:paternalage.factor(30,35]
>>>> traitza_children:paternalage.factor(35,40]
>>>> 16827.44
>>>> 17230.04
>>>> traitza_children:paternalage.factor(40,45]
>>>> traitza_children:paternalage.factor(45,50]
>>>> 17144.78
>>>> 18191.67
>>>> traitza_children:paternalage.factor(50,55]
>>>> traitza_children:paternalage.factor(55,90]
>>>> 17466.60
>>>> 18540.59
>>>>
>>>>
>>>> ### current script:
>>>>
>>>> # bsub -q mpi -W 24:00 -n 21 -R np20 mpirun -H localhost -n 21 R
>>>> --slave -f "/usr/users/rarslan/rpqa/krmh_main/children.R"
>>>> setwd("/usr/users/rarslan/rpqa/")
>>>> library(doMPI)
>>>> cl <-
>>>> startMPIcluster(verbose=T,workdir="/usr/users/rarslan/rpqa/krmh_main/")
>>>> registerDoMPI(cl)
>>>> Children = foreach(i=1:clusterSize(cl),.options.mpi =
>>>> list(seed=1337) ) %dopar% {
>>>> library(MCMCglmm);library(data.table)
>>>> setwd("/usr/users/rarslan/rpqa/krmh_main/")
>>>> source("../1 - extraction functions.r")
>>>> load("../krmh1.rdata")
>>>>
>>>> krmh.1 = recenter.pat(na.omit(krmh.1[spouses>0, list(idParents,
>>>> children, male, spouses, paternalage)]))
>>>>
>>>> samples = 1000
>>>> thin = 50; burnin = 100000
>>>> nitt = samples * thin + burnin
>>>>
>>>> prior <- list(
>>>> R=list(V=diag(2), nu=1.002, fix=2),
>>>> G=list(G1=list(V=diag(2), nu=1, alpha.mu=c(0,0), alpha.V=diag(2)*1000))
>>>> )
>>>>
>>>> start <- list(
>>>> liab=c(rnorm( nrow(krmh.1)*2 )),
>>>> R = list(R1 = rIW(diag(2), 10 )),
>>>> G = list(G1 = rIW(diag(2), 10 ))
>>>> )
>>>>
>>>> ( m1 = MCMCglmm( children ~ trait * (male + spouses +
>>>> paternalage.mean + paternalage.factor),
>>>> rcov=~idh(trait):units,
>>>> random=~idh(trait):idParents,
>>>> family="zapoisson",
>>>> start = start,
>>>> prior = prior,
>>>> data=krmh.1,
>>>> pr = F, saveX = F, saveZ = F,
>>>> nitt=nitt,thin=thin,burnin=burnin)
>>>> )
>>>> m1$Residual$nrt<-2
>>>> m1
>>>> }
>>>>
>>>> save(Children,file = "Children.rdata")
>>>> closeCluster(cl)
>>>> mpi.quit()
>>>>
>>>> On 28 Aug 2014, at 20:59, Jarrod Hadfield <j.hadfield at ed.ac.uk> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> The posteriors for the two models look pretty close to me. Are
>>>>> the scale reduction factors really as high as previously
>>>>> reported? Before you had 1.83 for traitza_children:spouses, but
>>>>> the plot suggests that it should be close to 1?
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Jarrod
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Quoting Ruben Arslan <rubenarslan at gmail.com> on Thu, 28 Aug 2014
>>>>> 19:59:16 +0200:
>>>>>
>>>>>> Sure! Thanks a lot.
>>>>>> I am using ~idh(trait):units already, sorry for saying that
>>>>>> incorrectly in my last email.
>>>>>> These models aren't the final thing, I will replace the
>>>>>> paternalage.factor variable
>>>>>> with its linear equivalent if that seems defensible (does so
>>>>>> far) and in this model it seems
>>>>>> okay to remove the za-effects for all predictors except spouses.
>>>>>> So a final model would have fewer fixed effects. I also have
>>>>>> datasets of 200k+ and 5m+,
>>>>>> but I'm learning MCMCglmm with this smaller one because my
>>>>>> wrong turns take less time.
>>>>>>
>>>>>> I've uploaded a comparison coef plot of two models:
>>>>>> http://i.imgur.com/sHUfnmd.png
>>>>>> m7 is with the default starting values, m1 is with the
>>>>>> specification I sent in my last email. I don't
>>>>>> know if such differences are something to worry about.
>>>>>>
>>>>>> I don't know what qualifies as highly overdispersed, here's a
>>>>>> plot of the outcome for ever
>>>>>> married people (slate=real data):
>>>>>> http://imgur.com/14MywgZ
>>>>>> here's with everybody born (incl. some stillborn etc.):
>>>>>> http://imgur.com/knRGa1v
>>>>>> I guess my approach (generating an overdispersed poisson with
>>>>>> the parameters from
>>>>>> the data and checking if it has as excess zeroes) is not the
>>>>>> best way to diagnose zero-inflation,
>>>>>> but especially in the second case it seems fairly clear-cut.
>>>>>>
>>>>>> Best regards,
>>>>>>
>>>>>> Ruben
>>>>>>
>>>>>>> summary(m1)
>>>>>>
>>>>>> Iterations = 50001:149951
>>>>>> Thinning interval = 50
>>>>>> Sample size = 2000
>>>>>>
>>>>>> DIC: 31249.73
>>>>>>
>>>>>> G-structure: ~idh(trait):idParents
>>>>>>
>>>>>> post.mean l-95% CI u-95% CI eff.samp
>>>>>> children.idParents 0.006611 4.312e-08 0.0159 523.9
>>>>>> za_children.idParents 0.193788 7.306e-02 0.3283 369.3
>>>>>>
>>>>>> R-structure: ~idh(trait):units
>>>>>>
>>>>>> post.mean l-95% CI u-95% CI eff.samp
>>>>>> children.units 0.1285 0.1118 0.1452 716.1
>>>>>> za_children.units 0.9950 0.9950 0.9950 0.0
>>>>>>
>>>>>> Location effects: children ~ trait * (male + spouses +
>>>>>> paternalage.mean + paternalage.factor)
>>>>>>
>>>>>> post.mean l-95% CI
>>>>>> u-95% CI eff.samp pMCMC
>>>>>> (Intercept) 1.3413364
>>>>>> 1.2402100 1.4326099 1789 <5e-04 ***
>>>>>> traitza_children -0.8362879
>>>>>> -1.2007980 -0.5016730 1669 <5e-04 ***
>>>>>> male 0.0994902
>>>>>> 0.0679050 0.1297394 2000 <5e-04 ***
>>>>>> spouses 0.1236033
>>>>>> 0.0839000 0.1624939 2000 <5e-04 ***
>>>>>> paternalage.mean 0.0533892
>>>>>> 0.0119569 0.0933960 2000 0.015 *
>>>>>> paternalage.factor(25,30] -0.0275822
>>>>>> -0.1116421 0.0537359 1842 0.515
>>>>>> paternalage.factor(30,35] -0.0691025
>>>>>> -0.1463214 0.0122393 1871 0.097 .
>>>>>> paternalage.factor(35,40] -0.1419933
>>>>>> -0.2277379 -0.0574678 1845 <5e-04 ***
>>>>>> paternalage.factor(40,45] -0.1364952
>>>>>> -0.2362714 -0.0451874 1835 0.007 **
>>>>>> paternalage.factor(45,50] -0.1445342
>>>>>> -0.2591767 -0.0421178 1693 0.008 **
>>>>>> paternalage.factor(50,55] -0.1302972
>>>>>> -0.2642965 0.0077061 2000 0.064 .
>>>>>> paternalage.factor(55,90] -0.3407879
>>>>>> -0.5168972 -0.1493652 1810 <5e-04 ***
>>>>>> traitza_children:male 0.0926888
>>>>>> -0.0147379 0.2006142 1901 0.098 .
>>>>>> traitza_children:spouses 0.5531197
>>>>>> 0.3870616 0.7314289 1495 <5e-04 ***
>>>>>> traitza_children:paternalage.mean 0.0051463
>>>>>> -0.1279396 0.1460099 1617 0.960
>>>>>> traitza_children:paternalage.factor(25,30] -0.1538957
>>>>>> -0.4445749 0.1462955 1781 0.321
>>>>>> traitza_children:paternalage.factor(30,35] -0.1747883
>>>>>> -0.4757851 0.1162476 1998 0.261
>>>>>> traitza_children:paternalage.factor(35,40] -0.2261843
>>>>>> -0.5464379 0.0892582 1755 0.166
>>>>>> traitza_children:paternalage.factor(40,45] -0.2807543
>>>>>> -0.6079678 0.0650281 1721 0.100 .
>>>>>> traitza_children:paternalage.factor(45,50] -0.4905843
>>>>>> -0.8649214 -0.1244174 1735 0.010 **
>>>>>> traitza_children:paternalage.factor(50,55] -0.4648579
>>>>>> -0.9215759 -0.0002083 1687 0.054 .
>>>>>> traitza_children:paternalage.factor(55,90] -0.3945406
>>>>>> -1.0230155 0.2481568 1793 0.195
>>>>>> ---
>>>>>> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>>>>>>
>>>>>>> describe(krmh.1[spouses>0,])
>>>>>> vars n mean sd median trimmed mad min
>>>>>> max range skew kurtosis se
>>>>>> children 2 6829 3.81 2.93 4.00 3.61 2.97
>>>>>> 0.00 16.00 16.00 0.47 -0.46 0.04
>>>>>> male 3 6829 0.46 0.50 0.00 0.45 0.00
>>>>>> 0.00 1.00 1.00 0.14 -1.98 0.01
>>>>>> spouses 4 6829 1.14 0.38 1.00 1.03 0.00
>>>>>> 1.00 4.00 3.00 2.87 8.23 0.00
>>>>>> paternalage 5 6829 3.65 0.80 3.57 3.60 0.80
>>>>>> 1.83 7.95 6.12 0.69 0.70 0.01
>>>>>> paternalage_c 6 6829 0.00 0.80 -0.08 -0.05 0.80
>>>>>> -1.82 4.30 6.12 0.69 0.70 0.01
>>>>>> paternalage.mean 7 6829 0.00 0.68 -0.08 -0.05 0.59
>>>>>> -1.74 4.30 6.04 0.95 1.97 0.01
>>>>>> paternalage.diff 8 6829 0.00 0.42 0.00 -0.01 0.38
>>>>>> -1.51 1.48 2.99 0.17 0.17 0.01
>>>>>>
>>>>>>> table(krmh.1$paternalage.factor)
>>>>>>
>>>>>> [0,25] (25,30] (30,35] (35,40] (40,45] (45,50] (50,55] (55,90]
>>>>>> 309 1214 1683 1562 1039 623 269 130
>>>>>>
>>>>>> On 28 Aug 2014, at 19:05, Jarrod Hadfield <j.hadfield at ed.ac.uk> wrote:
>>>>>>
>>>>>>> Hi Ruben,
>>>>>>>
>>>>>>> It might be hard to detect (near) ECPs with so many fixed
>>>>>>> effects (can you post the model summary (and give us the mean
>>>>>>> and standard deviation of any continuous covariates)). Also,
>>>>>>> the complementary log-log link (which is the za specification)
>>>>>>> is non-symmetric and runs into problems outside the range -35
>>>>>>> to 3.5 so there may be a problem there, particularly if you
>>>>>>> use rcov=~trait:units and the Poisson part is highly
>>>>>>> over-dispersed. You could try rcov=~idh(trait):units and fix
>>>>>>> the non-identifiable za residual variance to something smaller
>>>>>>> than 1 (say 0.5) - it will mix slower but it will reduce the
>>>>>>> chance of over/underflow.
>>>>>>>
>>>>>>> Cheers,
>>>>>>>
>>>>>>> Jarrod
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Quoting Ruben Arslan <rubenarslan at gmail.com> on Thu, 28 Aug
>>>>>>> 2014 18:45:30 +0200:
>>>>>>>
>>>>>>>> Hi Jarrod,
>>>>>>>>
>>>>>>>>> 1) it did not return an error with rcov = ~trait:units
>>>>>>>>> because you used R1=rpois(2,1)+1 and yet this specification
>>>>>>>>> only fits a single variance (not a 2x2 covariance matrix).
>>>>>>>>> R1=rpois(2,1)+1 is a bit of a weird specification since it
>>>>>>>>> has to be integer. I would obtain starting values using rIW().
>>>>>>>>
>>>>>>>> I agree it's a weird specification, I was a bit lost and
>>>>>>>> thought I could get away with just putting some random
>>>>>>>> numbers in the starting value.
>>>>>>>> I didn't do R1=rpois(2,1)+1 though, I did
>>>>>>>> R1=diag(rpois(2,1)+1), so I got a 2x2 matrix, but yes, bound
>>>>>>>> to be integer.
>>>>>>>> I didn't know starting values should come from a conjugate
>>>>>>>> distribution, though that probably means I didn't think about
>>>>>>>> it much.
>>>>>>>>
>>>>>>>> I'm now doing
>>>>>>>> start <- list(
>>>>>>>> liab=c(rnorm( nrow(krmh.1)*2 )),
>>>>>>>> R = list(R1 = rIW( diag(2), nrow(krmh.1)) ),
>>>>>>>> G = list(G1 = rIW( diag(2), nrow(krmh.1)) )
>>>>>>>> )
>>>>>>>>
>>>>>>>> Is this what you had in mind?
>>>>>>>> I am especially unsure if I am supposed to use such a low
>>>>>>>> sampling variability (my sample size is probably not even
>>>>>>>> relevant for the starting values) and if I should start from
>>>>>>>> diag(2).
>>>>>>>>
>>>>>>>> And, I am still happily confused that this specification
>>>>>>>> still doesn't lead to errors with respect to rcov =
>>>>>>>> ~trait:units . Does this mean I'm doing it wrong?
>>>>>>>>
>>>>>>>>> 3) a) how many effective samples do you have for each
>>>>>>>>> parameter? and b) are you getting extreme category
>>>>>>>>> problems/numerical issues? If you store the latent variables
>>>>>>>>> (pl=TUE) what is their range for the Zi/za part?
>>>>>>>>
>>>>>>>> My parallel run using the above starting values isn't finished yet.
>>>>>>>> a) After applying the above starting values I get, for the
>>>>>>>> location effects 1600-2000 samples for a 2000 sample chain
>>>>>>>> (with thin set to 50). G and R-structure are from 369
>>>>>>>> (za_children.idParents) to 716 (and 0 for the fixed part).
>>>>>>>> Effective sample sizes were similar for my run using the
>>>>>>>> starting values for G/R that I drew from rpois, and using 40
>>>>>>>> chains I of course get
>>>>>>>> b) I don't think I am getting extreme categories. I would
>>>>>>>> probably be getting extreme categories if I included the
>>>>>>>> forever-alones (they almost never have children), but this
>>>>>>>> way no.
>>>>>>>> I wasn't sure how to examine the range of the latents
>>>>>>>> separately for the za part, but for a single chain it looks
>>>>>>>> okay:
>>>>>>>>> quantile(as.numeric(m1$Liab),probs = c(0,0.01,0,0.99,1))
>>>>>>>> 0% 1% 0% 99% 100%
>>>>>>>> -4.934111 -1.290728 -4.934111 3.389847 7.484206
>>>>>>>>
>>>>>>>> Well, all considered now that I use the above starting value
>>>>>>>> specification I get slightly different estimates for all
>>>>>>>> za-coefficients. Nothing major, but still leading me to
>>>>>>>> think my estimates aren't exactly independent of the starting
>>>>>>>> values I use. I'll see what the parallel run yields.
>>>>>>>>
>>>>>>>> Thanks a lot,
>>>>>>>>
>>>>>>>> Ruben
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>>
>>>>>>>>> Jarrod
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Quoting Ruben Arslan <rubenarslan at gmail.com> on Wed, 27 Aug
>>>>>>>>> 2014 19:23:42 +0200:
>>>>>>>>>
>>>>>>>>>> Hi Jarrod,
>>>>>>>>>>
>>>>>>>>>> thanks again. I was able to get it running with your advice.
>>>>>>>>>> Some points of confusion remain:
>>>>>>>>>>
>>>>>>>>>> - You wrote that zi/za models would return an error with
>>>>>>>>>> rcov = ~trait:units + starting values. This did not happen
>>>>>>>>>> in my case, so I didn't build MCMCglmm myself with your
>>>>>>>>>> suggested edits. Also, have you considered putting your own
>>>>>>>>>> MCMCglmm repo on Github? Your users would be able to
>>>>>>>>>> install pre-releases and I'd think you'd get some
>>>>>>>>>> time-saving pull requests too.
>>>>>>>>>> - In my attempts to get my models to run properly, I messed
>>>>>>>>>> up a prior and did not use fix=2 in my prior specification
>>>>>>>>>> for my za models. This led to crappy convergence, it's much
>>>>>>>>>> better now and for some of my simpler models I think I
>>>>>>>>>> won't need parallel chains. I'm reminded of Gelman's folk
>>>>>>>>>> theorem of statistical computing.
>>>>>>>>>> - I followed your advice, but of course I could not set the
>>>>>>>>>> true values as starting values, but wanted to set random,
>>>>>>>>>> bad starting values. I pasted below what I arrived at, I'm
>>>>>>>>>> especially unsure whether I specified the starting values
>>>>>>>>>> for G and R properly (I think not).
>>>>>>>>>> start <- list(
>>>>>>>>>> liab=c(rnorm( nrow(krmh.1)*2 )),
>>>>>>>>>> R = list(R1 = diag(rpois(2, 1)+1)),
>>>>>>>>>> G = list(G1 = diag(rpois(2, 1)+1))
>>>>>>>>>> )
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> However, even though I may not need multiple chains for
>>>>>>>>>> some of my simpler models, I've now run into conflicting
>>>>>>>>>> diagnostics. The geweke.diag for each chain (and
>>>>>>>>>> examination of the traces) gives
>>>>>>>>>> satisfactory diagnostics. Comparing multiple chains using
>>>>>>>>>> gelman.diag, however, leads to one bad guy, namely the
>>>>>>>>>> traitza_children:spouses interaction.
>>>>>>>>>> I think this implies that I've got some starting value
>>>>>>>>>> dependence for this parameter, that won't be easily
>>>>>>>>>> rectified through longer burnin?
>>>>>>>>>> Do you have any ideas how to rectify this?
>>>>>>>>>> I am currently doing sequential analyses on episodes of
>>>>>>>>>> selection and in historical human data only those who marry
>>>>>>>>>> have a chance at having kids. I exclude the unmarried
>>>>>>>>>> from my sample where I predict number of children, because
>>>>>>>>>> I examine that in a previous model and the zero-inflation
>>>>>>>>>> (65% zeros, median w/o unmarried = 4) when including the
>>>>>>>>>> unmarried is so excessive.
>>>>>>>>>> Number of spouses is easily the strongest predictor in the
>>>>>>>>>> model, but only serves as a covariate here. Since my other
>>>>>>>>>> estimates are stable across chains and runs and agree well
>>>>>>>>>> across models and with theory, I'm
>>>>>>>>>> inclined to shrug this off. But probably I shouldn't ignore
>>>>>>>>>> this sign of non-convergence?
>>>>>>>>>>
>>>>>>>>>>> gelman.diag(mcmc_1)
>>>>>>>>>> Potential scale reduction factors:
>>>>>>>>>>
>>>>>>>>>> Point est. Upper C.I.
>>>>>>>>>> (Intercept) 1.00 1.00
>>>>>>>>>> traitza_children 1.27 1.39
>>>>>>>>>> male 1.00 1.00
>>>>>>>>>> spouses 1.00 1.00
>>>>>>>>>> paternalage.mean 1.00 1.00
>>>>>>>>>> paternalage.factor(25,30] 1.00 1.00
>>>>>>>>>> paternalage.factor(30,35] 1.00 1.00
>>>>>>>>>> paternalage.factor(35,40] 1.00 1.00
>>>>>>>>>> paternalage.factor(40,45] 1.00 1.00
>>>>>>>>>> paternalage.factor(45,50] 1.00 1.00
>>>>>>>>>> paternalage.factor(50,55] 1.00 1.00
>>>>>>>>>> paternalage.factor(55,90] 1.00 1.00
>>>>>>>>>> traitza_children:male 1.22 1.32
>>>>>>>>>> traitza_children:spouses 1.83 2.13
>>>>>>>>>> traitza_children:paternalage.mean 1.02 1.02
>>>>>>>>>> traitza_children:paternalage.factor(25,30] 1.03 1.05
>>>>>>>>>> traitza_children:paternalage.factor(30,35] 1.05 1.08
>>>>>>>>>> traitza_children:paternalage.factor(35,40] 1.10 1.15
>>>>>>>>>> traitza_children:paternalage.factor(40,45] 1.12 1.17
>>>>>>>>>> traitza_children:paternalage.factor(45,50] 1.19 1.28
>>>>>>>>>> traitza_children:paternalage.factor(50,55] 1.12 1.18
>>>>>>>>>> traitza_children:paternalage.factor(55,90] 1.11 1.17
>>>>>>>>>>
>>>>>>>>>> Multivariate psrf
>>>>>>>>>>
>>>>>>>>>> 7.27
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Best regards,
>>>>>>>>>>
>>>>>>>>>> Ruben
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 26 Aug 2014, at 13:04, Jarrod Hadfield
>>>>>>>>>> <j.hadfield at ed.ac.uk> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Ruben,
>>>>>>>>>>>
>>>>>>>>>>> There are 400 liabilities in a zapoisson model (2 per
>>>>>>>>>>> datum). This code should work:
>>>>>>>>>>>
>>>>>>>>>>> g <-sample(letters[1:10], size = 200, replace = T)
>>>>>>>>>>> pred <- rnorm(200)
>>>>>>>>>>>
>>>>>>>>>>> l1<-rnorm(200, -1, sqrt(1))
>>>>>>>>>>> l2<-rnorm(200, -1, sqrt(1))
>>>>>>>>>>>
>>>>>>>>>>> y<-VGAM::rzapois(200, exp(l1), exp(-exp(l2)))
>>>>>>>>>>>
>>>>>>>>>>> # generate zero-altered data with an intercept of -1
>>>>>>>>>>> (because the intercept and variance are the same for both
>>>>>>>>>>> processes this is just standard Poisson)
>>>>>>>>>>>
>>>>>>>>>>> dat<-data.frame(y=y, g = g, pred = pred)
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> start.1<-list(liab=c(l1,l2), R = list(R1=diag(2)),
>>>>>>>>>>> G=list(G1=diag(2)))
>>>>>>>>>>> prior.1<-list(R=list(V=diag(2), nu=1.002, fix=2),
>>>>>>>>>>> G=list(G1=list(V=diag(2), nu=2, alpha.mu=c(0,0),
>>>>>>>>>>> alpha.V=diag(2)*1000)))
>>>>>>>>>>>
>>>>>>>>>>> m1<-MCMCglmm(y~trait + pred:trait, random=~us(trait):g,
>>>>>>>>>>> family="zapoisson",rcov=~idh(trait):units, data=dat,
>>>>>>>>>>> prior=prior.1, start= start.1)
>>>>>>>>>>>
>>>>>>>>>>> However, there are 2 bugs in the current version of
>>>>>>>>>>> MCMCglmm that return an error message when the
>>>>>>>>>>> documentation implies it should be fine:
>>>>>>>>>>>
>>>>>>>>>>> a) it should be possible to have R=diag(2) rather than R =
>>>>>>>>>>> list(R1=diag(2)). This bug cropped up when I implemented
>>>>>>>>>>> block-diagonal R structures. It can be fixed by inserting:
>>>>>>>>>>>
>>>>>>>>>>> if(!is.list(start$R)){
>>>>>>>>>>> start$R<-list(R1=start$R)
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> on L514 of MCMCglmm.R below
>>>>>>>>>>>
>>>>>>>>>>> if(!is.list(prior$R[[1]])){
>>>>>>>>>>> prior$R<-list(R1=prior$R)
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> b) rcov=~trait:units models for zi/za models will return
>>>>>>>>>>> an error when passing starting values. To fix this insert
>>>>>>>>>>>
>>>>>>>>>>> if(diagR==3){
>>>>>>>>>>> if(dim(start)[1]!=1){
>>>>>>>>>>> stop("V is the wrong dimension for some
>>>>>>>>>>> strart$G/start$R elements")
>>>>>>>>>>> }
>>>>>>>>>>> start<-diag(sum(nfl))*start[1]
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> on L90 of priorfromat.R below
>>>>>>>>>>>
>>>>>>>>>>> if(is.matrix(start)==FALSE){
>>>>>>>>>>> start<-as.matrix(start)
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> I will put these in the new version.
>>>>>>>>>>>
>>>>>>>>>>> Cheers,
>>>>>>>>>>>
>>>>>>>>>>> Jarrod
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Quoting Ruben Arslan <rubenarslan at gmail.com> on Mon, 25
>>>>>>>>>>> Aug 2014 21:52:30 +0200:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Jarrod,
>>>>>>>>>>>>
>>>>>>>>>>>> thanks for these pointers.
>>>>>>>>>>>>
>>>>>>>>>>>>>> You will need to provide over-dispersed starting values
>>>>>>>>>>>>>> for multiple-chain convergence diagnostics to be useful
>>>>>>>>>>>>>> (GLMM are so simple I am generally happy if the output
>>>>>>>>>>>>>> of a single run looks reasonable).
>>>>>>>>>>>>
>>>>>>>>>>>> Oh, I would be happy with single chains, but since
>>>>>>>>>>>> computation would take weeks this way, I wanted to
>>>>>>>>>>>> parallelise and I would use the multi-chain convergence
>>>>>>>>>>>> as a criterion that my parallelisation was proper
>>>>>>>>>>>> and is as informative as a single long chain. There don't
>>>>>>>>>>>> seem to be any such checks built-in – I was analysing my
>>>>>>>>>>>> 40 chains for a bit longer than I like to admit until I
>>>>>>>>>>>> noticed they were identical (effectiveSize
>>>>>>>>>>>> and summary.mcmc.list did not yell at me for this).
>>>>>>>>>>>>
>>>>>>>>>>>>>> # use some very bad starting values
>>>>>>>>>>>> I get that these values are bad, but that is the goal for
>>>>>>>>>>>> my multi-chain aim, right?
>>>>>>>>>>>>
>>>>>>>>>>>> I can apply this to my zero-truncated model, but am again
>>>>>>>>>>>> getting stuck with the zero-altered one.
>>>>>>>>>>>> Maybe I need only specify the Liab values for this?
>>>>>>>>>>>> At least I'm getting nowhere with specifying R and G
>>>>>>>>>>>> starting values here. When I got an error, I always
>>>>>>>>>>>> went to the MCMCglmm source to understand why the checks
>>>>>>>>>>>> failed, but I didn't always understand
>>>>>>>>>>>> what was being checked and couldn't get it to work.
>>>>>>>>>>>>
>>>>>>>>>>>> Here's a failing example:
>>>>>>>>>>>>
>>>>>>>>>>>> l<-rnorm(200, -1, sqrt(1))
>>>>>>>>>>>> t<-(-log(1-runif(200)*(1-exp(-exp(l)))))
>>>>>>>>>>>> g = sample(letters[1:10], size = 200, replace = T)
>>>>>>>>>>>> pred = rnorm(200)
>>>>>>>>>>>> y<-rpois(200,exp(l)-t)
>>>>>>>>>>>> y[1:40] = 0
>>>>>>>>>>>> # generate zero-altered data with an intercept of -1
>>>>>>>>>>>>
>>>>>>>>>>>> dat<-data.frame(y=y, g = g, pred = pred)
>>>>>>>>>>>> set.seed(1)
>>>>>>>>>>>> start_true = list(Liab=l, R = 1, G = 1 )
>>>>>>>>>>>> m1<-MCMCglmm(y~1 + pred,random = ~ g,
>>>>>>>>>>>> family="zapoisson",rcov=~us(trait):units, data=dat,
>>>>>>>>>>>> start= start_true)
>>>>>>>>>>>>
>>>>>>>>>>>> # use true latent variable as starting values
>>>>>>>>>>>> set.seed(1)
>>>>>>>>>>>> # use some very bad starting values
>>>>>>>>>>>> start_rand = list(Liab=rnorm(200), R = rpois(1, 1)+1, G =
>>>>>>>>>>>> rpois(1, 1)+1 )
>>>>>>>>>>>> m2<-MCMCglmm(y~1 + pred,random = ~
>>>>>>>>>>>> g,rcov=~us(trait):units, family="zapoisson", data=dat,
>>>>>>>>>>>> start = start_rand)
>>>>>>>>>>>>
>>>>>>>>>>>> Best,
>>>>>>>>>>>>
>>>>>>>>>>>> Ruben
>>>>>>>>>>>>
>>>>>>>>>>>> On 25 Aug 2014, at 18:29, Jarrod Hadfield
>>>>>>>>>>>> <j.hadfield at ed.ac.uk> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Ruben,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Sorry - I was wrong when I said that everything is
>>>>>>>>>>>>> Gibbs sampled conditional on the latent variables. The
>>>>>>>>>>>>> location effects (fixed and random effects) are also
>>>>>>>>>>>>> sampled conditional on the (co)variance components so
>>>>>>>>>>>>> you should add them to the starting values. In the case
>>>>>>>>>>>>> where the true values are used:
>>>>>>>>>>>>>
>>>>>>>>>>>>> m1<-MCMCglmm(y~1, family="ztpoisson", data=dat,
>>>>>>>>>>>>> start=list(Liab=l,R=1))
>>>>>>>>>>>>>
>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Jarrod
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Quoting Jarrod Hadfield <j.hadfield at ed.ac.uk> on Mon, 25
>>>>>>>>>>>>> Aug 2014 17:14:14 +0100:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Ruben,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> You will need to provide over-dispersed starting values
>>>>>>>>>>>>>> for multiple-chain convergence diagnostics to be useful
>>>>>>>>>>>>>> (GLMM are so simple I am generally happy if the output
>>>>>>>>>>>>>> of a single run looks reasonable).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> With non-Gaussian data everything is Gibbs sampled
>>>>>>>>>>>>>> conditional on the latent variables, so you only need
>>>>>>>>>>>>>> to pass them:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> l<-rnorm(200, -1, sqrt(1))
>>>>>>>>>>>>>> t<-(-log(1-runif(200)*(1-exp(-exp(l)))))
>>>>>>>>>>>>>> y<-rpois(200,exp(l)-t)+1
>>>>>>>>>>>>>> # generate zero-truncated data with an intercept of -1
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> dat<-data.frame(y=y)
>>>>>>>>>>>>>> set.seed(1)
>>>>>>>>>>>>>> m1<-MCMCglmm(y~1, family="ztpoisson", data=dat,
>>>>>>>>>>>>>> start=list(Liab=l))
>>>>>>>>>>>>>> # use true latent variable as starting values
>>>>>>>>>>>>>> set.seed(1)
>>>>>>>>>>>>>> m2<-MCMCglmm(y~1, family="ztpoisson", data=dat,
>>>>>>>>>>>>>> start=list(Liab=rnorm(200)))
>>>>>>>>>>>>>> # use some very bad starting values
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> plot(mcmc.list(m1$Sol, m2$Sol))
>>>>>>>>>>>>>> # not identical despite the same seed because of
>>>>>>>>>>>>>> different starting values but clearly sampling the same
>>>>>>>>>>>>>> posterior distribution:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> gelman.diag(mcmc.list(m1$Sol, m2$Sol))
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Jarrod
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Quoting Ruben Arslan <rubenarslan at gmail.com> on Mon, 25
>>>>>>>>>>>>>> Aug 2014 18:00:08 +0200:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Dear Jarrod,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> thanks for the quick reply. Please, don't waste time
>>>>>>>>>>>>>>> looking into doMPI – I am happy that I
>>>>>>>>>>>>>>> get the expected result, when I specify that
>>>>>>>>>>>>>>> reproducible seed, whyever that may be.
>>>>>>>>>>>>>>> I'm pretty sure that is the deciding factor, because I
>>>>>>>>>>>>>>> tested it explicitly, I just have no idea
>>>>>>>>>>>>>>> how/why it interacts with the choice of family.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> That said, is setting up different RNG streams for my
>>>>>>>>>>>>>>> workers (now that it works) __sufficient__
>>>>>>>>>>>>>>> so that I get independent chains and can use
>>>>>>>>>>>>>>> gelman.diag() for convergence diagnostics?
>>>>>>>>>>>>>>> Or should I still tinker with the starting values myself?
>>>>>>>>>>>>>>> I've never found a worked example of supplying
>>>>>>>>>>>>>>> starting values and am thus a bit lost.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Sorry for sending further questions, I hope someone
>>>>>>>>>>>>>>> else takes pity while
>>>>>>>>>>>>>>> you're busy with lectures.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Best wishes
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Ruben
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 25 Aug 2014, at 17:29, Jarrod Hadfield
>>>>>>>>>>>>>>> <j.hadfield at ed.ac.uk> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi Ruben,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I do not think the issue is with the starting values,
>>>>>>>>>>>>>>>> because even if the same starting values were used
>>>>>>>>>>>>>>>> the chains would still differ because of the
>>>>>>>>>>>>>>>> randomness in the Markov Chain (if I interpret your
>>>>>>>>>>>>>>>> `identical' test correctly). This just involves a
>>>>>>>>>>>>>>>> call to GetRNGstate() in the C++ code (L 871
>>>>>>>>>>>>>>>> ofMCMCglmm.cc) so I think for some reason
>>>>>>>>>>>>>>>> doMPI/foreach is not doing what you expect. I am not
>>>>>>>>>>>>>>>> familiar with doMPI and am in the middle of writing
>>>>>>>>>>>>>>>> lectures so haven't got time to look into it
>>>>>>>>>>>>>>>> carefully. Outside of the context of doMPI I get the
>>>>>>>>>>>>>>>> behaviour I expect:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> l<-rnorm(200, -1, sqrt(1))
>>>>>>>>>>>>>>>> t<-(-log(1-runif(200)*(1-exp(-exp(l)))))
>>>>>>>>>>>>>>>> y<-rpois(200,exp(l)-t)+1
>>>>>>>>>>>>>>>> # generate zero-truncated data with an intercept of -1
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> dat<-data.frame(y=y)
>>>>>>>>>>>>>>>> set.seed(1)
>>>>>>>>>>>>>>>> m1<-MCMCglmm(y~1, family="ztpoisson", data=dat)
>>>>>>>>>>>>>>>> set.seed(2)
>>>>>>>>>>>>>>>> m2<-MCMCglmm(y~1, family="ztpoisson", data=dat)
>>>>>>>>>>>>>>>> set.seed(2)
>>>>>>>>>>>>>>>> m3<-MCMCglmm(y~1, family="ztpoisson", data=dat)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> plot(mcmc.list(m1$Sol, m2$Sol))
>>>>>>>>>>>>>>>> # different, as expected
>>>>>>>>>>>>>>>> plot(mcmc.list(m2$Sol, m3$Sol))
>>>>>>>>>>>>>>>> # the same, as expected
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Quoting Ruben Arslan <rubenarslan at gmail.com> on Mon,
>>>>>>>>>>>>>>>> 25 Aug 2014 16:58:06 +0200:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Dear list,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> sorry for bumping my old post, I hope to elicit a
>>>>>>>>>>>>>>>>> response with a more focused question:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> When does MCMCglmm automatically start from
>>>>>>>>>>>>>>>>> different values when using doMPI/foreach?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I have done some tests with models of varying
>>>>>>>>>>>>>>>>> complexity. For example, the script in my last
>>>>>>>>>>>>>>>>> post (using "zapoisson") yielded 40 identical chains:
>>>>>>>>>>>>>>>>>> identical(mcmclist[1], mcmclist[30])
>>>>>>>>>>>>>>>>> TRUE
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> A simpler (?) model (using "ztpoisson" and no
>>>>>>>>>>>>>>>>> specified prior), however, yielded different chains
>>>>>>>>>>>>>>>>> and I could use them to calculate gelman.diag()
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Changing my script to the version below, i.e.
>>>>>>>>>>>>>>>>> seeding foreach using .options.mpi=list( seed= 1337)
>>>>>>>>>>>>>>>>> so as to make RNGstreams reproducible (or so I
>>>>>>>>>>>>>>>>> thought), led to different chains even for the
>>>>>>>>>>>>>>>>> "zapoisson" model.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> In no case have I (successfully) tried to supplant
>>>>>>>>>>>>>>>>> the default of MCMCglmm's "start" argument.
>>>>>>>>>>>>>>>>> Is starting my models from different RNGsubstreams
>>>>>>>>>>>>>>>>> inadequate compared to manipulating
>>>>>>>>>>>>>>>>> the start argument explicitly? If so, is there any
>>>>>>>>>>>>>>>>> worked example of explicit starting value manipulation
>>>>>>>>>>>>>>>>> in parallel computation?
>>>>>>>>>>>>>>>>> I've browsed the MCMCglmm source to understand how
>>>>>>>>>>>>>>>>> the default starting values are generated,
>>>>>>>>>>>>>>>>> but didn't find any differences with respect to RNG
>>>>>>>>>>>>>>>>> for the two families "ztpoisson" and "zapoisson"
>>>>>>>>>>>>>>>>> (granted, I did not dig very deep).
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Ruben Arslan
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> # bsub -q mpi -W 12:00 -n 41 -R np20 mpirun -H
>>>>>>>>>>>>>>>>> localhost -n 41 R --slave -f
>>>>>>>>>>>>>>>>> "/usr/users/rarslan/rpqa/rpqa_main/rpqa_children_parallel.R"
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> library(doMPI)
>>>>>>>>>>>>>>>>> cl <-
>>>>>>>>>>>>>>>>> startMPIcluster(verbose=T,workdir="/usr/users/rarslan/rpqa/rpqa_main/")
>>>>>>>>>>>>>>>>> registerDoMPI(cl)
>>>>>>>>>>>>>>>>> Children_mcmc1 =
>>>>>>>>>>>>>>>>> foreach(i=1:clusterSize(cl),.options.mpi =
>>>>>>>>>>>>>>>>> list(seed=1337) ) %dopar% {
>>>>>>>>>>>>>>>>> library(MCMCglmm);library(data.table)
>>>>>>>>>>>>>>>>> load("/usr/users/rarslan/rpqa/rpqa1.rdata")
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> nitt = 130000; thin = 100; burnin = 30000
>>>>>>>>>>>>>>>>> prior.m5d.2 = list(
>>>>>>>>>>>>>>>>> R = list(V = diag(c(1,1)), nu = 0.002),
>>>>>>>>>>>>>>>>> G=list(list(V=diag(c(1,1e-6)),nu=0.002))
>>>>>>>>>>>>>>>>> )
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> rpqa.1 = na.omit(rpqa.1[spouses>0, list(idParents,
>>>>>>>>>>>>>>>>> children, male, urban, spouses, paternalage.mean,
>>>>>>>>>>>>>>>>> paternalage.factor)])
>>>>>>>>>>>>>>>>> (m1 = MCMCglmm( children ~ trait * (male + urban +
>>>>>>>>>>>>>>>>> spouses + paternalage.mean + paternalage.factor),
>>>>>>>>>>>>>>>>> rcov=~us(trait):units,
>>>>>>>>>>>>>>>>> random=~us(trait):idParents,
>>>>>>>>>>>>>>>>> family="zapoisson",
>>>>>>>>>>>>>>>>> prior = prior.m5d.2,
>>>>>>>>>>>>>>>>> data=rpqa.1,
>>>>>>>>>>>>>>>>> pr = F, saveX = F, saveZ = F,
>>>>>>>>>>>>>>>>> nitt=nitt,thin=thin,burnin=burnin))
>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> library(coda)
>>>>>>>>>>>>>>>>> mcmclist =
>>>>>>>>>>>>>>>>> mcmc.list(lapply(Children_mcmc1,FUN=function(x) {
>>>>>>>>>>>>>>>>> x$Sol}))
>>>>>>>>>>>>>>>>> save(Children_mcmc1,mcmclist, file =
>>>>>>>>>>>>>>>>> "/usr/users/rarslan/rpqa/rpqa_main/rpqa_mcmc_kids_za.rdata")
>>>>>>>>>>>>>>>>> closeCluster(cl)
>>>>>>>>>>>>>>>>> mpi.quit()
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On 04 Aug 2014, at 20:25, Ruben Arslan
>>>>>>>>>>>>>>>>> <rubenarslan at gmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Dear list,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> would someone be willing to share her or his
>>>>>>>>>>>>>>>>>> efforts in parallelising a MCMCglmm analysis?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I had something viable using harvestr that seemed
>>>>>>>>>>>>>>>>>> to properly initialise
>>>>>>>>>>>>>>>>>> the starting values from different random number
>>>>>>>>>>>>>>>>>> streams (which is desirable,
>>>>>>>>>>>>>>>>>> as far as I could find out), but I ended up being
>>>>>>>>>>>>>>>>>> unable to use harvestr, because
>>>>>>>>>>>>>>>>>> it uses an old version of plyr, where
>>>>>>>>>>>>>>>>>> parallelisation works only for multicore, not for
>>>>>>>>>>>>>>>>>> MPI.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I pasted my working version, that does not do
>>>>>>>>>>>>>>>>>> anything about starting values or RNG
>>>>>>>>>>>>>>>>>> at the end of this email. I can try to fumble
>>>>>>>>>>>>>>>>>> further in the dark or try to update harvestr,
>>>>>>>>>>>>>>>>>> but maybe someone has gone through all this already.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I'd also appreciate any tips for elegantly
>>>>>>>>>>>>>>>>>> post-processing such parallel data, as some of my
>>>>>>>>>>>>>>>>>> usual
>>>>>>>>>>>>>>>>>> extraction functions and routines are hampered by
>>>>>>>>>>>>>>>>>> the fact that some coda functions
>>>>>>>>>>>>>>>>>> do not aggregate results over chains. (What I get
>>>>>>>>>>>>>>>>>> from a single-chain summary in MCMCglmm
>>>>>>>>>>>>>>>>>> is a bit more comprehensive, than what I managed to
>>>>>>>>>>>>>>>>>> cobble together with my own extraction
>>>>>>>>>>>>>>>>>> functions).
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> The reason I'm parallelising my analyses is that
>>>>>>>>>>>>>>>>>> I'm having trouble getting a good effective
>>>>>>>>>>>>>>>>>> sample size for any parameter having to do with the
>>>>>>>>>>>>>>>>>> many zeroes in my data.
>>>>>>>>>>>>>>>>>> Any pointers are very appreciated, I'm quite
>>>>>>>>>>>>>>>>>> inexperienced with MCMCglmm.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Best wishes
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Ruben
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> # bsub -q mpi-short -W 2:00 -n 42 -R np20 mpirun -H
>>>>>>>>>>>>>>>>>> localhost -n 41 R --slave -f
>>>>>>>>>>>>>>>>>> "rpqa/rpqa_main/rpqa_children_parallel.r"
>>>>>>>>>>>>>>>>>> library(doMPI)
>>>>>>>>>>>>>>>>>> cl <- startMPIcluster()
>>>>>>>>>>>>>>>>>> registerDoMPI(cl)
>>>>>>>>>>>>>>>>>> Children_mcmc1 = foreach(i=1:40) %dopar% {
>>>>>>>>>>>>>>>>>> library(MCMCglmm)
>>>>>>>>>>>>>>>>>> load("rpqa1.rdata")
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> nitt = 40000; thin = 100; burnin = 10000
>>>>>>>>>>>>>>>>>> prior = list(
>>>>>>>>>>>>>>>>>> R = list(V = diag(c(1,1)), nu = 0.002),
>>>>>>>>>>>>>>>>>> G=list(list(V=diag(c(1,1e-6)),nu=0.002))
>>>>>>>>>>>>>>>>>> )
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> MCMCglmm( children ~ trait -1 +
>>>>>>>>>>>>>>>>>> at.level(trait,1):male + at.level(trait,1):urban +
>>>>>>>>>>>>>>>>>> at.level(trait,1):spouses +
>>>>>>>>>>>>>>>>>> at.level(trait,1):paternalage.mean +
>>>>>>>>>>>>>>>>>> at.level(trait,1):paternalage.factor,
>>>>>>>>>>>>>>>>>> rcov=~us(trait):units,
>>>>>>>>>>>>>>>>>> random=~us(trait):idParents,
>>>>>>>>>>>>>>>>>> family="zapoisson",
>>>>>>>>>>>>>>>>>> prior = prior,
>>>>>>>>>>>>>>>>>> data=rpqa.1,
>>>>>>>>>>>>>>>>>> pr = F, saveX = T, saveZ = T,
>>>>>>>>>>>>>>>>>> nitt=nitt,thin=thin,burnin=burnin)
>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> library(coda)
>>>>>>>>>>>>>>>>>> mcmclist =
>>>>>>>>>>>>>>>>>> mcmc.list(lapply(Children_mcmc1,FUN=function(x) {
>>>>>>>>>>>>>>>>>> x$Sol}))
>>>>>>>>>>>>>>>>>> save(Children_mcmc1,mcmclist, file =
>>>>>>>>>>>>>>>>>> "rpqa_mcmc_kids_za.rdata")
>>>>>>>>>>>>>>>>>> closeCluster(cl)
>>>>>>>>>>>>>>>>>> mpi.quit()
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>> Ruben C. Arslan
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Georg August University G�ttingen
>>>>>>>>>>>>>>>>>> Biological Personality Psychology and Psychological
>>>>>>>>>>>>>>>>>> Assessment
>>>>>>>>>>>>>>>>>> Georg Elias M�ller Institute of Psychology
>>>>>>>>>>>>>>>>>> Go�lerstr. 14
>>>>>>>>>>>>>>>>>> 37073 G�ttingen
>>>>>>>>>>>>>>>>>> Germany
>>>>>>>>>>>>>>>>>> Tel.: +49 551 3920704
>>>>>>>>>>>>>>>>>> https://psych.uni-goettingen.de/en/biopers/team/arslan
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> [[alternative HTML version deleted]]
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> The University of Edinburgh is a charitable body,
>>>>>>>>>>>>>>>> registered in
>>>>>>>>>>>>>>>> Scotland, with registration number SC005336.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> The University of Edinburgh is a charitable body, registered in
>>>>>>>>>>>>>> Scotland, with registration number SC005336.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>> R-sig-mixed-models at r-project.org mailing list
>>>>>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> The University of Edinburgh is a charitable body, registered in
>>>>>>>>>>>>> Scotland, with registration number SC005336.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> The University of Edinburgh is a charitable body, registered in
>>>>>>>>>>> Scotland, with registration number SC005336.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> The University of Edinburgh is a charitable body, registered in
>>>>>>>>> Scotland, with registration number SC005336.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> The University of Edinburgh is a charitable body, registered in
>>>>>>> Scotland, with registration number SC005336.
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> The University of Edinburgh is a charitable body, registered in
>>>>> Scotland, with registration number SC005336.
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> The University of Edinburgh is a charitable body, registered in
>>> Scotland, with registration number SC005336.
>>>
>>> _______________________________________________
>>> R-sig-mixed-models at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>
>>
>>
>> --
>> The University of Edinburgh is a charitable body, registered in
>> Scotland, with registration number SC005336.
>>
>>
>
>
--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
More information about the R-sig-mixed-models
mailing list