[R-sig-ME] parallel MCMCglmm, RNGstreams, starting values & priors

Fri Aug 29 09:04:42 CEST 2014

Hi Ruben,

Can you share your data and I will take a look. Its definitely not  
Monte Carlo error.

Cheers,

Jarrod

Quoting Ruben Arslan <rubenarslan at gmail.com> on Thu, 28 Aug 2014  
22:44:47 +0200:

> Hi Jarrod,
>
> those two matched up quite well yes. I just completed another 20  
> chains, using more variable
> starting values. There's still two fixed effects  
> traitza_children:spouses  and :male which haven't converged
> according to multi-chain (gelman), but have according to geweke.
> The offending traces: http://imgur.com/Qm6Ovfr
> These specific effects aren't of interest to me, so if this doesn't  
> affect the rest of my estimates, I can be happy
> with this, but I can't conclude that, can I?
>
> I'm now also doing a run to see how it deals with the more intensely  
> zero-inflated data when including
> the unmarried.
>
> Thanks a lot for all that help,
>
> Ruben
>
>> gelman.diag(mcmclist)
> Potential scale reduction factors:
>
>                                            Point est. Upper C.I.
> (Intercept)                                      1.00       1.00
> traitza_children                                 1.40       1.65
> male                                             1.00       1.00
> spouses                                          1.00       1.00
> paternalage.mean                                 1.00       1.00
> paternalage.factor(25,30]                        1.00       1.00
> paternalage.factor(30,35]                        1.00       1.00
> paternalage.factor(35,40]                        1.00       1.00
> paternalage.factor(40,45]                        1.00       1.00
> paternalage.factor(45,50]                        1.00       1.00
> paternalage.factor(50,55]                        1.00       1.00
> paternalage.factor(55,90]                        1.00       1.00
> traitza_children:male                            1.33       1.54
> traitza_children:spouses                         2.21       2.83
> traitza_children:paternalage.mean                1.01       1.02
> traitza_children:paternalage.factor(25,30]       1.05       1.08
> traitza_children:paternalage.factor(30,35]       1.08       1.13
> traitza_children:paternalage.factor(35,40]       1.15       1.25
> traitza_children:paternalage.factor(40,45]       1.15       1.26
> traitza_children:paternalage.factor(45,50]       1.26       1.43
> traitza_children:paternalage.factor(50,55]       1.15       1.25
> traitza_children:paternalage.factor(55,90]       1.14       1.23
>
> Multivariate psrf
>
> 8.99
>
>> summary(mcmclist)
>
> Iterations = 100001:149951
> Thinning interval = 50
> Number of chains = 20
> Sample size per chain = 1000
>
> 1. Empirical mean and standard deviation for each variable,
>    plus standard error of the mean:
>
>                                                Mean      SD  Naive  
> SE Time-series SE
> (Intercept)                                 1.36326 0.04848  
> 0.0003428      0.0003542
> traitza_children                           -0.76679 0.28738  
> 0.0020321      0.0016682
> male                                        0.09980 0.01633  
> 0.0001155      0.0001222
> spouses                                     0.12333 0.01957  
> 0.0001384      0.0001414
> paternalage.mean                            0.07215 0.02194  
> 0.0001551      0.0001596
> paternalage.factor(25,30]                  -0.03381 0.04184  
> 0.0002959      0.0003066
> paternalage.factor(30,35]                  -0.08380 0.04270  
> 0.0003019      0.0003118
> paternalage.factor(35,40]                  -0.16502 0.04569  
> 0.0003231      0.0003289
> paternalage.factor(40,45]                  -0.16738 0.05090  
> 0.0003599      0.0003697
> paternalage.factor(45,50]                  -0.18383 0.05880  
> 0.0004158      0.0004242
> paternalage.factor(50,55]                  -0.18241 0.07277  
> 0.0005146      0.0005302
> paternalage.factor(55,90]                  -0.40612 0.09875  
> 0.0006983      0.0007467
> traitza_children:male                       0.12092 0.08223  
> 0.0005815      0.0004697
> traitza_children:spouses                    0.64881 0.21132  
> 0.0014942      0.0008511
> traitza_children:paternalage.mean          -0.02741 0.08550  
> 0.0006046      0.0006221
> traitza_children:paternalage.factor(25,30] -0.17296 0.18680  
> 0.0013209      0.0013750
> traitza_children:paternalage.factor(30,35] -0.19027 0.19267  
> 0.0013624      0.0013901
> traitza_children:paternalage.factor(35,40] -0.24911 0.21282  
> 0.0015049      0.0014391
> traitza_children:paternalage.factor(40,45] -0.29772 0.23403  
> 0.0016548      0.0015956
> traitza_children:paternalage.factor(45,50] -0.51782 0.28589  
> 0.0020215      0.0017602
> traitza_children:paternalage.factor(50,55] -0.46126 0.32064  
> 0.0022673      0.0021397
> traitza_children:paternalage.factor(55,90] -0.38612 0.41461  
> 0.0029317      0.0027396
>
> 2. Quantiles for each variable:
>
>                                                2.5%      25%       
> 50%      75%      97.5%
> (Intercept)                                 1.26883  1.33106   
> 1.36322  1.39575  1.4589722
> traitza_children                           -1.20696 -0.95751  
> -0.81076 -0.63308 -0.0365042
> male                                        0.06785  0.08878   
> 0.09970  0.11085  0.1320168
> spouses                                     0.08467  0.11030   
> 0.12343  0.13643  0.1617869
> paternalage.mean                            0.02950  0.05751   
> 0.07202  0.08683  0.1153881
> paternalage.factor(25,30]                  -0.11581 -0.06174  
> -0.03397 -0.00574  0.0473783
> paternalage.factor(30,35]                  -0.16656 -0.11250  
> -0.08358 -0.05519  0.0003065
> paternalage.factor(35,40]                  -0.25518 -0.19530  
> -0.16500 -0.13440 -0.0757366
> paternalage.factor(40,45]                  -0.26887 -0.20164  
> -0.16675 -0.13335 -0.0677407
> paternalage.factor(45,50]                  -0.30080 -0.22320  
> -0.18339 -0.14440 -0.0687967
> paternalage.factor(50,55]                  -0.32663 -0.23034  
> -0.18227 -0.13317 -0.0415547
> paternalage.factor(55,90]                  -0.60202 -0.47303  
> -0.40454 -0.33994 -0.2139128
> traitza_children:male                      -0.01083  0.06634   
> 0.11024  0.16109  0.3295892
> traitza_children:spouses                    0.37857  0.51072   
> 0.59398  0.71395  1.2127940
> traitza_children:paternalage.mean          -0.19138 -0.08250  
> -0.02985  0.02493  0.1468989
> traitza_children:paternalage.factor(25,30] -0.57457 -0.28481  
> -0.16489 -0.05151  0.1728148
> traitza_children:paternalage.factor(30,35] -0.61499 -0.30350  
> -0.17736 -0.06299  0.1555147
> traitza_children:paternalage.factor(35,40] -0.74251 -0.36752  
> -0.22966 -0.10777  0.1151897
> traitza_children:paternalage.factor(40,45] -0.84165 -0.42691  
> -0.27729 -0.14322  0.1032436
> traitza_children:paternalage.factor(45,50] -1.21782 -0.66568  
> -0.48420 -0.32873 -0.0476720
> traitza_children:paternalage.factor(50,55] -1.21327 -0.63623  
> -0.43432 -0.24957  0.0955360
> traitza_children:paternalage.factor(55,90] -1.33772 -0.62227  
> -0.35364 -0.11050  0.3361684
>
>> effectiveSize(mcmclist)
>                                (Intercept)                            
> traitza_children
>                                   18814.05                            
>         16359.33
>                                       male                            
>          spouses
>                                   18132.98                            
>         19547.05
>                           paternalage.mean                   
> paternalage.factor(25,30]
>                                   19238.72                            
>         18974.81
>                  paternalage.factor(30,35]                   
> paternalage.factor(35,40]
>                                   18874.33                            
>         19406.63
>                  paternalage.factor(40,45]                   
> paternalage.factor(45,50]
>                                   19075.18                            
>         19401.77
>                  paternalage.factor(50,55]                   
> paternalage.factor(55,90]
>                                   18960.11                            
>         17893.23
>                      traitza_children:male                    
> traitza_children:spouses
>                                   18545.55                            
>         14438.51
>          traitza_children:paternalage.mean  
> traitza_children:paternalage.factor(25,30]
>                                   18464.09                            
>         16943.43
> traitza_children:paternalage.factor(30,35]  
> traitza_children:paternalage.factor(35,40]
>                                   16827.44                            
>         17230.04
> traitza_children:paternalage.factor(40,45]  
> traitza_children:paternalage.factor(45,50]
>                                   17144.78                            
>         18191.67
> traitza_children:paternalage.factor(50,55]  
> traitza_children:paternalage.factor(55,90]
>                                   17466.60                            
>         18540.59
>
>
> ### current script:
>
> # bsub -q mpi -W 24:00 -n 21 -R np20 mpirun -H localhost -n 21 R  
> --slave -f "/usr/users/rarslan/rpqa/krmh_main/children.R"
> setwd("/usr/users/rarslan/rpqa/")
> library(doMPI)
> cl <- startMPIcluster(verbose=T,workdir="/usr/users/rarslan/rpqa/krmh_main/")
> registerDoMPI(cl)
> Children = foreach(i=1:clusterSize(cl),.options.mpi =  
> list(seed=1337) ) %dopar% {
> 	library(MCMCglmm);library(data.table)
>     setwd("/usr/users/rarslan/rpqa/krmh_main/")
> 	source("../1 - extraction functions.r")
>     load("../krmh1.rdata")
>
> 	krmh.1 = recenter.pat(na.omit(krmh.1[spouses>0, list(idParents,  
> children, male, spouses, paternalage)]))
>
> 	samples = 1000
> 	thin = 50; burnin = 100000
> 	nitt = samples * thin + burnin
>
> 	prior <- list(
> 		R=list(V=diag(2), nu=1.002, fix=2),
> 		G=list(G1=list(V=diag(2), nu=1, alpha.mu=c(0,0), alpha.V=diag(2)*1000))
> 	)
>
> 	start <- list(
> 		liab=c(rnorm( nrow(krmh.1)*2 )),
> 		R = list(R1 = rIW(diag(2), 10 )),
> 		G = list(G1 = rIW(diag(2), 10 ))
> 	)
>
> 	( m1 = MCMCglmm( children ~ trait * (male + spouses +  
> paternalage.mean + paternalage.factor),
> 						rcov=~idh(trait):units,
> 						random=~idh(trait):idParents,
> 						family="zapoisson",
> 						start = start,
> 						prior = prior,
> 						data=krmh.1,
> 						pr = F, saveX = F, saveZ = F,
> 						nitt=nitt,thin=thin,burnin=burnin)
> 	)
> 		m1$Residual$nrt<-2
> 	m1
> }
>
> save(Children,file = "Children.rdata")
> closeCluster(cl)
> mpi.quit()
>
> On 28 Aug 2014, at 20:59, Jarrod Hadfield <j.hadfield at ed.ac.uk> wrote:
>
>> Hi,
>>
>> The posteriors for the two models look pretty close to me. Are the  
>> scale reduction factors really as high as previously reported?  
>> Before you had 1.83 for traitza_children:spouses, but the plot  
>> suggests that it should be close to 1?
>>
>> Cheers,
>>
>> Jarrod
>>
>>
>>
>>
>> Quoting Ruben Arslan <rubenarslan at gmail.com> on Thu, 28 Aug 2014  
>> 19:59:16 +0200:
>>
>>> Sure! Thanks a lot.
>>> I am using ~idh(trait):units already, sorry for saying that  
>>> incorrectly in my last email.
>>> These models aren't the final thing, I will replace the  
>>> paternalage.factor variable
>>> with its linear equivalent if that seems defensible (does so far)  
>>> and in this model it seems
>>> okay to remove the za-effects for all predictors except spouses.
>>> So a final model would have fewer fixed effects. I also have  
>>> datasets of 200k+ and 5m+,
>>> but I'm learning MCMCglmm with this smaller one because my wrong  
>>> turns take less time.
>>>
>>> I've uploaded a comparison coef plot of two models:
>>> http://i.imgur.com/sHUfnmd.png
>>> m7 is with the default starting values, m1 is with the  
>>> specification I sent in my last email. I don't
>>> know if such differences are something to worry about.
>>>
>>> I don't know what qualifies as highly overdispersed, here's a plot  
>>> of the outcome for ever
>>> married people (slate=real data):
>>> http://imgur.com/14MywgZ
>>> here's with everybody born (incl. some stillborn etc.):
>>> http://imgur.com/knRGa1v
>>> I guess my approach (generating an overdispersed poisson with the  
>>> parameters from
>>> the data and checking if it has as excess zeroes) is not the best  
>>> way to diagnose zero-inflation,
>>> but especially in the second case it seems fairly clear-cut.
>>>
>>> Best regards,
>>>
>>> Ruben
>>>
>>>> summary(m1)
>>>
>>> Iterations = 50001:149951
>>> Thinning interval  = 50
>>> Sample size  = 2000
>>>
>>> DIC: 31249.73
>>>
>>> G-structure:  ~idh(trait):idParents
>>>
>>>                      post.mean  l-95% CI u-95% CI eff.samp
>>> children.idParents     0.006611 4.312e-08   0.0159    523.9
>>> za_children.idParents  0.193788 7.306e-02   0.3283    369.3
>>>
>>> R-structure:  ~idh(trait):units
>>>
>>>                  post.mean l-95% CI u-95% CI eff.samp
>>> children.units       0.1285   0.1118   0.1452    716.1
>>> za_children.units    0.9950   0.9950   0.9950      0.0
>>>
>>> Location effects: children ~ trait * (male + spouses +  
>>> paternalage.mean + paternalage.factor)
>>>
>>>                                            post.mean   l-95% CI    
>>> u-95% CI eff.samp  pMCMC
>>> (Intercept)                                 1.3413364  1.2402100   
>>> 1.4326099     1789 <5e-04 ***
>>> traitza_children                           -0.8362879 -1.2007980  
>>> -0.5016730     1669 <5e-04 ***
>>> male                                        0.0994902  0.0679050   
>>> 0.1297394     2000 <5e-04 ***
>>> spouses                                     0.1236033  0.0839000   
>>> 0.1624939     2000 <5e-04 ***
>>> paternalage.mean                            0.0533892  0.0119569   
>>> 0.0933960     2000  0.015 *
>>> paternalage.factor(25,30]                  -0.0275822 -0.1116421   
>>> 0.0537359     1842  0.515
>>> paternalage.factor(30,35]                  -0.0691025 -0.1463214   
>>> 0.0122393     1871  0.097 .
>>> paternalage.factor(35,40]                  -0.1419933 -0.2277379  
>>> -0.0574678     1845 <5e-04 ***
>>> paternalage.factor(40,45]                  -0.1364952 -0.2362714  
>>> -0.0451874     1835  0.007 **
>>> paternalage.factor(45,50]                  -0.1445342 -0.2591767  
>>> -0.0421178     1693  0.008 **
>>> paternalage.factor(50,55]                  -0.1302972 -0.2642965   
>>> 0.0077061     2000  0.064 .
>>> paternalage.factor(55,90]                  -0.3407879 -0.5168972  
>>> -0.1493652     1810 <5e-04 ***
>>> traitza_children:male                       0.0926888 -0.0147379   
>>> 0.2006142     1901  0.098 .
>>> traitza_children:spouses                    0.5531197  0.3870616   
>>> 0.7314289     1495 <5e-04 ***
>>> traitza_children:paternalage.mean           0.0051463 -0.1279396   
>>> 0.1460099     1617  0.960
>>> traitza_children:paternalage.factor(25,30] -0.1538957 -0.4445749   
>>> 0.1462955     1781  0.321
>>> traitza_children:paternalage.factor(30,35] -0.1747883 -0.4757851   
>>> 0.1162476     1998  0.261
>>> traitza_children:paternalage.factor(35,40] -0.2261843 -0.5464379   
>>> 0.0892582     1755  0.166
>>> traitza_children:paternalage.factor(40,45] -0.2807543 -0.6079678   
>>> 0.0650281     1721  0.100 .
>>> traitza_children:paternalage.factor(45,50] -0.4905843 -0.8649214  
>>> -0.1244174     1735  0.010 **
>>> traitza_children:paternalage.factor(50,55] -0.4648579 -0.9215759  
>>> -0.0002083     1687  0.054 .
>>> traitza_children:paternalage.factor(55,90] -0.3945406 -1.0230155   
>>> 0.2481568     1793  0.195
>>> ---
>>> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>>>
>>>> describe(krmh.1[spouses>0,])
>>>                    vars    n mean   sd median trimmed  mad   min    
>>> max range skew kurtosis   se
>>> children               2 6829 3.81 2.93   4.00    3.61 2.97  0.00  
>>> 16.00 16.00 0.47    -0.46 0.04
>>> male                   3 6829 0.46 0.50   0.00    0.45 0.00  0.00   
>>> 1.00  1.00 0.14    -1.98 0.01
>>> spouses                4 6829 1.14 0.38   1.00    1.03 0.00  1.00   
>>> 4.00  3.00 2.87     8.23 0.00
>>> paternalage            5 6829 3.65 0.80   3.57    3.60 0.80  1.83   
>>> 7.95  6.12 0.69     0.70 0.01
>>> paternalage_c          6 6829 0.00 0.80  -0.08   -0.05 0.80 -1.82   
>>> 4.30  6.12 0.69     0.70 0.01
>>> paternalage.mean       7 6829 0.00 0.68  -0.08   -0.05 0.59 -1.74   
>>> 4.30  6.04 0.95     1.97 0.01
>>> paternalage.diff       8 6829 0.00 0.42   0.00   -0.01 0.38 -1.51   
>>> 1.48  2.99 0.17     0.17 0.01
>>>
>>>> table(krmh.1$paternalage.factor)
>>>
>>> [0,25] (25,30] (30,35] (35,40] (40,45] (45,50] (50,55] (55,90]
>>>    309    1214    1683    1562    1039     623     269     130
>>>
>>> On 28 Aug 2014, at 19:05, Jarrod Hadfield <j.hadfield at ed.ac.uk> wrote:
>>>
>>>> Hi Ruben,
>>>>
>>>> It might be hard to detect (near) ECPs with so many fixed effects  
>>>> (can you post the model summary (and give us the mean and  
>>>> standard deviation of any continuous covariates)). Also, the  
>>>> complementary log-log link (which is the za specification) is  
>>>> non-symmetric and runs into problems outside the range -35 to 3.5  
>>>> so there may be a problem there, particularly if you use  
>>>> rcov=~trait:units and the Poisson part is highly over-dispersed.   
>>>> You could try rcov=~idh(trait):units and fix the non-identifiable  
>>>> za residual variance to something smaller than 1 (say 0.5)  - it  
>>>> will mix slower but it will reduce the chance of over/underflow.
>>>>
>>>> Cheers,
>>>>
>>>> Jarrod
>>>>
>>>>
>>>>
>>>>
>>>> Quoting Ruben Arslan <rubenarslan at gmail.com> on Thu, 28 Aug 2014  
>>>> 18:45:30 +0200:
>>>>
>>>>> Hi Jarrod,
>>>>>
>>>>>> 1) it did not return an error with rcov = ~trait:units because  
>>>>>> you used R1=rpois(2,1)+1 and yet this specification only fits a  
>>>>>> single variance (not a 2x2 covariance matrix). R1=rpois(2,1)+1  
>>>>>> is a bit of a weird specification since it has to be integer. I  
>>>>>> would obtain starting values using rIW().
>>>>>
>>>>> I agree it's a weird specification, I was a bit lost and thought  
>>>>> I could get away with just putting some random numbers in the  
>>>>> starting value.
>>>>> I didn't do R1=rpois(2,1)+1 though, I did R1=diag(rpois(2,1)+1),  
>>>>> so I got a 2x2 matrix, but yes, bound to be integer.
>>>>> I didn't know starting values should come from a conjugate  
>>>>> distribution, though that probably means I didn't think about it  
>>>>> much.
>>>>>
>>>>> I'm now doing
>>>>> start <- list(
>>>>> 	liab=c(rnorm( nrow(krmh.1)*2 )),
>>>>> 	R = list(R1 = rIW( diag(2), nrow(krmh.1)) ),
>>>>> 	G = list(G1 = rIW( diag(2), nrow(krmh.1)) )
>>>>> )
>>>>>
>>>>> Is this what you had in mind?
>>>>> I am especially unsure if I am supposed to use such a low  
>>>>> sampling variability (my sample size is probably not even  
>>>>> relevant for the starting values) and if I should start from  
>>>>> diag(2).
>>>>>
>>>>> And, I am still happily confused that this specification still  
>>>>> doesn't lead to errors with respect to rcov = ~trait:units .  
>>>>> Does this mean I'm doing it wrong?
>>>>>
>>>>>> 3) a) how many effective samples do you have for each  
>>>>>> parameter? and b) are you getting extreme category  
>>>>>> problems/numerical issues? If you store the latent variables  
>>>>>> (pl=TUE) what is their range for the Zi/za part?
>>>>>
>>>>> My parallel run using the above starting values isn't finished yet.
>>>>> a) After applying the above starting values I get, for the  
>>>>> location effects 1600-2000 samples for a 2000 sample chain (with  
>>>>> thin set to 50). G and R-structure are from 369  
>>>>> (za_children.idParents) to 716 (and 0 for the fixed part).
>>>>> Effective sample sizes were similar for my run using the  
>>>>> starting values for G/R that I drew from rpois, and using 40  
>>>>> chains I of course get
>>>>> b) I don't think I am getting extreme categories. I would  
>>>>> probably be getting extreme categories if I included the  
>>>>> forever-alones (they almost never have children), but this way no.
>>>>> I wasn't sure how to examine the range of the latents separately  
>>>>> for the za part, but for a single chain it looks okay:
>>>>>> quantile(as.numeric(m1$Liab),probs = c(0,0.01,0,0.99,1))
>>>>>      0%        1%        0%       99%      100%
>>>>> -4.934111 -1.290728 -4.934111  3.389847  7.484206
>>>>>
>>>>> Well, all considered now that I use the above starting value  
>>>>> specification I get slightly different estimates for all  
>>>>> za-coefficients. Nothing major, but still leading me to
>>>>> think my estimates aren't exactly independent of the starting  
>>>>> values I use. I'll see what the parallel run yields.
>>>>>
>>>>> Thanks a lot,
>>>>>
>>>>> Ruben
>>>>>
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>> Jarrod
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Quoting Ruben Arslan <rubenarslan at gmail.com> on Wed, 27 Aug  
>>>>>> 2014 19:23:42 +0200:
>>>>>>
>>>>>>> Hi Jarrod,
>>>>>>>
>>>>>>> thanks again. I was able to get it running with your advice.
>>>>>>> Some points of confusion remain:
>>>>>>>
>>>>>>> - You wrote that zi/za models would return an error with rcov  
>>>>>>> = ~trait:units + starting values. This did not happen in my  
>>>>>>> case, so I didn't build MCMCglmm myself with your suggested  
>>>>>>> edits. Also, have you considered putting your own MCMCglmm  
>>>>>>> repo on Github? Your users would be able to install  
>>>>>>> pre-releases and I'd think you'd get some time-saving pull  
>>>>>>> requests too.
>>>>>>> - In my attempts to get my models to run properly, I messed up  
>>>>>>> a prior and did not use fix=2 in my prior specification for my  
>>>>>>> za models. This led to crappy convergence, it's much better  
>>>>>>> now and for some of my simpler models I think I won't need  
>>>>>>> parallel chains. I'm reminded of Gelman's folk theorem of  
>>>>>>> statistical computing.
>>>>>>> - I followed your advice, but of course I could not set the  
>>>>>>> true values as starting values, but wanted to set random, bad  
>>>>>>> starting values. I pasted below what I arrived at, I'm  
>>>>>>> especially unsure whether I specified the starting values for  
>>>>>>> G and R properly (I think not).
>>>>>>> 	start <- list(
>>>>>>> 		liab=c(rnorm( nrow(krmh.1)*2 )),
>>>>>>> 		R = list(R1 = diag(rpois(2, 1)+1)),
>>>>>>> 		G = list(G1 = diag(rpois(2, 1)+1))
>>>>>>> 	)
>>>>>>>
>>>>>>>
>>>>>>> However, even though I may not need multiple chains for some  
>>>>>>> of my simpler models, I've now run into conflicting  
>>>>>>> diagnostics. The geweke.diag for each chain (and examination  
>>>>>>> of the traces) gives
>>>>>>> satisfactory diagnostics. Comparing multiple chains using  
>>>>>>> gelman.diag, however, leads to one bad guy, namely the  
>>>>>>> traitza_children:spouses interaction.
>>>>>>> I think this implies that I've got some starting value  
>>>>>>> dependence for this parameter, that won't be easily rectified  
>>>>>>> through longer burnin?
>>>>>>> Do you have any ideas how to rectify this?
>>>>>>> I am currently doing sequential analyses on episodes of  
>>>>>>> selection and in historical human data only those who marry  
>>>>>>> have a chance at having kids. I exclude the unmarried
>>>>>>> from my sample where I predict number of children, because I  
>>>>>>> examine that in a previous model and the zero-inflation (65%  
>>>>>>> zeros, median w/o unmarried = 4) when including the unmarried  
>>>>>>> is so excessive.
>>>>>>> Number of spouses is easily the strongest predictor in the  
>>>>>>> model, but only serves as a covariate here. Since my other  
>>>>>>> estimates are stable across chains and runs and agree well  
>>>>>>> across models and with theory, I'm
>>>>>>> inclined to shrug this off. But probably I shouldn't ignore  
>>>>>>> this sign of non-convergence?
>>>>>>>
>>>>>>>> gelman.diag(mcmc_1)
>>>>>>> Potential scale reduction factors:
>>>>>>>
>>>>>>>                                         Point est. Upper C.I.
>>>>>>> (Intercept)                                      1.00       1.00
>>>>>>> traitza_children                                 1.27       1.39
>>>>>>> male                                             1.00       1.00
>>>>>>> spouses                                          1.00       1.00
>>>>>>> paternalage.mean                                 1.00       1.00
>>>>>>> paternalage.factor(25,30]                        1.00       1.00
>>>>>>> paternalage.factor(30,35]                        1.00       1.00
>>>>>>> paternalage.factor(35,40]                        1.00       1.00
>>>>>>> paternalage.factor(40,45]                        1.00       1.00
>>>>>>> paternalage.factor(45,50]                        1.00       1.00
>>>>>>> paternalage.factor(50,55]                        1.00       1.00
>>>>>>> paternalage.factor(55,90]                        1.00       1.00
>>>>>>> traitza_children:male                            1.22       1.32
>>>>>>> traitza_children:spouses                         1.83       2.13
>>>>>>> traitza_children:paternalage.mean                1.02       1.02
>>>>>>> traitza_children:paternalage.factor(25,30]       1.03       1.05
>>>>>>> traitza_children:paternalage.factor(30,35]       1.05       1.08
>>>>>>> traitza_children:paternalage.factor(35,40]       1.10       1.15
>>>>>>> traitza_children:paternalage.factor(40,45]       1.12       1.17
>>>>>>> traitza_children:paternalage.factor(45,50]       1.19       1.28
>>>>>>> traitza_children:paternalage.factor(50,55]       1.12       1.18
>>>>>>> traitza_children:paternalage.factor(55,90]       1.11       1.17
>>>>>>>
>>>>>>> Multivariate psrf
>>>>>>>
>>>>>>> 7.27
>>>>>>>
>>>>>>>
>>>>>>> Best regards,
>>>>>>>
>>>>>>> Ruben
>>>>>>>
>>>>>>>
>>>>>>> On 26 Aug 2014, at 13:04, Jarrod Hadfield <j.hadfield at ed.ac.uk> wrote:
>>>>>>>
>>>>>>>> Hi Ruben,
>>>>>>>>
>>>>>>>> There are 400 liabilities in a zapoisson model (2 per datum).  
>>>>>>>> This code should work:
>>>>>>>>
>>>>>>>> g <-sample(letters[1:10], size = 200, replace = T)
>>>>>>>> pred <- rnorm(200)
>>>>>>>>
>>>>>>>> l1<-rnorm(200, -1, sqrt(1))
>>>>>>>> l2<-rnorm(200, -1, sqrt(1))
>>>>>>>>
>>>>>>>> y<-VGAM::rzapois(200, exp(l1), exp(-exp(l2)))
>>>>>>>>
>>>>>>>> # generate zero-altered data with an intercept of -1 (because  
>>>>>>>> the intercept and variance are the same for both processes  
>>>>>>>> this is just standard Poisson)
>>>>>>>>
>>>>>>>> dat<-data.frame(y=y, g = g, pred = pred)
>>>>>>>>
>>>>>>>>
>>>>>>>> start.1<-list(liab=c(l1,l2), R = list(R1=diag(2)), G=list(G1=diag(2)))
>>>>>>>> prior.1<-list(R=list(V=diag(2), nu=1.002, fix=2),  
>>>>>>>> G=list(G1=list(V=diag(2), nu=2, alpha.mu=c(0,0),  
>>>>>>>> alpha.V=diag(2)*1000)))
>>>>>>>>
>>>>>>>> m1<-MCMCglmm(y~trait + pred:trait, random=~us(trait):g,  
>>>>>>>> family="zapoisson",rcov=~idh(trait):units, data=dat,  
>>>>>>>> prior=prior.1, start= start.1)
>>>>>>>>
>>>>>>>> However, there are 2 bugs in the current version of MCMCglmm  
>>>>>>>> that return an error message when the documentation implies  
>>>>>>>> it should be fine:
>>>>>>>>
>>>>>>>> a) it should be possible to have R=diag(2) rather than R =  
>>>>>>>> list(R1=diag(2)). This bug cropped up when I implemented  
>>>>>>>> block-diagonal R structures. It can be fixed by inserting:
>>>>>>>>
>>>>>>>>       if(!is.list(start$R)){
>>>>>>>>          start$R<-list(R1=start$R)
>>>>>>>>       }
>>>>>>>>
>>>>>>>> on L514 of MCMCglmm.R below
>>>>>>>>
>>>>>>>>       if(!is.list(prior$R[[1]])){
>>>>>>>>          prior$R<-list(R1=prior$R)
>>>>>>>>       }
>>>>>>>>
>>>>>>>> b) rcov=~trait:units models for zi/za models will return an  
>>>>>>>> error when passing starting values. To fix this insert
>>>>>>>>
>>>>>>>>      if(diagR==3){
>>>>>>>>        if(dim(start)[1]!=1){
>>>>>>>>          stop("V is the wrong dimension for some  
>>>>>>>> strart$G/start$R elements")
>>>>>>>>        }
>>>>>>>>        start<-diag(sum(nfl))*start[1]
>>>>>>>>      }
>>>>>>>>
>>>>>>>> on L90 of priorfromat.R below
>>>>>>>>
>>>>>>>>      if(is.matrix(start)==FALSE){
>>>>>>>>        start<-as.matrix(start)
>>>>>>>>      }
>>>>>>>>
>>>>>>>> I will put these in the new version.
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>>
>>>>>>>> Jarrod
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Quoting Ruben Arslan <rubenarslan at gmail.com> on Mon, 25 Aug  
>>>>>>>> 2014 21:52:30 +0200:
>>>>>>>>
>>>>>>>>> Hi Jarrod,
>>>>>>>>>
>>>>>>>>> thanks for these pointers.
>>>>>>>>>
>>>>>>>>>>> You will need to provide over-dispersed starting values  
>>>>>>>>>>> for multiple-chain convergence diagnostics to be useful  
>>>>>>>>>>> (GLMM are so simple I am generally happy if the output of  
>>>>>>>>>>> a single run looks reasonable).
>>>>>>>>>
>>>>>>>>> Oh, I would be happy with single chains, but since  
>>>>>>>>> computation would take weeks this way, I wanted to  
>>>>>>>>> parallelise and I would use the multi-chain convergence as a  
>>>>>>>>> criterion that my parallelisation was proper
>>>>>>>>> and is as informative as a single long chain. There don't  
>>>>>>>>> seem to be any such checks built-in – I was analysing my 40  
>>>>>>>>> chains for a bit longer than I like to admit until I noticed  
>>>>>>>>> they were identical (effectiveSize
>>>>>>>>> and summary.mcmc.list did not yell at me for this).
>>>>>>>>>
>>>>>>>>>>> # use some very bad starting values
>>>>>>>>> I get that these values are bad, but that is the goal for my  
>>>>>>>>> multi-chain aim, right?
>>>>>>>>>
>>>>>>>>> I can apply this to my zero-truncated model, but am again  
>>>>>>>>> getting stuck with the zero-altered one.
>>>>>>>>> Maybe I need only specify the Liab values for this?
>>>>>>>>> At least I'm getting nowhere with specifying R and G  
>>>>>>>>> starting values here. When I got an error, I always
>>>>>>>>> went to the MCMCglmm source to understand why the checks  
>>>>>>>>> failed, but I didn't always understand
>>>>>>>>> what was being checked and couldn't get it to work.
>>>>>>>>>
>>>>>>>>> Here's a failing example:
>>>>>>>>>
>>>>>>>>> l<-rnorm(200, -1, sqrt(1))
>>>>>>>>> t<-(-log(1-runif(200)*(1-exp(-exp(l)))))
>>>>>>>>> g = sample(letters[1:10], size = 200, replace = T)
>>>>>>>>> pred = rnorm(200)
>>>>>>>>> y<-rpois(200,exp(l)-t)
>>>>>>>>> y[1:40] = 0
>>>>>>>>> # generate zero-altered data with an intercept of -1
>>>>>>>>>
>>>>>>>>> dat<-data.frame(y=y, g = g, pred = pred)
>>>>>>>>> set.seed(1)
>>>>>>>>> start_true = list(Liab=l, R = 1, G = 1 )
>>>>>>>>> m1<-MCMCglmm(y~1 + pred,random = ~ g,  
>>>>>>>>> family="zapoisson",rcov=~us(trait):units, data=dat, start=  
>>>>>>>>> start_true)
>>>>>>>>>
>>>>>>>>> # use true latent variable as starting values
>>>>>>>>> set.seed(1)
>>>>>>>>> # use some very bad starting values
>>>>>>>>> start_rand = list(Liab=rnorm(200), R = rpois(1, 1)+1, G =  
>>>>>>>>> rpois(1, 1)+1 )
>>>>>>>>> m2<-MCMCglmm(y~1 + pred,random = ~ g,rcov=~us(trait):units,   
>>>>>>>>> family="zapoisson", data=dat, start = start_rand)
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>>
>>>>>>>>> Ruben
>>>>>>>>>
>>>>>>>>> On 25 Aug 2014, at 18:29, Jarrod Hadfield  
>>>>>>>>> <j.hadfield at ed.ac.uk> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Ruben,
>>>>>>>>>>
>>>>>>>>>> Sorry  - I was wrong when I said that everything is Gibbs  
>>>>>>>>>> sampled conditional on the latent variables. The location  
>>>>>>>>>> effects (fixed and random effects) are also sampled  
>>>>>>>>>> conditional on the (co)variance components so you should  
>>>>>>>>>> add them to the starting values. In the case where the true  
>>>>>>>>>> values are used:
>>>>>>>>>>
>>>>>>>>>> m1<-MCMCglmm(y~1, family="ztpoisson", data=dat,  
>>>>>>>>>> start=list(Liab=l,R=1))
>>>>>>>>>>
>>>>>>>>>> Cheers,
>>>>>>>>>>
>>>>>>>>>> Jarrod
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Quoting Jarrod Hadfield <j.hadfield at ed.ac.uk> on Mon, 25  
>>>>>>>>>> Aug 2014 17:14:14 +0100:
>>>>>>>>>>
>>>>>>>>>>> Hi Ruben,
>>>>>>>>>>>
>>>>>>>>>>> You will need to provide over-dispersed starting values  
>>>>>>>>>>> for multiple-chain convergence diagnostics to be useful  
>>>>>>>>>>> (GLMM are so simple I am generally happy if the output of  
>>>>>>>>>>> a single run looks reasonable).
>>>>>>>>>>>
>>>>>>>>>>> With non-Gaussian data everything is Gibbs sampled  
>>>>>>>>>>> conditional on the latent variables, so you only need to  
>>>>>>>>>>> pass them:
>>>>>>>>>>>
>>>>>>>>>>> l<-rnorm(200, -1, sqrt(1))
>>>>>>>>>>> t<-(-log(1-runif(200)*(1-exp(-exp(l)))))
>>>>>>>>>>> y<-rpois(200,exp(l)-t)+1
>>>>>>>>>>> # generate zero-truncated data with an intercept of -1
>>>>>>>>>>>
>>>>>>>>>>> dat<-data.frame(y=y)
>>>>>>>>>>> set.seed(1)
>>>>>>>>>>> m1<-MCMCglmm(y~1, family="ztpoisson", data=dat, start=list(Liab=l))
>>>>>>>>>>> # use true latent variable as starting values
>>>>>>>>>>> set.seed(1)
>>>>>>>>>>> m2<-MCMCglmm(y~1, family="ztpoisson", data=dat,  
>>>>>>>>>>> start=list(Liab=rnorm(200)))
>>>>>>>>>>> # use some very bad starting values
>>>>>>>>>>>
>>>>>>>>>>> plot(mcmc.list(m1$Sol, m2$Sol))
>>>>>>>>>>> # not identical despite the same seed because of different  
>>>>>>>>>>> starting values but clearly sampling the same posterior  
>>>>>>>>>>> distribution:
>>>>>>>>>>>
>>>>>>>>>>> gelman.diag(mcmc.list(m1$Sol, m2$Sol))
>>>>>>>>>>>
>>>>>>>>>>> Cheers,
>>>>>>>>>>>
>>>>>>>>>>> Jarrod
>>>>>>>>>>>
>>>>>>>>>>> Quoting Ruben Arslan <rubenarslan at gmail.com> on Mon, 25  
>>>>>>>>>>> Aug 2014 18:00:08 +0200:
>>>>>>>>>>>
>>>>>>>>>>>> Dear Jarrod,
>>>>>>>>>>>>
>>>>>>>>>>>> thanks for the quick reply. Please, don't waste time  
>>>>>>>>>>>> looking into doMPI – I am happy that I
>>>>>>>>>>>> get the expected result, when I specify that reproducible  
>>>>>>>>>>>> seed, whyever that may be.
>>>>>>>>>>>> I'm pretty sure that is the deciding factor, because I  
>>>>>>>>>>>> tested it explicitly, I just have no idea
>>>>>>>>>>>> how/why it interacts with the choice of family.
>>>>>>>>>>>>
>>>>>>>>>>>> That said, is setting up different RNG streams for my  
>>>>>>>>>>>> workers (now that it works) __sufficient__
>>>>>>>>>>>> so that I get independent chains and can use  
>>>>>>>>>>>> gelman.diag() for convergence diagnostics?
>>>>>>>>>>>> Or should I still tinker with the starting values myself?
>>>>>>>>>>>> I've never found a worked example of supplying starting  
>>>>>>>>>>>> values and am thus a bit lost.
>>>>>>>>>>>>
>>>>>>>>>>>> Sorry for sending further questions, I hope someone else  
>>>>>>>>>>>> takes pity while
>>>>>>>>>>>> you're busy with lectures.
>>>>>>>>>>>>
>>>>>>>>>>>> Best wishes
>>>>>>>>>>>>
>>>>>>>>>>>> Ruben
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 25 Aug 2014, at 17:29, Jarrod Hadfield  
>>>>>>>>>>>> <j.hadfield at ed.ac.uk> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Ruben,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I do not think the issue is with the starting values,  
>>>>>>>>>>>>> because even if the same starting values were used the  
>>>>>>>>>>>>> chains would still differ because of the randomness in  
>>>>>>>>>>>>> the Markov Chain (if I interpret your `identical' test  
>>>>>>>>>>>>> correctly). This just involves a call to GetRNGstate()  
>>>>>>>>>>>>> in the C++ code (L 871 ofMCMCglmm.cc) so I think for  
>>>>>>>>>>>>> some reason doMPI/foreach is not doing what you expect.  
>>>>>>>>>>>>> I am not familiar with doMPI and am in the middle of  
>>>>>>>>>>>>> writing lectures so haven't got time to look into it  
>>>>>>>>>>>>> carefully. Outside of the context of doMPI I get the  
>>>>>>>>>>>>> behaviour I expect:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> l<-rnorm(200, -1, sqrt(1))
>>>>>>>>>>>>> t<-(-log(1-runif(200)*(1-exp(-exp(l)))))
>>>>>>>>>>>>> y<-rpois(200,exp(l)-t)+1
>>>>>>>>>>>>> # generate zero-truncated data with an intercept of -1
>>>>>>>>>>>>>
>>>>>>>>>>>>> dat<-data.frame(y=y)
>>>>>>>>>>>>> set.seed(1)
>>>>>>>>>>>>> m1<-MCMCglmm(y~1, family="ztpoisson", data=dat)
>>>>>>>>>>>>> set.seed(2)
>>>>>>>>>>>>> m2<-MCMCglmm(y~1, family="ztpoisson", data=dat)
>>>>>>>>>>>>> set.seed(2)
>>>>>>>>>>>>> m3<-MCMCglmm(y~1, family="ztpoisson", data=dat)
>>>>>>>>>>>>>
>>>>>>>>>>>>> plot(mcmc.list(m1$Sol, m2$Sol))
>>>>>>>>>>>>> # different, as expected
>>>>>>>>>>>>> plot(mcmc.list(m2$Sol, m3$Sol))
>>>>>>>>>>>>> # the same, as expected
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Quoting Ruben Arslan <rubenarslan at gmail.com> on Mon, 25  
>>>>>>>>>>>>> Aug 2014 16:58:06 +0200:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Dear list,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> sorry for bumping my old post, I hope to elicit a  
>>>>>>>>>>>>>> response with a more focused question:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> When does MCMCglmm automatically start from different  
>>>>>>>>>>>>>> values when using doMPI/foreach?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I have done some tests with models of varying  
>>>>>>>>>>>>>> complexity. For example, the script in my last
>>>>>>>>>>>>>> post (using "zapoisson") yielded 40 identical chains:
>>>>>>>>>>>>>>> identical(mcmclist[1], mcmclist[30])
>>>>>>>>>>>>>> TRUE
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> A simpler (?) model (using "ztpoisson" and no specified  
>>>>>>>>>>>>>> prior), however, yielded different chains
>>>>>>>>>>>>>> and I could use them to calculate gelman.diag()
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Changing my script to the version below, i.e. seeding  
>>>>>>>>>>>>>> foreach using .options.mpi=list( seed= 1337)
>>>>>>>>>>>>>> so as to make RNGstreams reproducible (or so I   
>>>>>>>>>>>>>> thought), led to different chains even for the
>>>>>>>>>>>>>> "zapoisson" model.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> In no case have I (successfully) tried to supplant the  
>>>>>>>>>>>>>> default of MCMCglmm's "start" argument.
>>>>>>>>>>>>>> Is starting my models from different RNGsubstreams  
>>>>>>>>>>>>>> inadequate compared to manipulating
>>>>>>>>>>>>>> the start argument explicitly? If so, is there any  
>>>>>>>>>>>>>> worked example of explicit starting value manipulation
>>>>>>>>>>>>>> in parallel computation?
>>>>>>>>>>>>>> I've browsed the MCMCglmm source to understand how the  
>>>>>>>>>>>>>> default starting values are generated,
>>>>>>>>>>>>>> but didn't find any differences with respect to RNG for  
>>>>>>>>>>>>>> the two families "ztpoisson" and "zapoisson"
>>>>>>>>>>>>>> (granted, I did not dig very deep).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Ruben Arslan
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> # bsub -q mpi -W 12:00 -n 41 -R np20 mpirun -H  
>>>>>>>>>>>>>> localhost -n 41 R --slave -f  
>>>>>>>>>>>>>> "/usr/users/rarslan/rpqa/rpqa_main/rpqa_children_parallel.R"
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> library(doMPI)
>>>>>>>>>>>>>> cl <-  
>>>>>>>>>>>>>> startMPIcluster(verbose=T,workdir="/usr/users/rarslan/rpqa/rpqa_main/")
>>>>>>>>>>>>>> registerDoMPI(cl)
>>>>>>>>>>>>>> Children_mcmc1 =  
>>>>>>>>>>>>>> foreach(i=1:clusterSize(cl),.options.mpi =  
>>>>>>>>>>>>>> list(seed=1337) ) %dopar% {
>>>>>>>>>>>>>> 	library(MCMCglmm);library(data.table)
>>>>>>>>>>>>>> 	load("/usr/users/rarslan/rpqa/rpqa1.rdata")
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 	nitt = 130000; thin = 100; burnin = 30000
>>>>>>>>>>>>>> 	prior.m5d.2 = list(
>>>>>>>>>>>>>> 		R = list(V = diag(c(1,1)), nu = 0.002),
>>>>>>>>>>>>>> 		G=list(list(V=diag(c(1,1e-6)),nu=0.002))
>>>>>>>>>>>>>> 	)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 	rpqa.1 = na.omit(rpqa.1[spouses>0, list(idParents,  
>>>>>>>>>>>>>> children, male, urban, spouses, paternalage.mean,  
>>>>>>>>>>>>>> paternalage.factor)])
>>>>>>>>>>>>>> 	(m1 = MCMCglmm( children ~ trait * (male + urban +  
>>>>>>>>>>>>>> spouses + paternalage.mean + paternalage.factor),
>>>>>>>>>>>>>> 						rcov=~us(trait):units,
>>>>>>>>>>>>>> 						random=~us(trait):idParents,
>>>>>>>>>>>>>> 						family="zapoisson",
>>>>>>>>>>>>>> 						prior = prior.m5d.2,
>>>>>>>>>>>>>> 						data=rpqa.1,
>>>>>>>>>>>>>> 						pr = F, saveX = F, saveZ = F,
>>>>>>>>>>>>>> 						nitt=nitt,thin=thin,burnin=burnin))
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> library(coda)
>>>>>>>>>>>>>> mcmclist =  
>>>>>>>>>>>>>> mcmc.list(lapply(Children_mcmc1,FUN=function(x) {  
>>>>>>>>>>>>>> x$Sol}))
>>>>>>>>>>>>>> save(Children_mcmc1,mcmclist, file =  
>>>>>>>>>>>>>> "/usr/users/rarslan/rpqa/rpqa_main/rpqa_mcmc_kids_za.rdata")
>>>>>>>>>>>>>> closeCluster(cl)
>>>>>>>>>>>>>> mpi.quit()
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 04 Aug 2014, at 20:25, Ruben Arslan  
>>>>>>>>>>>>>> <rubenarslan at gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Dear list,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> would someone be willing to share her or his efforts  
>>>>>>>>>>>>>>> in parallelising a MCMCglmm analysis?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I had something viable using harvestr that seemed to  
>>>>>>>>>>>>>>> properly initialise
>>>>>>>>>>>>>>> the starting values from different random number  
>>>>>>>>>>>>>>> streams (which is desirable,
>>>>>>>>>>>>>>> as far as I could find out), but I ended up being  
>>>>>>>>>>>>>>> unable to use harvestr, because
>>>>>>>>>>>>>>> it uses an old version of plyr, where parallelisation  
>>>>>>>>>>>>>>> works only for multicore, not for
>>>>>>>>>>>>>>> MPI.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I pasted my working version, that does not do anything  
>>>>>>>>>>>>>>> about starting values or RNG
>>>>>>>>>>>>>>> at the end of this email. I can try to fumble further  
>>>>>>>>>>>>>>> in the dark or try to update harvestr,
>>>>>>>>>>>>>>> but maybe someone has gone through all this already.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I'd also appreciate any tips for elegantly  
>>>>>>>>>>>>>>> post-processing such parallel data, as some of my usual
>>>>>>>>>>>>>>> extraction functions and routines are hampered by the  
>>>>>>>>>>>>>>> fact that some coda functions
>>>>>>>>>>>>>>> do not aggregate results over chains. (What I get from  
>>>>>>>>>>>>>>> a single-chain summary in MCMCglmm
>>>>>>>>>>>>>>> is a bit more comprehensive, than what I managed to  
>>>>>>>>>>>>>>> cobble together with my own extraction
>>>>>>>>>>>>>>> functions).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The reason I'm parallelising my analyses is that I'm  
>>>>>>>>>>>>>>> having trouble getting a good effective
>>>>>>>>>>>>>>> sample size for any parameter having to do with the  
>>>>>>>>>>>>>>> many zeroes in my data.
>>>>>>>>>>>>>>> Any pointers are very appreciated, I'm quite  
>>>>>>>>>>>>>>> inexperienced with MCMCglmm.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Best wishes
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Ruben
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> # bsub -q mpi-short -W 2:00 -n 42 -R np20 mpirun -H  
>>>>>>>>>>>>>>> localhost -n 41 R --slave -f  
>>>>>>>>>>>>>>> "rpqa/rpqa_main/rpqa_children_parallel.r"
>>>>>>>>>>>>>>> library(doMPI)
>>>>>>>>>>>>>>> cl <- startMPIcluster()
>>>>>>>>>>>>>>> registerDoMPI(cl)
>>>>>>>>>>>>>>> Children_mcmc1 = foreach(i=1:40) %dopar% {
>>>>>>>>>>>>>>> 	library(MCMCglmm)
>>>>>>>>>>>>>>> 	load("rpqa1.rdata")
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 	nitt = 40000; thin = 100; burnin = 10000
>>>>>>>>>>>>>>> 	prior = list(
>>>>>>>>>>>>>>> 		R = list(V = diag(c(1,1)), nu = 0.002),
>>>>>>>>>>>>>>> 		G=list(list(V=diag(c(1,1e-6)),nu=0.002))
>>>>>>>>>>>>>>> 	)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 	MCMCglmm( children ~ trait -1 +  
>>>>>>>>>>>>>>> at.level(trait,1):male + at.level(trait,1):urban +  
>>>>>>>>>>>>>>> at.level(trait,1):spouses +  
>>>>>>>>>>>>>>> at.level(trait,1):paternalage.mean +  
>>>>>>>>>>>>>>> at.level(trait,1):paternalage.factor,
>>>>>>>>>>>>>>> 		rcov=~us(trait):units,
>>>>>>>>>>>>>>> 		random=~us(trait):idParents,
>>>>>>>>>>>>>>> 		family="zapoisson",
>>>>>>>>>>>>>>> 		prior = prior,
>>>>>>>>>>>>>>> 		data=rpqa.1,
>>>>>>>>>>>>>>> 		pr = F, saveX = T, saveZ = T,
>>>>>>>>>>>>>>> 		nitt=nitt,thin=thin,burnin=burnin)
>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> library(coda)
>>>>>>>>>>>>>>> mcmclist =  
>>>>>>>>>>>>>>> mcmc.list(lapply(Children_mcmc1,FUN=function(x) {  
>>>>>>>>>>>>>>> x$Sol}))
>>>>>>>>>>>>>>> save(Children_mcmc1,mcmclist, file = "rpqa_mcmc_kids_za.rdata")
>>>>>>>>>>>>>>> closeCluster(cl)
>>>>>>>>>>>>>>> mpi.quit()
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> Ruben C. Arslan
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Georg August University G�ttingen
>>>>>>>>>>>>>>> Biological Personality Psychology and Psychological Assessment
>>>>>>>>>>>>>>> Georg Elias M�ller Institute of Psychology
>>>>>>>>>>>>>>> Go�lerstr. 14
>>>>>>>>>>>>>>> 37073 G�ttingen
>>>>>>>>>>>>>>> Germany
>>>>>>>>>>>>>>> Tel.: +49 551 3920704
>>>>>>>>>>>>>>> https://psych.uni-goettingen.de/en/biopers/team/arslan
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 	[[alternative HTML version deleted]]
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> The University of Edinburgh is a charitable body, registered in
>>>>>>>>>>>>> Scotland, with registration number SC005336.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> The University of Edinburgh is a charitable body, registered in
>>>>>>>>>>> Scotland, with registration number SC005336.
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> R-sig-mixed-models at r-project.org mailing list
>>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> The University of Edinburgh is a charitable body, registered in
>>>>>>>>>> Scotland, with registration number SC005336.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> The University of Edinburgh is a charitable body, registered in
>>>>>>>> Scotland, with registration number SC005336.
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> The University of Edinburgh is a charitable body, registered in
>>>>>> Scotland, with registration number SC005336.
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> The University of Edinburgh is a charitable body, registered in
>>>> Scotland, with registration number SC005336.
>>>>
>>>>
>>>
>>>
>>
>>
>> --
>> The University of Edinburgh is a charitable body, registered in
>> Scotland, with registration number SC005336.
>>
>>
>
>

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.