[R-sig-ME] parallel MCMCglmm, RNGstreams, starting values & priors

Jarrod Hadfield j.hadfield at ed.ac.uk
Fri Aug 29 09:13:33 CEST 2014


Hi Ruben,

Actually I might know what it is. When you sample different starting  
values do you inadvertently sample a new residual variance for the  
unidentified za part? You need to make sure that this is always fixed  
at the same value (otherwise the model is different). This is not a  
problem under the trait:units specification.

Cheers,

Jarrod.



Quoting Jarrod Hadfield <j.hadfield at ed.ac.uk> on Fri, 29 Aug 2014  
08:04:42 +0100:

> Hi Ruben,
>
> Can you share your data and I will take a look. Its definitely not  
> Monte Carlo error.
>
> Cheers,
>
> Jarrod
>
>
> Quoting Ruben Arslan <rubenarslan at gmail.com> on Thu, 28 Aug 2014  
> 22:44:47 +0200:
>
>> Hi Jarrod,
>>
>> those two matched up quite well yes. I just completed another 20  
>> chains, using more variable
>> starting values. There's still two fixed effects  
>> traitza_children:spouses  and :male which haven't converged
>> according to multi-chain (gelman), but have according to geweke.
>> The offending traces: http://imgur.com/Qm6Ovfr
>> These specific effects aren't of interest to me, so if this doesn't  
>> affect the rest of my estimates, I can be happy
>> with this, but I can't conclude that, can I?
>>
>> I'm now also doing a run to see how it deals with the more  
>> intensely zero-inflated data when including
>> the unmarried.
>>
>> Thanks a lot for all that help,
>>
>> Ruben
>>
>>> gelman.diag(mcmclist)
>> Potential scale reduction factors:
>>
>>                                           Point est. Upper C.I.
>> (Intercept)                                      1.00       1.00
>> traitza_children                                 1.40       1.65
>> male                                             1.00       1.00
>> spouses                                          1.00       1.00
>> paternalage.mean                                 1.00       1.00
>> paternalage.factor(25,30]                        1.00       1.00
>> paternalage.factor(30,35]                        1.00       1.00
>> paternalage.factor(35,40]                        1.00       1.00
>> paternalage.factor(40,45]                        1.00       1.00
>> paternalage.factor(45,50]                        1.00       1.00
>> paternalage.factor(50,55]                        1.00       1.00
>> paternalage.factor(55,90]                        1.00       1.00
>> traitza_children:male                            1.33       1.54
>> traitza_children:spouses                         2.21       2.83
>> traitza_children:paternalage.mean                1.01       1.02
>> traitza_children:paternalage.factor(25,30]       1.05       1.08
>> traitza_children:paternalage.factor(30,35]       1.08       1.13
>> traitza_children:paternalage.factor(35,40]       1.15       1.25
>> traitza_children:paternalage.factor(40,45]       1.15       1.26
>> traitza_children:paternalage.factor(45,50]       1.26       1.43
>> traitza_children:paternalage.factor(50,55]       1.15       1.25
>> traitza_children:paternalage.factor(55,90]       1.14       1.23
>>
>> Multivariate psrf
>>
>> 8.99
>>
>>> summary(mcmclist)
>>
>> Iterations = 100001:149951
>> Thinning interval = 50
>> Number of chains = 20
>> Sample size per chain = 1000
>>
>> 1. Empirical mean and standard deviation for each variable,
>>   plus standard error of the mean:
>>
>>                                               Mean      SD  Naive  
>> SE Time-series SE
>> (Intercept)                                 1.36326 0.04848  
>> 0.0003428      0.0003542
>> traitza_children                           -0.76679 0.28738  
>> 0.0020321      0.0016682
>> male                                        0.09980 0.01633  
>> 0.0001155      0.0001222
>> spouses                                     0.12333 0.01957  
>> 0.0001384      0.0001414
>> paternalage.mean                            0.07215 0.02194  
>> 0.0001551      0.0001596
>> paternalage.factor(25,30]                  -0.03381 0.04184  
>> 0.0002959      0.0003066
>> paternalage.factor(30,35]                  -0.08380 0.04270  
>> 0.0003019      0.0003118
>> paternalage.factor(35,40]                  -0.16502 0.04569  
>> 0.0003231      0.0003289
>> paternalage.factor(40,45]                  -0.16738 0.05090  
>> 0.0003599      0.0003697
>> paternalage.factor(45,50]                  -0.18383 0.05880  
>> 0.0004158      0.0004242
>> paternalage.factor(50,55]                  -0.18241 0.07277  
>> 0.0005146      0.0005302
>> paternalage.factor(55,90]                  -0.40612 0.09875  
>> 0.0006983      0.0007467
>> traitza_children:male                       0.12092 0.08223  
>> 0.0005815      0.0004697
>> traitza_children:spouses                    0.64881 0.21132  
>> 0.0014942      0.0008511
>> traitza_children:paternalage.mean          -0.02741 0.08550  
>> 0.0006046      0.0006221
>> traitza_children:paternalage.factor(25,30] -0.17296 0.18680  
>> 0.0013209      0.0013750
>> traitza_children:paternalage.factor(30,35] -0.19027 0.19267  
>> 0.0013624      0.0013901
>> traitza_children:paternalage.factor(35,40] -0.24911 0.21282  
>> 0.0015049      0.0014391
>> traitza_children:paternalage.factor(40,45] -0.29772 0.23403  
>> 0.0016548      0.0015956
>> traitza_children:paternalage.factor(45,50] -0.51782 0.28589  
>> 0.0020215      0.0017602
>> traitza_children:paternalage.factor(50,55] -0.46126 0.32064  
>> 0.0022673      0.0021397
>> traitza_children:paternalage.factor(55,90] -0.38612 0.41461  
>> 0.0029317      0.0027396
>>
>> 2. Quantiles for each variable:
>>
>>                                               2.5%      25%       
>> 50%      75%      97.5%
>> (Intercept)                                 1.26883  1.33106   
>> 1.36322  1.39575  1.4589722
>> traitza_children                           -1.20696 -0.95751  
>> -0.81076 -0.63308 -0.0365042
>> male                                        0.06785  0.08878   
>> 0.09970  0.11085  0.1320168
>> spouses                                     0.08467  0.11030   
>> 0.12343  0.13643  0.1617869
>> paternalage.mean                            0.02950  0.05751   
>> 0.07202  0.08683  0.1153881
>> paternalage.factor(25,30]                  -0.11581 -0.06174  
>> -0.03397 -0.00574  0.0473783
>> paternalage.factor(30,35]                  -0.16656 -0.11250  
>> -0.08358 -0.05519  0.0003065
>> paternalage.factor(35,40]                  -0.25518 -0.19530  
>> -0.16500 -0.13440 -0.0757366
>> paternalage.factor(40,45]                  -0.26887 -0.20164  
>> -0.16675 -0.13335 -0.0677407
>> paternalage.factor(45,50]                  -0.30080 -0.22320  
>> -0.18339 -0.14440 -0.0687967
>> paternalage.factor(50,55]                  -0.32663 -0.23034  
>> -0.18227 -0.13317 -0.0415547
>> paternalage.factor(55,90]                  -0.60202 -0.47303  
>> -0.40454 -0.33994 -0.2139128
>> traitza_children:male                      -0.01083  0.06634   
>> 0.11024  0.16109  0.3295892
>> traitza_children:spouses                    0.37857  0.51072   
>> 0.59398  0.71395  1.2127940
>> traitza_children:paternalage.mean          -0.19138 -0.08250  
>> -0.02985  0.02493  0.1468989
>> traitza_children:paternalage.factor(25,30] -0.57457 -0.28481  
>> -0.16489 -0.05151  0.1728148
>> traitza_children:paternalage.factor(30,35] -0.61499 -0.30350  
>> -0.17736 -0.06299  0.1555147
>> traitza_children:paternalage.factor(35,40] -0.74251 -0.36752  
>> -0.22966 -0.10777  0.1151897
>> traitza_children:paternalage.factor(40,45] -0.84165 -0.42691  
>> -0.27729 -0.14322  0.1032436
>> traitza_children:paternalage.factor(45,50] -1.21782 -0.66568  
>> -0.48420 -0.32873 -0.0476720
>> traitza_children:paternalage.factor(50,55] -1.21327 -0.63623  
>> -0.43432 -0.24957  0.0955360
>> traitza_children:paternalage.factor(55,90] -1.33772 -0.62227  
>> -0.35364 -0.11050  0.3361684
>>
>>> effectiveSize(mcmclist)
>>                               (Intercept)                            
>> traitza_children
>>                                  18814.05                            
>>         16359.33
>>                                      male                            
>>          spouses
>>                                  18132.98                            
>>         19547.05
>>                          paternalage.mean                   
>> paternalage.factor(25,30]
>>                                  19238.72                            
>>         18974.81
>>                 paternalage.factor(30,35]                   
>> paternalage.factor(35,40]
>>                                  18874.33                            
>>         19406.63
>>                 paternalage.factor(40,45]                   
>> paternalage.factor(45,50]
>>                                  19075.18                            
>>         19401.77
>>                 paternalage.factor(50,55]                   
>> paternalage.factor(55,90]
>>                                  18960.11                            
>>         17893.23
>>                     traitza_children:male                    
>> traitza_children:spouses
>>                                  18545.55                            
>>         14438.51
>>         traitza_children:paternalage.mean  
>> traitza_children:paternalage.factor(25,30]
>>                                  18464.09                            
>>         16943.43
>> traitza_children:paternalage.factor(30,35]  
>> traitza_children:paternalage.factor(35,40]
>>                                  16827.44                            
>>         17230.04
>> traitza_children:paternalage.factor(40,45]  
>> traitza_children:paternalage.factor(45,50]
>>                                  17144.78                            
>>         18191.67
>> traitza_children:paternalage.factor(50,55]  
>> traitza_children:paternalage.factor(55,90]
>>                                  17466.60                            
>>         18540.59
>>
>>
>> ### current script:
>>
>> # bsub -q mpi -W 24:00 -n 21 -R np20 mpirun -H localhost -n 21 R  
>> --slave -f "/usr/users/rarslan/rpqa/krmh_main/children.R"
>> setwd("/usr/users/rarslan/rpqa/")
>> library(doMPI)
>> cl <-  
>> startMPIcluster(verbose=T,workdir="/usr/users/rarslan/rpqa/krmh_main/")
>> registerDoMPI(cl)
>> Children = foreach(i=1:clusterSize(cl),.options.mpi =  
>> list(seed=1337) ) %dopar% {
>> 	library(MCMCglmm);library(data.table)
>>    setwd("/usr/users/rarslan/rpqa/krmh_main/")
>> 	source("../1 - extraction functions.r")
>>    load("../krmh1.rdata")
>>
>> 	krmh.1 = recenter.pat(na.omit(krmh.1[spouses>0, list(idParents,  
>> children, male, spouses, paternalage)]))
>>
>> 	samples = 1000
>> 	thin = 50; burnin = 100000
>> 	nitt = samples * thin + burnin
>>
>> 	prior <- list(
>> 		R=list(V=diag(2), nu=1.002, fix=2),
>> 		G=list(G1=list(V=diag(2), nu=1, alpha.mu=c(0,0), alpha.V=diag(2)*1000))
>> 	)
>>
>> 	start <- list(
>> 		liab=c(rnorm( nrow(krmh.1)*2 )),
>> 		R = list(R1 = rIW(diag(2), 10 )),
>> 		G = list(G1 = rIW(diag(2), 10 ))
>> 	)
>>
>> 	( m1 = MCMCglmm( children ~ trait * (male + spouses +  
>> paternalage.mean + paternalage.factor),
>> 						rcov=~idh(trait):units,
>> 						random=~idh(trait):idParents,
>> 						family="zapoisson",
>> 						start = start,
>> 						prior = prior,
>> 						data=krmh.1,
>> 						pr = F, saveX = F, saveZ = F,
>> 						nitt=nitt,thin=thin,burnin=burnin)
>> 	)
>> 		m1$Residual$nrt<-2
>> 	m1
>> }
>>
>> save(Children,file = "Children.rdata")
>> closeCluster(cl)
>> mpi.quit()
>>
>> On 28 Aug 2014, at 20:59, Jarrod Hadfield <j.hadfield at ed.ac.uk> wrote:
>>
>>> Hi,
>>>
>>> The posteriors for the two models look pretty close to me. Are the  
>>> scale reduction factors really as high as previously reported?  
>>> Before you had 1.83 for traitza_children:spouses, but the plot  
>>> suggests that it should be close to 1?
>>>
>>> Cheers,
>>>
>>> Jarrod
>>>
>>>
>>>
>>>
>>> Quoting Ruben Arslan <rubenarslan at gmail.com> on Thu, 28 Aug 2014  
>>> 19:59:16 +0200:
>>>
>>>> Sure! Thanks a lot.
>>>> I am using ~idh(trait):units already, sorry for saying that  
>>>> incorrectly in my last email.
>>>> These models aren't the final thing, I will replace the  
>>>> paternalage.factor variable
>>>> with its linear equivalent if that seems defensible (does so far)  
>>>> and in this model it seems
>>>> okay to remove the za-effects for all predictors except spouses.
>>>> So a final model would have fewer fixed effects. I also have  
>>>> datasets of 200k+ and 5m+,
>>>> but I'm learning MCMCglmm with this smaller one because my wrong  
>>>> turns take less time.
>>>>
>>>> I've uploaded a comparison coef plot of two models:
>>>> http://i.imgur.com/sHUfnmd.png
>>>> m7 is with the default starting values, m1 is with the  
>>>> specification I sent in my last email. I don't
>>>> know if such differences are something to worry about.
>>>>
>>>> I don't know what qualifies as highly overdispersed, here's a  
>>>> plot of the outcome for ever
>>>> married people (slate=real data):
>>>> http://imgur.com/14MywgZ
>>>> here's with everybody born (incl. some stillborn etc.):
>>>> http://imgur.com/knRGa1v
>>>> I guess my approach (generating an overdispersed poisson with the  
>>>> parameters from
>>>> the data and checking if it has as excess zeroes) is not the best  
>>>> way to diagnose zero-inflation,
>>>> but especially in the second case it seems fairly clear-cut.
>>>>
>>>> Best regards,
>>>>
>>>> Ruben
>>>>
>>>>> summary(m1)
>>>>
>>>> Iterations = 50001:149951
>>>> Thinning interval  = 50
>>>> Sample size  = 2000
>>>>
>>>> DIC: 31249.73
>>>>
>>>> G-structure:  ~idh(trait):idParents
>>>>
>>>>                     post.mean  l-95% CI u-95% CI eff.samp
>>>> children.idParents     0.006611 4.312e-08   0.0159    523.9
>>>> za_children.idParents  0.193788 7.306e-02   0.3283    369.3
>>>>
>>>> R-structure:  ~idh(trait):units
>>>>
>>>>                 post.mean l-95% CI u-95% CI eff.samp
>>>> children.units       0.1285   0.1118   0.1452    716.1
>>>> za_children.units    0.9950   0.9950   0.9950      0.0
>>>>
>>>> Location effects: children ~ trait * (male + spouses +  
>>>> paternalage.mean + paternalage.factor)
>>>>
>>>>                                           post.mean   l-95% CI    
>>>> u-95% CI eff.samp  pMCMC
>>>> (Intercept)                                 1.3413364  1.2402100   
>>>> 1.4326099     1789 <5e-04 ***
>>>> traitza_children                           -0.8362879 -1.2007980  
>>>> -0.5016730     1669 <5e-04 ***
>>>> male                                        0.0994902  0.0679050   
>>>> 0.1297394     2000 <5e-04 ***
>>>> spouses                                     0.1236033  0.0839000   
>>>> 0.1624939     2000 <5e-04 ***
>>>> paternalage.mean                            0.0533892  0.0119569   
>>>> 0.0933960     2000  0.015 *
>>>> paternalage.factor(25,30]                  -0.0275822 -0.1116421   
>>>> 0.0537359     1842  0.515
>>>> paternalage.factor(30,35]                  -0.0691025 -0.1463214   
>>>> 0.0122393     1871  0.097 .
>>>> paternalage.factor(35,40]                  -0.1419933 -0.2277379  
>>>> -0.0574678     1845 <5e-04 ***
>>>> paternalage.factor(40,45]                  -0.1364952 -0.2362714  
>>>> -0.0451874     1835  0.007 **
>>>> paternalage.factor(45,50]                  -0.1445342 -0.2591767  
>>>> -0.0421178     1693  0.008 **
>>>> paternalage.factor(50,55]                  -0.1302972 -0.2642965   
>>>> 0.0077061     2000  0.064 .
>>>> paternalage.factor(55,90]                  -0.3407879 -0.5168972  
>>>> -0.1493652     1810 <5e-04 ***
>>>> traitza_children:male                       0.0926888 -0.0147379   
>>>> 0.2006142     1901  0.098 .
>>>> traitza_children:spouses                    0.5531197  0.3870616   
>>>> 0.7314289     1495 <5e-04 ***
>>>> traitza_children:paternalage.mean           0.0051463 -0.1279396   
>>>> 0.1460099     1617  0.960
>>>> traitza_children:paternalage.factor(25,30] -0.1538957 -0.4445749   
>>>> 0.1462955     1781  0.321
>>>> traitza_children:paternalage.factor(30,35] -0.1747883 -0.4757851   
>>>> 0.1162476     1998  0.261
>>>> traitza_children:paternalage.factor(35,40] -0.2261843 -0.5464379   
>>>> 0.0892582     1755  0.166
>>>> traitza_children:paternalage.factor(40,45] -0.2807543 -0.6079678   
>>>> 0.0650281     1721  0.100 .
>>>> traitza_children:paternalage.factor(45,50] -0.4905843 -0.8649214  
>>>> -0.1244174     1735  0.010 **
>>>> traitza_children:paternalage.factor(50,55] -0.4648579 -0.9215759  
>>>> -0.0002083     1687  0.054 .
>>>> traitza_children:paternalage.factor(55,90] -0.3945406 -1.0230155   
>>>> 0.2481568     1793  0.195
>>>> ---
>>>> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>>>>
>>>>> describe(krmh.1[spouses>0,])
>>>>                   vars    n mean   sd median trimmed  mad   min    
>>>> max range skew kurtosis   se
>>>> children               2 6829 3.81 2.93   4.00    3.61 2.97  0.00  
>>>> 16.00 16.00 0.47    -0.46 0.04
>>>> male                   3 6829 0.46 0.50   0.00    0.45 0.00  0.00  
>>>>  1.00  1.00 0.14    -1.98 0.01
>>>> spouses                4 6829 1.14 0.38   1.00    1.03 0.00  1.00  
>>>>  4.00  3.00 2.87     8.23 0.00
>>>> paternalage            5 6829 3.65 0.80   3.57    3.60 0.80  1.83  
>>>>  7.95  6.12 0.69     0.70 0.01
>>>> paternalage_c          6 6829 0.00 0.80  -0.08   -0.05 0.80 -1.82  
>>>>  4.30  6.12 0.69     0.70 0.01
>>>> paternalage.mean       7 6829 0.00 0.68  -0.08   -0.05 0.59 -1.74  
>>>>  4.30  6.04 0.95     1.97 0.01
>>>> paternalage.diff       8 6829 0.00 0.42   0.00   -0.01 0.38 -1.51  
>>>>  1.48  2.99 0.17     0.17 0.01
>>>>
>>>>> table(krmh.1$paternalage.factor)
>>>>
>>>> [0,25] (25,30] (30,35] (35,40] (40,45] (45,50] (50,55] (55,90]
>>>>   309    1214    1683    1562    1039     623     269     130
>>>>
>>>> On 28 Aug 2014, at 19:05, Jarrod Hadfield <j.hadfield at ed.ac.uk> wrote:
>>>>
>>>>> Hi Ruben,
>>>>>
>>>>> It might be hard to detect (near) ECPs with so many fixed  
>>>>> effects (can you post the model summary (and give us the mean  
>>>>> and standard deviation of any continuous covariates)). Also, the  
>>>>> complementary log-log link (which is the za specification) is  
>>>>> non-symmetric and runs into problems outside the range -35 to  
>>>>> 3.5 so there may be a problem there, particularly if you use  
>>>>> rcov=~trait:units and the Poisson part is highly over-dispersed.  
>>>>>  You could try rcov=~idh(trait):units and fix the  
>>>>> non-identifiable za residual variance to something smaller than  
>>>>> 1 (say 0.5)  - it will mix slower but it will reduce the chance  
>>>>> of over/underflow.
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Jarrod
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Quoting Ruben Arslan <rubenarslan at gmail.com> on Thu, 28 Aug 2014  
>>>>> 18:45:30 +0200:
>>>>>
>>>>>> Hi Jarrod,
>>>>>>
>>>>>>> 1) it did not return an error with rcov = ~trait:units because  
>>>>>>> you used R1=rpois(2,1)+1 and yet this specification only fits  
>>>>>>> a single variance (not a 2x2 covariance matrix).  
>>>>>>> R1=rpois(2,1)+1 is a bit of a weird specification since it has  
>>>>>>> to be integer. I would obtain starting values using rIW().
>>>>>>
>>>>>> I agree it's a weird specification, I was a bit lost and  
>>>>>> thought I could get away with just putting some random numbers  
>>>>>> in the starting value.
>>>>>> I didn't do R1=rpois(2,1)+1 though, I did  
>>>>>> R1=diag(rpois(2,1)+1), so I got a 2x2 matrix, but yes, bound to  
>>>>>> be integer.
>>>>>> I didn't know starting values should come from a conjugate  
>>>>>> distribution, though that probably means I didn't think about  
>>>>>> it much.
>>>>>>
>>>>>> I'm now doing
>>>>>> start <- list(
>>>>>> 	liab=c(rnorm( nrow(krmh.1)*2 )),
>>>>>> 	R = list(R1 = rIW( diag(2), nrow(krmh.1)) ),
>>>>>> 	G = list(G1 = rIW( diag(2), nrow(krmh.1)) )
>>>>>> )
>>>>>>
>>>>>> Is this what you had in mind?
>>>>>> I am especially unsure if I am supposed to use such a low  
>>>>>> sampling variability (my sample size is probably not even  
>>>>>> relevant for the starting values) and if I should start from  
>>>>>> diag(2).
>>>>>>
>>>>>> And, I am still happily confused that this specification still  
>>>>>> doesn't lead to errors with respect to rcov = ~trait:units .  
>>>>>> Does this mean I'm doing it wrong?
>>>>>>
>>>>>>> 3) a) how many effective samples do you have for each  
>>>>>>> parameter? and b) are you getting extreme category  
>>>>>>> problems/numerical issues? If you store the latent variables  
>>>>>>> (pl=TUE) what is their range for the Zi/za part?
>>>>>>
>>>>>> My parallel run using the above starting values isn't finished yet.
>>>>>> a) After applying the above starting values I get, for the  
>>>>>> location effects 1600-2000 samples for a 2000 sample chain  
>>>>>> (with thin set to 50). G and R-structure are from 369  
>>>>>> (za_children.idParents) to 716 (and 0 for the fixed part).
>>>>>> Effective sample sizes were similar for my run using the  
>>>>>> starting values for G/R that I drew from rpois, and using 40  
>>>>>> chains I of course get
>>>>>> b) I don't think I am getting extreme categories. I would  
>>>>>> probably be getting extreme categories if I included the  
>>>>>> forever-alones (they almost never have children), but this way  
>>>>>> no.
>>>>>> I wasn't sure how to examine the range of the latents  
>>>>>> separately for the za part, but for a single chain it looks okay:
>>>>>>> quantile(as.numeric(m1$Liab),probs = c(0,0.01,0,0.99,1))
>>>>>>     0%        1%        0%       99%      100%
>>>>>> -4.934111 -1.290728 -4.934111  3.389847  7.484206
>>>>>>
>>>>>> Well, all considered now that I use the above starting value  
>>>>>> specification I get slightly different estimates for all  
>>>>>> za-coefficients. Nothing major, but still leading me to
>>>>>> think my estimates aren't exactly independent of the starting  
>>>>>> values I use. I'll see what the parallel run yields.
>>>>>>
>>>>>> Thanks a lot,
>>>>>>
>>>>>> Ruben
>>>>>>
>>>>>>>
>>>>>>> Cheers,
>>>>>>>
>>>>>>> Jarrod
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Quoting Ruben Arslan <rubenarslan at gmail.com> on Wed, 27 Aug  
>>>>>>> 2014 19:23:42 +0200:
>>>>>>>
>>>>>>>> Hi Jarrod,
>>>>>>>>
>>>>>>>> thanks again. I was able to get it running with your advice.
>>>>>>>> Some points of confusion remain:
>>>>>>>>
>>>>>>>> - You wrote that zi/za models would return an error with rcov  
>>>>>>>> = ~trait:units + starting values. This did not happen in my  
>>>>>>>> case, so I didn't build MCMCglmm myself with your suggested  
>>>>>>>> edits. Also, have you considered putting your own MCMCglmm  
>>>>>>>> repo on Github? Your users would be able to install  
>>>>>>>> pre-releases and I'd think you'd get some time-saving pull  
>>>>>>>> requests too.
>>>>>>>> - In my attempts to get my models to run properly, I messed  
>>>>>>>> up a prior and did not use fix=2 in my prior specification  
>>>>>>>> for my za models. This led to crappy convergence, it's much  
>>>>>>>> better now and for some of my simpler models I think I won't  
>>>>>>>> need parallel chains. I'm reminded of Gelman's folk theorem  
>>>>>>>> of statistical computing.
>>>>>>>> - I followed your advice, but of course I could not set the  
>>>>>>>> true values as starting values, but wanted to set random, bad  
>>>>>>>> starting values. I pasted below what I arrived at, I'm  
>>>>>>>> especially unsure whether I specified the starting values for  
>>>>>>>> G and R properly (I think not).
>>>>>>>> 	start <- list(
>>>>>>>> 		liab=c(rnorm( nrow(krmh.1)*2 )),
>>>>>>>> 		R = list(R1 = diag(rpois(2, 1)+1)),
>>>>>>>> 		G = list(G1 = diag(rpois(2, 1)+1))
>>>>>>>> 	)
>>>>>>>>
>>>>>>>>
>>>>>>>> However, even though I may not need multiple chains for some  
>>>>>>>> of my simpler models, I've now run into conflicting  
>>>>>>>> diagnostics. The geweke.diag for each chain (and examination  
>>>>>>>> of the traces) gives
>>>>>>>> satisfactory diagnostics. Comparing multiple chains using  
>>>>>>>> gelman.diag, however, leads to one bad guy, namely the  
>>>>>>>> traitza_children:spouses interaction.
>>>>>>>> I think this implies that I've got some starting value  
>>>>>>>> dependence for this parameter, that won't be easily rectified  
>>>>>>>> through longer burnin?
>>>>>>>> Do you have any ideas how to rectify this?
>>>>>>>> I am currently doing sequential analyses on episodes of  
>>>>>>>> selection and in historical human data only those who marry  
>>>>>>>> have a chance at having kids. I exclude the unmarried
>>>>>>>> from my sample where I predict number of children, because I  
>>>>>>>> examine that in a previous model and the zero-inflation (65%  
>>>>>>>> zeros, median w/o unmarried = 4) when including the unmarried  
>>>>>>>> is so excessive.
>>>>>>>> Number of spouses is easily the strongest predictor in the  
>>>>>>>> model, but only serves as a covariate here. Since my other  
>>>>>>>> estimates are stable across chains and runs and agree well  
>>>>>>>> across models and with theory, I'm
>>>>>>>> inclined to shrug this off. But probably I shouldn't ignore  
>>>>>>>> this sign of non-convergence?
>>>>>>>>
>>>>>>>>> gelman.diag(mcmc_1)
>>>>>>>> Potential scale reduction factors:
>>>>>>>>
>>>>>>>>                                        Point est. Upper C.I.
>>>>>>>> (Intercept)                                      1.00       1.00
>>>>>>>> traitza_children                                 1.27       1.39
>>>>>>>> male                                             1.00       1.00
>>>>>>>> spouses                                          1.00       1.00
>>>>>>>> paternalage.mean                                 1.00       1.00
>>>>>>>> paternalage.factor(25,30]                        1.00       1.00
>>>>>>>> paternalage.factor(30,35]                        1.00       1.00
>>>>>>>> paternalage.factor(35,40]                        1.00       1.00
>>>>>>>> paternalage.factor(40,45]                        1.00       1.00
>>>>>>>> paternalage.factor(45,50]                        1.00       1.00
>>>>>>>> paternalage.factor(50,55]                        1.00       1.00
>>>>>>>> paternalage.factor(55,90]                        1.00       1.00
>>>>>>>> traitza_children:male                            1.22       1.32
>>>>>>>> traitza_children:spouses                         1.83       2.13
>>>>>>>> traitza_children:paternalage.mean                1.02       1.02
>>>>>>>> traitza_children:paternalage.factor(25,30]       1.03       1.05
>>>>>>>> traitza_children:paternalage.factor(30,35]       1.05       1.08
>>>>>>>> traitza_children:paternalage.factor(35,40]       1.10       1.15
>>>>>>>> traitza_children:paternalage.factor(40,45]       1.12       1.17
>>>>>>>> traitza_children:paternalage.factor(45,50]       1.19       1.28
>>>>>>>> traitza_children:paternalage.factor(50,55]       1.12       1.18
>>>>>>>> traitza_children:paternalage.factor(55,90]       1.11       1.17
>>>>>>>>
>>>>>>>> Multivariate psrf
>>>>>>>>
>>>>>>>> 7.27
>>>>>>>>
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>>
>>>>>>>> Ruben
>>>>>>>>
>>>>>>>>
>>>>>>>> On 26 Aug 2014, at 13:04, Jarrod Hadfield <j.hadfield at ed.ac.uk> wrote:
>>>>>>>>
>>>>>>>>> Hi Ruben,
>>>>>>>>>
>>>>>>>>> There are 400 liabilities in a zapoisson model (2 per  
>>>>>>>>> datum). This code should work:
>>>>>>>>>
>>>>>>>>> g <-sample(letters[1:10], size = 200, replace = T)
>>>>>>>>> pred <- rnorm(200)
>>>>>>>>>
>>>>>>>>> l1<-rnorm(200, -1, sqrt(1))
>>>>>>>>> l2<-rnorm(200, -1, sqrt(1))
>>>>>>>>>
>>>>>>>>> y<-VGAM::rzapois(200, exp(l1), exp(-exp(l2)))
>>>>>>>>>
>>>>>>>>> # generate zero-altered data with an intercept of -1  
>>>>>>>>> (because the intercept and variance are the same for both  
>>>>>>>>> processes this is just standard Poisson)
>>>>>>>>>
>>>>>>>>> dat<-data.frame(y=y, g = g, pred = pred)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> start.1<-list(liab=c(l1,l2), R = list(R1=diag(2)),  
>>>>>>>>> G=list(G1=diag(2)))
>>>>>>>>> prior.1<-list(R=list(V=diag(2), nu=1.002, fix=2),  
>>>>>>>>> G=list(G1=list(V=diag(2), nu=2, alpha.mu=c(0,0),  
>>>>>>>>> alpha.V=diag(2)*1000)))
>>>>>>>>>
>>>>>>>>> m1<-MCMCglmm(y~trait + pred:trait, random=~us(trait):g,  
>>>>>>>>> family="zapoisson",rcov=~idh(trait):units, data=dat,  
>>>>>>>>> prior=prior.1, start= start.1)
>>>>>>>>>
>>>>>>>>> However, there are 2 bugs in the current version of MCMCglmm  
>>>>>>>>> that return an error message when the documentation implies  
>>>>>>>>> it should be fine:
>>>>>>>>>
>>>>>>>>> a) it should be possible to have R=diag(2) rather than R =  
>>>>>>>>> list(R1=diag(2)). This bug cropped up when I implemented  
>>>>>>>>> block-diagonal R structures. It can be fixed by inserting:
>>>>>>>>>
>>>>>>>>>      if(!is.list(start$R)){
>>>>>>>>>         start$R<-list(R1=start$R)
>>>>>>>>>      }
>>>>>>>>>
>>>>>>>>> on L514 of MCMCglmm.R below
>>>>>>>>>
>>>>>>>>>      if(!is.list(prior$R[[1]])){
>>>>>>>>>         prior$R<-list(R1=prior$R)
>>>>>>>>>      }
>>>>>>>>>
>>>>>>>>> b) rcov=~trait:units models for zi/za models will return an  
>>>>>>>>> error when passing starting values. To fix this insert
>>>>>>>>>
>>>>>>>>>     if(diagR==3){
>>>>>>>>>       if(dim(start)[1]!=1){
>>>>>>>>>         stop("V is the wrong dimension for some  
>>>>>>>>> strart$G/start$R elements")
>>>>>>>>>       }
>>>>>>>>>       start<-diag(sum(nfl))*start[1]
>>>>>>>>>     }
>>>>>>>>>
>>>>>>>>> on L90 of priorfromat.R below
>>>>>>>>>
>>>>>>>>>     if(is.matrix(start)==FALSE){
>>>>>>>>>       start<-as.matrix(start)
>>>>>>>>>     }
>>>>>>>>>
>>>>>>>>> I will put these in the new version.
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>>
>>>>>>>>> Jarrod
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Quoting Ruben Arslan <rubenarslan at gmail.com> on Mon, 25 Aug  
>>>>>>>>> 2014 21:52:30 +0200:
>>>>>>>>>
>>>>>>>>>> Hi Jarrod,
>>>>>>>>>>
>>>>>>>>>> thanks for these pointers.
>>>>>>>>>>
>>>>>>>>>>>> You will need to provide over-dispersed starting values  
>>>>>>>>>>>> for multiple-chain convergence diagnostics to be useful  
>>>>>>>>>>>> (GLMM are so simple I am generally happy if the output of  
>>>>>>>>>>>> a single run looks reasonable).
>>>>>>>>>>
>>>>>>>>>> Oh, I would be happy with single chains, but since  
>>>>>>>>>> computation would take weeks this way, I wanted to  
>>>>>>>>>> parallelise and I would use the multi-chain convergence as  
>>>>>>>>>> a criterion that my parallelisation was proper
>>>>>>>>>> and is as informative as a single long chain. There don't  
>>>>>>>>>> seem to be any such checks built-in – I was analysing my 40  
>>>>>>>>>> chains for a bit longer than I like to admit until I  
>>>>>>>>>> noticed they were identical (effectiveSize
>>>>>>>>>> and summary.mcmc.list did not yell at me for this).
>>>>>>>>>>
>>>>>>>>>>>> # use some very bad starting values
>>>>>>>>>> I get that these values are bad, but that is the goal for  
>>>>>>>>>> my multi-chain aim, right?
>>>>>>>>>>
>>>>>>>>>> I can apply this to my zero-truncated model, but am again  
>>>>>>>>>> getting stuck with the zero-altered one.
>>>>>>>>>> Maybe I need only specify the Liab values for this?
>>>>>>>>>> At least I'm getting nowhere with specifying R and G  
>>>>>>>>>> starting values here. When I got an error, I always
>>>>>>>>>> went to the MCMCglmm source to understand why the checks  
>>>>>>>>>> failed, but I didn't always understand
>>>>>>>>>> what was being checked and couldn't get it to work.
>>>>>>>>>>
>>>>>>>>>> Here's a failing example:
>>>>>>>>>>
>>>>>>>>>> l<-rnorm(200, -1, sqrt(1))
>>>>>>>>>> t<-(-log(1-runif(200)*(1-exp(-exp(l)))))
>>>>>>>>>> g = sample(letters[1:10], size = 200, replace = T)
>>>>>>>>>> pred = rnorm(200)
>>>>>>>>>> y<-rpois(200,exp(l)-t)
>>>>>>>>>> y[1:40] = 0
>>>>>>>>>> # generate zero-altered data with an intercept of -1
>>>>>>>>>>
>>>>>>>>>> dat<-data.frame(y=y, g = g, pred = pred)
>>>>>>>>>> set.seed(1)
>>>>>>>>>> start_true = list(Liab=l, R = 1, G = 1 )
>>>>>>>>>> m1<-MCMCglmm(y~1 + pred,random = ~ g,  
>>>>>>>>>> family="zapoisson",rcov=~us(trait):units, data=dat, start=  
>>>>>>>>>> start_true)
>>>>>>>>>>
>>>>>>>>>> # use true latent variable as starting values
>>>>>>>>>> set.seed(1)
>>>>>>>>>> # use some very bad starting values
>>>>>>>>>> start_rand = list(Liab=rnorm(200), R = rpois(1, 1)+1, G =  
>>>>>>>>>> rpois(1, 1)+1 )
>>>>>>>>>> m2<-MCMCglmm(y~1 + pred,random = ~ g,rcov=~us(trait):units,  
>>>>>>>>>>  family="zapoisson", data=dat, start = start_rand)
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>>
>>>>>>>>>> Ruben
>>>>>>>>>>
>>>>>>>>>> On 25 Aug 2014, at 18:29, Jarrod Hadfield  
>>>>>>>>>> <j.hadfield at ed.ac.uk> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Ruben,
>>>>>>>>>>>
>>>>>>>>>>> Sorry  - I was wrong when I said that everything is Gibbs  
>>>>>>>>>>> sampled conditional on the latent variables. The location  
>>>>>>>>>>> effects (fixed and random effects) are also sampled  
>>>>>>>>>>> conditional on the (co)variance components so you should  
>>>>>>>>>>> add them to the starting values. In the case where the  
>>>>>>>>>>> true values are used:
>>>>>>>>>>>
>>>>>>>>>>> m1<-MCMCglmm(y~1, family="ztpoisson", data=dat,  
>>>>>>>>>>> start=list(Liab=l,R=1))
>>>>>>>>>>>
>>>>>>>>>>> Cheers,
>>>>>>>>>>>
>>>>>>>>>>> Jarrod
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Quoting Jarrod Hadfield <j.hadfield at ed.ac.uk> on Mon, 25  
>>>>>>>>>>> Aug 2014 17:14:14 +0100:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Ruben,
>>>>>>>>>>>>
>>>>>>>>>>>> You will need to provide over-dispersed starting values  
>>>>>>>>>>>> for multiple-chain convergence diagnostics to be useful  
>>>>>>>>>>>> (GLMM are so simple I am generally happy if the output of  
>>>>>>>>>>>> a single run looks reasonable).
>>>>>>>>>>>>
>>>>>>>>>>>> With non-Gaussian data everything is Gibbs sampled  
>>>>>>>>>>>> conditional on the latent variables, so you only need to  
>>>>>>>>>>>> pass them:
>>>>>>>>>>>>
>>>>>>>>>>>> l<-rnorm(200, -1, sqrt(1))
>>>>>>>>>>>> t<-(-log(1-runif(200)*(1-exp(-exp(l)))))
>>>>>>>>>>>> y<-rpois(200,exp(l)-t)+1
>>>>>>>>>>>> # generate zero-truncated data with an intercept of -1
>>>>>>>>>>>>
>>>>>>>>>>>> dat<-data.frame(y=y)
>>>>>>>>>>>> set.seed(1)
>>>>>>>>>>>> m1<-MCMCglmm(y~1, family="ztpoisson", data=dat,  
>>>>>>>>>>>> start=list(Liab=l))
>>>>>>>>>>>> # use true latent variable as starting values
>>>>>>>>>>>> set.seed(1)
>>>>>>>>>>>> m2<-MCMCglmm(y~1, family="ztpoisson", data=dat,  
>>>>>>>>>>>> start=list(Liab=rnorm(200)))
>>>>>>>>>>>> # use some very bad starting values
>>>>>>>>>>>>
>>>>>>>>>>>> plot(mcmc.list(m1$Sol, m2$Sol))
>>>>>>>>>>>> # not identical despite the same seed because of  
>>>>>>>>>>>> different starting values but clearly sampling the same  
>>>>>>>>>>>> posterior distribution:
>>>>>>>>>>>>
>>>>>>>>>>>> gelman.diag(mcmc.list(m1$Sol, m2$Sol))
>>>>>>>>>>>>
>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>
>>>>>>>>>>>> Jarrod
>>>>>>>>>>>>
>>>>>>>>>>>> Quoting Ruben Arslan <rubenarslan at gmail.com> on Mon, 25  
>>>>>>>>>>>> Aug 2014 18:00:08 +0200:
>>>>>>>>>>>>
>>>>>>>>>>>>> Dear Jarrod,
>>>>>>>>>>>>>
>>>>>>>>>>>>> thanks for the quick reply. Please, don't waste time  
>>>>>>>>>>>>> looking into doMPI – I am happy that I
>>>>>>>>>>>>> get the expected result, when I specify that  
>>>>>>>>>>>>> reproducible seed, whyever that may be.
>>>>>>>>>>>>> I'm pretty sure that is the deciding factor, because I  
>>>>>>>>>>>>> tested it explicitly, I just have no idea
>>>>>>>>>>>>> how/why it interacts with the choice of family.
>>>>>>>>>>>>>
>>>>>>>>>>>>> That said, is setting up different RNG streams for my  
>>>>>>>>>>>>> workers (now that it works) __sufficient__
>>>>>>>>>>>>> so that I get independent chains and can use  
>>>>>>>>>>>>> gelman.diag() for convergence diagnostics?
>>>>>>>>>>>>> Or should I still tinker with the starting values myself?
>>>>>>>>>>>>> I've never found a worked example of supplying starting  
>>>>>>>>>>>>> values and am thus a bit lost.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Sorry for sending further questions, I hope someone else  
>>>>>>>>>>>>> takes pity while
>>>>>>>>>>>>> you're busy with lectures.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Best wishes
>>>>>>>>>>>>>
>>>>>>>>>>>>> Ruben
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 25 Aug 2014, at 17:29, Jarrod Hadfield  
>>>>>>>>>>>>> <j.hadfield at ed.ac.uk> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Ruben,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I do not think the issue is with the starting values,  
>>>>>>>>>>>>>> because even if the same starting values were used the  
>>>>>>>>>>>>>> chains would still differ because of the randomness in  
>>>>>>>>>>>>>> the Markov Chain (if I interpret your `identical' test  
>>>>>>>>>>>>>> correctly). This just involves a call to GetRNGstate()  
>>>>>>>>>>>>>> in the C++ code (L 871 ofMCMCglmm.cc) so I think for  
>>>>>>>>>>>>>> some reason doMPI/foreach is not doing what you expect.  
>>>>>>>>>>>>>> I am not familiar with doMPI and am in the middle of  
>>>>>>>>>>>>>> writing lectures so haven't got time to look into it  
>>>>>>>>>>>>>> carefully. Outside of the context of doMPI I get the  
>>>>>>>>>>>>>> behaviour I expect:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> l<-rnorm(200, -1, sqrt(1))
>>>>>>>>>>>>>> t<-(-log(1-runif(200)*(1-exp(-exp(l)))))
>>>>>>>>>>>>>> y<-rpois(200,exp(l)-t)+1
>>>>>>>>>>>>>> # generate zero-truncated data with an intercept of -1
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> dat<-data.frame(y=y)
>>>>>>>>>>>>>> set.seed(1)
>>>>>>>>>>>>>> m1<-MCMCglmm(y~1, family="ztpoisson", data=dat)
>>>>>>>>>>>>>> set.seed(2)
>>>>>>>>>>>>>> m2<-MCMCglmm(y~1, family="ztpoisson", data=dat)
>>>>>>>>>>>>>> set.seed(2)
>>>>>>>>>>>>>> m3<-MCMCglmm(y~1, family="ztpoisson", data=dat)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> plot(mcmc.list(m1$Sol, m2$Sol))
>>>>>>>>>>>>>> # different, as expected
>>>>>>>>>>>>>> plot(mcmc.list(m2$Sol, m3$Sol))
>>>>>>>>>>>>>> # the same, as expected
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Quoting Ruben Arslan <rubenarslan at gmail.com> on Mon, 25  
>>>>>>>>>>>>>> Aug 2014 16:58:06 +0200:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Dear list,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> sorry for bumping my old post, I hope to elicit a  
>>>>>>>>>>>>>>> response with a more focused question:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> When does MCMCglmm automatically start from different  
>>>>>>>>>>>>>>> values when using doMPI/foreach?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I have done some tests with models of varying  
>>>>>>>>>>>>>>> complexity. For example, the script in my last
>>>>>>>>>>>>>>> post (using "zapoisson") yielded 40 identical chains:
>>>>>>>>>>>>>>>> identical(mcmclist[1], mcmclist[30])
>>>>>>>>>>>>>>> TRUE
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> A simpler (?) model (using "ztpoisson" and no  
>>>>>>>>>>>>>>> specified prior), however, yielded different chains
>>>>>>>>>>>>>>> and I could use them to calculate gelman.diag()
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Changing my script to the version below, i.e. seeding  
>>>>>>>>>>>>>>> foreach using .options.mpi=list( seed= 1337)
>>>>>>>>>>>>>>> so as to make RNGstreams reproducible (or so I   
>>>>>>>>>>>>>>> thought), led to different chains even for the
>>>>>>>>>>>>>>> "zapoisson" model.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> In no case have I (successfully) tried to supplant the  
>>>>>>>>>>>>>>> default of MCMCglmm's "start" argument.
>>>>>>>>>>>>>>> Is starting my models from different RNGsubstreams  
>>>>>>>>>>>>>>> inadequate compared to manipulating
>>>>>>>>>>>>>>> the start argument explicitly? If so, is there any  
>>>>>>>>>>>>>>> worked example of explicit starting value manipulation
>>>>>>>>>>>>>>> in parallel computation?
>>>>>>>>>>>>>>> I've browsed the MCMCglmm source to understand how the  
>>>>>>>>>>>>>>> default starting values are generated,
>>>>>>>>>>>>>>> but didn't find any differences with respect to RNG  
>>>>>>>>>>>>>>> for the two families "ztpoisson" and "zapoisson"
>>>>>>>>>>>>>>> (granted, I did not dig very deep).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Ruben Arslan
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> # bsub -q mpi -W 12:00 -n 41 -R np20 mpirun -H  
>>>>>>>>>>>>>>> localhost -n 41 R --slave -f  
>>>>>>>>>>>>>>> "/usr/users/rarslan/rpqa/rpqa_main/rpqa_children_parallel.R"
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> library(doMPI)
>>>>>>>>>>>>>>> cl <-  
>>>>>>>>>>>>>>> startMPIcluster(verbose=T,workdir="/usr/users/rarslan/rpqa/rpqa_main/")
>>>>>>>>>>>>>>> registerDoMPI(cl)
>>>>>>>>>>>>>>> Children_mcmc1 =  
>>>>>>>>>>>>>>> foreach(i=1:clusterSize(cl),.options.mpi =  
>>>>>>>>>>>>>>> list(seed=1337) ) %dopar% {
>>>>>>>>>>>>>>> 	library(MCMCglmm);library(data.table)
>>>>>>>>>>>>>>> 	load("/usr/users/rarslan/rpqa/rpqa1.rdata")
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 	nitt = 130000; thin = 100; burnin = 30000
>>>>>>>>>>>>>>> 	prior.m5d.2 = list(
>>>>>>>>>>>>>>> 		R = list(V = diag(c(1,1)), nu = 0.002),
>>>>>>>>>>>>>>> 		G=list(list(V=diag(c(1,1e-6)),nu=0.002))
>>>>>>>>>>>>>>> 	)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 	rpqa.1 = na.omit(rpqa.1[spouses>0, list(idParents,  
>>>>>>>>>>>>>>> children, male, urban, spouses, paternalage.mean,  
>>>>>>>>>>>>>>> paternalage.factor)])
>>>>>>>>>>>>>>> 	(m1 = MCMCglmm( children ~ trait * (male + urban +  
>>>>>>>>>>>>>>> spouses + paternalage.mean + paternalage.factor),
>>>>>>>>>>>>>>> 						rcov=~us(trait):units,
>>>>>>>>>>>>>>> 						random=~us(trait):idParents,
>>>>>>>>>>>>>>> 						family="zapoisson",
>>>>>>>>>>>>>>> 						prior = prior.m5d.2,
>>>>>>>>>>>>>>> 						data=rpqa.1,
>>>>>>>>>>>>>>> 						pr = F, saveX = F, saveZ = F,
>>>>>>>>>>>>>>> 						nitt=nitt,thin=thin,burnin=burnin))
>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> library(coda)
>>>>>>>>>>>>>>> mcmclist =  
>>>>>>>>>>>>>>> mcmc.list(lapply(Children_mcmc1,FUN=function(x) {  
>>>>>>>>>>>>>>> x$Sol}))
>>>>>>>>>>>>>>> save(Children_mcmc1,mcmclist, file =  
>>>>>>>>>>>>>>> "/usr/users/rarslan/rpqa/rpqa_main/rpqa_mcmc_kids_za.rdata")
>>>>>>>>>>>>>>> closeCluster(cl)
>>>>>>>>>>>>>>> mpi.quit()
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 04 Aug 2014, at 20:25, Ruben Arslan  
>>>>>>>>>>>>>>> <rubenarslan at gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Dear list,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> would someone be willing to share her or his efforts  
>>>>>>>>>>>>>>>> in parallelising a MCMCglmm analysis?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I had something viable using harvestr that seemed to  
>>>>>>>>>>>>>>>> properly initialise
>>>>>>>>>>>>>>>> the starting values from different random number  
>>>>>>>>>>>>>>>> streams (which is desirable,
>>>>>>>>>>>>>>>> as far as I could find out), but I ended up being  
>>>>>>>>>>>>>>>> unable to use harvestr, because
>>>>>>>>>>>>>>>> it uses an old version of plyr, where parallelisation  
>>>>>>>>>>>>>>>> works only for multicore, not for
>>>>>>>>>>>>>>>> MPI.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I pasted my working version, that does not do  
>>>>>>>>>>>>>>>> anything about starting values or RNG
>>>>>>>>>>>>>>>> at the end of this email. I can try to fumble further  
>>>>>>>>>>>>>>>> in the dark or try to update harvestr,
>>>>>>>>>>>>>>>> but maybe someone has gone through all this already.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I'd also appreciate any tips for elegantly  
>>>>>>>>>>>>>>>> post-processing such parallel data, as some of my usual
>>>>>>>>>>>>>>>> extraction functions and routines are hampered by the  
>>>>>>>>>>>>>>>> fact that some coda functions
>>>>>>>>>>>>>>>> do not aggregate results over chains. (What I get  
>>>>>>>>>>>>>>>> from a single-chain summary in MCMCglmm
>>>>>>>>>>>>>>>> is a bit more comprehensive, than what I managed to  
>>>>>>>>>>>>>>>> cobble together with my own extraction
>>>>>>>>>>>>>>>> functions).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The reason I'm parallelising my analyses is that I'm  
>>>>>>>>>>>>>>>> having trouble getting a good effective
>>>>>>>>>>>>>>>> sample size for any parameter having to do with the  
>>>>>>>>>>>>>>>> many zeroes in my data.
>>>>>>>>>>>>>>>> Any pointers are very appreciated, I'm quite  
>>>>>>>>>>>>>>>> inexperienced with MCMCglmm.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Best wishes
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Ruben
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> # bsub -q mpi-short -W 2:00 -n 42 -R np20 mpirun -H  
>>>>>>>>>>>>>>>> localhost -n 41 R --slave -f  
>>>>>>>>>>>>>>>> "rpqa/rpqa_main/rpqa_children_parallel.r"
>>>>>>>>>>>>>>>> library(doMPI)
>>>>>>>>>>>>>>>> cl <- startMPIcluster()
>>>>>>>>>>>>>>>> registerDoMPI(cl)
>>>>>>>>>>>>>>>> Children_mcmc1 = foreach(i=1:40) %dopar% {
>>>>>>>>>>>>>>>> 	library(MCMCglmm)
>>>>>>>>>>>>>>>> 	load("rpqa1.rdata")
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 	nitt = 40000; thin = 100; burnin = 10000
>>>>>>>>>>>>>>>> 	prior = list(
>>>>>>>>>>>>>>>> 		R = list(V = diag(c(1,1)), nu = 0.002),
>>>>>>>>>>>>>>>> 		G=list(list(V=diag(c(1,1e-6)),nu=0.002))
>>>>>>>>>>>>>>>> 	)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 	MCMCglmm( children ~ trait -1 +  
>>>>>>>>>>>>>>>> at.level(trait,1):male + at.level(trait,1):urban +  
>>>>>>>>>>>>>>>> at.level(trait,1):spouses +  
>>>>>>>>>>>>>>>> at.level(trait,1):paternalage.mean +  
>>>>>>>>>>>>>>>> at.level(trait,1):paternalage.factor,
>>>>>>>>>>>>>>>> 		rcov=~us(trait):units,
>>>>>>>>>>>>>>>> 		random=~us(trait):idParents,
>>>>>>>>>>>>>>>> 		family="zapoisson",
>>>>>>>>>>>>>>>> 		prior = prior,
>>>>>>>>>>>>>>>> 		data=rpqa.1,
>>>>>>>>>>>>>>>> 		pr = F, saveX = T, saveZ = T,
>>>>>>>>>>>>>>>> 		nitt=nitt,thin=thin,burnin=burnin)
>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> library(coda)
>>>>>>>>>>>>>>>> mcmclist =  
>>>>>>>>>>>>>>>> mcmc.list(lapply(Children_mcmc1,FUN=function(x) {  
>>>>>>>>>>>>>>>> x$Sol}))
>>>>>>>>>>>>>>>> save(Children_mcmc1,mcmclist, file =  
>>>>>>>>>>>>>>>> "rpqa_mcmc_kids_za.rdata")
>>>>>>>>>>>>>>>> closeCluster(cl)
>>>>>>>>>>>>>>>> mpi.quit()
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> Ruben C. Arslan
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Georg August University G�ttingen
>>>>>>>>>>>>>>>> Biological Personality Psychology and Psychological Assessment
>>>>>>>>>>>>>>>> Georg Elias M�ller Institute of Psychology
>>>>>>>>>>>>>>>> Go�lerstr. 14
>>>>>>>>>>>>>>>> 37073 G�ttingen
>>>>>>>>>>>>>>>> Germany
>>>>>>>>>>>>>>>> Tel.: +49 551 3920704
>>>>>>>>>>>>>>>> https://psych.uni-goettingen.de/en/biopers/team/arslan
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 	[[alternative HTML version deleted]]
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> The University of Edinburgh is a charitable body, registered in
>>>>>>>>>>>>>> Scotland, with registration number SC005336.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> The University of Edinburgh is a charitable body, registered in
>>>>>>>>>>>> Scotland, with registration number SC005336.
>>>>>>>>>>>>
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> R-sig-mixed-models at r-project.org mailing list
>>>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> The University of Edinburgh is a charitable body, registered in
>>>>>>>>>>> Scotland, with registration number SC005336.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> The University of Edinburgh is a charitable body, registered in
>>>>>>>>> Scotland, with registration number SC005336.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> The University of Edinburgh is a charitable body, registered in
>>>>>>> Scotland, with registration number SC005336.
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> The University of Edinburgh is a charitable body, registered in
>>>>> Scotland, with registration number SC005336.
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> The University of Edinburgh is a charitable body, registered in
>>> Scotland, with registration number SC005336.
>>>
>>>
>>
>>
>
>
>
> -- 
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



More information about the R-sig-mixed-models mailing list