[R-sig-ME] Data frame size limits in MCMCglmm?

Jarrod Hadfield j.hadfield at ed.ac.uk
Fri Jan 25 11:36:05 CET 2013


Hi Stuart,

2.4 million records is bigger than anything I've tried but in theory  
it should run, or return an error if it can't allocate enough memory.  
It definitely shouldn't be seg-faulting.  If you could send a  
reproducible example (preferably one where it fails quickly) I will  
take a look into it.

Cheers,

Jarrod.






Quoting Joshua Wiley <jwiley.psych at gmail.com> on Fri, 18 Jan 2013  
20:28:33 -0800:

> Hi Stuart,
>
> How many (if any) iterations completed before you got the seg fault?
> Also, how much memory does your system have?
>
> Currently at 4000 iterations, I have not reproduced the error so far
> with this made up example (although I have practically finished an
> algorithm to sort any array in O(log log n) while waiting to get this
> far ~1 hour per thousand iterations on a 6 core 3.9GHZ machine).
>
> require(MCMCglmm)
> require(MASS)
> set.seed(10)
> rint <- rep(rnorm(14982, 0, 4), each = floor(2.44e6/14982))
> ID <- factor(rep(1:14982,  each = floor(2.44e6/14982)))
> X <- MASS::mvrnorm(length(rint), mu = c(0, 0, 0),
>   Sigma = matrix(c(1, .3, .3, .3, 1, .3, .3, .3, 1), 3))
> b <- matrix(c(1.2, -.5, 2))
> ycont <- rnorm(length(ID), mean = 2 + rint + X %*% b)
> yord <- cut(ycont, breaks = quantile(ycont, c(0, .2, .4, .6, .8, 1)),
>   include.lowest=TRUE, ordered_result=TRUE)
> testdat <- data.frame(yord, ID, x1 = X[,1], x2 = X[, 2], x3 = X[, 3])
> ## > gc()
> ##            used  (Mb) gc trigger  (Mb) max used  (Mb)
> ## Ncells  1117490  59.7    1710298  91.4  1368491  73.1
> ## Vcells 25878798 197.5   47836772 365.0 47753473 364.4
> m <- MCMCglmm(yord ~ x1 + x2 + x3, random = ~ ID,
>   family = "ordinal", data = testdat,
>   prior = list(
>     R = list(V = 1, fix = 1),
>     G = list(
>       G1 = list(V = 1, nu = 0))),
>   nitt = 13000, thin = 10, burnin = 3000)
>
> on a Win 8 pro x64 system with 32GB of memory and
>
> R version 2.15.2 (2012-10-26)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
>  [1] MASS_7.3-22        ggplot2_0.9.3      MCMCglmm_2.17
>  [4] corpcor_1.6.4      ape_3.0-6          coda_0.16-1        Matrix_1.0-10
>  [8] lattice_0.20-10    tensorA_0.36
>
>
>
>
> On Fri, Jan 18, 2013 at 1:55 PM, Stuart Luppescu  
> <slu at ccsr.uchicago.edu> wrote:
>> Hello, I'm having problems running a simple ordinal outcome mixed
>> effects model, and I'm thinking it may be because of the size of the
>> dataset (or, it very may well be that I'm not specifying the model
>> correctly). I must confess to insecurity about how to specify the
>> priors. Here is the structure of the data frame (with columns not in
>> this model omitted). Note that there are more than 2.4 million rows. Is
>> that a problem?
>>
>>  str(all.subj)
>> 'data.frame':   2438922 obs. of  112 variables:
>>  $ gr10                  : num  0 0 0 0 0 0 1 0 1 0 ...
>>  $ gr11                  : num  0 0 0 1 1 1 0 0 0 0 ...
>>  $ gr12                  : num  1 1 1 0 0 0 0 1 0 0 ...
>>  $ tid                   : Factor w/ 14982 levels  
>> "........","A.D46607",..: 2 2 2 2 2 2 2 2 2 2 ...
>>  $ final.points          : Ord.factor w/ 5 levels  
>> "0"<"1"<"2"<"3"<..: 4 4 4 2 3 3 2 2 3 2 ...
>>
>> Here are two attempts and their results:
>>
>> glmm.uncond <- MCMCglmm(final.points ~ gr10 + gr11 + gr12,
>>                          prior=list(R=list(V=1, fix=1),  
>> G=list(G1=list(V=1, nu=0))),
>>                          random = ~tid ,
>>                          family = "ordinal",
>>                          nitt=100000,
>>                          data = all.subj)
>>
>> Error: segfault from C stack overflow
>>
>>
>>  glmm.uncond <- MCMCglmm(final.points ~ gr10 + gr11 + gr12,
>>                          prior=list(R=list(V=1, nu=0),  
>> G=list(G1=list(V=1, nu=0))),
>>                          random = ~tid ,
>>                          family = "ordinal",
>>                          nitt=100000,
>>                          data = all.subj)
>>
>>
>> Process R segmentation fault (core dumped) at Fri Jan 18 12:53:49 2013
>>
>> Here is my sessionInfo
>> R version 2.15.1 (2012-06-22)
>> Platform: x86_64-redhat-linux-gnu (64-bit)
>>
>> locale:
>>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>>  [7] LC_PAPER=C                 LC_NAME=C
>>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> other attached packages:
>> [1] foreign_0.8-52  MCMCglmm_2.17   corpcor_1.6.4   ape_3.0-6
>> [5] coda_0.16-1     Matrix_1.0-10   lattice_0.20-13 tensorA_0.36
>>
>> loaded via a namespace (and not attached):
>> [1] compiler_2.15.1 gee_4.13-18     grid_2.15.1     nlme_3.1-107
>> [5] tools_2.15.1
>>
>> and memory info.
>>
>>  gc()
>>             used   (Mb) gc trigger    (Mb)   max used    (Mb)
>> Ncells   1196029   63.9    1835812    98.1    1710298    91.4
>> Vcells 678385919 5175.7 1777423119 13560.7 2114537169 16132.7
>>
>> Any help will be appreciated.
>>
>>
>> --
>> Stuart Luppescu <slu at ccsr.uchicago.edu>
>> University of Chicago
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
>
>
> --
> Joshua Wiley
> Ph.D. Student, Health Psychology
> Programmer Analyst II, Statistical Consulting Group
> University of California, Los Angeles
> https://joshuawiley.com/
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



More information about the R-sig-mixed-models mailing list