[R-sig-ME] Fw: mcmcsamp(lme4): What is contained in $ST and $sigma?

Wed Oct 8 00:31:03 CEST 2008

On Tue, Oct 7, 2008 at 4:57 PM, David LeBlond <David.LeBlond at abbott.com> wrote:

> Prof. Bates,

> Thanks for the note. Hopefully I am responding properly to the list.

Indeed you are.

> Now I understand better. mcmcsamp is still under development. It will be
> really useful when it is available. I know this is eminently (if not
> imminently) doable. Having the posterior opens up all kinds of
> possibilities. I really encourage this project!!!

> While not yet "ready for prime time", I do have a question about generating
> posteriors from variance parameters when finally available. As I understand
> it mcmcsamp will produce a posterior sample directly for all fixed effects
> and for the error variance. I understand from your comment that to get the
> posterior sample for the other random effect variances, I will need to
> multiply the error variance times the appropriate vector in $ST for each
> draw (ie- a vector multiplication). The resulting vector would be a sample
> from the posterior of the random effect variance. That is - we are taking 2
> posterior samples (the $sigma and the $ST[?] and forming the desired
> posterior sample by their vector product. Do I have this right? Please
> ignore my confusing sigmas and variances. I like sigmas better too.

The plan is to have extractor function, actually a method for the
VarCorr generic, that will calculate the desired quantities, be they
variances and covariances or standard deviations and correlations or
logarithms of standard deviations and Fisher's z-transform of the
correlations, from the sigma and ST components.  (The name ST comes
from the fact that the relative covariance factor is the product of a
diagonal scale matrix S and a unit lower triangular matrix T.)

There are hints of that code in the lme4/R/lmer.R and
lme4/src/lmer.[ch] files but they have not yet been fleshed out.  I
figure I should get mcmcsamp working before trying to figure out what
to do with the results.

> "Douglas Bates" <bates at stat.wisc.edu>
> Sent by: dmbates at gmail.com
>
> 10/07/2008 04:35 PM
>
> To
> "David LeBlond" <David.LeBlond at abbott.com>
> cc
> r-sig-mixed-models at r-project.org
> Subject
> Re: [R-sig-ME] Fw: mcmcsamp(lme4): What is contained in $ST and $sigma?
>
>
>
>
> On Fri, Oct 3, 2008 at 1:55 PM, David LeBlond <David.LeBlond at abbott.com>
> wrote:
>> Dear List,
>>
>> I am new to lme4. I believe it will be very useful for analysis of
>> pharmaceutical stability data in which batches are considered to have
>> random slopes and intercepts. We now use SAS Proc Mixed for this, but I
>> like the lme4 approach and the other capabilities of R better.
>>
>> What attracts me to lme4 is the possibility of obtaining a posterior
>> sample using mcmcsamp(). How else can one obtain meaningful interval
>> estimates or make predictions about future data?
>>
>> However, I do not know how to interpret the contents of the mcmcsamp
>> output slots $ST and $sigma. I find that the samples contained in those
>> slots do not agree with the estimates provided by lmer(). I hope someone
>> more knowledgeable will be willing to explain this to me.
>>
>> To make things concrete, I provide below an example analysis of simulated
>> data.
>>
>> The following code generates 50 batches worth of stability tests. Batch
>> intercepts and slopes are drawn from N(100,,4^2) and N(-1,2^2). Analytical
>> results at each time point (0,3,...,36 months) are drawn from
>> N(intercept+slope*month, 1^2). So there are 3 random paramters (variance
>> for slope, intercept, and error) and 2 fixed effects (mean slope and
>> intercept). With such a large balanced data set, I expect rapid
>> convergence of the Gibbs sampler.
>>
>> B<-50 # number of batches
>> Months<-c(0,3,6,9,12,24,30,36) # pull points
>> n.Months<-length(Months)
>> Init.mean<-100 # initial level for the process
>> Init.SD<-4 # SD of the process initial
>> Slope.mean<- -1
>> Slope.SD<- 2
>> Error.SD<- 1
>> # Prepare Data Vectors
>> Batch<-rep(NA,B*n.Months)
>> Month<-rep(NA,B*n.Months)
>> Y<-rep(NA,B*n.Months)
>> for(i in 1:B){
>>  slope<-rnorm(1,Slope.mean,Slope.SD)
>>  intercept<-rnorm(1,Init.mean,Init.SD)
>>  for(j in 1:n.Months){
>>    k<-(i-1)*n.Months+j
>>    Batch[k]<-i
>>    Month[k]<-Months[j]
>>    Y[k]<-intercept+Month[k]*slope+rnorm(1,0,Error.SD)
>>  }
>> }
>> Stab<-cbind(Batch=factor(Batch),Month=Month,Y=Y);Stab
>>
>> I use the following to analyze these data:
>>
>> library(lme4)
>> anal<-lmer(Y~1+Month+(1|Batch)+(-1+Month|Batch))
>> anal
>>
>> The following output is obtained:
>>
>> Linear mixed model fit by REML
>> Formula: Y ~ 1 + Month + (1 | Batch) + (-1 + Month | Batch)
>>  AIC  BIC logLik deviance REMLdev
>>  1821 1841 -905.4     1811    1811
>> Random effects:
>>  Groups   Name        Variance Std.Dev.
>>  Batch    (Intercept) 19.5551  4.42212
>>  Batch    Month        4.4818  2.11704
>>  Residual              0.9780  0.98894
>> Number of obs: 400, groups: Batch, 50
>>
>> Fixed effects:
>>            Estimate Std. Error t value
>> (Intercept) 100.0254     0.6302  158.73
>> Month        -1.3988     0.2994   -4.67
>>
>> Correlation of Fixed Effects:
>>      (Intr)
>> Month -0.001
>>
>> This is terrific. The estimates agree well with theory. Armed with
>> confidence (or rather credibility?) I obtain interval estimate as follows:
>>
>> HPDinterval(mcmcsamp(anal,n=10000))
>>
>> The following output is obtained:
>>
>> $fixef
>>                lower       upper
>> (Intercept) 99.447112 100.5605457
>> Month       -1.905921  -0.8496636
>> attr(,"Probability")
>> [1] 0.95
>>
>> $ST
>>         lower     upper
>> [1,] 0.7546098 0.9719147
>> [2,] 0.6913345 1.7690498
>> attr(,"Probability")
>> [1] 0.95
>>
>> $sigma
>>        lower    upper
>> [1,] 1.710004 2.169471
>> attr(,"Probability")
>> [1] 0.95
>>
>> While the fixed effect posteriors compare very well with truth and with
>> lmer point estimates, the $sigma is strange. I expect it to be close to 1
>> since this is the error SD used to generate the data and also the value
>> reported by lmer. Instead $sigma contains a values between 1.7 and 2.2.
>> Why?
>
> Most likely because there is a mistake in the mcmcsamp function at
> present.  I have seen the same thing when trying to get an MCMC sample
> from a model with two uncorrelated random effects per group.  I'm not
> sure if the problem is in the theory or in the implementation but
> there is definitely a problem.  (I hope it is in the implementation
> and not the theory.)
>
>> Also, I expect the $ST slot to bear some resemblance to the true SDs for
>> intercept and slope (the other 2 random parameters in the model).  I
>> expect values near 4 and 2 respectively, not near 0.8 and 1.1
>> respectively. I realize there may be some "common scale" factor involved
>> here, but I cannot rationalize how to transform $ST into something I can
>> use. This is particularly vexing since the "common scale factor" ($sigma)
>> bears little resemblence to expectation. Further, the common scale factor
>> is actually a paramter and therefore has a distribution; a simple
>> multiplication by a point estimate of sigma would not seem appropriate to
>> obtain posteriors for SD.slope and SD.intercept.
>
> Yes.  What is supposed to happen is that the product of an element of
> the ST array (these are the parameters that determine the relative
> standard deviations of the random effects) and the corresponding
> element of sigma would give you the value of the standard deviation of
> the random effects.
>
>> The kind readers who have followed me through this will realize I am quite
>> clueless. Possibly I have not found the right background material (though
>> I have searched hard. Prof Bates kindly provided me some useful
>> presentations). Or possibly, mcmcsamp() is simply not ready for prime
>> time. I can live with that - for now. Maybe I am using the functions
>> inappropriately.
>
> The current implementation of mcmcsamp is not ready for prime time.
>
>> I can obtain posterior samples using the PRIOR statement of SAS Proc Mixed
>> for a wide class of variance component models. The difficulty with SAS is
>> that technical descriptions of the methodology are not forthcoming, SAS
>> has not been very progressive, the modeling syntax is confusing and
>> non-intuitive, one is quickly limited by the very painful SAS graphics,
>> and - of course - SAS is not free.
>>
>> Any suggestions, comments, or insight would be greatly appreciated. This
>> lmeX library could be extremely useful to me in my nonclinical statistical
>> work and I really appreciate that it is under development. I hope to
>> understand it better and thank you in advance for your help!!
>>
>> Best regards,
>>
>> Dave LeBlond
>> Sr. Statistician, Non-Clinical Statistics
>> email: david.leblond at abbott.com
>> voice mail: 847-935-6031
>> Dept R436, Bldg AP9A-1
>> Abbott Laboratories
>> 100 Abbott Park Road
>> Abbott Park, IL 60064-3500
>>        [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>
>