[R-sig-ME] Fwd: RE: [R] Help with lsmeans

Tue Sep 23 00:21:43 CEST 2014

  (Discussion of possible interest to others on r-sig-mixed-models about
running lsmeans::ref.grid for large data sets ...)

-------- Forwarded Message --------
Subject: 	RE: [R] Help with lsmeans
Date: 	Mon, 22 Sep 2014 22:05:54 +0000
From: 	Lenth, Russell V <russell-lenth at uiowa.edu>
To: 	Ben Bolker <bbolker at gmail.com>, Dan Dillon <dgdillon at gmail.com>,
Søren Højsgaard <sorenh at math.aau.dk>

All:

Thanks for calling this to my attention.

There **is** a way to disable that time-consuming step… just set the
*lsmeans* option disable.pbrtest to TRUE. Here’s Ben’s example, run both
ways, showing the timing and the covariance matrix obtained:

> system.time(rr <- ref.grid(m2))

   user  system elapsed

  13.39    1.66   15.05

> rr at V

              (Intercept)             x

(Intercept)  9.900968e-05 -0.0001482768

x           -1.482768e-04  0.0002955296

> lsm.options(disable.pbkrtest=TRUE)

> system.time(rrd <- ref.grid(m2))

   user  system elapsed

   0.11    0.00    0.10

> rrd at V

              (Intercept)             x

(Intercept)  9.900036e-05 -0.0001482582

x           -1.482582e-04  0.0002954926

In the second result, the V matrix is just the result of vcov(mm)instead
of vcovAdj(mm). Note also that you can also set the degrees of freedom
to any desired value in the lsmeans or ref.grid call:

> lsmeans(m2, "x", at = list(x = c(.3,.5,.7)), options = list(df = 24))

   x      lsmean          SE df     lower.CL   upper.CL

 0.3 0.009617796 0.006053079 24 -0.002875145 0.02211074

 0.5 0.006692900 0.004961380 24 -0.003546885 0.01693269

 0.7 0.003768004 0.006019155 24 -0.008654921 0.01619093

Confidence level used: 0.95

(Setting df to NAwill produce asymptotic results.)

Hope this helps.

Russ

*From:*Ben Bolker [mailto:bbolker at gmail.com]
*Sent:* Monday, September 22, 2014 3:06 PM
*To:* Dan Dillon; Søren Højsgaard; Lenth, Russell V
*Subject:* Re: [R] Help with lsmeans

PS: smaller example, showing the same thing (and the equivalent just
with pbkrtest)

library("lme4")
library("lsmeans")
set.seed(101)
dd <- data.frame(y=rnorm(4e4),x=runif(4e4),f=gl(n=200,k=200))
m2 <- lmer(y~x+(1|f),data=dd)
m0 <- lmer(y~1+(1|f),data=dd)
rr <- ref.grid(m2)
library("pbkrtest")
kk <- KRmodcomp(m2,m0)

On Mon, Sep 22, 2014 at 11:09 AM, Dan Dillon <dgdillon at gmail.com
<mailto:dgdillon at gmail.com>> wrote:

    Hi Ben. Apologies if I should be sending this to the list instead of
    directly you, but I am still having trouble with lsmeans and wonder
    if you can help. I have attached my data and list my code below. In
    a nutshell, lsmeans works fine when I direct it towards a logistic
    model where my DV is accuracy (coded 0 or 1), but it hangs for
    hours, uses up all the application memory on my machine, and does
    not generate a result when I direct it towards a linear model where
    my DV is RT (response time).

    I took your advice and tried working with subsets of the data, and
    found that lsmeans can handle the RT model if the data set is small
    enough. I have data from four sites: US, UK, JP, and SA. If I run a
    model on RT data from three cases from the US, lsmeans runs for ~1
    minute and generates a result. If I scale up and use all the data
    from just the US and UK sites, lsmeans works for 20+ minutes but
    gives a result. However, if I throw all the RT data at it, lsmeans
    hangs for hours and uses up all my application memory without
    generating a result. By contrast, if I use all the data but accuracy
    is the DV (i.e., logistic model), I get an answer in seconds.

    If you have any other ideas or thoughts here, I would greatly
    appreciate it. I am new to R and lme4, and I come from a tradition
    (experimental psychology) where many reviewers will want to see
    follow-up test for interactions. lsmeans looks like the right tool
    for the job, so I'd love to get it running for both my accuracy and
    RT data.

    Thanks for any help you can provide--code below.

    Dan Dillon

    # The data are from a flanker task, where stimuli are congruent or
    incongruent (CON or INC). I have two groups of subjects.

    flk = read.csv('my_data.csv')

    flk$group <- as.factor(flk$group)

    fm.acc <- glmer(accuracy ~ site + group*stimulus + (1|subject),
    family = binomial, data = flk)

    acc.rg <- ref.grid(fm.acc) # Works fine

    print(acc.rg) # Works fine

    lsmeans(fm.acc, ~group|stimulus) # Works fine

    flk$accuracy <- as.factor(flk$accuracy) # I want to include accuracy
    as a factor in the RT model

    fm.rt <- lmer(rt ~ site + group*stimulus*accuracy + (1|subject),
    data = flk) # Works fine, model results look good

    rt.rg <- ref.grid(fm.rt) # Does not work, hangs for hours.

    On Tue, Aug 26, 2014 at 5:07 PM, Ben Bolker <bbolker at gmail.com
    <mailto:bbolker at gmail.com>> wrote:

        Dan Dillon <dgdillon <at> gmail.com <http://gmail.com>> writes:

        >
        > Colleagues:
        >

         [snip]

        >
        > My data are from a behavioral experiment in which two groups
        of subjects
        > complete 200+ trials of a task with two conditions. Each
        subject is tested
        > in one of four separate locations. I record accuracy (0 or 1)
        and response
        > time (RT) on each trial--these are the DVs for the two
        regressions. Thus,
        > my dataframe has columns "location", "group", "subject", "trial",
        > "condition", "accuracy", and "RT".
        >
        > The regression model for accuracy looks like this:
        >
        > acc.fm <http://acc.fm> = glmer(accuracy ~ location +
        group*condition + (1|subject),
        > family=binomial, data=my_data)
        >
        > The results look as expected and I'm using lsmeans to do some
        follow-up
        > analyses. For example, to compare accuracy by group and
        condition, I'm
        > doing this:
        >
        > acc.lsm <- lsmeans(acc.fm <http://acc.fm>, ~group|condition)
        >
        > pairs(acc.lsm)
        >
        >

         [snip]

        > Here is my model for the RT data
        > (RT is a continuous variable so no logistic regression here):
        >
        > rt.fm <http://rt.fm> = lmer(rt ~ location +
        group*condition*accuracy + (1|subject),
        > data=my_data)
        >
        > The results from this regression look fine, but if I try this
        . . .
        >
        > rt.lsm <- lsmeans(rt.fm <http://rt.fm> ~ group|condition)
        >
        > . . . or if I try to specify a reference grid like this . . .
        >
        > rt.rg <- ref.grid(rt.fm <http://rt.fm>)
        >
        > . . . my machine hangs.
        >

          [snip]

          It's a little hard to say without a reproducible example, and
        this question would probably be slightly more appropriate for
        r-sig-mixed-models at r-project.org
        <mailto:r-sig-mixed-models at r-project.org> (although I can't
        actually tell
        for sure whether it is an lme4-specific problem or a more general
        ls.means::ref.grid question), but: how big a reference is ref.grid()
        trying to construct?  Is it fairly high-resolution/high-dimensional?
        I would probably try some experiments with small subsets of your
        data
        to see how the results scale.

        ______________________________________________
        R-help at r-project.org <mailto:R-help at r-project.org> mailing list
        https://stat.ethz.ch/mailman/listinfo/r-help
        PLEASE do read the posting guide
        http://www.R-project.org/posting-guide.html
        and provide commented, minimal, self-contained, reproducible code.