[ESS] R session starts sending commands to wrong place
Ross Boylan
ross at biostat.ucsf.edu
Tue Jun 30 18:53:05 CEST 2009
On Tue, 2009-06-30 at 10:06 -0500, Rodney Sparapani wrote:
> Ross Boylan wrote:
> > Short version: I have an ess-remote session (under openMPI) and a
> > regular R session. Sometimes, typically after a long computation
> > (hours) in the ess-remote session, commands entered in the ess-remote
> > session get echoed to and executed by the regular R session. The
> > ess-remote session appears hung and unresponsive during this time.
> >
> > I worked around this by quitting R in the *R* window, responding yes to
> > several complaints about the R process disappearing, and then typing the
> > commands in the ess-remote buffer. This time, they stayed in the
> > buffer.
> >
> > Aside from not starting multiple R sessions, is there a good way to
> > avoid this problem? Is it a bug?
> >
> > DETAILS
> >
> > 11433 pts/4 S<s 0:00 /bin/sh
> > 11530 pts/4 S<+ 0:54 \_ emacs
> > 14250 pts/0 S<s 0:00 \_ /bin/sh -i
> > 30209 pts/0 S<+ 0:00 | \_ mpirun -np 32 --hostfile hosts
> > RMPIInteractive
> > 30014 pts/6 S<s+ 0:09 \_ /usr/lib64/R/bin/exec/R --no-readline
> >
> > 30214 ? S<s 0:00 orted --bootproxy 1 --name 0.0.1 --num_procs
> > 5 --vpid_start 0 --nodename n7 --universe ross at n7:default-universe-3
> > 30215 ? S< 0:00
> > \_ /bin/bash /home/ross/clean/OLTData/RMPIInteractive
> > 30219 ? S< 141:09 | \_ /usr/lib64/R/bin/exec/R --no-save
> >
> > 30219 is the only R process not at 100% CPU, characteristic of openmpi
> > slaves. So presumably that is where the commands should be going.
> > strace showed only
> > 30219 15:49:37 read(0, <unfinished ...>
> >
> > I wonder if the cause is some interaction between ESS and openmpi, which
> > does some input and output redirecting to wire one of the spawned
> > processes (almost certainly 30219) to my "terminal". My understanding
> > is that ess-remote is simply sending commands to that terminal, and
> > openmpi is taking care of getting them to the master R.
> >
> > In even more detail:
> > 1. start emacs
> > 2. open shell within emacs
> > 3. execute the mpirun command within the shell.
> > 4. They invoked script does
> > R --no-save $*
> > for rank 0 and
> > R --no-save $* > rmpi.$RANK 2>&1
> > for others.
> > 5. In emacs, invoke ess-remote with language r.
> > 6. In my terminal
> > options(error=recover)
> > in an effort to avoid death at the first error.
> > It does that, but R's machinery still seems to think it's
> > non-interactive.
> > 7. do long computation
> > 8. type a command, most commonly save.image()
> > 9. hit enter.
> > 10. cursor sits blinking on the "(" in save.image()
> > 11. terminal non-responsive to inputs, mostly (in particular, hitting
> > enter has no effect)
> > 12. hitting ctl-g seems to cause previous enters to print out, and
> > better response to keys (at least I can switch to another session).
> >
> > Variations:
> > Step 8 is sometimes preceded by some commands executing successfully.
> > It has hung up on commands involving no obvious disk I/O, e.g., print.
> > This coupled, with sys admin check that the disk is OK, suggests it's
> > not a disk problem.
> >
> > The workspace (.RData) is around 13MB. It's time stamp seems to match
> > when I issued the save.image() command, but that is a save from the
> > regular R process in the same directory.
> >
> > I am not sure about the relative timing of launching the ess-remote
> > process and the regular ess process.
> >
> > Debian Lenny
> > ess 5.3.8~svn3917-1
> > emacs 22.2+2-5
> > openmpi-bin 1.2.7~rc2-2
> > r-base-core 2.7.1-1+lenny1
> >
>
> Nice bug report. However, I have no idea :o) Can you upgrade these
> packages to the latest version and still reproduce the bug?
>
> Rodney
Since it's on a cluster, upgrades are non-trivial. I could probably
start with ess, possibly just invoking it from my own machine.
I think I've also seen the failure before running a long computation;
it's a little random. There might be a simple rule I haven't noticed,
like starting ess-remote, then launch esc-x R -> ess-remote inputs get
redirected. It might matter that they are in the same directory.
I just tried the simple rule locally; it's not that simple :(
Ross
More information about the ESS-help
mailing list