[ESS] R session starts sending commands to wrong place

Ross Boylan ross at biostat.ucsf.edu
Tue Jun 30 18:53:05 CEST 2009


On Tue, 2009-06-30 at 10:06 -0500, Rodney Sparapani wrote:
> Ross Boylan wrote:
> > Short version: I have an ess-remote session (under openMPI) and a
> > regular R session.  Sometimes, typically after a long computation
> > (hours) in the ess-remote session, commands entered in the ess-remote
> > session get echoed to and executed by the regular R session.  The
> > ess-remote session appears hung and unresponsive during this time.
> > 
> > I worked around this by quitting R in the *R* window, responding yes to
> > several complaints about the R process disappearing, and then typing the
> > commands in the ess-remote buffer.  This time, they stayed in the
> > buffer.
> > 
> > Aside from not starting multiple R sessions, is there a good way to
> > avoid this problem?  Is it a bug?
> > 
> > DETAILS
> > 
> > 11433 pts/4    S<s    0:00 /bin/sh
> > 11530 pts/4    S<+    0:54  \_ emacs
> > 14250 pts/0    S<s    0:00      \_ /bin/sh -i
> > 30209 pts/0    S<+    0:00      |   \_ mpirun -np 32 --hostfile hosts
> > RMPIInteractive
> > 30014 pts/6    S<s+   0:09      \_ /usr/lib64/R/bin/exec/R --no-readline
> > 
> > 30214 ?        S<s    0:00 orted --bootproxy 1 --name 0.0.1 --num_procs
> > 5 --vpid_start 0 --nodename n7 --universe ross at n7:default-universe-3
> > 30215 ?        S<     0:00
> > \_ /bin/bash /home/ross/clean/OLTData/RMPIInteractive
> > 30219 ?        S<   141:09  |   \_ /usr/lib64/R/bin/exec/R --no-save
> > 
> > 30219 is the only R process not at 100% CPU, characteristic of openmpi
> > slaves.  So presumably that is where the commands should be going.
> > strace showed only
> > 30219 15:49:37 read(0,  <unfinished ...>
> > 
> > I wonder if the cause is some interaction between ESS and openmpi, which
> > does some input and output redirecting to wire one of the spawned
> > processes (almost certainly 30219) to my "terminal".  My understanding
> > is that ess-remote is simply sending commands to that terminal, and
> > openmpi is taking care of getting them to the master R.
> > 
> > In even more detail:
> > 1. start emacs
> > 2. open shell within emacs
> > 3. execute the mpirun command within the shell.
> > 4. They invoked script does
> >      R --no-save $*
> >    for rank 0 and 
> >      R --no-save $* > rmpi.$RANK 2>&1
> >    for others.
> > 5. In emacs, invoke ess-remote with language r.
> > 6. In my terminal
> > 	options(error=recover)
> >    in an effort to avoid death at the first error.
> >    It does that, but R's machinery still seems to think it's
> > non-interactive.
> > 7.  do long computation
> > 8.  type a command, most commonly save.image()
> > 9. hit enter.
> > 10. cursor sits blinking on the "(" in save.image()
> > 11. terminal non-responsive to inputs, mostly (in particular, hitting
> > enter has no effect)
> > 12.  hitting ctl-g seems to cause previous enters to print out, and
> > better response to keys (at least I can switch to another session).
> > 
> > Variations:
> > Step 8 is sometimes preceded by some commands executing successfully.
> > It has hung up on commands involving no obvious disk I/O, e.g., print.
> > This coupled, with sys admin check that the disk is OK, suggests it's
> > not a disk problem.
> > 
> > The workspace (.RData) is around 13MB.  It's time stamp seems to match
> > when I issued the save.image() command, but that is a save from the
> > regular R process in the same directory.
> > 
> > I am not sure about the relative timing of launching the ess-remote
> > process and the regular ess process.
> > 
> > Debian Lenny
> > ess                           5.3.8~svn3917-1
> > emacs                         22.2+2-5
> > openmpi-bin                   1.2.7~rc2-2
> > r-base-core                   2.7.1-1+lenny1
> > 
> 
> Nice bug report.  However, I have no idea :o)  Can you upgrade these
> packages to the latest version and still reproduce the bug?
> 
> Rodney
Since it's on a cluster, upgrades are non-trivial.  I could probably
start with ess, possibly just invoking it from my own machine.

I think I've also seen the failure before running a long computation;
it's a little random.  There might be a simple rule I haven't noticed,
like starting ess-remote, then launch esc-x R -> ess-remote inputs get
redirected.  It might matter that they are in the same directory.

I just tried the simple rule locally; it's not that simple :(

Ross



More information about the ESS-help mailing list