[R-sig-hpc] [R] segmentation fault with Rmpi and OpenMPI on Ubuntu 9.04

Dirk Eddelbuettel edd at debian.org
Wed Jun 24 20:13:07 CEST 2009


Hi Mark,

On 24 June 2009 at 12:31, Mark Mueller wrote:
| Many thanks for your reply and initial look at the situation.

My pleasure. 

| I ran though the examples you listed below, and everything worked just
| fine (as I suspected it would).  Unfortunately, when I run my R
| program I still get the segmentation fault.  The R program itself uses
| the Text Miner package, which in turn relies on the Snow package (and
| hence Rmpi), so there isn't likely anything specifically in my R
| program that is causing the issue (rather, it might be something in
| one of the packages?).  As I understand, Text Miner simply calls
| routines in the Snow package which then invokes Rmpi calls.

Aieee. Then it's between you and the author of TextMiner :) unless you find
something below.

| Sample lines from my R program --
| =======
| 
| library (tm)
| 
| activateCluster() <-- uses Snow
| ... [some R + text miner code (with text miner further invoking the Snow API)]
| deactivateCluster() <-- uses Snow
| 
| =======
| 
| We do not use SLURM (or any other resource allocation solutions) at
| this point in an effort to keep things simple (at first) as we build
| out our parallel environment.
| 
| I just wonder if there is something with the 64-bit master and 32-bit
| slave that is the cause, although I'm not sure how to uncover that.

Redefine your test setup to master/master only, and slave/slave.  You can
perfectly fine do Rmpi, snow, ... on a single machine.  Ie do something like

	  orterun --host master -n 4 ./path/to/script

or edit the hostfile you used previously. Same for the slave(s).

Hth, Dirk


| 
| On Tue, Jun 23, 2009 at 8:00 PM, Dirk Eddelbuettel <edd at debian.org> wrote:
| >
| > Hi Mark,
| >
| > On 23 June 2009 at 19:38, Mark Mueller wrote:
| > | PROBLEM DEFINITION --
| > |
| > | Master:
| > |
| > | - AMD_64
| > | - Ubuntu 9.04 64-bit
| > | - OpenMPI 1.3.2 (to avoid the problem in v1.3 where OpenMPI tries to connect
| > | to the localhost via ssh to run local jobs) - manually downloaded source and
| > | compiled
| > | - Rmpi 0.5-7 - package installed using R install.packages()
| > | - R 2.9.0 - installed using apt-get
| >
| > Ok. [ I prefer to take Debian sources for Open MPI and rebuild local packages
| > on Ubuntu for things like Open MPI but otherwise it looks fine. ]
| >
| > | Slave:
| > |
| > | - Intel Pentium 4 32-bit
| > | - Ubuntu 9.04 32-bit
| > | - OpenMPI 1.3.2 (to avoid the problem in v1.3 where OpenMPI tries to connect
| > | to the localhost via ssh to run local jobs) - manually downloaded source and
| > | compiled
| > | - Rmpi 0.5-7 - package installed using R install.packages()
| > | - R 2.9.0 - installed using apt-get
| >
| > Same -- but I am cautious about the 32bit / 64bit mix. I have no experience
| > there.  At work everything is 64bit, at home everything is 32bit.
| >
| > | When executing the following command from the master:
| > |
| > | --> mpirun --hostfile <some file> -np 1 R CMD BATCH <some program>.R
| > |
| > | the following trace results on the master node (lines 18 and 19 are from my
| > | particular R program):
| > |
| > | *** caught segfault ***
| > | address 0x10333e4d8, cause 'memory not mapped'
| > |
| > | Traceback:
| > |  1: .Call("mpi_recv", x, as.integer(type), as.integer(source),
| > | as.integer(tag),     as.integer(comm), as.integer(status), PACKAGE = "Rmpi")
| > |  2: mpi.recv(x = raw(charlen), type = 4, srctag[1], srctag[2], comm,
| > | status)
| > |  3: typeof(connection)
| > |  4: unserialize(obj)
| > |  5: .mpi.unserialize(mpi.recv(x = raw(charlen), type = 4, srctag[1],
| > | srctag[2], comm, status))
| > |  6: mpi.recv.Robj(node$rank, node$RECVTAG, node$comm)
| > |  7: recvData.MPInode(con)
| > |  8: recvData(con)
| > |  9: FUN(X[[6L]], ...)
| > | 10: lapply(cl[1:jobs], recvResult)
| > | 11: staticClusterApply(cl, fun, length(x), argfun)
| > | 12: clusterApply(cl, splitList(x, length(cl)), lapply, fun, ...)
| > | 13: is.vector(X)
| > | 14: lapply(args, enquote)
| > | 15: do.call("fun", lapply(args, enquote))
| > | 16: docall(c, clusterApply(cl, splitList(x, length(cl)), lapply,     fun,
| > | ...))
| > | 17: snow::parLapply(snow::getMPIcluster(), object, FUN, ..., DMetaData =
| > | DMetaData(object))
| > | 18: tmMap(corpuscleanrand, replacePatterns, ("real estate"), by =
| > | "realestate")
| > | 19: tmMap(corpuscleanrand, replacePatterns, ("real estate"), by =
| > | "realestate")
| > | aborting ...
| > | Segmentation fault
| >
| > In a case like this I always prefer to step back and run simple scripts (as
| > from my "Intro to HPC with R" tutorials).  E.g. can you run
| >
| > a) a simple mpiHelloWorld C program with no other depends between master and
| >   slave nodes ?  This shows basic MPI functionality.
| >
| >   mpiHelloWorld.c is attached. Do
| >
| >   $ mpicc -o mpiHelloWorld mpiHelloWorld.c
| >   $ # cp and scp to /tmp on master and slave
| >   $ orterun -n 4 -H master,slave /tmp/mpiHelloWorld
| >
| > b) same for a simple Rmpi script doing the same ?  This shows R/MPI interaction.
| >
| >   Likewise, place mpiHelloWorld.r in /tmp on each machine, then
| >
| >   $ orterun -n 4 -H master,slave /tmp/mpiHelloWorld.r
| >
| > c) do the same for snow (by writing a simple snow/MPI file)
| >
| > d) if you care for slurm, do the same with slurm to allocate resource in
| >   which you then run orterun to launch R/MPI jobs.
| >
| > | CONFIGURATION STEPS TAKEN --
| > |
| > | - There is no common/shared file system mounted for the cluster.
| > |
| > | - All PATH and LD_LIBRARY_PATH environment variables for OpenMPI are
| > | properly set on each node (including the master).
| > |
| > | - OpenMPI was configured and built on each node with the
| > | --enable-heterogeneous configuration flag to account for the AMD-64 and
| > | Intel-32 architectures.
| > |
| > | - The R_SNOW_LIB environment variable is set properly and the RunSnowNode
| > | and RunSnowWorker scrips are located in the PATH (and set to executable) on
| > | all nodes (including the master).
| > |
| > | - All of the OpenMPI settings as documented in the OpenMPI FAQs to allow for
| > | remote execution (i.e. rsh/ssh, .rhosts) are in place.
| > |
| > | Any insight or assistance will be greatly appreciated.
| >
| > As outlined above, I try to stick with the 'tested' configuration from the
| > Debian packages so I don't have to deal with all the env vars etc.  Also,
| > decomposing down from snow to Rmpi to MPI by itself may help.
| >
| > Best regards, Dirk
| >
| >
| > |
| > | Sincerely,
| > | Mark
| > |
| > |       [[alternative HTML version deleted]]
| > |
| > | _______________________________________________
| > | R-sig-hpc mailing list
| > | R-sig-hpc at r-project.org
| > | https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
| >
| >
| >
| > --
| > Three out of two people have difficulties with fractions.
| >

-- 
Three out of two people have difficulties with fractions.



More information about the R-sig-hpc mailing list