[Rd] 'parallel' package changes '.Random.seed'
Henric Winell
nilsson.henric at gmail.com
Thu Mar 6 12:54:25 CET 2014
Comments below.
On 2014-03-06 11:17, Henric Winell wrote:
> Hi,
>
> I've implemented parallelization in one of my packages using the
> 'parallel' package -- many thanks for providing it!
>
> In my package I'm importing 'parallel' and so added it to the
> DESCRIPTION file's 'Import:' tag and also added a
> 'importFrom("parallel", ...)' statement in the NAMESPACE file.
>
> Parallelization works nicely, but my package no longer passes any parts
> of its (unparallelized) checks that depends on random number generation,
> e.g., the simulated data in the check suite are no longer the same as
> before parallelization was added. This seems to be due to 'parallel'
> changing '.Random.seed' when loading its name space:
>
> > set.seed(1)
> > rs1 <- .Random.seed
> > rnorm(1)
> [1] -0.6264538
> > set.seed(1)
> > rs2 <- .Random.seed
> > identical(rs1, rs2)
> [1] TRUE
> > loadNamespace("parallel")
> <environment: namespace:parallel>
> > rs3 <- .Random.seed
> > identical(rs1, rs3)
> [1] FALSE
> > rnorm(1)
> [1] -0.3262334
> > set.seed(1)
> > rs4 <- .Random.seed
> > identical(rs1, rs4)
> [1] TRUE
>
> I've taken a look at the 'parallel' source code, and in a few places a
> call to 'runif(1)' is issued. So, what effectively seems to happen when
> 'parallel' is loaded is
>
> > set.seed(1)
> > runif(1)
> [1] 0.2655087
> > rnorm(1)
> [1] -0.3262334
Some digging reveals that this is due to no port number for the socket
connection being set by default, in which case 'parallel' picks a random
port in the 11000-11999 range using 'runif(1L)'. So, by setting
R_PARALLEL_PORT the '.Random.seed' object is no longer touched:
> Sys.setenv(R_PARALLEL_PORT = 11500)
> set.seed(1)
> rs1 <- .Random.seed
> loadNamespace("parallel")
<environment: namespace:parallel>
> rs2 <- .Random.seed
> identical(rs1, rs2)
[1] TRUE
This is handled in the 'initDefaultClusterOptions' function in 'snow.R',
where line 88 has
port <- 11000 + 1000 * ((stats::runif(1L) + unclass(Sys.time())/300)%%1)
It seems to me that we can tread more carefully here. I've attached a
trivial patch that
1. Checks if '.Random.seed' exists
2. If TRUE: a) save '.Random.seed'
b) make the call above
c) reset '.Random.seed' to its state in a)
If FALSE: a) make the call above
b) remove '.Random.seed'
In due course I hope someone is interested enough to review it.
Henric Winell
>
> which reproduces the above. But is this really necessary? And more
> importantly (at least to me): Can it somehow be avoided?
>
> The current state of affairs is a bit unfortunate, since it implies that
> a user just by loading the new parallelized version of my package can no
> longer reproduce any subsequent results depending on random number
> generation (unless a call to 'set.seed' was issued *after* attaching my
> package).
>
> I'd be most grateful for any help that you're able to provide here. Many
> thanks!
>
> Kind regards,
> Henric Winell
>
>
>> sessionInfo()
> R Under development (unstable) (2014-01-26 r64897)
> Platform: x86_64-redhat-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=sv_SE.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> loaded via a namespace (and not attached):
> [1] compiler_3.1.0 parallel_3.1.0 tools_3.1.0
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: snow.R.patch
Type: text/x-patch
Size: 1138 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20140306/e83ca0fc/attachment.bin>
More information about the R-devel
mailing list