[R] sample(c(0, 1)...) vs. rbinom

Nordlund, Dan (DSHS/RDA) NordlDJ at dshs.wa.gov
Thu May 23 18:38:18 CEST 2013


> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Albyn Jones
> Sent: Thursday, May 23, 2013 8:30 AM
> To: r-help at r-project.org
> Subject: Re: [R] sample(c(0, 1)...) vs. rbinom
> 
> After a bit of playing around, I discovered that
> sample() does something similar in other situations:
> 
> > set.seed(105021)
> > sample(1:5,1,prob=c(1,1,1,1,1))
> [1] 3
> > set.seed(105021)
> > sample(1:5,1)
> [1] 2
> 
> 
> > set.seed(105021)
> > sample(1:5,5,prob=c(1,1,1,1,1))
> [1] 3 4 2 1 5
> > set.seed(105021)
> > sample(1:5,5)
> [1] 2 5 1 4 3
> 
> albyn


What is the "something similar" you are referring to?  And I guess I still don't understand what it is that concerns you about the sample function.


Dan

Daniel J. Nordlund
Washington State Department of Social and Health Services
Planning, Performance, and Accountability
Research and Data Analysis Division
Olympia, WA 98504-5204



> 
> 
> On 2013-05-22 22:24, peter dalgaard wrote:
> > On May 23, 2013, at 07:01 , Jeff Newmiller wrote:
> >
> >> You seem to be building an elaborate structure for testing the
> >> reproducibility of the random number generator. I suspect that
> rbinom
> >> is calling the random number generator a different number of times
> >> when you pass prob=0.5 than otherwise.
> >
> > Nope. It's switching 0 and 1:
> >
> >> set.seed(1); sample(0:1,10,replace=TRUE,prob=c(1-pp,pp));
> >> set.seed(1); rbinom(10,1,pp)
> >  [1] 1 1 0 0 1 0 0 0 0 1
> >  [1] 0 0 1 1 0 1 1 1 1 0
> >
> > which is curious, but of course has no implication for the
> > distributional properties. Curiouser, if you drop the prob= in
> > sample.
> >
> >> set.seed(1); sample(0:1,10,replace=TRUE); set.seed(1);
> >> rbinom(10,1,pp)
> >  [1] 0 0 1 1 0 1 1 1 1 0
> >  [1] 0 0 1 1 0 1 1 1 1 0
> >
> > However, it was never a design goal that two different random
> > functions (or even two code paths within the same function) should
> > give exactly the same values, even if they simulate the same
> > distribution, so this is nothing more than a curiosity.
> >
> >
> >>>
> >>> Appendix A: some R code that exhibits the problem
> >>> =================================================
> >>>
> >>> ppp <- seq(0, 1, by = 0.01)
> >>>
> >>> result <- do.call(rbind, lapply(ppp, function(p) {
> >>>     set.seed(1)
> >>>     sampleRes <- sample(c(0, 1), size = 1, replace = TRUE,
> >>>                         prob=c(1-p, p))
> >>>
> >>>     set.seed(1)
> >>>     rbinomRes <- rbinom(1, size = 1, prob = p)
> >>>
> >>>     data.frame(prob = p, equivalent = all(sampleRes == rbinomRes))
> >>>
> >>> }))
> >>>
> >>> result
> >>>
> >>>
> >>> Appendix B: the output from the R code
> >>> ======================================
> >>>
> >>>     prob equivalent
> >>> 1   0.00       TRUE
> >>> 2   0.01       TRUE
> >>> 3   0.02       TRUE
> >>> 4   0.03       TRUE
> >>> 5   0.04       TRUE
> >>> 6   0.05       TRUE
> >>> 7   0.06       TRUE
> >>> 8   0.07       TRUE
> >>> 9   0.08       TRUE
> >>> 10  0.09       TRUE
> >>> 11  0.10       TRUE
> >>> 12  0.11       TRUE
> >>> 13  0.12       TRUE
> >>> 14  0.13       TRUE
> >>> 15  0.14       TRUE
> >>> 16  0.15       TRUE
> >>> 17  0.16       TRUE
> >>> 18  0.17       TRUE
> >>> 19  0.18       TRUE
> >>> 20  0.19       TRUE
> >>> 21  0.20       TRUE
> >>> 22  0.21       TRUE
> >>> 23  0.22       TRUE
> >>> 24  0.23       TRUE
> >>> 25  0.24       TRUE
> >>> 26  0.25       TRUE
> >>> 27  0.26       TRUE
> >>> 28  0.27       TRUE
> >>> 29  0.28       TRUE
> >>> 30  0.29       TRUE
> >>> 31  0.30       TRUE
> >>> 32  0.31       TRUE
> >>> 33  0.32       TRUE
> >>> 34  0.33       TRUE
> >>> 35  0.34       TRUE
> >>> 36  0.35       TRUE
> >>> 37  0.36       TRUE
> >>> 38  0.37       TRUE
> >>> 39  0.38       TRUE
> >>> 40  0.39       TRUE
> >>> 41  0.40       TRUE
> >>> 42  0.41       TRUE
> >>> 43  0.42       TRUE
> >>> 44  0.43       TRUE
> >>> 45  0.44       TRUE
> >>> 46  0.45       TRUE
> >>> 47  0.46       TRUE
> >>> 48  0.47       TRUE
> >>> 49  0.48       TRUE
> >>> 50  0.49       TRUE
> >>> 51  0.50      FALSE
> >>> 52  0.51       TRUE
> >>> 53  0.52       TRUE
> >>> 54  0.53       TRUE
> >>> 55  0.54       TRUE
> >>> 56  0.55       TRUE
> >>> 57  0.56       TRUE
> >>> 58  0.57       TRUE
> >>> 59  0.58       TRUE
> >>> 60  0.59       TRUE
> >>> 61  0.60       TRUE
> >>> 62  0.61       TRUE
> >>> 63  0.62       TRUE
> >>> 64  0.63       TRUE
> >>> 65  0.64       TRUE
> >>> 66  0.65       TRUE
> >>> 67  0.66       TRUE
> >>> 68  0.67       TRUE
> >>> 69  0.68       TRUE
> >>> 70  0.69       TRUE
> >>> 71  0.70       TRUE
> >>> 72  0.71       TRUE
> >>> 73  0.72       TRUE
> >>> 74  0.73       TRUE
> >>> 75  0.74       TRUE
> >>> 76  0.75       TRUE
> >>> 77  0.76       TRUE
> >>> 78  0.77       TRUE
> >>> 79  0.78       TRUE
> >>> 80  0.79       TRUE
> >>> 81  0.80       TRUE
> >>> 82  0.81       TRUE
> >>> 83  0.82       TRUE
> >>> 84  0.83       TRUE
> >>> 85  0.84       TRUE
> >>> 86  0.85       TRUE
> >>> 87  0.86       TRUE
> >>> 88  0.87       TRUE
> >>> 89  0.88       TRUE
> >>> 90  0.89       TRUE
> >>> 91  0.90       TRUE
> >>> 92  0.91       TRUE
> >>> 93  0.92       TRUE
> >>> 94  0.93       TRUE
> >>> 95  0.94       TRUE
> >>> 96  0.95       TRUE
> >>> 97  0.96       TRUE
> >>> 98  0.97       TRUE
> >>> 99  0.98       TRUE
> >>> 100 0.99       TRUE
> >>> 101 1.00       TRUE
> >>>
> >>> Appendix C: Session information
> >>> ===============================
> >>>
> >>>> sessionInfo()
> >>> R version 3.0.0 (2013-04-03)
> >>> Platform: x86_64-redhat-linux-gnu (64-bit)
> >>>
> >>> locale:
> >>>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
> >>>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
> >>>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
> >>>  [7] LC_PAPER=C                 LC_NAME=C
> >>>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
> >>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
> >>>
> >>> attached base packages:
> >>> [1] stats     graphics  grDevices utils     datasets  methods
> >>> base
> >>>
> >>>>
> >>>
> >>> ______________________________________________
> >>> R-help at r-project.org mailing list
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide
> >>> http://www.R-project.org/posting-guide.html
> >>> and provide commented, minimal, self-contained, reproducible code.
> >>
> >> ______________________________________________
> >> R-help at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.


More information about the R-help mailing list