[R] simulated data using empirical distribution
Greg Snow
Greg.Snow at intermountainmail.org
Thu Oct 11 17:58:29 CEST 2007
Try the logspline package:
> library(logspline)
>
> x1 <- rgamma(1000, 3)
>
> br <- c(0,1,2,4,6,8,12,15)
>
> h1 <- cut( x1, br, include.lowest=TRUE )
>
> int1 <- embed(br,2)[ as.integer(h1), 2:1 ]
>
> ls1 <- oldlogspline(x1, lbound=0)
> ls2 <- oldlogspline( interval=int1, lbound=0 )
>
> x2 <- roldlogspline( 1000, ls2 )
>
> par(mfrow=c(3,1))
> hist(x1, xlim=c(0,15))
> hist(x2, xlim=c(0,15))
>
> xx <- seq(0,15, length=250)
> plot(xx, dgamma(xx,3), type='l')
> lines(xx, doldlogspline(xx,ls1), col='blue')
> lines(xx, doldlogspline(xx,ls2), col='green')
>
Hope this helps,
--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at intermountainmail.org
(801) 408-8111
> -----Original Message-----
> From: r-help-bounces at r-project.org
> [mailto:r-help-bounces at r-project.org] On Behalf Of tom sgouros
> Sent: Thursday, October 11, 2007 5:30 AM
> To: r-help at r-project.org
> Subject: Re: [R] simulated data using empirical distribution
>
>
> Hello all:
>
> Many thanks to the people who have responded to my question,
> on and off-list. My problem isn't completely solved, though,
> and perhaps you can help again.
>
> The problem, again, is that I have what is essentially a
> histogram, but not the underlying data, and I want to
> simulate data that would have created that histogram. That
> is, I have counts for the number of data points in a dozen
> bins. The bins are not of uniform size. (It's income data,
> reported as incomes from 0-10k, 10k-25k, 25k-50k, and so on.)
>
> The best suggestion I had yesterday was to simulate the data
> with uniform distributions in each bin, and an exponential
> one on the rightmost bin, and I did that and superficially it
> looks good.
> Unfortunately, now that I am trying to calibrate the model, I
> have discovered a high bias. The way the bins are chosen, I
> would expect that 9 out of 12 bins have a down-ward slope,
> meaning that approximating them with a square top gives me
> more along the high border of the bin, and I currently
> suspect that this is at least part of the bias.
>
> Is there a way to ask for a not-quite uniform distribution of
> random data? I imagine a density function with a linear, but
> not flat, top. I admit that the standard selection of
> distributions in R is more than I am familiar with, but I
> can't find one that does what I think I need.
>
> Any advice (R advice or statistics advice) is welcome. Thanks again,
>
> -tom
>
> --
> ------------------------
> tomfool at as220 dot org
> http://sgouros.com
> http://whatcheer.net
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list