[R] simulated data using empirical distribution

Thu Oct 11 17:58:29 CEST 2007

Try the logspline package:

> library(logspline)
> 
> x1 <- rgamma(1000, 3)
> 
> br <- c(0,1,2,4,6,8,12,15)
> 
> h1 <- cut( x1, br, include.lowest=TRUE )
> 
> int1 <- embed(br,2)[ as.integer(h1), 2:1 ]
> 
> ls1 <- oldlogspline(x1, lbound=0)
> ls2 <- oldlogspline( interval=int1, lbound=0 )
> 
> x2 <- roldlogspline( 1000, ls2 )
> 
> par(mfrow=c(3,1))
> hist(x1, xlim=c(0,15))
> hist(x2, xlim=c(0,15))
> 
> xx <- seq(0,15, length=250)
> plot(xx, dgamma(xx,3), type='l')
> lines(xx, doldlogspline(xx,ls1), col='blue')
> lines(xx, doldlogspline(xx,ls2), col='green')
> 

Hope this helps,

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at intermountainmail.org
(801) 408-8111

> -----Original Message-----
> From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] On Behalf Of tom sgouros
> Sent: Thursday, October 11, 2007 5:30 AM
> To: r-help at r-project.org
> Subject: Re: [R] simulated data using empirical distribution
> 
> 
> Hello all:
> 
> Many thanks to the people who have responded to my question, 
> on and off-list.  My problem isn't completely solved, though, 
> and perhaps you can help again.
> 
> The problem, again, is that I have what is essentially a 
> histogram, but not the underlying data, and I want to 
> simulate data that would have created that histogram.  That 
> is, I have counts for the number of data points in a dozen 
> bins.  The bins are not of uniform size.  (It's income data, 
> reported as incomes from 0-10k, 10k-25k, 25k-50k, and so on.)
> 
> The best suggestion I had yesterday was to simulate the data 
> with uniform distributions in each bin, and an exponential 
> one on the rightmost bin, and I did that and superficially it 
> looks good.
> Unfortunately, now that I am trying to calibrate the model, I 
> have discovered a high bias.  The way the bins are chosen, I 
> would expect that 9 out of 12 bins have a down-ward slope, 
> meaning that approximating them with a square top gives me 
> more along the high border of the bin, and I currently 
> suspect that this is at least part of the bias.
> 
> Is there a way to ask for a not-quite uniform distribution of 
> random data?  I imagine a density function with a linear, but 
> not flat, top.  I admit that the standard selection of 
> distributions in R is more than I am familiar with, but I 
> can't find one that does what I think I need.
> 
> Any advice (R advice or statistics advice) is welcome.  Thanks again,
> 
>  -tom
> 
> --
>  ------------------------
>  tomfool at as220 dot org
>  http://sgouros.com
>  http://whatcheer.net
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>