[R] simulated data using empirical distribution
tom sgouros
tomfool at as220.org
Thu Oct 11 13:30:16 CEST 2007
Hello all:
Many thanks to the people who have responded to my question, on and
off-list. My problem isn't completely solved, though, and perhaps you
can help again.
The problem, again, is that I have what is essentially a histogram, but
not the underlying data, and I want to simulate data that would have
created that histogram. That is, I have counts for the number of data
points in a dozen bins. The bins are not of uniform size. (It's income
data, reported as incomes from 0-10k, 10k-25k, 25k-50k, and so on.)
The best suggestion I had yesterday was to simulate the data with
uniform distributions in each bin, and an exponential one on the
rightmost bin, and I did that and superficially it looks good.
Unfortunately, now that I am trying to calibrate the model, I have
discovered a high bias. The way the bins are chosen, I would expect
that 9 out of 12 bins have a down-ward slope, meaning that approximating
them with a square top gives me more along the high border of the bin,
and I currently suspect that this is at least part of the bias.
Is there a way to ask for a not-quite uniform distribution of random
data? I imagine a density function with a linear, but not flat, top. I
admit that the standard selection of distributions in R is more than I
am familiar with, but I can't find one that does what I think I need.
Any advice (R advice or statistics advice) is welcome. Thanks again,
-tom
--
------------------------
tomfool at as220 dot org
http://sgouros.com
http://whatcheer.net
More information about the R-help
mailing list