[R] Automating binning for chisq.test()
Duncan Murdoch
murdoch at stats.uwo.ca
Fri Oct 12 19:54:05 CEST 2007
On 10/12/2007 1:16 PM, D. R. Evans wrote:
> The standard chisq.test() and fisher.test() functions, when applied to
> two distributions (to determine whether the same underlying
> distribution applies to both) requires one to pre-bin the
> distributions.
>
> Is there a library function (either built-in or in a package) that
> acts more like the ks.test() function, in that one can simply pass the
> two distributions and have it do the necessary binning as well as the
> actual statistical test?
>
> (Yes, you can accuse me of laziness: I just don't fancy trying to
> figure out a routine that would make sure that there more than 5
> samples in each of the expected bins before applying the chi-squared
> test. It seems too much like re-inventing an elementary wheel that
> must have been invented by someone else.)
If you have a quantile function q() for the distribution, a sample size
of N, and want expected counts of 5 in each bin, just calculate the
cutpoints as
nbins <- floor(N/5)
cutpoints <- c(-Inf, q( (1:(nbins-1)/nbins)), Inf)
Duncan Murdoch
More information about the R-help
mailing list