[R] Randomization tests, grouped data

Charles C. Berry cberry at tajo.ucsd.edu
Fri Jan 11 23:50:34 CET 2008


On Fri, 11 Jan 2008, Johannes Hüsing wrote:

> Tom Backer Johnsen <backer at psych.uib.no> [Fri, Jan 11, 2008 at 06:57:41PM CET]:
> [...]
>>> Are there something that can handle this in R?
>>
>
> Have you considered the coin package?
>
>> After a few hours thinking on and off about the problem, I suspect
>> that the question may be stupid or silly (or both).  If that is the
>> case, I would very much like to know why.
>>
>
> I am not quite clear in my thinking anymore, but there are 2^2n
> permutations, of which (2n choose n) happen to yield the same
> effect. These cases are "part of life" and should be counted in
> the permutation test just as well. You might save a little bit of
> computation time by singling these group-preserving permutations
> out, but this is not worth the while at all.
>

It depends (as always...)

Suppose you have two samples with n1 and n2 independent observations in 
each. You wish to do a two sample test on each of M variables and M is 
quite large. And you wish to account for multiplicity in testing. So, a 
permutation test is constructed.

If n1 == n2 == 4, there are choose(8,4) == 70 arrangements. By enumerating 
them all you can get the p-value of your test statistic, and often this is 
practical.

But if you sample (say) 70 from the factorial(8) arrangements, you will 
likely miss some and repeat others. The number 0.632 comes to mind as the 
fraction of distinct arrangements that will actually show up (see Efron 
and Tibs Intro to the Bootstrap to check if this is right).

To get an accurate p-value via sampling from the factorial(8), you would 
need a much larger sample than the number of distinct arrangements.

OTOH, if the number of distinct arrangements is too large to be able to 
enumerate them all and is much larger than the number you could afford to 
enumerate, then sampling from factorial(n1+n2) and sampling from 
choose(n1+n2,n2) are nearly equivalent. You could use the finite 
population correction to ascertain just how different they are, I think.

HTH,

Chuck

> -- 
> Johannes Hüsing               There is something fascinating about science.
>                              One gets such wholesale returns of conjecture
> mailto:johannes at huesing.name  from such a trifling investment of fact.
> http://derwisch.wikidot.com         (Mark Twain, "Life on the Mississippi")
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Charles C. Berry                            (858) 534-2098
                                             Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu	            UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901


More information about the R-help mailing list