[R] significance test interquartile ranges
Rui Barradas
ruipbarradas at sapo.pt
Sat Jul 14 19:36:02 CEST 2012
Hello,
Em 14-07-2012 13:08, peter dalgaard escreveu:
>
> On Jul 14, 2012, at 12:25 , Rui Barradas wrote:
>
>> Hello,
>>
>> There's a test for iqr equality, of Westenberg (1948), that can be found on-line if one really looks. It starts creating a 1 sample pool from the two samples and computing the 1st and 3rd quartiles. Then a three column table where the rows correspond to the samples is built. The middle column is the counts between the quartiles and the side ones to the outsides. These columns are collapsed into one and a Fisher exact test is conducted on the 2x2 resulting table.
>
>
> That's just wrong, is it not? Just because things were suggested by someone semi-famous, it doesn't mean that they actually work...
>
> Take two normal distributions, equal in size, with a sufficiently large difference between the means, so that there is no material overlap. The quartiles of the pooled sample will then be the medians of the original samples, and the test will be that one sample has the same number above its median as the other has below its median.
>
> If it weren't for the pooling business, I'd say that it was a sane test for equality of quartiles, but not for the IQR.
>
Right, thank you! It forced me to pay more attention to what I was
reading. The "test is aimed at differences in scale only, presuming no
difference in location"
http://www.stat.ncsu.edu/information/library/mimeo.archive/ISMS_1986_1499.pdf
The original can be found at
http://www.dwc.knaw.nl/DL/publications/PU00018486.pdf
If we subtract the median of each sample to each of them, the medians
become zero but the IQRs remain as they were. In my simulation I had
chosen samples from distributions with equal mean, and that point passed
unnoticed.
The code should then be slightly revised. I'll repost it because there
was a typo in the 'method' member of the returned list
iqr.test <- function(x, y){
data.name <- deparse(substitute(x))
data.name <- paste(data.name, ", ", deparse(substitute(y)), sep="")
x <- x - median(x)
y <- y - median(y)
qq <- quantile(c(x, y), prob = c(0.25, 0.75))
a <- sum(qq[1] < x & x < qq[2])
b <- length(x) - a
c <- sum(qq[1] < y & y < qq[2])
d <- length(y) - b
m <- matrix(c(a, c, b, d), ncol = 2)
numer <- sum(lfactorial(c(margin.table(m, 1), margin.table(m, 2))))
denom <- sum(lfactorial(c(a, b, c, d, sum(m))))
p.value <- 2*exp(numer - denom)
method <- "Westenberg-Mood test for IQR equality"
alternative <- "the IQRs are not equal"
ht <- list(
p.value = p.value,
method = method,
alternative = alternative,
data.name = data.name
)
class(ht) <- "htest"
ht
}
Rui Barradas
>>
>> R code could be:
>>
>>
>> iqr.test <- function(x, y){
>> qq <- quantile(c(x, y), prob = c(0.25, 0.75))
>> a <- sum(qq[1] < x & x < qq[2])
>> b <- length(x) - a
>> c <- sum(qq[1] < y & y < qq[2])
>> d <- length(y) - b
>> m <- matrix(c(a, c, b, d), ncol = 2)
>> numer <- sum(lfactorial(c(margin.table(m, 1), margin.table(m, 2))))
>> denom <- sum(lfactorial(c(a, b, c, d, sum(m))))
>> p.value <- 2*exp(numer - denom)
>> data.name <- deparse(substitute(x))
>> data.name <- paste(data.name, ", ", deparse(substitute(y)), sep="")
>> method <- "Westenberg-Mood test for IQR range equality"
>> alternative <- "the IQRs are not equal"
>> ht <- list(
>> p.value = p.value,
>> method = method,
>> alternative = alternative,
>> data.name = data.name
>> )
>> class(ht) <- "htest"
>> ht
>> }
>>
>> n <- 1e3
>> pv <- numeric(n)
>> set.seed(2319)
>> for(i in 1:n){
>> x <- rnorm(sample(20:30, 1), 4, 1)
>> y <- rchisq(sample(20:40, 1), df=4)
>> pv[i] <- iqr.test(x, y)$p.value
>> }
>>
>> sum(pv < 0.05)/n # 0.8
>>
>>
>
>
> To wit:
>
>> iqr.test(rnorm(100), rnorm(100,10,3))
>
> Westenberg-Mood test for IQR range equality
>
> data: rnorm(100), rnorm(100, 10, 3)
> p-value = 0.2248
> alternative hypothesis: the IQRs are not equal
>
>> replicate(10,iqr.test(rnorm(100), rnorm(100,10,3))$p.value)
> [1] 0.2248312 0.2248312 0.2248312 0.2248312 0.2248312 0.2248312 0.2248312
> [8] 0.2248312 0.2248312 0.2248312
>
>
>
More information about the R-help
mailing list