[R] test for whether dataset comes from a known MVN
Ben Bolker
bolker at ufl.edu
Fri Oct 12 19:33:04 CEST 2007
Desmond Campbell wrote:
>
> Dear Ben Bolker,
>
> Thanks for replying and offering advice, unfortunately it doesn't solve my
> problem.
>
> 1) The mshapiro.test() in the mvnormtest package appears only applicable
> for datasets containing 3-5000 samples, whereas my dataset contains
> 100,000
> samples.
>
> 2) As you said in your email if my data is from the real world then any
> test is likely to reject the null hypothesis, because of the power of such
> a
> large dataset.
>
> However my data is not from the real world. I am conducting validation
> studies, and if the program I am testing is working correctly then the
> dataset
> will be perfectly normally distributed.
>
> Thanks anyway.
>
>
I would be tempted in this case to contact the package author and find
out what limits the size of the input data set. It does look like the
method requires a matrix inversion, in which case you might be in big
trouble (if it were sparse you could see if you could substitute in SparseM
functions, but I kind of doubt it would be ...).
Do you know if anyone has come up with a method that will do this
test for this size data set? i.e., is this a problem of developing a
statistical
method or a problem of implementation in R? (Are the methods discussed
in http://support.sas.com/ctx/samples/index.jsp?sid=480 or
http://interstat.statjournals.net/YEAR/2003/articles/0301001.pdf such
as Mardia's multivariate skew or kurtosis appropriate and less numerically
intensive? I don't know how to calculate MV skew, and R site search brings
up a lot about the MV skew-normal distribution but not a lot about MV skew
itself. I found an SPSS macro http://www.columbia.edu/~ld208/Mardia.sps but
that's as far as I got.)
Do you have to test the whole data set at once? Could you hack it
by testing subsets of the data and (e.g.) using Fisher's combined p values?
cheers
Ben Bolker
--
View this message in context: http://www.nabble.com/test-for-whether-dataset-comes-from-a-known-MVN-tf4609195.html#a13177063
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help
mailing list