[R] Error on distance matrix

Jari Oksanen jari.oksanen at oulu.fi
Thu Jan 10 14:45:23 CET 2008


Marc Moragues <Marc.Moragues <at> scri.ac.uk> writes:

> 
> Hi,
> 
> I am trying to calculate a distance matrix on a binary data frame using
> dist.binary() {ade4}. This is the code I run and the error I get:
> 
> > sjlc.dist <- dist.binary(as.data.frame(data), method=2) #D = (a+d) /
> (a+b+c+d)
> Error in if (any(df < 0)) stop("non negative value expected in df") :
>  missing value where TRUE/FALSE needed
> 
> I don't know if the problem are the missing values in my data. If so how
> can I handle them?
> 
Dear Marc Moragues,

At least adding NA to a data.frame gave the same error message as you report
above. Odds are good for NA being responsible (but we cannot know: we only
guess). Further, it seems that ade4:::dist.binary does not have an option to
handle NA input. Problem here is that what do you think should be done with NA?
Should you get a NA result? Should the whole observation be removed because of
NA? Or should the comparisons be based on pairwise omissions of NA meaning that
index entries are based on different data in the same matrix? Or should you
impute some values for missing entries (which is fun but tricky)?

One solution is to use function designdist in vegan where you can with some
acrobary design your own dissimilarity indices. Function designdist uses
different notations, because its author hates that misleading and dangerous 2x2
contingency table notation. The following, however, seems to define the same
index as ade4:

designdist(data, "sqrt(1-(2*J+P-A-B)/P)")

See the documentation of vegan:::designdist to see how to define things there
(and the sqrt(1-x) part comes from the way ade4 changes similarities to
dissimilarities).

BTW, don't call your data 'data'. R wisdom (see fortunes) tells you that you do
not call your dog dog, but I'm not quite sure of this. At least in yesterdays
horse races in national betting, one of the winner horses was called 'Animal',
so why not...

cheers, jari oksanen




More information about the R-help mailing list