[R] Why daisy() in cluster library failed to exclude NA	when	computing dissimilarity
    Martin Maechler 
    maechler at stat.math.ethz.ch
       
    Mon Dec  9 11:36:04 CET 2013
    
    
  
>>>>> Gundala Viswanath <gundalav at gmail.com>
>>>>>     on Sun, 8 Dec 2013 16:11:12 +0900 writes:
    > Hi, According to daisy function from cluster
    > documentation, it can compute dissimilarity when NA
    > (missing) value(s) is present.
    > http://stat.ethz.ch/R-manual/R-devel/library/cluster/html/daisy.html
    > But why when I tried this code
    > library(cluster)
    > x <- c(1.115,NA,NA,0.971,NA)
    > y <- c(NA,1.006,NA,NA,0.645)
    > df <- as.data.frame(rbind(x,y))
    > daisy(df,metric="gower")
    > It gave this message:
    > Dissimilarities :
    > x
    > y NA
    > Metric :  mixed ;  Types = I, I, I, I, I
    > Number of objects : 2
    > Warning messages:
    > 1: In min(x) : no non-missing arguments to min; returning Inf
    > 2: In max(x) : no non-missing arguments to max; returning -Inf
    > I welcome other alternative than gower.
    > I expect the dissimilarity output gives a non-NA value e.g. 0. What's
    > the right way to do it?
Thank you, Gundala, for using a simple reproducible example.
Reading the documentation about Gower's distance a bit more,
you'd have found that it works by basically giving weight zero
to *pairs* of variable values where one of the two values is
missing.
In situations like yours, *all* pairs have at least one missing,
so there's no way to get a non-NA distance.
*AND* the documentation already contains  this, at the very end
 of the section 'Details' :
  If all weights w_k delta(ij;k) are zero, the dissimilarity is set to ‘NA’.
I.e., we have
> install.packages("fortunes")
> fortune("WTFM")
This is all documented in TFM. Those who WTFM don't want to have to WTFM again
on the mailing list. RTFM.
   -- Barry Rowlingson
      R-help (October 2003)
... which I now did in spite of Barry's excellent point
... let's say it's because of approaching Christmas !
Martin Maechler,
ETH Zurich
    
    
More information about the R-help
mailing list