[R] kmeans and incom,plete distance matrix concern
    Christian Hennig 
    chrish at stats.ucl.ac.uk
       
    Mon Aug  7 16:46:40 CEST 2006
    
    
  
First of all, kmeans doesn't work on distance matrices.
On Mon, 7 Aug 2006, Ffenics wrote:
> Hi there
> I have been using R to perform kmeans on a dataset. The data is fed in using read.table and then a matrix (x) is created
>
> i.e:
>
> [
> mat <- matrix(0, nlevels(DF$V1), nlevels(DF$V2),
> dimnames = list(levels(DF$V1), levels(DF$V2)))
> mat[cbind(DF$V1, DF$V2)] <- DF$V3
> This matrix is then taken and a distance matrix (y) created using dist() before performing the kmeans clustering.
>
> My query is this: not all the data for the initial matrix (x) exists and therefore the matrix is not fully populated - empty cells are populated with '0's.
>
> Could someone please tell me how this may affect the result from the dist() command - because a '0' in a distance matrix means that the two variables are identical doesnt it(?) - but I dont want tthings clustered together simply because there was no information.
>
> Is this a problem and are there ways to circumnavigate them? Thanks
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
*** --- ***
Christian Hennig
University College London, Department of Statistical Science
Gower St., London WC1E 6BT, phone +44 207 679 1698
chrish at stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche
    
    
More information about the R-help
mailing list