[R] R and Clusters

Lorenzo Isella lorenzo.isella at gmail.com
Mon Jan 7 15:26:57 CET 2008


Dear All,
I hope I am not asking a FAQ. I am dealing with a problem of graph
theory [connected components in a non-directed graph] and I do not
want to rediscover the wheel.
I saw a large number of R packages dealing for instance with the
k-means method or hierarchical clustering for spatially distributed
data and I am basically facing a similar problem.
I am given a set of data which are the positions of particles in 3
dimensions; I define two particles A and B to be directly connected if
their Euclidean distance is below a certain threshold d. If A and B
are directly connected and B and C are directly connected, then A,B
and C are connected components (physically it means that they are
members of the same cluster).
All my N particles then split into k disjointed clusters, each with a
certain number of connected components, and this is what I want to
investigate.
I do not know a priori how many clusters I have (this is my problem
with e.g. k-means since k is an output for me); the only input is the
set of 3-dimensional particle positions and a threshold distance.
The algorithm/package I am looking should return the number of
clusters and the composition of each cluster, e.g. the fact that the
second cluster is made up of particles {R,T,L}.
Consider for instance:

# a 2-dimensional example
x <- rbind(matrix(rnorm(100, sd = 0.3), ncol = 2),
           matrix(rnorm(100, mean = 1, sd = 0.3), ncol = 2))
colnames(x) <- c("x", "y")

How can I then find out how many connected components I have when my
threshold distance is d=0.5?

Many thanks

Lorenzo




More information about the R-help mailing list