[R] R and Clusters

Gabor Csardi csardi at rmki.kfki.hu
Mon Jan 7 15:45:53 CET 2008


Lorenzo, why can't you actually generate the graph to find the 
connection components? With the 'igraph' package this is something like:

g <- graph.adjacency( DIST < 0.5, mode="undirected" )
g <- simplify(g)
no.clusters(g)

assuming you have your distance matrix in 'DIST'. If N is too big
then you don't really want to create the distance matrix, but use 
a more sophisticated approach to find the points that are close 
to each other than your threshold; then create the graph.
So why using clustering algorithms? 

Btw. from your vector of points you can create the distance matrix
by using the 'outer' function.

Btw2. your algorithm for graph generation is similar to geometric
random graphs, see function 'grg.game' in 'igraph'.

Gabor

On Mon, Jan 07, 2008 at 03:26:57PM +0100, Lorenzo Isella wrote:
> Dear All,
> I hope I am not asking a FAQ. I am dealing with a problem of graph
> theory [connected components in a non-directed graph] and I do not
> want to rediscover the wheel.
> I saw a large number of R packages dealing for instance with the
> k-means method or hierarchical clustering for spatially distributed
> data and I am basically facing a similar problem.
> I am given a set of data which are the positions of particles in 3
> dimensions; I define two particles A and B to be directly connected if
> their Euclidean distance is below a certain threshold d. If A and B
> are directly connected and B and C are directly connected, then A,B
> and C are connected components (physically it means that they are
> members of the same cluster).
> All my N particles then split into k disjointed clusters, each with a
> certain number of connected components, and this is what I want to
> investigate.
> I do not know a priori how many clusters I have (this is my problem
> with e.g. k-means since k is an output for me); the only input is the
> set of 3-dimensional particle positions and a threshold distance.
> The algorithm/package I am looking should return the number of
> clusters and the composition of each cluster, e.g. the fact that the
> second cluster is made up of particles {R,T,L}.
> Consider for instance:
> 
> # a 2-dimensional example
> x <- rbind(matrix(rnorm(100, sd = 0.3), ncol = 2),
>            matrix(rnorm(100, mean = 1, sd = 0.3), ncol = 2))
> colnames(x) <- c("x", "y")
> 
> How can I then find out how many connected components I have when my
> threshold distance is d=0.5?
> 
> Many thanks
> 
> Lorenzo
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Csardi Gabor <csardi at rmki.kfki.hu>    MTA RMKI, ELTE TTK




More information about the R-help mailing list