[R] R and Clusters

Lorenzo Isella lorenzo.isella at gmail.com
Mon Jan 7 17:25:44 CET 2008


Thanks for both the replies.
I am now giving a try to the suggestion by Gabor since it looks easier
(for me) to implement.
I am testing it, but so far it does what I have in mind.
I am going now through the documentation of the igraph package. I can
count the cluster number, but I also want to make sure that I can
retrieve the info about which components are in which cluster i.e.
find out the cluster composition.
And second question: at the moment, my simulations involve a
relatively low particle number (of the order of the hundred), but this
may increase by 1-2 orders of magnitude.
Which alternative methods to the distance matrix can I use then to
find the spacing between my particles?
Cheers

Lorenzo

On 07/01/2008, Gabor Csardi <csardi at rmki.kfki.hu> wrote:
> Lorenzo, why can't you actually generate the graph to find the
> connection components? With the 'igraph' package this is something like:
>
> g <- graph.adjacency( DIST < 0.5, mode="undirected" )
> g <- simplify(g)
> no.clusters(g)
>
> assuming you have your distance matrix in 'DIST'. If N is too big
> then you don't really want to create the distance matrix, but use
> a more sophisticated approach to find the points that are close
> to each other than your threshold; then create the graph.
> So why using clustering algorithms?
>
> Btw. from your vector of points you can create the distance matrix
> by using the 'outer' function.
>
> Btw2. your algorithm for graph generation is similar to geometric
> random graphs, see function 'grg.game' in 'igraph'.
>
> Gabor
>
> On Mon, Jan 07, 2008 at 03:26:57PM +0100, Lorenzo Isella wrote:
> > Dear All,
> > I hope I am not asking a FAQ. I am dealing with a problem of graph
> > theory [connected components in a non-directed graph] and I do not
> > want to rediscover the wheel.
> > I saw a large number of R packages dealing for instance with the
> > k-means method or hierarchical clustering for spatially distributed
> > data and I am basically facing a similar problem.
> > I am given a set of data which are the positions of particles in 3
> > dimensions; I define two particles A and B to be directly connected if
> > their Euclidean distance is below a certain threshold d. If A and B
> > are directly connected and B and C are directly connected, then A,B
> > and C are connected components (physically it means that they are
> > members of the same cluster).
> > All my N particles then split into k disjointed clusters, each with a
> > certain number of connected components, and this is what I want to
> > investigate.
> > I do not know a priori how many clusters I have (this is my problem
> > with e.g. k-means since k is an output for me); the only input is the
> > set of 3-dimensional particle positions and a threshold distance.
> > The algorithm/package I am looking should return the number of
> > clusters and the composition of each cluster, e.g. the fact that the
> > second cluster is made up of particles {R,T,L}.
> > Consider for instance:
> >
> > # a 2-dimensional example
> > x <- rbind(matrix(rnorm(100, sd = 0.3), ncol = 2),
> >            matrix(rnorm(100, mean = 1, sd = 0.3), ncol = 2))
> > colnames(x) <- c("x", "y")
> >
> > How can I then find out how many connected components I have when my
> > threshold distance is d=0.5?
> >
> > Many thanks
> >
> > Lorenzo
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> --
> Csardi Gabor <csardi at rmki.kfki.hu>    MTA RMKI, ELTE TTK
>




More information about the R-help mailing list