[R] R and Clusters

Christian Hennig chrish at stats.ucl.ac.uk
Mon Jan 7 15:51:54 CET 2008


Dear Lorenzo,

if I understand your posting correctly, this is exactly what Single 
Linkage clustering does if you cut the dendrogram tree at your threshold 
distance.

Therefore you can use hclust with method="single" (which produces the full 
dendrogram; you have to generate the Euclidean distances first) and then 
you feed the output into cutree with h=0.5 (your 
threshold), which produces a partition by cutting the 
dendrogram at the height 0.5.

Hope this helps,
Christian

On Mon, 7 Jan 2008, Lorenzo Isella wrote:

> Dear All,
> I hope I am not asking a FAQ. I am dealing with a problem of graph
> theory [connected components in a non-directed graph] and I do not
> want to rediscover the wheel.
> I saw a large number of R packages dealing for instance with the
> k-means method or hierarchical clustering for spatially distributed
> data and I am basically facing a similar problem.
> I am given a set of data which are the positions of particles in 3
> dimensions; I define two particles A and B to be directly connected if
> their Euclidean distance is below a certain threshold d. If A and B
> are directly connected and B and C are directly connected, then A,B
> and C are connected components (physically it means that they are
> members of the same cluster).
> All my N particles then split into k disjointed clusters, each with a
> certain number of connected components, and this is what I want to
> investigate.
> I do not know a priori how many clusters I have (this is my problem
> with e.g. k-means since k is an output for me); the only input is the
> set of 3-dimensional particle positions and a threshold distance.
> The algorithm/package I am looking should return the number of
> clusters and the composition of each cluster, e.g. the fact that the
> second cluster is made up of particles {R,T,L}.
> Consider for instance:
>
> # a 2-dimensional example
> x <- rbind(matrix(rnorm(100, sd = 0.3), ncol = 2),
>           matrix(rnorm(100, mean = 1, sd = 0.3), ncol = 2))
> colnames(x) <- c("x", "y")
>
> How can I then find out how many connected components I have when my
> threshold distance is d=0.5?
>
> Many thanks
>
> Lorenzo
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

*** --- ***
Christian Hennig
University College London, Department of Statistical Science
Gower St., London WC1E 6BT, phone +44 207 679 1698
chrish at stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche




More information about the R-help mailing list