[R] Clustering techniques using R

ngottlieb at marinercapital.com ngottlieb at marinercapital.com
Wed Oct 10 21:36:30 CEST 2007


Maura:

I looked at the scatter plots you sent.

A few thoughts:

1- Patient 3 data has a lot of missing data. This will make
   doing a good grouping against your cases an issue.
   Missing data is so common and much work has been done in this area.

One can do the trivial approach, forward fill and backward fill the
sample data
thus have same amount of data for all cases. 

The more advanced approaches are, "Expectation-Maximization algorithm",
a Google search on EM Algorithm will provide you a lot of info.
Another approach is called, "Multiple Imputation"
(http://www.multiple-imputation.com/).
EM for your type of data appears to be a good solution.

2- Looking at your data, Principal Component Analysis (PCA) appears to
be your best starting point before
   clustering. Many books on this subject but start with these simple
links:
 	http://en.wikipedia.org/wiki/Karhunen-Lo%C3%A8ve_transform

	
http://csnet.otago.ac.nz/cosc453/student_tutorials/principal_components.
pdf

All the methods mentioned above will be in R... PCA, EM.

Finally, there is no one right answer for clustering, I.e. single
linkage, Complete linkage, Ward's Method et al.
It's always particular to the type of data one is analyzing.

Naturally our fellow R community members might have more and better
insights/suggestion! :)

Hope this helps.

Regards,
Neil


-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On Behalf Of Prof Brian Ripley
Sent: Monday, October 01, 2007 1:37 PM
To: Maura E Monville
Cc: r-help at stat.math.ethz.ch
Subject: Re: [R] Clustering techniques using R

On Mon, 1 Oct 2007, Maura E Monville wrote:

> Now that I've loaded a file into an R data.frame  and played with 
> linear regression until I got a good model, my next step is clustering

> using the coefficients of the regression model (I have many files) 
> Thanks to some R experts'  guidelines I could find plenty of 
> documentation on regression analysis in the "contributed" section.
> Some touch on the concepts of the underlying theory and then show some

> worked out examples (extremely useful).
> I found nothing so nicely explained and laid out about cluster
analysis with R.
> I would appreciate some suggestion about reading on techniques for 
> clustering using R. Some application examples are very welcome.

Have you looked at MASS (the book, see the FAQ)?
Or the CRAN task views at

http://cran.r-project.org/src/contrib/Views/Cluster.html
http://cran.r-project.org/src/contrib/Views/Multivariate.html
(Clustering is 'unsupervied classification')?

There is a lot of information there.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
--------------------------------------------------------



This information is being sent at the recipient's reques...{{dropped:16}}



More information about the R-help mailing list