[R] Clustering problem
    Abhishek Pratap 
    abhishek.vit at gmail.com
       
    Mon Mar 21 18:48:11 CET 2011
    
    
  
Hi Guys
I want to apply a clustering algo to my dataset in order to find the
regions points(X,Y) which have similar values(percent_GC and
mean_phred_quality). Details below.
I have sampled 1% of points from my main data set of 85 million
points.  The result is still somewhat large 800K points and  looks
like following.
     X     Y    percent_GC  mean_phred_quality
1  4286 930       0.50           0.13
2  4825 947       0.50           20.33
3  8207 932       0.32           26.50
4  8451 940       0.48           24.81
5  9331 931       0.38           16.93
6 11501 949       0.49          31.28
What I want to do is find local regions in which I have associations
between these 4 values i.e points X,Y have close correlation with
percent_GC and mean_phred_quality.
PS:  I did calculate the overall pearson correlation coeff between
percent_GC and mean_phred_quality and it is not statistically
significant which got me interested into finding local regions where
it may be.
I would really appreciate your help as I am still a rookie in applying
clustering algorithms.
Thanks!
-Abhi
    
    
More information about the R-help
mailing list