[R] silhouette: clustering labels have to be consecutive intergers starting

Tao Shi shitao at hotmail.com
Wed Oct 10 08:15:53 CEST 2007


Thank you very much,  Benilton and Prof. Ripley, for the speedy replies!  
Looking forward to the fix!

....Tao


>From: Prof Brian Ripley <ripley at stats.ox.ac.uk>
>To: Benilton Carvalho <bcarvalh at jhsph.edu>
>CC: Tao Shi <shitao at hotmail.com>, maechler at stat.math.ethz.ch,        
>r-help at r-project.org
>Subject: Re: [R] silhouette: clustering labels have to be consecutive 
>intergers starting from 1?
>Date: Wed, 10 Oct 2007 05:33:03 +0100 (BST)
>
>It is a C-level problem in package cluster: valgrind gives
>
>==11377== Invalid write of size 8
>==11377==    at 0xA4015D3: sildist (sildist.c:35)
>==11377==    by 0x4706D8: do_dotCode (dotcode.c:1750)
>
>This is a matter for the package maintainer (Cc:ed here), not R-help.
>
>On Tue, 9 Oct 2007, Benilton Carvalho wrote:
>
>>that happened to me with R-2.4.0 (alpha) and was fixed on R-2.4.0
>>(final)...
>>
>>http://tolstoy.newcastle.edu.au/R/e2/help/06/11/5061.html
>>
>>then i stopped using... now, the problem seems to be back. The same
>>examples still apply.
>>
>>This fails:
>>
>>require(cluster)
>>set.seed(1)
>>x <- rnorm(100)
>>g <- sample(2:4, 100, rep=T)
>>for (i in 1:100){
>>   print(i)
>>   tmp <- silhouette(g, dist(x))
>>}
>>
>>and this works:
>>
>>require(cluster)
>>set.seed(1)
>>x <- rnorm(100)
>>g <- sample(2:4, 100, rep=T)
>>for (i in 1:100){
>>   print(i)
>>   tmp <- silhouette(as.integer(factor(g)), dist(x))
>>}
>>
>>and here's the sessionInfo():
>>
>> > sessionInfo()
>>R version 2.6.0 (2007-10-03)
>>x86_64-unknown-linux-gnu
>>
>>locale:
>>LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.U
>>TF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-
>>8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_ID
>>ENTIFICATION=C
>>
>>attached base packages:
>>[1] stats     graphics  grDevices utils     datasets  methods   base
>>
>>other attached packages:
>>[1] cluster_1.11.9
>>
>>
>>(Red Hat EL 2.6.9-42 smp - AMD opteron 848)
>>
>>b
>>
>>On Oct 9, 2007, at 8:35 PM, Tao Shi wrote:
>>
>>>Hi list,
>>>
>>>When I was using 'silhouette' from the 'cluster' package to
>>>calculate clustering performances, R crashed.  I traced the problem
>>>to the fact that my clustering labels only have 2's and 3's.  when
>>>I replaced them with 1's and 2's, the problem was solved.  Is the
>>>function purposely written in this way so when I have clustering
>>>labels, "2" and "3", for example, the function somehow takes the
>>>'missing' cluster "2" into account when it calculates silhouette
>>>widths?
>>>
>>>Thanks,
>>>
>>>....Tao
>>>
>>>##============================================
>>>## sorry about the long attachment
>>>
>>>>R.Version()
>>>$platform
>>>[1] "i386-pc-mingw32"
>>>
>>>$arch
>>>[1] "i386"
>>>
>>>$os
>>>[1] "mingw32"
>>>
>>>$system
>>>[1] "i386, mingw32"
>>>
>>>$status
>>>[1] ""
>>>
>>>$major
>>>[1] "2"
>>>
>>>$minor
>>>[1] "5.1"
>>>
>>>$year
>>>[1] "2007"
>>>
>>>$month
>>>[1] "06"
>>>
>>>$day
>>>[1] "27"
>>>
>>>$`svn rev`
>>>[1] "42083"
>>>
>>>$language
>>>[1] "R"
>>>
>>>$version.string
>>>[1] "R version 2.5.1 (2007-06-27)"
>>>
>>>>library(cluster)
>>>>cl1   ## clustering labels
>>>  [1] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 2 2 2 2 2 2 2 2 2
>>>[30] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
>>>[59] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
>>>[88] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
>>>[117] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
>>>[146] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
>>>[175] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
>>>[204] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
>>>>x1  ## 1-d input vector
>>>  [1] 1.5707963 1.5707963 1.5707963 1.5707963 1.5707963
>>>  [6] 1.5707963 1.5707963 1.5707963 1.5707963 1.5707963
>>>[11] 1.5707963 1.5707963 1.5707963 1.5707963 1.5707963
>>>[16] 1.5707963 1.5707963 1.5707963 1.5707963 1.5707963
>>>[21] 1.0163758 0.7657763 0.7370084 0.6999689 0.7366476
>>>[26] 0.7883921 0.6925395 0.7729240 0.7202391 0.7910149
>>>[31] 0.7397698 0.7958092 0.6978596 0.7350255 0.7294362
>>>[36] 0.6125713 0.7174000 0.7413046 0.7044205 0.7568104
>>>[41] 0.7048469 0.7334515 0.7143170 0.7002311 0.7540981
>>>[46] 0.7627527 0.7712762 0.8193611 0.7801148 0.9061762
>>>[51] 0.8248195 0.7932630 0.7248037 0.7423547 0.6419314
>>>[56] 0.6001092 0.7572272 0.7631742 0.7085384 0.8710853
>>>[61] 0.6589563 0.7464943 0.7487340 0.7751280 0.7946542
>>>[66] 0.7666081 0.8508109 0.8314308 0.7442471 0.8006093
>>>[71] 0.7949156 0.7852447 0.7630048 0.7104764 0.6768218
>>>[76] 0.6806351 0.7255355 0.7431389 0.7523627 0.7670515
>>>[81] 0.8118214 0.7215615 0.8186164 0.6941610 0.8285453
>>>[86] 0.8395170 0.8088044 0.8182706 0.7550723 0.7948639
>>>[91] 0.7204830 0.7109068 0.7756949 0.6837856 0.7055604
>>>[96] 0.6126666 0.7201964 0.6849890 0.7779753 0.7845284
>>>[101] 0.9370788 0.8242935 0.6908860 0.6446151 0.7660386
>>>[106] 0.8141526 0.8111984 0.8624186 0.7865335 0.8213035
>>>[111] 0.8059171 0.6735751 0.7815353 0.6972508 0.6699396
>>>[116] 0.6293971 0.7475913 0.7700821 0.8258339 0.8096144
>>>[121] 0.7058171 0.7516635 0.7323909 0.7229136 0.8344846
>>>[126] 0.7205433 0.8287774 0.8322097 0.7767547 0.7402277
>>>[131] 0.7939879 0.7797308 0.7112453 0.7091554 0.6417382
>>>[136] 0.6369171 0.7059020 0.7496380 0.7298359 0.8202566
>>>[141] 0.7331830 0.7344492 0.8316894 0.7323979 0.7977615
>>>[146] 0.7841205 0.7587060 0.8056685 0.7895643 0.8140731
>>>[151] 0.7890221 0.8016008 0.7381577 0.6936453 0.7133525
>>>[156] 0.7121459 0.6851448 0.7946275 0.8077618 0.7899059
>>>[161] 0.7128826 0.7546289 0.7042451 0.6606403 0.7525233
>>>[166] 0.7527548 0.8098887 0.8254190 0.7873064 0.8139340
>>>[171] 0.7903462 0.8377651 0.6709983 0.7423632 0.6632082
>>>[176] 0.5676717 0.6925125 0.7077083 0.7488877 0.7630604
>>>[181] 0.7843001 0.7524471 0.6871823 0.7144443 0.7692206
>>>[186] 0.8690710 0.9282786 0.7844991 0.7094671 0.7578409
>>>[191] 0.8026643 0.7759241 0.6997376 0.6167209 0.6682289
>>>[196] 0.6572018 0.7615807 0.7415752 0.7659161 0.7040360
>>>[201] 0.6874460 0.7052109 0.8290970 0.6915149 0.7173107
>>>[206] 0.7848961 0.7943846 0.8437946 0.7817344 0.8867006
>>>[211] 0.7575857 0.8390473 0.7382348 0.6789859 0.7129010
>>>[216] 0.6938173 0.7384170 0.6747648 0.7203337 0.7278963
>>>>  silhouette(cl1, dist(x1)^2)  #####  CRASHED! ######
>>>>silhouette(ifelse(cl1==3,2,1), dist(x1)^2)
>>>       cluster neighbor sil_width
>>>  [1,]       2        1 1.0000000
>>>  [2,]       2        1 1.0000000
>>>  [3,]       2        1 1.0000000
>>>  [4,]       2        1 1.0000000
>>>  [5,]       2        1 1.0000000
>>>  [6,]       2        1 1.0000000
>>>  [7,]       2        1 1.0000000
>>>  [8,]       2        1 1.0000000
>>>  [9,]       2        1 1.0000000
>>>[10,]       2        1 1.0000000
>>>[11,]       2        1 1.0000000
>>>[12,]       2        1 1.0000000
>>>[13,]       2        1 1.0000000
>>>[14,]       2        1 1.0000000
>>>[15,]       2        1 1.0000000
>>>[16,]       2        1 1.0000000
>>>[17,]       2        1 1.0000000
>>>[18,]       2        1 1.0000000
>>>[19,]       2        1 1.0000000
>>>[20,]       2        1 1.0000000
>>>[21,]       1        2 0.7592857
>>>[22,]       1        2 0.9934455
>>>[23,]       1        2 0.9937880
>>>[24,]       1        2 0.9909544
>>>[25,]       1        2 0.9937769
>>>[26,]       1        2 0.9912442
>>>[27,]       1        2 0.9900156
>>>[28,]       1        2 0.9929499
>>>[29,]       1        2 0.9929125
>>>[30,]       1        2 0.9908637
>>>[31,]       1        2 0.9938610
>>>[32,]       1        2 0.9900958
>>>[33,]       1        2 0.9906993
>>>[34,]       1        2 0.9937227
>>>[35,]       1        2 0.9934823
>>>[36,]       1        2 0.9740954
>>>[37,]       1        2 0.9926948
>>>[38,]       1        2 0.9938924
>>>[39,]       1        2 0.9914623
>>>[40,]       1        2 0.9938250
>>>[41,]       1        2 0.9915088
>>>[42,]       1        2 0.9936633
>>>[43,]       1        2 0.9924367
>>>[44,]       1        2 0.9909855
>>>[45,]       1        2 0.9938891
>>>[46,]       1        2 0.9936028
>>>[47,]       1        2 0.9930799
>>>[48,]       1        2 0.9848568
>>>[49,]       1        2 0.9922685
>>>[50,]       1        2 0.9371272
>>>[51,]       1        2 0.9832647
>>>[52,]       1        2 0.9905154
>>>[53,]       1        2 0.9932217
>>>[54,]       1        2 0.9939101
>>>[55,]       1        2 0.9810071
>>>[56,]       1        2 0.9708675
>>>[57,]       1        2 0.9938131
>>>[58,]       1        2 0.9935827
>>>[59,]       1        2 0.9918943
>>>[60,]       1        2 0.9628701
>>>[61,]       1        2 0.9844965
>>>[62,]       1        2 0.9939491
>>>[63,]       1        2 0.9939495
>>>[64,]       1        2 0.9927610
>>>[65,]       1        2 0.9902895
>>>[66,]       1        2 0.9933968
>>>[67,]       1        2 0.9734481
>>>[68,]       1        2 0.9811285
>>>[69,]       1        2 0.9939341
>>>[70,]       1        2 0.9892304
>>>[71,]       1        2 0.9902461
>>>[72,]       1        2 0.9916649
>>>[73,]       1        2 0.9935909
>>>[74,]       1        2 0.9920846
>>>[75,]       1        2 0.9876779
>>>[76,]       1        2 0.9882868
>>>[77,]       1        2 0.9932665
>>>[78,]       1        2 0.9939213
>>>[79,]       1        2 0.9939182
>>>[80,]       1        2 0.9933699
>>>[81,]       1        2 0.9868129
>>>[82,]       1        2 0.9930074
>>>[83,]       1        2 0.9850624
>>>[84,]       1        2 0.9902300
>>>[85,]       1        2 0.9820895
>>>[86,]       1        2 0.9781906
>>>[87,]       1        2 0.9875197
>>>[88,]       1        2 0.9851569
>>>[89,]       1        2 0.9938688
>>>[90,]       1        2 0.9902547
>>>[91,]       1        2 0.9929304
>>>[92,]       1        2 0.9921257
>>>[93,]       1        2 0.9927096
>>>[94,]       1        2 0.9887702
>>>[95,]       1        2 0.9915856
>>>[96,]       1        2 0.9741195
>>>[97,]       1        2 0.9929094
>>>[98,]       1        2 0.9889500
>>>[99,]       1        2 0.9924910
>>>[100,]       1        2 0.9917552
>>>[101,]       1        2 0.9047049
>>>[102,]       1        2 0.9834247
>>>[103,]       1        2 0.9897916
>>>[104,]       1        2 0.9815845
>>>[105,]       1        2 0.9934304
>>>[106,]       1        2 0.9862375
>>>[107,]       1        2 0.9869624
>>>[108,]       1        2 0.9677353
>>>[109,]       1        2 0.9914973
>>>[110,]       1        2 0.9843076
>>>[111,]       1        2 0.9881568
>>>[112,]       1        2 0.9871393
>>>[113,]       1        2 0.9921114
>>>[114,]       1        2 0.9906240
>>>[115,]       1        2 0.9865148
>>>[116,]       1        2 0.9781846
>>>[117,]       1        2 0.9939511
>>>[118,]       1        2 0.9931681
>>>[119,]       1        2 0.9829519
>>>[120,]       1        2 0.9873341
>>>[121,]       1        2 0.9916130
>>>[122,]       1        2 0.9939273
>>>[123,]       1        2 0.9936196
>>>[124,]       1        2 0.9930999
>>>[125,]       1        2 0.9800620
>>>[126,]       1        2 0.9929347
>>>[127,]       1        2 0.9820138
>>>[128,]       1        2 0.9808614
>>>[129,]       1        2 0.9926103
>>>[130,]       1        2 0.9938711
>>>[131,]       1        2 0.9903987
>>>[132,]       1        2 0.9923097
>>>[133,]       1        2 0.9921578
>>>[134,]       1        2 0.9919558
>>>[135,]       1        2 0.9809652
>>>[136,]       1        2 0.9799023
>>>[137,]       1        2 0.9916220
>>>[138,]       1        2 0.9939454
>>>[139,]       1        2 0.9935022
>>>[140,]       1        2 0.9846059
>>>[141,]       1        2 0.9936526
>>>[142,]       1        2 0.9937017
>>>[143,]       1        2 0.9810402
>>>[144,]       1        2 0.9936199
>>>[145,]       1        2 0.9897557
>>>[146,]       1        2 0.9918058
>>>[147,]       1        2 0.9937665
>>>[148,]       1        2 0.9882099
>>>[149,]       1        2 0.9910776
>>>[150,]       1        2 0.9862575
>>>[151,]       1        2 0.9911553
>>>[152,]       1        2 0.9890393
>>>[153,]       1        2 0.9938209
>>>[154,]       1        2 0.9901624
>>>[155,]       1        2 0.9923515
>>>[156,]       1        2 0.9922418
>>>[157,]       1        2 0.9889731
>>>[158,]       1        2 0.9902939
>>>[159,]       1        2 0.9877542
>>>[160,]       1        2 0.9910280
>>>[161,]       1        2 0.9923092
>>>[162,]       1        2 0.9938784
>>>[163,]       1        2 0.9914431
>>>[164,]       1        2 0.9848184
>>>[165,]       1        2 0.9939159
>>>[166,]       1        2 0.9939125
>>>[167,]       1        2 0.9872706
>>>[168,]       1        2 0.9830805
>>>[169,]       1        2 0.9913937
>>>[170,]       1        2 0.9862925
>>>[171,]       1        2 0.9909633
>>>[172,]       1        2 0.9788584
>>>[173,]       1        2 0.9866989
>>>[174,]       1        2 0.9939102
>>>[175,]       1        2 0.9853007
>>>[176,]       1        2 0.9617883
>>>[177,]       1        2 0.9900120
>>>[178,]       1        2 0.9918102
>>>[179,]       1        2 0.9939489
>>>[180,]       1        2 0.9935882
>>>[181,]       1        2 0.9917836
>>>[182,]       1        2 0.9939170
>>>[183,]       1        2 0.9892708
>>>[184,]       1        2 0.9924478
>>>[185,]       1        2 0.9932287
>>>[186,]       1        2 0.9640487
>>>[187,]       1        2 0.9150126
>>>[188,]       1        2 0.9917589
>>>[189,]       1        2 0.9919865
>>>[190,]       1        2 0.9937946
>>>[191,]       1        2 0.9888295
>>>[192,]       1        2 0.9926884
>>>[193,]       1        2 0.9909269
>>>[194,]       1        2 0.9751339
>>>[195,]       1        2 0.9862132
>>>[196,]       1        2 0.9841566
>>>[197,]       1        2 0.9936557
>>>[198,]       1        2 0.9938973
>>>[199,]       1        2 0.9934375
>>>[200,]       1        2 0.9914201
>>>[201,]       1        2 0.9893087
>>>[202,]       1        2 0.9915481
>>>[203,]       1        2 0.9819092
>>>[204,]       1        2 0.9898774
>>>[205,]       1        2 0.9926876
>>>[206,]       1        2 0.9917091
>>>[207,]       1        2 0.9903339
>>>[208,]       1        2 0.9764847
>>>[209,]       1        2 0.9920887
>>>[210,]       1        2 0.9526866
>>>[211,]       1        2 0.9938025
>>>[212,]       1        2 0.9783714
>>>[213,]       1        2 0.9938230
>>>[214,]       1        2 0.9880267
>>>[215,]       1        2 0.9923108
>>>[216,]       1        2 0.9901850
>>>[217,]       1        2 0.9938279
>>>[218,]       1        2 0.9873388
>>>[219,]       1        2 0.9929195
>>>[220,]       1        2 0.9934017
>>>attr(,"Ordered")
>>>[1] FALSE
>>>attr(,"call")
>>>silhouette.default(x = ifelse(cl1 == 3, 2, 1), dist = dist(x1)^2)
>>>attr(,"class")
>>>[1] "silhouette"
>>>
>>>## other examples
>>>>set.seed(1234)
>>>>cl.tmp <- rep(2:3, each=5)
>>>>x.tmp <- c(rep(-1,5), abs(rnorm(5)+3))
>>>>silhouette(cl.tmp, dist(x.tmp))
>>>      cluster neighbor  sil_width
>>>[1,]       2        1        NaN
>>>[2,]       2        1        NaN
>>>[3,]       2        1        NaN
>>>[4,]       2        1        NaN
>>>[5,]       2        1        NaN
>>>[6,]       3        2 -0.5736515
>>>[7,]       3        2 -0.1557143
>>>[8,]       3        2 -0.2922523
>>>[9,]       3        2 -0.8340174
>>>[10,]       3        2 -0.1511875
>>>attr(,"Ordered")
>>>[1] FALSE
>>>attr(,"call")
>>>silhouette.default(x = cl.tmp, dist = dist(x.tmp))
>>>attr(,"class")
>>>[1] "silhouette"
>>>>silhouette(ifelse(cl.tmp==2,1,2), dist(x.tmp))
>>>      cluster neighbor  sil_width
>>>[1,]       1        2  1.0000000
>>>[2,]       1        2  1.0000000
>>>[3,]       1        2  1.0000000
>>>[4,]       1        2  1.0000000
>>>[5,]       1        2  1.0000000
>>>[6,]       2        1  0.4136253
>>>[7,]       2        1  0.7038917
>>>[8,]       2        1  0.6467668
>>>[9,]       2        1 -0.3360695
>>>[10,]       2        1  0.7054709
>>>attr(,"Ordered")
>>>[1] FALSE
>>>attr(,"call")
>>>silhouette.default(x = ifelse(cl.tmp == 2, 1, 2), dist = dist(x.tmp))
>>>attr(,"class")
>>>[1] "silhouette"
>>>>silhouette(ifelse(cl.tmp==2,1,3), dist(x.tmp))
>>>      cluster neighbor  sil_width
>>>[1,]       1        2        NaN
>>>[2,]       1        2        NaN
>>>[3,]       1        2        NaN
>>>[4,]       1        2        NaN
>>>[5,]       1        2        NaN
>>>[6,]       3        1 -0.7694686
>>>[7,]       3        1 -0.8167313
>>>[8,]       3        1 -0.6054665
>>>[9,]       3        1 -0.9037412
>>>[10,]       3        1  0.1875360
>>>attr(,"Ordered")
>>>[1] FALSE
>>>attr(,"call")
>>>silhouette.default(x = ifelse(cl.tmp == 2, 1, 3), dist = dist(x.tmp))
>>>attr(,"class")
>>>[1] "silhouette"
>>>
>>>_________________________________________________________________
>>>
>>>It?s free. http://im.live.com/messenger/im/home/?source=TAGHM
>>>
>>><mime-attachment.txt>
>>
>>______________________________________________
>>R-help at r-project.org mailing list
>>https://stat.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide 
>>http://www.R-project.org/posting-guide.html
>>and provide commented, minimal, self-contained, reproducible code.
>>
>
>--
>Brian D. Ripley,                  ripley at stats.ox.ac.uk
>Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
>University of Oxford,             Tel:  +44 1865 272861 (self)
>1 South Parks Road,                     +44 1865 272866 (PA)
>Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-help mailing list