[R] silhouette: clustering labels have to be consecutive intergers starting from 1?
Prof Brian Ripley
ripley at stats.ox.ac.uk
Wed Oct 10 06:33:03 CEST 2007
It is a C-level problem in package cluster: valgrind gives
==11377== Invalid write of size 8
==11377== at 0xA4015D3: sildist (sildist.c:35)
==11377== by 0x4706D8: do_dotCode (dotcode.c:1750)
This is a matter for the package maintainer (Cc:ed here), not R-help.
On Tue, 9 Oct 2007, Benilton Carvalho wrote:
> that happened to me with R-2.4.0 (alpha) and was fixed on R-2.4.0
> (final)...
>
> http://tolstoy.newcastle.edu.au/R/e2/help/06/11/5061.html
>
> then i stopped using... now, the problem seems to be back. The same
> examples still apply.
>
> This fails:
>
> require(cluster)
> set.seed(1)
> x <- rnorm(100)
> g <- sample(2:4, 100, rep=T)
> for (i in 1:100){
> print(i)
> tmp <- silhouette(g, dist(x))
> }
>
> and this works:
>
> require(cluster)
> set.seed(1)
> x <- rnorm(100)
> g <- sample(2:4, 100, rep=T)
> for (i in 1:100){
> print(i)
> tmp <- silhouette(as.integer(factor(g)), dist(x))
> }
>
> and here's the sessionInfo():
>
> > sessionInfo()
> R version 2.6.0 (2007-10-03)
> x86_64-unknown-linux-gnu
>
> locale:
> LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.U
> TF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-
> 8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_ID
> ENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] cluster_1.11.9
>
>
> (Red Hat EL 2.6.9-42 smp - AMD opteron 848)
>
> b
>
> On Oct 9, 2007, at 8:35 PM, Tao Shi wrote:
>
>> Hi list,
>>
>> When I was using 'silhouette' from the 'cluster' package to
>> calculate clustering performances, R crashed. I traced the problem
>> to the fact that my clustering labels only have 2's and 3's. when
>> I replaced them with 1's and 2's, the problem was solved. Is the
>> function purposely written in this way so when I have clustering
>> labels, "2" and "3", for example, the function somehow takes the
>> 'missing' cluster "2" into account when it calculates silhouette
>> widths?
>>
>> Thanks,
>>
>> ....Tao
>>
>> ##============================================
>> ## sorry about the long attachment
>>
>>> R.Version()
>> $platform
>> [1] "i386-pc-mingw32"
>>
>> $arch
>> [1] "i386"
>>
>> $os
>> [1] "mingw32"
>>
>> $system
>> [1] "i386, mingw32"
>>
>> $status
>> [1] ""
>>
>> $major
>> [1] "2"
>>
>> $minor
>> [1] "5.1"
>>
>> $year
>> [1] "2007"
>>
>> $month
>> [1] "06"
>>
>> $day
>> [1] "27"
>>
>> $`svn rev`
>> [1] "42083"
>>
>> $language
>> [1] "R"
>>
>> $version.string
>> [1] "R version 2.5.1 (2007-06-27)"
>>
>>> library(cluster)
>>> cl1 ## clustering labels
>> [1] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 2 2 2 2 2 2 2 2 2
>> [30] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
>> [59] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
>> [88] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
>> [117] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
>> [146] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
>> [175] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
>> [204] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
>>> x1 ## 1-d input vector
>> [1] 1.5707963 1.5707963 1.5707963 1.5707963 1.5707963
>> [6] 1.5707963 1.5707963 1.5707963 1.5707963 1.5707963
>> [11] 1.5707963 1.5707963 1.5707963 1.5707963 1.5707963
>> [16] 1.5707963 1.5707963 1.5707963 1.5707963 1.5707963
>> [21] 1.0163758 0.7657763 0.7370084 0.6999689 0.7366476
>> [26] 0.7883921 0.6925395 0.7729240 0.7202391 0.7910149
>> [31] 0.7397698 0.7958092 0.6978596 0.7350255 0.7294362
>> [36] 0.6125713 0.7174000 0.7413046 0.7044205 0.7568104
>> [41] 0.7048469 0.7334515 0.7143170 0.7002311 0.7540981
>> [46] 0.7627527 0.7712762 0.8193611 0.7801148 0.9061762
>> [51] 0.8248195 0.7932630 0.7248037 0.7423547 0.6419314
>> [56] 0.6001092 0.7572272 0.7631742 0.7085384 0.8710853
>> [61] 0.6589563 0.7464943 0.7487340 0.7751280 0.7946542
>> [66] 0.7666081 0.8508109 0.8314308 0.7442471 0.8006093
>> [71] 0.7949156 0.7852447 0.7630048 0.7104764 0.6768218
>> [76] 0.6806351 0.7255355 0.7431389 0.7523627 0.7670515
>> [81] 0.8118214 0.7215615 0.8186164 0.6941610 0.8285453
>> [86] 0.8395170 0.8088044 0.8182706 0.7550723 0.7948639
>> [91] 0.7204830 0.7109068 0.7756949 0.6837856 0.7055604
>> [96] 0.6126666 0.7201964 0.6849890 0.7779753 0.7845284
>> [101] 0.9370788 0.8242935 0.6908860 0.6446151 0.7660386
>> [106] 0.8141526 0.8111984 0.8624186 0.7865335 0.8213035
>> [111] 0.8059171 0.6735751 0.7815353 0.6972508 0.6699396
>> [116] 0.6293971 0.7475913 0.7700821 0.8258339 0.8096144
>> [121] 0.7058171 0.7516635 0.7323909 0.7229136 0.8344846
>> [126] 0.7205433 0.8287774 0.8322097 0.7767547 0.7402277
>> [131] 0.7939879 0.7797308 0.7112453 0.7091554 0.6417382
>> [136] 0.6369171 0.7059020 0.7496380 0.7298359 0.8202566
>> [141] 0.7331830 0.7344492 0.8316894 0.7323979 0.7977615
>> [146] 0.7841205 0.7587060 0.8056685 0.7895643 0.8140731
>> [151] 0.7890221 0.8016008 0.7381577 0.6936453 0.7133525
>> [156] 0.7121459 0.6851448 0.7946275 0.8077618 0.7899059
>> [161] 0.7128826 0.7546289 0.7042451 0.6606403 0.7525233
>> [166] 0.7527548 0.8098887 0.8254190 0.7873064 0.8139340
>> [171] 0.7903462 0.8377651 0.6709983 0.7423632 0.6632082
>> [176] 0.5676717 0.6925125 0.7077083 0.7488877 0.7630604
>> [181] 0.7843001 0.7524471 0.6871823 0.7144443 0.7692206
>> [186] 0.8690710 0.9282786 0.7844991 0.7094671 0.7578409
>> [191] 0.8026643 0.7759241 0.6997376 0.6167209 0.6682289
>> [196] 0.6572018 0.7615807 0.7415752 0.7659161 0.7040360
>> [201] 0.6874460 0.7052109 0.8290970 0.6915149 0.7173107
>> [206] 0.7848961 0.7943846 0.8437946 0.7817344 0.8867006
>> [211] 0.7575857 0.8390473 0.7382348 0.6789859 0.7129010
>> [216] 0.6938173 0.7384170 0.6747648 0.7203337 0.7278963
>>> silhouette(cl1, dist(x1)^2) ##### CRASHED! ######
>>> silhouette(ifelse(cl1==3,2,1), dist(x1)^2)
>> cluster neighbor sil_width
>> [1,] 2 1 1.0000000
>> [2,] 2 1 1.0000000
>> [3,] 2 1 1.0000000
>> [4,] 2 1 1.0000000
>> [5,] 2 1 1.0000000
>> [6,] 2 1 1.0000000
>> [7,] 2 1 1.0000000
>> [8,] 2 1 1.0000000
>> [9,] 2 1 1.0000000
>> [10,] 2 1 1.0000000
>> [11,] 2 1 1.0000000
>> [12,] 2 1 1.0000000
>> [13,] 2 1 1.0000000
>> [14,] 2 1 1.0000000
>> [15,] 2 1 1.0000000
>> [16,] 2 1 1.0000000
>> [17,] 2 1 1.0000000
>> [18,] 2 1 1.0000000
>> [19,] 2 1 1.0000000
>> [20,] 2 1 1.0000000
>> [21,] 1 2 0.7592857
>> [22,] 1 2 0.9934455
>> [23,] 1 2 0.9937880
>> [24,] 1 2 0.9909544
>> [25,] 1 2 0.9937769
>> [26,] 1 2 0.9912442
>> [27,] 1 2 0.9900156
>> [28,] 1 2 0.9929499
>> [29,] 1 2 0.9929125
>> [30,] 1 2 0.9908637
>> [31,] 1 2 0.9938610
>> [32,] 1 2 0.9900958
>> [33,] 1 2 0.9906993
>> [34,] 1 2 0.9937227
>> [35,] 1 2 0.9934823
>> [36,] 1 2 0.9740954
>> [37,] 1 2 0.9926948
>> [38,] 1 2 0.9938924
>> [39,] 1 2 0.9914623
>> [40,] 1 2 0.9938250
>> [41,] 1 2 0.9915088
>> [42,] 1 2 0.9936633
>> [43,] 1 2 0.9924367
>> [44,] 1 2 0.9909855
>> [45,] 1 2 0.9938891
>> [46,] 1 2 0.9936028
>> [47,] 1 2 0.9930799
>> [48,] 1 2 0.9848568
>> [49,] 1 2 0.9922685
>> [50,] 1 2 0.9371272
>> [51,] 1 2 0.9832647
>> [52,] 1 2 0.9905154
>> [53,] 1 2 0.9932217
>> [54,] 1 2 0.9939101
>> [55,] 1 2 0.9810071
>> [56,] 1 2 0.9708675
>> [57,] 1 2 0.9938131
>> [58,] 1 2 0.9935827
>> [59,] 1 2 0.9918943
>> [60,] 1 2 0.9628701
>> [61,] 1 2 0.9844965
>> [62,] 1 2 0.9939491
>> [63,] 1 2 0.9939495
>> [64,] 1 2 0.9927610
>> [65,] 1 2 0.9902895
>> [66,] 1 2 0.9933968
>> [67,] 1 2 0.9734481
>> [68,] 1 2 0.9811285
>> [69,] 1 2 0.9939341
>> [70,] 1 2 0.9892304
>> [71,] 1 2 0.9902461
>> [72,] 1 2 0.9916649
>> [73,] 1 2 0.9935909
>> [74,] 1 2 0.9920846
>> [75,] 1 2 0.9876779
>> [76,] 1 2 0.9882868
>> [77,] 1 2 0.9932665
>> [78,] 1 2 0.9939213
>> [79,] 1 2 0.9939182
>> [80,] 1 2 0.9933699
>> [81,] 1 2 0.9868129
>> [82,] 1 2 0.9930074
>> [83,] 1 2 0.9850624
>> [84,] 1 2 0.9902300
>> [85,] 1 2 0.9820895
>> [86,] 1 2 0.9781906
>> [87,] 1 2 0.9875197
>> [88,] 1 2 0.9851569
>> [89,] 1 2 0.9938688
>> [90,] 1 2 0.9902547
>> [91,] 1 2 0.9929304
>> [92,] 1 2 0.9921257
>> [93,] 1 2 0.9927096
>> [94,] 1 2 0.9887702
>> [95,] 1 2 0.9915856
>> [96,] 1 2 0.9741195
>> [97,] 1 2 0.9929094
>> [98,] 1 2 0.9889500
>> [99,] 1 2 0.9924910
>> [100,] 1 2 0.9917552
>> [101,] 1 2 0.9047049
>> [102,] 1 2 0.9834247
>> [103,] 1 2 0.9897916
>> [104,] 1 2 0.9815845
>> [105,] 1 2 0.9934304
>> [106,] 1 2 0.9862375
>> [107,] 1 2 0.9869624
>> [108,] 1 2 0.9677353
>> [109,] 1 2 0.9914973
>> [110,] 1 2 0.9843076
>> [111,] 1 2 0.9881568
>> [112,] 1 2 0.9871393
>> [113,] 1 2 0.9921114
>> [114,] 1 2 0.9906240
>> [115,] 1 2 0.9865148
>> [116,] 1 2 0.9781846
>> [117,] 1 2 0.9939511
>> [118,] 1 2 0.9931681
>> [119,] 1 2 0.9829519
>> [120,] 1 2 0.9873341
>> [121,] 1 2 0.9916130
>> [122,] 1 2 0.9939273
>> [123,] 1 2 0.9936196
>> [124,] 1 2 0.9930999
>> [125,] 1 2 0.9800620
>> [126,] 1 2 0.9929347
>> [127,] 1 2 0.9820138
>> [128,] 1 2 0.9808614
>> [129,] 1 2 0.9926103
>> [130,] 1 2 0.9938711
>> [131,] 1 2 0.9903987
>> [132,] 1 2 0.9923097
>> [133,] 1 2 0.9921578
>> [134,] 1 2 0.9919558
>> [135,] 1 2 0.9809652
>> [136,] 1 2 0.9799023
>> [137,] 1 2 0.9916220
>> [138,] 1 2 0.9939454
>> [139,] 1 2 0.9935022
>> [140,] 1 2 0.9846059
>> [141,] 1 2 0.9936526
>> [142,] 1 2 0.9937017
>> [143,] 1 2 0.9810402
>> [144,] 1 2 0.9936199
>> [145,] 1 2 0.9897557
>> [146,] 1 2 0.9918058
>> [147,] 1 2 0.9937665
>> [148,] 1 2 0.9882099
>> [149,] 1 2 0.9910776
>> [150,] 1 2 0.9862575
>> [151,] 1 2 0.9911553
>> [152,] 1 2 0.9890393
>> [153,] 1 2 0.9938209
>> [154,] 1 2 0.9901624
>> [155,] 1 2 0.9923515
>> [156,] 1 2 0.9922418
>> [157,] 1 2 0.9889731
>> [158,] 1 2 0.9902939
>> [159,] 1 2 0.9877542
>> [160,] 1 2 0.9910280
>> [161,] 1 2 0.9923092
>> [162,] 1 2 0.9938784
>> [163,] 1 2 0.9914431
>> [164,] 1 2 0.9848184
>> [165,] 1 2 0.9939159
>> [166,] 1 2 0.9939125
>> [167,] 1 2 0.9872706
>> [168,] 1 2 0.9830805
>> [169,] 1 2 0.9913937
>> [170,] 1 2 0.9862925
>> [171,] 1 2 0.9909633
>> [172,] 1 2 0.9788584
>> [173,] 1 2 0.9866989
>> [174,] 1 2 0.9939102
>> [175,] 1 2 0.9853007
>> [176,] 1 2 0.9617883
>> [177,] 1 2 0.9900120
>> [178,] 1 2 0.9918102
>> [179,] 1 2 0.9939489
>> [180,] 1 2 0.9935882
>> [181,] 1 2 0.9917836
>> [182,] 1 2 0.9939170
>> [183,] 1 2 0.9892708
>> [184,] 1 2 0.9924478
>> [185,] 1 2 0.9932287
>> [186,] 1 2 0.9640487
>> [187,] 1 2 0.9150126
>> [188,] 1 2 0.9917589
>> [189,] 1 2 0.9919865
>> [190,] 1 2 0.9937946
>> [191,] 1 2 0.9888295
>> [192,] 1 2 0.9926884
>> [193,] 1 2 0.9909269
>> [194,] 1 2 0.9751339
>> [195,] 1 2 0.9862132
>> [196,] 1 2 0.9841566
>> [197,] 1 2 0.9936557
>> [198,] 1 2 0.9938973
>> [199,] 1 2 0.9934375
>> [200,] 1 2 0.9914201
>> [201,] 1 2 0.9893087
>> [202,] 1 2 0.9915481
>> [203,] 1 2 0.9819092
>> [204,] 1 2 0.9898774
>> [205,] 1 2 0.9926876
>> [206,] 1 2 0.9917091
>> [207,] 1 2 0.9903339
>> [208,] 1 2 0.9764847
>> [209,] 1 2 0.9920887
>> [210,] 1 2 0.9526866
>> [211,] 1 2 0.9938025
>> [212,] 1 2 0.9783714
>> [213,] 1 2 0.9938230
>> [214,] 1 2 0.9880267
>> [215,] 1 2 0.9923108
>> [216,] 1 2 0.9901850
>> [217,] 1 2 0.9938279
>> [218,] 1 2 0.9873388
>> [219,] 1 2 0.9929195
>> [220,] 1 2 0.9934017
>> attr(,"Ordered")
>> [1] FALSE
>> attr(,"call")
>> silhouette.default(x = ifelse(cl1 == 3, 2, 1), dist = dist(x1)^2)
>> attr(,"class")
>> [1] "silhouette"
>>
>> ## other examples
>>> set.seed(1234)
>>> cl.tmp <- rep(2:3, each=5)
>>> x.tmp <- c(rep(-1,5), abs(rnorm(5)+3))
>>> silhouette(cl.tmp, dist(x.tmp))
>> cluster neighbor sil_width
>> [1,] 2 1 NaN
>> [2,] 2 1 NaN
>> [3,] 2 1 NaN
>> [4,] 2 1 NaN
>> [5,] 2 1 NaN
>> [6,] 3 2 -0.5736515
>> [7,] 3 2 -0.1557143
>> [8,] 3 2 -0.2922523
>> [9,] 3 2 -0.8340174
>> [10,] 3 2 -0.1511875
>> attr(,"Ordered")
>> [1] FALSE
>> attr(,"call")
>> silhouette.default(x = cl.tmp, dist = dist(x.tmp))
>> attr(,"class")
>> [1] "silhouette"
>>> silhouette(ifelse(cl.tmp==2,1,2), dist(x.tmp))
>> cluster neighbor sil_width
>> [1,] 1 2 1.0000000
>> [2,] 1 2 1.0000000
>> [3,] 1 2 1.0000000
>> [4,] 1 2 1.0000000
>> [5,] 1 2 1.0000000
>> [6,] 2 1 0.4136253
>> [7,] 2 1 0.7038917
>> [8,] 2 1 0.6467668
>> [9,] 2 1 -0.3360695
>> [10,] 2 1 0.7054709
>> attr(,"Ordered")
>> [1] FALSE
>> attr(,"call")
>> silhouette.default(x = ifelse(cl.tmp == 2, 1, 2), dist = dist(x.tmp))
>> attr(,"class")
>> [1] "silhouette"
>>> silhouette(ifelse(cl.tmp==2,1,3), dist(x.tmp))
>> cluster neighbor sil_width
>> [1,] 1 2 NaN
>> [2,] 1 2 NaN
>> [3,] 1 2 NaN
>> [4,] 1 2 NaN
>> [5,] 1 2 NaN
>> [6,] 3 1 -0.7694686
>> [7,] 3 1 -0.8167313
>> [8,] 3 1 -0.6054665
>> [9,] 3 1 -0.9037412
>> [10,] 3 1 0.1875360
>> attr(,"Ordered")
>> [1] FALSE
>> attr(,"call")
>> silhouette.default(x = ifelse(cl.tmp == 2, 1, 3), dist = dist(x.tmp))
>> attr(,"class")
>> [1] "silhouette"
>>
>> _________________________________________________________________
>>
>> It?s free. http://im.live.com/messenger/im/home/?source=TAGHM
>>
>> <mime-attachment.txt>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help
mailing list