[R] silhouette: clustering labels have to be consecutive intergers starting
Tao Shi
shitao at hotmail.com
Wed Oct 10 08:15:53 CEST 2007
Thank you very much, Benilton and Prof. Ripley, for the speedy replies!
Looking forward to the fix!
....Tao
>From: Prof Brian Ripley <ripley at stats.ox.ac.uk>
>To: Benilton Carvalho <bcarvalh at jhsph.edu>
>CC: Tao Shi <shitao at hotmail.com>, maechler at stat.math.ethz.ch,
>r-help at r-project.org
>Subject: Re: [R] silhouette: clustering labels have to be consecutive
>intergers starting from 1?
>Date: Wed, 10 Oct 2007 05:33:03 +0100 (BST)
>
>It is a C-level problem in package cluster: valgrind gives
>
>==11377== Invalid write of size 8
>==11377== at 0xA4015D3: sildist (sildist.c:35)
>==11377== by 0x4706D8: do_dotCode (dotcode.c:1750)
>
>This is a matter for the package maintainer (Cc:ed here), not R-help.
>
>On Tue, 9 Oct 2007, Benilton Carvalho wrote:
>
>>that happened to me with R-2.4.0 (alpha) and was fixed on R-2.4.0
>>(final)...
>>
>>http://tolstoy.newcastle.edu.au/R/e2/help/06/11/5061.html
>>
>>then i stopped using... now, the problem seems to be back. The same
>>examples still apply.
>>
>>This fails:
>>
>>require(cluster)
>>set.seed(1)
>>x <- rnorm(100)
>>g <- sample(2:4, 100, rep=T)
>>for (i in 1:100){
>> print(i)
>> tmp <- silhouette(g, dist(x))
>>}
>>
>>and this works:
>>
>>require(cluster)
>>set.seed(1)
>>x <- rnorm(100)
>>g <- sample(2:4, 100, rep=T)
>>for (i in 1:100){
>> print(i)
>> tmp <- silhouette(as.integer(factor(g)), dist(x))
>>}
>>
>>and here's the sessionInfo():
>>
>> > sessionInfo()
>>R version 2.6.0 (2007-10-03)
>>x86_64-unknown-linux-gnu
>>
>>locale:
>>LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.U
>>TF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-
>>8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_ID
>>ENTIFICATION=C
>>
>>attached base packages:
>>[1] stats graphics grDevices utils datasets methods base
>>
>>other attached packages:
>>[1] cluster_1.11.9
>>
>>
>>(Red Hat EL 2.6.9-42 smp - AMD opteron 848)
>>
>>b
>>
>>On Oct 9, 2007, at 8:35 PM, Tao Shi wrote:
>>
>>>Hi list,
>>>
>>>When I was using 'silhouette' from the 'cluster' package to
>>>calculate clustering performances, R crashed. I traced the problem
>>>to the fact that my clustering labels only have 2's and 3's. when
>>>I replaced them with 1's and 2's, the problem was solved. Is the
>>>function purposely written in this way so when I have clustering
>>>labels, "2" and "3", for example, the function somehow takes the
>>>'missing' cluster "2" into account when it calculates silhouette
>>>widths?
>>>
>>>Thanks,
>>>
>>>....Tao
>>>
>>>##============================================
>>>## sorry about the long attachment
>>>
>>>>R.Version()
>>>$platform
>>>[1] "i386-pc-mingw32"
>>>
>>>$arch
>>>[1] "i386"
>>>
>>>$os
>>>[1] "mingw32"
>>>
>>>$system
>>>[1] "i386, mingw32"
>>>
>>>$status
>>>[1] ""
>>>
>>>$major
>>>[1] "2"
>>>
>>>$minor
>>>[1] "5.1"
>>>
>>>$year
>>>[1] "2007"
>>>
>>>$month
>>>[1] "06"
>>>
>>>$day
>>>[1] "27"
>>>
>>>$`svn rev`
>>>[1] "42083"
>>>
>>>$language
>>>[1] "R"
>>>
>>>$version.string
>>>[1] "R version 2.5.1 (2007-06-27)"
>>>
>>>>library(cluster)
>>>>cl1 ## clustering labels
>>> [1] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 2 2 2 2 2 2 2 2 2
>>>[30] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
>>>[59] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
>>>[88] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
>>>[117] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
>>>[146] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
>>>[175] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
>>>[204] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
>>>>x1 ## 1-d input vector
>>> [1] 1.5707963 1.5707963 1.5707963 1.5707963 1.5707963
>>> [6] 1.5707963 1.5707963 1.5707963 1.5707963 1.5707963
>>>[11] 1.5707963 1.5707963 1.5707963 1.5707963 1.5707963
>>>[16] 1.5707963 1.5707963 1.5707963 1.5707963 1.5707963
>>>[21] 1.0163758 0.7657763 0.7370084 0.6999689 0.7366476
>>>[26] 0.7883921 0.6925395 0.7729240 0.7202391 0.7910149
>>>[31] 0.7397698 0.7958092 0.6978596 0.7350255 0.7294362
>>>[36] 0.6125713 0.7174000 0.7413046 0.7044205 0.7568104
>>>[41] 0.7048469 0.7334515 0.7143170 0.7002311 0.7540981
>>>[46] 0.7627527 0.7712762 0.8193611 0.7801148 0.9061762
>>>[51] 0.8248195 0.7932630 0.7248037 0.7423547 0.6419314
>>>[56] 0.6001092 0.7572272 0.7631742 0.7085384 0.8710853
>>>[61] 0.6589563 0.7464943 0.7487340 0.7751280 0.7946542
>>>[66] 0.7666081 0.8508109 0.8314308 0.7442471 0.8006093
>>>[71] 0.7949156 0.7852447 0.7630048 0.7104764 0.6768218
>>>[76] 0.6806351 0.7255355 0.7431389 0.7523627 0.7670515
>>>[81] 0.8118214 0.7215615 0.8186164 0.6941610 0.8285453
>>>[86] 0.8395170 0.8088044 0.8182706 0.7550723 0.7948639
>>>[91] 0.7204830 0.7109068 0.7756949 0.6837856 0.7055604
>>>[96] 0.6126666 0.7201964 0.6849890 0.7779753 0.7845284
>>>[101] 0.9370788 0.8242935 0.6908860 0.6446151 0.7660386
>>>[106] 0.8141526 0.8111984 0.8624186 0.7865335 0.8213035
>>>[111] 0.8059171 0.6735751 0.7815353 0.6972508 0.6699396
>>>[116] 0.6293971 0.7475913 0.7700821 0.8258339 0.8096144
>>>[121] 0.7058171 0.7516635 0.7323909 0.7229136 0.8344846
>>>[126] 0.7205433 0.8287774 0.8322097 0.7767547 0.7402277
>>>[131] 0.7939879 0.7797308 0.7112453 0.7091554 0.6417382
>>>[136] 0.6369171 0.7059020 0.7496380 0.7298359 0.8202566
>>>[141] 0.7331830 0.7344492 0.8316894 0.7323979 0.7977615
>>>[146] 0.7841205 0.7587060 0.8056685 0.7895643 0.8140731
>>>[151] 0.7890221 0.8016008 0.7381577 0.6936453 0.7133525
>>>[156] 0.7121459 0.6851448 0.7946275 0.8077618 0.7899059
>>>[161] 0.7128826 0.7546289 0.7042451 0.6606403 0.7525233
>>>[166] 0.7527548 0.8098887 0.8254190 0.7873064 0.8139340
>>>[171] 0.7903462 0.8377651 0.6709983 0.7423632 0.6632082
>>>[176] 0.5676717 0.6925125 0.7077083 0.7488877 0.7630604
>>>[181] 0.7843001 0.7524471 0.6871823 0.7144443 0.7692206
>>>[186] 0.8690710 0.9282786 0.7844991 0.7094671 0.7578409
>>>[191] 0.8026643 0.7759241 0.6997376 0.6167209 0.6682289
>>>[196] 0.6572018 0.7615807 0.7415752 0.7659161 0.7040360
>>>[201] 0.6874460 0.7052109 0.8290970 0.6915149 0.7173107
>>>[206] 0.7848961 0.7943846 0.8437946 0.7817344 0.8867006
>>>[211] 0.7575857 0.8390473 0.7382348 0.6789859 0.7129010
>>>[216] 0.6938173 0.7384170 0.6747648 0.7203337 0.7278963
>>>> silhouette(cl1, dist(x1)^2) ##### CRASHED! ######
>>>>silhouette(ifelse(cl1==3,2,1), dist(x1)^2)
>>> cluster neighbor sil_width
>>> [1,] 2 1 1.0000000
>>> [2,] 2 1 1.0000000
>>> [3,] 2 1 1.0000000
>>> [4,] 2 1 1.0000000
>>> [5,] 2 1 1.0000000
>>> [6,] 2 1 1.0000000
>>> [7,] 2 1 1.0000000
>>> [8,] 2 1 1.0000000
>>> [9,] 2 1 1.0000000
>>>[10,] 2 1 1.0000000
>>>[11,] 2 1 1.0000000
>>>[12,] 2 1 1.0000000
>>>[13,] 2 1 1.0000000
>>>[14,] 2 1 1.0000000
>>>[15,] 2 1 1.0000000
>>>[16,] 2 1 1.0000000
>>>[17,] 2 1 1.0000000
>>>[18,] 2 1 1.0000000
>>>[19,] 2 1 1.0000000
>>>[20,] 2 1 1.0000000
>>>[21,] 1 2 0.7592857
>>>[22,] 1 2 0.9934455
>>>[23,] 1 2 0.9937880
>>>[24,] 1 2 0.9909544
>>>[25,] 1 2 0.9937769
>>>[26,] 1 2 0.9912442
>>>[27,] 1 2 0.9900156
>>>[28,] 1 2 0.9929499
>>>[29,] 1 2 0.9929125
>>>[30,] 1 2 0.9908637
>>>[31,] 1 2 0.9938610
>>>[32,] 1 2 0.9900958
>>>[33,] 1 2 0.9906993
>>>[34,] 1 2 0.9937227
>>>[35,] 1 2 0.9934823
>>>[36,] 1 2 0.9740954
>>>[37,] 1 2 0.9926948
>>>[38,] 1 2 0.9938924
>>>[39,] 1 2 0.9914623
>>>[40,] 1 2 0.9938250
>>>[41,] 1 2 0.9915088
>>>[42,] 1 2 0.9936633
>>>[43,] 1 2 0.9924367
>>>[44,] 1 2 0.9909855
>>>[45,] 1 2 0.9938891
>>>[46,] 1 2 0.9936028
>>>[47,] 1 2 0.9930799
>>>[48,] 1 2 0.9848568
>>>[49,] 1 2 0.9922685
>>>[50,] 1 2 0.9371272
>>>[51,] 1 2 0.9832647
>>>[52,] 1 2 0.9905154
>>>[53,] 1 2 0.9932217
>>>[54,] 1 2 0.9939101
>>>[55,] 1 2 0.9810071
>>>[56,] 1 2 0.9708675
>>>[57,] 1 2 0.9938131
>>>[58,] 1 2 0.9935827
>>>[59,] 1 2 0.9918943
>>>[60,] 1 2 0.9628701
>>>[61,] 1 2 0.9844965
>>>[62,] 1 2 0.9939491
>>>[63,] 1 2 0.9939495
>>>[64,] 1 2 0.9927610
>>>[65,] 1 2 0.9902895
>>>[66,] 1 2 0.9933968
>>>[67,] 1 2 0.9734481
>>>[68,] 1 2 0.9811285
>>>[69,] 1 2 0.9939341
>>>[70,] 1 2 0.9892304
>>>[71,] 1 2 0.9902461
>>>[72,] 1 2 0.9916649
>>>[73,] 1 2 0.9935909
>>>[74,] 1 2 0.9920846
>>>[75,] 1 2 0.9876779
>>>[76,] 1 2 0.9882868
>>>[77,] 1 2 0.9932665
>>>[78,] 1 2 0.9939213
>>>[79,] 1 2 0.9939182
>>>[80,] 1 2 0.9933699
>>>[81,] 1 2 0.9868129
>>>[82,] 1 2 0.9930074
>>>[83,] 1 2 0.9850624
>>>[84,] 1 2 0.9902300
>>>[85,] 1 2 0.9820895
>>>[86,] 1 2 0.9781906
>>>[87,] 1 2 0.9875197
>>>[88,] 1 2 0.9851569
>>>[89,] 1 2 0.9938688
>>>[90,] 1 2 0.9902547
>>>[91,] 1 2 0.9929304
>>>[92,] 1 2 0.9921257
>>>[93,] 1 2 0.9927096
>>>[94,] 1 2 0.9887702
>>>[95,] 1 2 0.9915856
>>>[96,] 1 2 0.9741195
>>>[97,] 1 2 0.9929094
>>>[98,] 1 2 0.9889500
>>>[99,] 1 2 0.9924910
>>>[100,] 1 2 0.9917552
>>>[101,] 1 2 0.9047049
>>>[102,] 1 2 0.9834247
>>>[103,] 1 2 0.9897916
>>>[104,] 1 2 0.9815845
>>>[105,] 1 2 0.9934304
>>>[106,] 1 2 0.9862375
>>>[107,] 1 2 0.9869624
>>>[108,] 1 2 0.9677353
>>>[109,] 1 2 0.9914973
>>>[110,] 1 2 0.9843076
>>>[111,] 1 2 0.9881568
>>>[112,] 1 2 0.9871393
>>>[113,] 1 2 0.9921114
>>>[114,] 1 2 0.9906240
>>>[115,] 1 2 0.9865148
>>>[116,] 1 2 0.9781846
>>>[117,] 1 2 0.9939511
>>>[118,] 1 2 0.9931681
>>>[119,] 1 2 0.9829519
>>>[120,] 1 2 0.9873341
>>>[121,] 1 2 0.9916130
>>>[122,] 1 2 0.9939273
>>>[123,] 1 2 0.9936196
>>>[124,] 1 2 0.9930999
>>>[125,] 1 2 0.9800620
>>>[126,] 1 2 0.9929347
>>>[127,] 1 2 0.9820138
>>>[128,] 1 2 0.9808614
>>>[129,] 1 2 0.9926103
>>>[130,] 1 2 0.9938711
>>>[131,] 1 2 0.9903987
>>>[132,] 1 2 0.9923097
>>>[133,] 1 2 0.9921578
>>>[134,] 1 2 0.9919558
>>>[135,] 1 2 0.9809652
>>>[136,] 1 2 0.9799023
>>>[137,] 1 2 0.9916220
>>>[138,] 1 2 0.9939454
>>>[139,] 1 2 0.9935022
>>>[140,] 1 2 0.9846059
>>>[141,] 1 2 0.9936526
>>>[142,] 1 2 0.9937017
>>>[143,] 1 2 0.9810402
>>>[144,] 1 2 0.9936199
>>>[145,] 1 2 0.9897557
>>>[146,] 1 2 0.9918058
>>>[147,] 1 2 0.9937665
>>>[148,] 1 2 0.9882099
>>>[149,] 1 2 0.9910776
>>>[150,] 1 2 0.9862575
>>>[151,] 1 2 0.9911553
>>>[152,] 1 2 0.9890393
>>>[153,] 1 2 0.9938209
>>>[154,] 1 2 0.9901624
>>>[155,] 1 2 0.9923515
>>>[156,] 1 2 0.9922418
>>>[157,] 1 2 0.9889731
>>>[158,] 1 2 0.9902939
>>>[159,] 1 2 0.9877542
>>>[160,] 1 2 0.9910280
>>>[161,] 1 2 0.9923092
>>>[162,] 1 2 0.9938784
>>>[163,] 1 2 0.9914431
>>>[164,] 1 2 0.9848184
>>>[165,] 1 2 0.9939159
>>>[166,] 1 2 0.9939125
>>>[167,] 1 2 0.9872706
>>>[168,] 1 2 0.9830805
>>>[169,] 1 2 0.9913937
>>>[170,] 1 2 0.9862925
>>>[171,] 1 2 0.9909633
>>>[172,] 1 2 0.9788584
>>>[173,] 1 2 0.9866989
>>>[174,] 1 2 0.9939102
>>>[175,] 1 2 0.9853007
>>>[176,] 1 2 0.9617883
>>>[177,] 1 2 0.9900120
>>>[178,] 1 2 0.9918102
>>>[179,] 1 2 0.9939489
>>>[180,] 1 2 0.9935882
>>>[181,] 1 2 0.9917836
>>>[182,] 1 2 0.9939170
>>>[183,] 1 2 0.9892708
>>>[184,] 1 2 0.9924478
>>>[185,] 1 2 0.9932287
>>>[186,] 1 2 0.9640487
>>>[187,] 1 2 0.9150126
>>>[188,] 1 2 0.9917589
>>>[189,] 1 2 0.9919865
>>>[190,] 1 2 0.9937946
>>>[191,] 1 2 0.9888295
>>>[192,] 1 2 0.9926884
>>>[193,] 1 2 0.9909269
>>>[194,] 1 2 0.9751339
>>>[195,] 1 2 0.9862132
>>>[196,] 1 2 0.9841566
>>>[197,] 1 2 0.9936557
>>>[198,] 1 2 0.9938973
>>>[199,] 1 2 0.9934375
>>>[200,] 1 2 0.9914201
>>>[201,] 1 2 0.9893087
>>>[202,] 1 2 0.9915481
>>>[203,] 1 2 0.9819092
>>>[204,] 1 2 0.9898774
>>>[205,] 1 2 0.9926876
>>>[206,] 1 2 0.9917091
>>>[207,] 1 2 0.9903339
>>>[208,] 1 2 0.9764847
>>>[209,] 1 2 0.9920887
>>>[210,] 1 2 0.9526866
>>>[211,] 1 2 0.9938025
>>>[212,] 1 2 0.9783714
>>>[213,] 1 2 0.9938230
>>>[214,] 1 2 0.9880267
>>>[215,] 1 2 0.9923108
>>>[216,] 1 2 0.9901850
>>>[217,] 1 2 0.9938279
>>>[218,] 1 2 0.9873388
>>>[219,] 1 2 0.9929195
>>>[220,] 1 2 0.9934017
>>>attr(,"Ordered")
>>>[1] FALSE
>>>attr(,"call")
>>>silhouette.default(x = ifelse(cl1 == 3, 2, 1), dist = dist(x1)^2)
>>>attr(,"class")
>>>[1] "silhouette"
>>>
>>>## other examples
>>>>set.seed(1234)
>>>>cl.tmp <- rep(2:3, each=5)
>>>>x.tmp <- c(rep(-1,5), abs(rnorm(5)+3))
>>>>silhouette(cl.tmp, dist(x.tmp))
>>> cluster neighbor sil_width
>>>[1,] 2 1 NaN
>>>[2,] 2 1 NaN
>>>[3,] 2 1 NaN
>>>[4,] 2 1 NaN
>>>[5,] 2 1 NaN
>>>[6,] 3 2 -0.5736515
>>>[7,] 3 2 -0.1557143
>>>[8,] 3 2 -0.2922523
>>>[9,] 3 2 -0.8340174
>>>[10,] 3 2 -0.1511875
>>>attr(,"Ordered")
>>>[1] FALSE
>>>attr(,"call")
>>>silhouette.default(x = cl.tmp, dist = dist(x.tmp))
>>>attr(,"class")
>>>[1] "silhouette"
>>>>silhouette(ifelse(cl.tmp==2,1,2), dist(x.tmp))
>>> cluster neighbor sil_width
>>>[1,] 1 2 1.0000000
>>>[2,] 1 2 1.0000000
>>>[3,] 1 2 1.0000000
>>>[4,] 1 2 1.0000000
>>>[5,] 1 2 1.0000000
>>>[6,] 2 1 0.4136253
>>>[7,] 2 1 0.7038917
>>>[8,] 2 1 0.6467668
>>>[9,] 2 1 -0.3360695
>>>[10,] 2 1 0.7054709
>>>attr(,"Ordered")
>>>[1] FALSE
>>>attr(,"call")
>>>silhouette.default(x = ifelse(cl.tmp == 2, 1, 2), dist = dist(x.tmp))
>>>attr(,"class")
>>>[1] "silhouette"
>>>>silhouette(ifelse(cl.tmp==2,1,3), dist(x.tmp))
>>> cluster neighbor sil_width
>>>[1,] 1 2 NaN
>>>[2,] 1 2 NaN
>>>[3,] 1 2 NaN
>>>[4,] 1 2 NaN
>>>[5,] 1 2 NaN
>>>[6,] 3 1 -0.7694686
>>>[7,] 3 1 -0.8167313
>>>[8,] 3 1 -0.6054665
>>>[9,] 3 1 -0.9037412
>>>[10,] 3 1 0.1875360
>>>attr(,"Ordered")
>>>[1] FALSE
>>>attr(,"call")
>>>silhouette.default(x = ifelse(cl.tmp == 2, 1, 3), dist = dist(x.tmp))
>>>attr(,"class")
>>>[1] "silhouette"
>>>
>>>_________________________________________________________________
>>>
>>>It?s free. http://im.live.com/messenger/im/home/?source=TAGHM
>>>
>>><mime-attachment.txt>
>>
>>______________________________________________
>>R-help at r-project.org mailing list
>>https://stat.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide
>>http://www.R-project.org/posting-guide.html
>>and provide commented, minimal, self-contained, reproducible code.
>>
>
>--
>Brian D. Ripley, ripley at stats.ox.ac.uk
>Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
>University of Oxford, Tel: +44 1865 272861 (self)
>1 South Parks Road, +44 1865 272866 (PA)
>Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help
mailing list