ROCK clustering example

Assignment Clustering ROCK Clustering Algorithm 

Input:

A set S of data points



Number of k clusters to be found



The similarity threshold

 

Output: Groups of clustered data The ROCK alorithm alorithm is di!ided di!ided into three three ma"or parts: #$ Draw a random sample from the data set: %$ Perform a hierarchical hierarchical agglomerative clustering algorithm &$ Label data on disk



in our case' (e do not deal (ith a !ery hue data set$ So' (e (ill consider the (hole data in the process of formin clusters' i$e$ (e skip step# and step&

#$ Draw a random sample from the data set:  

Samplin is used to ensure scalability to !ery lare data sets The initial sample sample is used to form clusters' clusters' then the the remainin remainin data on disk disk is assined to these clusters

in our case' (e (ill consider the (hole data in the process of formin clusters$ %$ Perform a hierarchical agglomerative clustering clustering algorithm : 

ROCK performs the follo(in steps (hich are common to all hierarchical alomerati!e clusterin alorithms' alorithms' but (ith di)erent de*nition to the similarity measures: a$ places places each sinl sinle e data point point into into a separate separate cluste clusterr b$ compute compute the similar similarity ity measure measure for for all pairs pairs of clusters clusters c$ mere the t(o t(o clusters clusters (ith (ith the hihest similarity +oodness measure, d$ -erify erify a stop condit condition$ ion$ If If it is not met met then o to to step b

&$ Label data on disk: 

.inally' the remainin data points in the disk are assined to the enerated clusters$



This is done by selectin a random sample /i from each cluster Ci' then (e assin each point p to the cluster for (hich it has the stronest linkae (ith /i$



As (e said' (e (ill consider the (hole data in the process of formin clusters$



Computation of links:

#$ usin the similarity threshold θ' (e can con!ert the similarity matri0 into an ad"acency matri0 +A, %$ Then (e obtain a matri0 indicatin the number of links by calculatin +A 0 A , ' i$e$' by multiplyin the ad"acency matri0 A (ith itself &$ Suppose we have four verses contains some subjects , as follows: P1= judgment, faith, pra!er, fair" #ilk, butter, bread, eggs" P$= fasting, faith, pra!er" P%= fair, fasting, faith"

hone!, butter, bread" eggs, hone!, butter"

P&= fasting, pra!er, pilgrimage" hone!, bread, 'am" 1$ the similarit! threshold = ()%, and number of re*uired cluster is $) 2$ using 'accard coe+cient as a similarit! measure, we obtain the following similarit! table :



Since (e ha!e a similarity threshold e3ual to 4$&' then (e deri!e the ad"acency table: 



5y multiplyin the ad"acency table (ith itself' (e deri!e the follo(in table (hich sho(s the number of links +or common neihbors, : 



(e compute the oodness measure for all ad"acent points 'assumin that f θ - =1.θ / 10θ

g ( P i , P j ) =

link [ P i , P j ] ( n + m)1

+

2 f (θ )

−

n1

+

2 f (θ )

−

m1

+

2 f (θ )



6e obtain the follo(in table

:



+7#'7%,8 & 9 +##,%$4; < #%$4; < #%$4; 8 #$&=



+7#'71,8 # 9 +##,%$4; < #%$4; < #%$4; 8 4$1;



6e ha!e an e3ual oodness measure for merin ++7#'7%,' +7%'7#,' +7&'7#,,



No(' (e start the hierarchical alorithm by merin' say 7# and 7%$



A ne( cluster +let>s call it C+7#'7%,, is formed$



It should be noted that for some other hierarchical clusterin techni3ues' (e (ill not start the clusterin process by merin 7# and 7%' since Sim+7#'7%, 8 4$1'(hich is not the hihest$ 5ut' ROCK uses the number of links as the similarity measure rather than distance$



No(' after merin 7# and 7%' (e ha!e only three clusters$ The follo(in table sho(s the number of common neihbors for these clusters:



etc$





Then (e can obtain the follo(in oodness measures for all ad"acent clusters: 



It should be 7&'718 4$12



++7#'7%,'7&,8 ; 9 +%#,%$4; < %%$4; < #%$4; 8 #$&1 or #$&#



++7#'7%,'71,8 & 9 +%#,%$4; < %%$4; < #%$4; 8 4$;= or 4$;;



Since the number of re3uired clusters is %' then (e *nish the clusterin alorithm by merin C+7#'7%, and 7&' obtainin a ne( cluster C+7#'7%'7&, (hich contains ?7#'7%'7&@ lea!in 71 alone in a separate cluster$

ROCK clustering example

Recommend Documents