## 2014年6月12日 星期四

### [ML in DH] Clustering 之 Basic Agglomerative Clustering

Preface:

• Input: D = {d1, d2, ..., dn}
• Output: DE // dendrogram as a set of clustering results in each step

Algorithm:

Example:

Smallest distance between an element in one cluster and an element in the other. Mathematically, the linkage function – the distance D(X,Y) between clusters X and Y – is described by the expression:

Largest distance between an element in one cluster and an element in the other. Mathematically, the complete linkage function:

Average distance between an element in one cluster and an element in the other. Mathematically, the average linkage function:

* Centroid (center of cluster)
If clusters have a representative centroid, then the cluster distance is defined to be the distance between their centroids.

* Medoid (a centrally located point in the cluster)
Distance between the cluster medoids. Medoids are representative objects of a data set or a cluster with a data set whose average dissimilarity to all the objects in the cluster is minimal. Medoids are similar in concept to means or centroids, but medoids are always members of the data set.

Toolkit Usage:

1. // 1) Prepare data
2. def datas = [[11], [22], [31], [24],
3.              [11,4], [12,4], [13,4], [14,4],
4.              [2,14], [3,13], [3,14],
5.              [12,13], [13,12], [13,13], [14,14]]
6.
7. // 2) Run Basic Agglomerative Clustering algorithm
8. printf "\t[Info] Running Basic Agglomerative Clustering...\n"
9. BAgg ba = new BAgg()
10. def rt = ba.run(datas,                  /*Input data vector list*/
11.                 4,                      /*How many clusters being output*/
13.                 )
14. printf "\n"
15.
16. // 3) Output Cluster Result
17. printf "\t[Info] Output Clustering Result:\n"
18. def K = rt[0]
19. def I = rt[1]
20. K.eachWithIndex{ v, p->
21.     printf "\t\tC%d: [%s]\n", p, I[p].sort().join(",")
22.     v.each{
23.         printf "\t\t\t[%s]\n", it.join(",")
24.     }
25. }

## 關於我自己

Where there is a will, there is a way!