程式扎記: [ML in DH] Clustering 之 Basic Agglomerative Clustering

2014年6月12日星期四

[ML in DH] Clustering 之 Basic Agglomerative Clustering

內容來自課程這裡
Preface:
想法：一開始，將每個點都視為一個群集。接下來，則找出最相近的兩個群集，將其合併為一個新群集。重覆以上的合併動作，直到結束.(Example for Agglomerative Clustering):

• Input: D = {d1, d2, ..., dn}
• Output: DE // dendrogram as a set of clustering results in each step

Algorithm:

Example:
考慮 4 個點: P = {(1, 1), (2, 2), (3, 5), (4, 4)}, k=2. 在此，我們將兩個群集的距離，定義為它們重心的距離:

底下為套用 Basic Agglomerative Clustering 演算法的過程:

幾種 Clusters 的距離測度方式:
兩個點的距離很容易定義，但如何定義兩個群集的距離但如何定義兩個群集的距離但如何定義兩個群集的距離但如何定義兩個群集的距離？
* Single Link

Smallest distance between an element in one cluster and an element in the other. Mathematically, the linkage function – the distance D(X,Y) between clusters X and Y – is described by the expression:

* Complete Link

Largest distance between an element in one cluster and an element in the other. Mathematically, the complete linkage function:

* Average Link

Average distance between an element in one cluster and an element in the other. Mathematically, the average linkage function:

* Centroid (center of cluster)

If clusters have a representative centroid, then the cluster distance is defined to be the distance between their centroids.

* Medoid (a centrally located point in the cluster)

Distance between the cluster medoids. Medoids are representative objects of a data set or a cluster with a data set whose average dissimilarity to all the objects in the cluster is minimal. Medoids are similar in concept to means or centroids, but medoids are always members of the data set.

Toolkit Usage:
這個演算法可以使用工具 "BAgg.groovy" 來實現, 使用範例如下:

view plaincopy to clipboardprint?
// 1) Prepare data  
def datas = [[1, 1], [2, 2], [3, 1], [2, 4],  
             [11,4], [12,4], [13,4], [14,4],  
             [2,14], [3,13], [3,14],  
             [12,13], [13,12], [13,13], [14,14]]  
  
// 2) Run Basic Agglomerative Clustering algorithm  
printf "\t[Info] Running Basic Agglomerative Clustering...\n"  
BAgg ba = new BAgg()  
def rt = ba.run(datas,                  /*Input data vector list*/   
                4,                      /*How many clusters being output*/    
                ba.completeLinkDistClu  /*Choose Complete Linkage*/  
                )   
printf "\n"  
  
// 3) Output Cluster Result  
printf "\t[Info] Output Clustering Result:\n"  
def K = rt[0]  
def I = rt[1]  
K.eachWithIndex{ v, p->  
    printf "\t\tC%d: [%s]\n", p, I[p].sort().join(",")  
    v.each{  
        printf "\t\t\t[%s]\n", it.join(",")  
    }  
}  

執行結果:

程式扎記

標籤

2014年6月12日星期四

[ML in DH] Clustering 之 Basic Agglomerative Clustering

沒有留言:

張貼留言

[Git 常見問題] error: The following untracked working tree files would be overwritten by merge

檢舉濫用情形

學習筆記

標籤

2014年6月12日 星期四

[ML in DH] Clustering 之 Basic Agglomerative Clustering

沒有留言:

張貼留言

[Git 常見問題] error: The following untracked working tree files would be overwritten by merge

檢舉濫用情形

學習筆記

2014年6月12日星期四