內容來自課程 這裡
Preface:
給定一些資料點,利用它們之間的距離關係(或相似程度),將類似的點聚集在一起(成為一群)
由於並不需額外指定「每個點屬於哪一類」,所以 分群 (Clustering) 的的方法屬於「非監督式學習」(unsupervised learning) 法:
k-Means Clustering Algorithm:
• Input: D (a set of documents), k (# of desired clusters)
• Output: C (set of clusters)
• Algorithm:
Example: k-Means Clustering
考慮有 Input: P = {(1, 1), (2, 2), (3, 5), (4, 4)}, 並設定 k=2:
The 1'st Iteration
* Randomly choose (1, 1), (2, 2) as the single points of initial clusters #1, #2
* Assign points in P to the closest cluster:
* Calculate new cluster means:
The 2'nd Iteration
* Re-assign points in P to the closest cluster
* Calculate new means:
The 3'nd Iteration
* Re-assign points in P to the closest cluster
* Calculate new means:
* Meet convergence criteria fl DONE.
Toolkit Usage:
這個演算法的實作可以使用工具 "KMeans.groovy" 進行 k-Means Clustering. 使用方法如下:
執行結果如下:
Supplement:
* Unsupervised learning : The k-means clustering algorithm (1)
* Unsupervised learning : The k-means clustering algorithm (2)
給定一些資料點,利用它們之間的距離關係(或相似程度),將類似的點聚集在一起(成為一群)
由於並不需額外指定「每個點屬於哪一類」,所以 分群 (Clustering) 的的方法屬於「非監督式學習」(unsupervised learning) 法:
k-Means Clustering Algorithm:
• Input: D (a set of documents), k (# of desired clusters)
• Output: C (set of clusters)
• Algorithm:
Example: k-Means Clustering
考慮有 Input: P = {(1, 1), (2, 2), (3, 5), (4, 4)}, 並設定 k=2:
The 1'st Iteration
* Randomly choose (1, 1), (2, 2) as the single points of initial clusters #1, #2
* Assign points in P to the closest cluster:
* Calculate new cluster means:
The 2'nd Iteration
* Re-assign points in P to the closest cluster
* Calculate new means:
The 3'nd Iteration
* Re-assign points in P to the closest cluster
* Calculate new means:
* Meet convergence criteria fl DONE.
Toolkit Usage:
這個演算法的實作可以使用工具 "KMeans.groovy" 進行 k-Means Clustering. 使用方法如下:
- // 1) Prepare data
- def datas = [[1, 1], [2, 2], [3, 5], [4, 4]]
- // 2) Run k-Means clustering algorithm
- KMeans km = new KMeans()
- JTuple rt = km.kMeans(datas, 2, [[1, 1], [2, 2]]);
- // 3) Print clustering result
- def clusterAssment = rt.get(0) // [Cluster Num, Distance]
- def centroids = rt.get(1) // [key:ClusterNum,Value:Centroid]
- printf "\t[Info] Cluster Result:\n"
- clusterAssment.eachWithIndex{ v, i->
- printf "\t\t%s is assigned to Cluster%d (D=%.02f; C=%s)...\n", datas[i],
- v[0],
- v[1],
- centroids[v[0]]
- }
Supplement:
* Unsupervised learning : The k-means clustering algorithm (1)
* Unsupervised learning : The k-means clustering algorithm (2)
This message was edited 29 times. Last update was at 12/06/2014 10:02:26
沒有留言:
張貼留言