## 2014年6月11日 星期三

### [ML in DH] Clustering 之 k-Means Clustering Algorithm

Preface:

k-Means Clustering Algorithm:
k-means clustering is a method of vector quantization, originally from signal processing, that is popular for cluster analysis in data mining. k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells.

• Input: D (a set of documents), k (# of desired clusters)
• Output: C (set of clusters)
• Algorithm:

Example: k-Means Clustering

The 1'st Iteration
* Randomly choose (1, 1), (2, 2) as the single points of initial clusters #1, #2
* Assign points in P to the closest cluster:

* Calculate new cluster means:

The 2'nd Iteration
* Re-assign points in P to the closest cluster

* Calculate new means:

The 3'nd Iteration
* Re-assign points in P to the closest cluster

* Calculate new means:

* Meet convergence criteria ﬂ DONE.
Since the cluster means (or clusters) do not change in this iteration. Cluster means before and after the 3'nd iteration: {(1.5, 1.5), (3.5, 4.5)}

Toolkit Usage:

1. // 1) Prepare data
2. def datas = [[11], [22], [35], [44]]
3.
4. // 2) Run k-Means clustering algorithm
5. KMeans km = new KMeans()
6. JTuple rt = km.kMeans(datas, 2, [[11], [22]]);
7.
8. // 3) Print clustering result
9. def clusterAssment = rt.get(0)  // [Cluster Num, Distance]
10. def centroids = rt.get(1)       // [key:ClusterNum,Value:Centroid]
11. printf "\t[Info] Cluster Result:\n"
12. clusterAssment.eachWithIndex{ v, i->
13.     printf "\t\t%s is assigned to Cluster%d (D=%.02f; C=%s)...\n", datas[i],
14.                                                                    v[0],
15.                                                                    v[1],
16.                                                                    centroids[v[0]]
17. }

[Info] KMeans with K=2...
[Info] Randomly pickup 2 centroid at:
[1, 1]
[2, 2]
***** Round 1 *****
[1, 1] to Centroid0 has dist=0.000...
[1, 1] to Centroid1 has dist=1.414...
Assign [1, 1] to Centroid0...
[2, 2] to Centroid0 has dist=1.414...
[2, 2] to Centroid1 has dist=0.000...
Assign [2, 2] to Centroid1...
[3, 5] to Centroid0 has dist=4.472...
[3, 5] to Centroid1 has dist=3.162...
Assign [3, 5] to Centroid1...
[4, 4] to Centroid0 has dist=4.243...
[4, 4] to Centroid1 has dist=2.828...
Assign [4, 4] to Centroid1...
***** Recalculate centroids *****
...
[Info] Done! 139 ms
[Info] Cluster Result:
[1, 1] is assigned to Cluster0 (Dist=0.71; Centroid=[1.5, 1.5])...
[2, 2] is assigned to Cluster0 (Dist=0.71; Centroid=[1.5, 1.5])...
[3, 5] is assigned to Cluster1 (Dist=0.71; Centroid=[3.5, 4.5])...
[4, 4] is assigned to Cluster1 (Dist=0.71; Centroid=[3.5, 4.5])...

Supplement:
Unsupervised learning : The k-means clustering algorithm (1)

Unsupervised learning : The k-means clustering algorithm (2)

This message was edited 29 times. Last update was at 12/06/2014 10:02:26

### [Linux 常見問題] What's the best way to send a signal to all members of a process group?

Source From  Here   Question   I want to  kill a whole process tree.  What is the best way to do this using any common scripting languages? ...