在資訊理論中,兩個等長字元串之間的 漢明距離 是兩個字元串對應位置的不同字元的個數。換句話說,它就是將一個字元串變換成另外一個字元串所需要替換的字元個數. 例如:
漢明重量 是字元串相對於同樣長度的零字元串的漢明距離,也就是說,它是字元串中非零的元素個數:對於二進制字元串來說,就是 1 的個數,所以 11101 的漢明重量是 4.
當你的比較的兩個字串是不同長度時, 漢明距離 便無法使用, 這時你可以考慮另一種方法叫 "編輯距離". 底下的範例代碼提供你計算 "漢明距離", 當兩個字串長度不同時, 就會替換成 "編輯距離".
Implementation:
這邊使用類別 HammingUtil 上的靜態方法 getDistance(String str1, String str2) 來計算 "漢明距離"; 靜態方法 getWeight(int i) 用來計算 "漢明重量"; 而靜態方法 LevenshteinDistance(String s, String t) 則是用來計算 "編輯距離". 完整代碼如下:
- package flib.util.coding;
- /**
- * BD: Hamming Toolkit
- * Reference:
- * - Hamming distance
- * http://en.wikipedia.org/wiki/Hamming_distance
- * - Levenshtein distance
- * http://en.wikipedia.org/wiki/Levenshtein_distance
- * - Java實例9 - 漢明距離Hamming Distance
- * http://blog.csdn.net/kindterry/article/details/6581344
- *
- * @author John
- *
- */
- public class HammingUtil {
- /**
- * BD: Calculate Hamming Distance between str1 and str2.
- * @param str1: Input string1
- * @param str2: Input string2
- * @return Hamming distance
- */
- public static int getDistance(String str1, String str2)
- {
- if(str1.length()==str2.length())
- {
- int distance=0;
- for ( int i = 0 ; i < str1.length(); i++) {
- if (str1.charAt(i) != str2.charAt(i)) {
- distance++;
- }
- }
- return distance;
- }
- else if(str1.length()>str2.length())
- {
- if(str1.indexOf(str2)>=0) return str1.length()-str2.length();
- else
- {
- return LevenshteinDistance(str1, str2);
- }
- }
- else
- {
- if(str2.indexOf(str1)==0) return str2.length()-str1.length();
- else
- {
- return LevenshteinDistance(str1, str2);
- }
- }
- }
- public static int LevenshteinDistance(String s, String t)
- {
- if(s.length()==0) return t.length();
- else if(t.length()==0) return s.length();
- int cost = 0;
- if(s.charAt(0) != t.charAt(0)) cost = 1;
- return Math.min(LevenshteinDistance(s.substring(1), t)+1,
- Math.min(LevenshteinDistance(s, t.substring(1))+1,
- LevenshteinDistance(s.substring(1), t.substring(1))+cost));
- //return LevenshteinDistance(s.substring(1), t.substring(1))+cost;
- }
- public static int getWeight(int i)
- {
- int n;
- for (n = 0 ; i > 0 ; n++) {
- i &= (i - 1 );
- }
- return n;
- }
- public static void main(String[] args) {
- String str1 = "www.facebook.com" ;
- String str2 = "www.faecbok.com" ;
- int distance = HammingUtil.getDistance(str1, str2);
- System.out.println( "distance is " + distance);
- int weight = HammingUtil.getWeight( 255 );
- System.out.println( "weight is " + weight);
- }
- }
接著你可以使用下面範例代碼計算 "漢明距離", "漢明重量":
- public static void main(String[] args) {
- String str1 = "facebook" ;
- String str2 = "facboak" ;
- String str3 = "toned";
- String str4 = "roses";
- int distance1 = HammingUtil.getDistance(str1, str2);
- int distance2 = HammingUtil.getDistance(str3, str4);
- System.out.printf("\t[Info] Hamming distance between '%s' and '%s' is %d...\n",str1, str2, distance1);
- System.out.printf("\t[Info] Hamming distance between '%s' and '%s' is %d...\n",str3, str4, distance2);
- int strInBinary = 255;
- int weight = HammingUtil.getWeight( strInBinary );
- System.out.printf("\t[Info] Hamming weight of '%s' is %d...\n", Integer.toBinaryString(strInBinary), weight);
- }
Supplement:
* Java example of Hamming distance
* Java實例9 - 漢明距離Hamming Distance
沒有留言:
張貼留言