2015年11月29日 星期日

[ Algorithm ] Radix Sorts - LSD Radix Sort

Source From Here 
Introduction 
Least-significant-digit-first string sort (LSD Sort) is a fast stable sorting algorithm which can be used to sort keys in integer representation order. Keys may be a string of characters, or numerical digits in a given 'radix'. The processing of the keys begins at the least significant digit (i.e., the rightmost digit), and proceeds to the most significant digit (i.e., the leftmost digit). The sequence in which digits are processed by an LSD radix sort is the opposite of the sequence in which digits are processed by a most significant digit (MSD) radix sort. An LSD radix sort operates in O(nk) time, where n is the number of keys, and k is the average key length. 

Key-indexed Counting 
So far we have known below sorting algorithms which have frequency of operations = key compares. 



which has Lower bound. ~ N lg N compares required by any compare-based algorithm. Can we do better (despite the lower bound)? Yes, if we don't depend on key compares. Before we talk about Key-indexed counting, we have below assumption and implication: 

Assumption. Keys are integers between 0 and R - 1.
Implication. Can use key as an array index.
Remark. Keys may have associated data.

Below is a sorting example: 



Key-indexed counting demo 
Goal. Sort an array a[] of N integers between 0 and R - 1. The algorithm includes below steps: 

* Count frequencies of each letter using key as index.
* Compute frequency cumulates which specify destinations.
* Access cumulates using key as index to move items.
* Copy back into original array.

Below is the pseudo code and the testing data used in this demonstration: 



Firstly we count frequencies of each letter using key as index: 


Then we compute frequency cumulatively which specify destinations which help us in moving element from original list to sorted list: 


Next we start to move element from original list a to temple list aux


Finally, replace original list with sorted list: 


Analysis 
Proposition. Key-indexed counting uses ~ 11 N + 4 R array accesses to sort N items whose keys are integers between 0 and R - 1. Key-indexed counting uses extra space proportional to N + R and it is a stable sorting algorithm. 

LSD string (radix) sort 
Here explain how to use LSD algorithm to sorting string: 

* Consider characters from right to left.
* Stably sort using dth character as the key (using key-indexed counting).

Below is a simple example: 



Java Implementation 


What if strings do not have same length? Create a customized charAt API to handle out-of-boundary access. For example: 

  1. /** 
  2. * Return character as Integer of input String at specific position 
  3.  
  4. * @param val: String value 
  5. * @param d: Position of character 
  6.  
  7. * @return character at specific position of input string as integer. 
  8. */  
  9. int charAt(String val, d)  
  10. {  
  11.     if( d >= val.length()) return 0  
  12.     else return (int)val[d]  
  13. }  
Groovy Implementation 
The groovy implementation is from Git hub project AlgorithmCA
- LSDSortStr.groovy 
  1. package sort  
  2.   
  3. import java.util.List;  
  4.   
  5. class LSDSortStr extends SortAlgImp {  
  6.     int R = 10  
  7.        
  8.     /** 
  9.      * Return character as Integer of input String at specific position 
  10.      *  
  11.      * @param val: String value 
  12.      * @param d: Position of character 
  13.      *  
  14.      * @return character at specific position of input string as integer. 
  15.      */  
  16.     int charAt(String val, d)  
  17.     {  
  18.         if( d >= val.length()) return 0  
  19.         else return (int)val[d]  
  20.     }  
  21.       
  22.     void sort(List list, int mlen)  
  23.     {  
  24.         int N = list.size()  
  25.         def aux = []; N.times{aux[it]=null}  
  26.         def count = new int[R+1] as int[]; (R+1).times{count[it]=0}  
  27.         mlen.times{ d->  
  28.             d = (mlen-d-1)  
  29.             // Count frequencies of each letter using key as index.  
  30.             for(String str:list) {                
  31.                 count[charAt(str, d)+1]++                 
  32.             }  
  33.               
  34.             // Compute frequency cumulates which specify destinations.  
  35.             int p=0  
  36.             R.times{ r->               
  37.                 count[r+1]+=count[r]                  
  38.             }  
  39.               
  40.             // Access cumulates using key as index to move items.  
  41.             N.times{ n->               
  42.                 aux[count[charAt(list[n], d)]++] = list[n]  
  43.             }  
  44.               
  45.             // Copy back into original array.  
  46.             N.times{ n->  
  47.                 list[n] = aux[n]  
  48.             }  
  49.               
  50.             // Clean  
  51.             (R+1).times{count[it]=0}  
  52.         }  
  53.     }  
  54.   
  55.     @Override  
  56.     public void sort(List list) {  
  57.         int mlen=0  
  58.           
  59.         // Find the longest length among all strings -> mlen  
  60.         list.each{if(it.length()>mlen) mlen=it.length()}  
  61.           
  62.         // Start sorting with LSD  
  63.         sort(list, mlen)      
  64.     }  
  65.   
  66.     static void main(args)  
  67.     {  
  68.         def data = ["John""Joseph""Ken""Mary""Peter""johnson",  
  69.                     "Tim""Tom""Anderson""Andy""Bob""Forry",  
  70.                     "Carter""Derek"]  
  71.         // Java Character has 16 bits (2 bytes) = 2^16 = 0~65535   
  72.         LSDSortStr lsdSort = new LSDSortStr(R:65536)  
  73.         lsdSort.test(data)  
  74.     }  
  75. }  
Summary of the performance of sorting algorithms 



Supplement 
[ 資料結構 小學堂 ] 排序 : 基數排序法

沒有留言:

張貼留言

[Git 常見問題] error: The following untracked working tree files would be overwritten by merge

  Source From  Here 方案1: // x -----删除忽略文件已经对 git 来说不识别的文件 // d -----删除未被添加到 git 的路径中的文件 // f -----强制运行 #   git clean -d -fx 方案2: 今天在服务器上  gi...