Introduction
Least-significant-digit-first string sort (LSD Sort) is a fast stable sorting algorithm which can be used to sort keys in integer representation order. Keys may be a string of characters, or numerical digits in a given 'radix'. The processing of the keys begins at the least significant digit (i.e., the rightmost digit), and proceeds to the most significant digit (i.e., the leftmost digit). The sequence in which digits are processed by an LSD radix sort is the opposite of the sequence in which digits are processed by a most significant digit (MSD) radix sort. An LSD radix sort operates in O(nk) time, where n is the number of keys, and k is the average key length.
Key-indexed Counting
So far we have known below sorting algorithms which have frequency of operations = key compares.
which has Lower bound. ~ N lg N compares required by any compare-based algorithm. Can we do better (despite the lower bound)? Yes, if we don't depend on key compares. Before we talk about Key-indexed counting, we have below assumption and implication:
Below is a sorting example:
Key-indexed counting demo
Goal. Sort an array a[] of N integers between 0 and R - 1. The algorithm includes below steps:
Below is the pseudo code and the testing data used in this demonstration:
Firstly we count frequencies of each letter using key as index:
Then we compute frequency cumulatively which specify destinations which help us in moving element from original list to sorted list:
Next we start to move element from original list a to temple list aux:
Finally, replace original list with sorted list:
Analysis
Proposition. Key-indexed counting uses ~ 11 N + 4 R array accesses to sort N items whose keys are integers between 0 and R - 1. Key-indexed counting uses extra space proportional to N + R and it is a stable sorting algorithm.
LSD string (radix) sort
Here explain how to use LSD algorithm to sorting string:
Below is a simple example:
Java Implementation
What if strings do not have same length? Create a customized charAt API to handle out-of-boundary access. For example:
- /**
- * Return character as Integer of input String at specific position
- *
- * @param val: String value
- * @param d: Position of character
- *
- * @return character at specific position of input string as integer.
- */
- int charAt(String val, d)
- {
- if( d >= val.length()) return 0
- else return (int)val[d]
- }
The groovy implementation is from Git hub project AlgorithmCA:
- LSDSortStr.groovy
- package sort
- import java.util.List;
- class LSDSortStr
{ - int R = 10
- /**
- * Return character as Integer of input String at specific position
- *
- * @param val: String value
- * @param d: Position of character
- *
- * @return character at specific position of input string as integer.
- */
- int charAt(String val, d)
- {
- if( d >= val.length()) return 0
- else return (int)val[d]
- }
- void sort(List
list, int mlen) - {
- int N = list.size()
- def aux = []; N.times{aux[it]=null}
- def count = new int[R+1] as int[]; (R+1).times{count[it]=0}
- mlen.times{ d->
- d = (mlen-d-1)
- // Count frequencies of each letter using key as index.
- for(String str:list) {
- count[charAt(str, d)+1]++
- }
- // Compute frequency cumulates which specify destinations.
- int p=0
- R.times{ r->
- count[r+1]+=count[r]
- }
- // Access cumulates using key as index to move items.
- N.times{ n->
- aux[count[charAt(list[n], d)]++] = list[n]
- }
- // Copy back into original array.
- N.times{ n->
- list[n] = aux[n]
- }
- // Clean
- (R+1).times{count[it]=0}
- }
- }
- @Override
- public void sort(List
list) { - int mlen=0
- // Find the longest length among all strings -> mlen
- list.each{if(it.length()>mlen) mlen=it.length()}
- // Start sorting with LSD
- sort(list, mlen)
- }
- static void main(args)
- {
- def data = ["John", "Joseph", "Ken", "Mary", "Peter", "johnson",
- "Tim", "Tom", "Anderson", "Andy", "Bob", "Forry",
- "Carter", "Derek"]
- // Java Character has 16 bits (2 bytes) = 2^16 = 0~65535
- LSDSortStr lsdSort = new LSDSortStr(R:65536)
- lsdSort.test(data)
- }
- }
Supplement
* [ 資料結構 小學堂 ] 排序 : 基數排序法