## 2011年1月19日 星期三

### [ Data Structures with Java ] Section 7.2 : Divide-and-Conquer Sort Algorithms (Merge Sort)

Preface :
Divide and conquer is a problem-solving technique that makes use of recursion. Here is the idea when applied to sorting algorithm :

* Divide : Partition the elements into two groups of roughly equal size. Recursively repeat the process by dividing each of two groups into pairs of groups of smaller size. At some point in the partitioning process, a group is so small (stopping condition) that it ca be sorted.
* Conquer : Combine each pair of sorted groups into one large sorted list.

The divide-and conquer strategy is the basic for the mergesort, which literally partitions the elements in half by splitting them at the midpoint. Recursive steps divide the list into two parts, each holding one-half of the elements, then into four parts, each holding one-forth of the elements, and so forth. The process continues as long as a sublist has two or more elements. When executing in reverse order, the recursive steps merge the sorted half-lists back into larger and larger ordered list until the algorithm has rebuilt the original sequence in ascending order.

MergeSort :
Before we talk about MsergeSort, let's discuss on how MsergeSort merge two ordered sublist into one larger ordered sublist.
The mergesort algorithm assumes the existence of an original array arr and a temporary arr tempArr, both containing n elements. The merging process focuses on a sequence of elements in array arr having index range [first, last). The sequence consists of two ordered sublists separated by an intermediate index, called mid. A merge involves copying the elements from the two sublists in array arr to the index range [first, last) in array tempArr so that the new sequence in tempArr is ordered. Consider the following example that describes a sequence of nine integer elements with index range [first, last). The sequence consists of the four-element sorted sublist A and the five-element sublist sublist B, with index ranges [first, mid) and [mid, last) respectively :

After all, the merge algorithm concludes by copying the elements from tempArr back to the original array, starting at index first.

The msort() Method :
The msrot() method is a recursive algorithm that takes two Object arrays arr and tempArr along with integers first and last that specifies the index range for the sublist which is to be sorted. The method creates two half-lists by computing the index midpt, representing the midpoint of the index range :
int midpt = (last + first) /2 ;

Each recursive step makes two calls to msort(). The first one uses indices first and mid to define the index range [first, mid) for the lower half-list of the original sequence. The second call uses indices mid and last to define the index range [mid, last) for the upper half-list. The process sets in motion a chain of recursive calls that partitions the original list into smaller and smaller sublist (half-list) until their size is 1 (stopping condition). Sublists of size 1 are obviously ordered. When revisiting the chain of recursive calls in reverse order, msort() uses the merge algorithm to build successively larger ordered sublist. The final merging of sublists corresponds to the first recursive step and arranges the original array in sorted order.
Tracing an example is a good way to understand the msrot() algorithm. Below figure illustrates the algorithm for the 11-elements Integer array :

The following is the implementation of msort(). The index range for a singleton sublist is [first, first+1), where first+1=last. Thus, the recursive partitioning process continues only as long as first + 1 < last. The concluding merge phase for each recursive step builds an ordered list with range [first, last) from the ordered lower sublist with range [first, mid) and the ordered upper sublist with range [mid, last). The method checks whether the list is already sorted by comparing the last element in the lower sublist (arr[mid-1]) with the first element in the upper sublist (arr[mid]). If arr[mid-1] is less than arr[mid], the list is sorted and no merging of sublist is required. The test would indicate that the following sublist from first to last is already ordered. The test provides optimization that results in faster sorts for nearly ordered lists.

- 函式 msort() 範例代碼 :
1. public static extends Comparablesuper T>> void msort(T[] tmp, T[] arr, int first, int last) {
2.     if((first+1) < last) {
3.         int mid = (first+last)/2;
4.         msort(tmp, arr, first, mid);
5.         msort(tmp, arr, mid, last);
6.         int indexA=first, indexB=mid, indexC=first;
7.         if(arr[mid-1].compareTo(arr[mid])>0) {
8.             for(int i=first; i
9.                 if(indexA>=mid && indexC// sublistA is done
10.                     tmp[indexC] = arr[indexB];
11.                     indexC++;
12.                     indexB++;
13.                 } else if(indexB>=last && indexC// sublistB is done
14.                     tmp[indexC] = arr[indexA];
15.                     indexC++;
16.                     indexA++;
17.                 } else {
18.                     if(arr[indexA].compareTo(arr[indexB]) <= 0) {
19.                         tmp[indexC] = arr[indexA];
20.                         indexC++;
21.                         indexA++;
22.                     } else {
23.                         tmp[indexC] = arr[indexB];
24.                         indexC++;
25.                         indexB++;
26.                     }
27.                 }
28.             }
29.         }
30.         for(int i=first; i
31.             arr[i] = tmp[i];
32.         }
33.     }
34. }

Efficiency of MergeSort :
An intuitive analysis of the process indicates that the worst-case running time for mSort() is O(n log2(n)). Assume the array has n=2^k elements. Below figure is a tree illustrating the recursive partitioning into progressively smaller and smaller sublists. At each level in the tree, we want to evaluate the amount of work required to merge the half-lists into a single sorted list :

The first call to msort() at level 0 generates two recursive calls that produce half-lists of size n/2, and the merge() method combines the half-list to create the ordered n-element list. At level 1, there are two calls to msot(); each produce two additional recursive calls with lists of size n/4. Each merge joins two sublists of size n/4 to create an ordered list of size n/2. In general, at level i, there are 2^i calls to merge(), and each call orders n/(2^i) elements :

At each level i in the tree, the merge involves n/(2^i) elements with linear running time that requires less than n/(2^i) comparisons. The combined 2^i merge operations at level i require less than (2^i)*(n/(2^i)) = n comparisons. Assuming that n=2^k, the partition process terminates at level K with sublists of size n/(2^k)=1. The total work done on all of the levels is no more than :

## 關於我自己

Where there is a will, there is a way!