程式扎記: [ Data Structures with Java ] Section 21.5 : Hash Class Implementation

標籤

2011年4月7日 星期四

[ Data Structures with Java ] Section 21.5 : Hash Class Implementation


Preface :
We provide a detailed implementation of the Hash class. You will see how chaining with separate lists works. By understanding the Hash class, we cna focus on design principles for the implementation of the HashMap class.
The Hash class uses a hash table as its storage structure. The table is an array whose elements are the first node in a singly linked list. For the Hash class, we define a new node structure called Entry with an integer field hashValue that stores the hash code value . Adding the field simplified the process of rehashing the table. We avoid having to recompute :
item.hashCode() & Integer.MAX_VALUE

during the rehash. The other fields in the node are value and next :


The following is a declaration of the genetic static inner class Entry. A constructor creates an Entry object with specified valuenext node and hashValueparameters :

- Entry inner class in Hash class :
  1. ...  
  2.     private static class Entry{  
  3.         public T value;  
  4.         public int hashValue;  
  5.         public Entry next;  
  6.           
  7.         Entry(T value, int hashValue, Entry next){  
  8.             this.value = value;  
  9.             this.hashValue = hashValue;  
  10.             this.next = next;  
  11.         }  
  12.     }  
  13. ...  

The Hash class private variables describe the hash table. The Entry array, called table, defines the singly sinked lists (buckets) that store the elements. The integer variable hashTableSize specifies the number of entries in the table. The variable tableThreshold has the value :
(int)(table.length * MAX_LOAD_FACTOR)

The constant MAX_LOAD_FACTOR is a value that specifies the maximum allowed ration of the elements in the table and the table size. The value of 0.75 (number of hash table entries is 75% of the table size) is generally a good value. When the number of elements in the table equals tableThreshold, a rehash occurs. The variable modCount is used by iterators to determine whether external updates may have invalidated an iterator scan of the elements. Note that table is an array of the row type Entry, not generic type Entry. It is necessary to declare table in this fashion, because arrays of generic types are not allowed.
The Hash class constructor creates the 17-element array table, whose elements default to null. This has the effect of creating 17 empty lists. A rehash will occur when the hash collection size equals 12 :

- Hash class partly implementation :
  1. public class Hash implements Collection{  
  2.     private static final int InitHashSize = 17;  
  3.     private final double MAX_LOAD_FACTOR = .75;  
  4.     private int tableThreshold;  
  5.     private int hashTableSize;  
  6.     private Entry[] table;  
  7.       
  8.       
  9.     public Hash(){  
  10.         table = new Entry[InitHashSize];  
  11.         hashTableSize = 0;  
  12.         tableThreshold = (int)(table.length * MAX_LOAD_FACTOR);  
  13.     }  
  14. ...  
  15. }  

Hash add() and rehash() Methods :
The algorithm for add() computes the hash index for the parameter item. The index identifies the linked list (bucket) whose first node is at table[index]. A reference variable scans the list to see if item is currently in the hash table. If so, the scan terminates and the method returns the boolean value false. Otherwise, a new Entry object is created with value item and inserted at the front of the list. Note that hashValue is assigned to the entry so it will not have to be computed when rehashing occurs. The variables hashTableSize and modCount are incremented. If the size of the hash table has reached the table threshold (at least 75% full), the private method rehash() locates the table items in a larger hash table. The size of the new table is :
2 * table.length + 1

Table sizes are always odd, being one more than twice the previous size. Using this approach, table sizes follow the sequence, 17, 35, 71... Below is the implementation of method add() :
- Method add() in Hash class :
  1. ...  
  2.     @Override  
  3.     public boolean add(T item) {  
  4.         int hashValue = item.hashCode() & Integer.MAX_VALUE;  
  5.         int index = hashValue % table.length;  
  6.         Entry entry = table[index];  
  7.         while(entry!=null) {  
  8.             if(entry.value.equals(item)) return false;  
  9.             entry = entry.next;  
  10.         }  
  11.         modCount++;  
  12.         entry = new Entry(item, hashValue, (Entry)table[index]);  
  13.         table[index] = entry;  
  14.         hashTableSize++;  
  15.         if(hashTableSize >= tableThreshold) rehash(2*table.length+1);  
  16.         return true;  
  17.     }  
  18. ...  

- Rehashing the Table
The method rehash() takes the size of the new hash table as an argument and performs rehashing. After creating a new table with the specified size, it uses a nested for-loop to cycle through the nodes in the original table. For each node, use the hasValue field modulo the new table size to hash to the new index. Simply insert the node at the front of the linked list. Saving the hashValue optimizes the rehashing process :
- Method rehash() in class Hash :
  1. ...  
  2.     protected void rehash(int size){  
  3.         Entry[] newTable = new Entry[size], oldTable = table;  
  4.         Entry entry, nextEntry;  
  5.         int index;  
  6.         for(int i=0; i
  7.             entry = table[i];  
  8.             if(entry!=null) {  
  9.                 do{  
  10.                     nextEntry = entry.next;  
  11.                     index = entry.hashValue % size;  
  12.                     entry.next = newTable[index];  
  13.                     newTable[index] = entry;  
  14.                     entry = nextEntry;  
  15.                 }while(entry!=null);  
  16.             }  
  17.         }  
  18.         table = newTable;  
  19.         tableThreshold = (int)(table.length * MAX_LOAD_FACTOR);  
  20.         oldTable = null;  
  21.     }  
  22. ...  

Hash remove() Method :
The algorithm for remove() computes the hash index for the parameter item. This provides the table index for the linked list which may contain item. Removing an element from a singly linked list requires two reference variables, curr and prev. Initially, curr is the front of the list and prev is null. The variables move in tandem down the list until they find a match or reach the end of the list. In the latter case, item is not found and remove() simply returns with a value false. Otherwise, the prev reference next is updated to point at the successor of curr. If a match occurs at the first element, the front of the list, table[index] must also be updated. The remove() method decrements hashTableSize, increments modCount, and return true :
- Method remove in class Hash :
  1. ...  
  2.     @Override  
  3.     public boolean remove(Object item) {  
  4.         int index = (item.hashCode() & Integer.MAX_VALUE) % table.length;  
  5.         Entry curr=table[index], prev = null;  
  6.         while(curr!=null) {  
  7.             if(curr.value.equals(item)) {  
  8.                 modCount++;  
  9.                 if(prev==null)  
  10.                     table[index] = curr.next;  
  11.                 else  
  12.                     prev.next = curr.next;  
  13.                 hashTableSize--;  
  14.                 return true;  
  15.             }  
  16.             prev = curr;  
  17.             curr = curr.next;  
  18.         }  
  19.         return false;  
  20.     }  
  21. ...  

Implementing the Hash Iterator :
The design of an iterator class focuses on a strategy to scan the elements in the corresponding collection. For the Hash class, the strategy involves searching the hash table for the first non-empty bucket in the array of linked lists. Once the bucket is located, the iterator traverses all of the elements in the corresponding linked list and then continues the process by looking for the next non-empty bucket. The iterator reaches the end of the table when it reaches the end of the list for the last non-empty bucket.
Iterator objects are instances of the inner class IteratorImpl. The instance variables in the inner class maintain information on the current state of the iterator relative to the elements in the outer class collection. These variables include the integer index that identifies the current bucket (table[index]) scanned by the iterator. The Entry reference next pointed by the most recent call to next(). A program can update a Hash collection by calling the outer class add() and remove() methods. These operations place the interator in an inconsistent state. The iterator variable expectedModCount and the collection variable modCount are used to check the consistency.
Below figure displays a Hash collection with five buckets and four elements. The elements enter the collection in the order (19, 32, 11, 27) using the identify hash function. An Iterator object, hIter, scans the collection. The initial position is provided by the Hash method iterator() and points to element 11. This is the first element in the linked list table[1]. The figure includes the initial iterator variables and the updated variables after hIter.next() returns 32. In the figure, the dotted lines indicate traversal from one bucket to another. The iterator visits the elements in the order (11, 32, 27, 19) :

- Hash Iterator Constructor
The constructor positions the Entry reference next at the front of the first none-empty linked list (bucket). A loop iterates up the list of buckets until it locates this first none-empty bucket. The loop variable i becomes the initial value for index and table references the front of the list. This is the initial value for next :
- Constructor of inner class IteratorImpl inside class Hash :
  1. ...  
  2.     private class IteratorImpl implements Iterator {  
  3.         Entry next;  
  4.         int expectedModCount;  
  5.         int index;  
  6.         T lastReturned;  
  7.   
  8.         public IteratorImpl(){  
  9.             int i=0;  
  10.             Entry n = null;  
  11.             expectedModCount = modCount;  
  12.             if(hashTableSize!=0)  
  13.                 while(i < table.length && ((n = table[i])==null))  
  14.                     i++;  
  15.             next = n;  
  16.             index = i;  
  17.             lastReturned = null;  
  18.         }  
  19. ...  
  20. }  

- Implementing next()
The iterator is currently positioned at some node in a linked list. The method next() first determines the operation is valid by checking that modCount andexpectedModCount are equal and that we are not at the end of the hash table. If the iterator is in a consistent state, next() uses a look index i and an Entry reference called entry to perform the iteration scan. Initialize i and entry with the instance variables index and next, respectively. The method extracts the value at entry as the object lastReturned and then moves entry to the next position in the linked list. In case the position is at the end of the current list, we must search up the list of buckets beginning at index i+1 to locate the next none-empty bucket. The front of this list becomes the value for the instance variable next, and the bucket index is the new value for index :
- Method next() in inner class IteratorImpl :
  1. @Override  
  2. public T next() {  
  3.     if(expectedModCount!=modCount) throw new ConcurrentModificationException();  
  4.     if(next==nullthrow new NoSuchElementException();  
  5.     lastReturned = next.value;  
  6.     Entry entry = next.next;  
  7.     if(entry==null) {                             
  8.         int i = index+1;  
  9.         while(inull) i++;  
  10.         index = i;  
  11.     }  
  12.     next = entry;  
  13.     return lastReturned;  
  14. }  

- Implementing remove()
The remove() method first determines that the operation is valid by checking that lastReturned is not null and that modCount and expectedModCount are equal. If all is well, the iterator remove() method calls the Hash class remove() method with lastReturned as the argument. By assigning to expectedmodCountthe current value of modCount , the iterator remains consistent.
- Method remove in inner class IteratorImpl :
  1. ...  
  2.         @Override  
  3.         public void remove() {  
  4.             if(expectedModCount!=modCount) throw new ConcurrentModificationException();  
  5.             if(lastReturned == nullthrow new IllegalStateException();  
  6.             Hash.this.remove(lastReturned);  
  7.             expectedModCount = modCount;  
  8.             lastReturned = null;  
  9.         }  
  10. ...  

沒有留言:

張貼留言

網誌存檔

關於我自己

我的相片
Where there is a will, there is a way!