程式扎記: [ Data Structures with Java ] Section 21.1 : Hashing

標籤

2011年4月5日 星期二

[ Data Structures with Java ] Section 21.1 : Hashing

Preface : 
A binary search tree can access data stored by value with O(log2n) average search time. Ideally, we would like to design a storage structure that approximates O(1) average retrieval time. In this way, access to an item is independent of the number of other items in the collection. A hash table is such a structure. The table is an array of references. Associated with the table is a hash function that takes a key as an argument and returns an integer value that leads to a table index. To gain some understanding of a hash function, you can view a bash table as a storage array. Later we will describe a more general table consisting of an array of references to linked list. 
A hash function returns an integer value, referred to as a hash value. By using the remainder after dividing the hash value by the table size, we have a mapping of the key to an index in the table. In the below figure, the table size is n. A two-step process uses the hash function hf() to convert the key to an integer value and the "%" operator to telescope the value to an index in the range [0, n) : 
 

Using a Hash Function : 
Let us look at the behavior of a simple hash function in which the key is a nonnegative integer. The hash function hf(x) = x is the identity function. The remainder after dividing x by n is the hash table index. Assume the table is the array tableEntry with n=7 elements. The hash function takes key=4 and maps it to tableEntry[4]. Key = 22 is mapped to tableEntry[1] : 
 

The need to divide the hash value by the table size can create a many-to-one association between items and a table entry. If two items have keys that differ by a multiple of 7, the hash process maps the items to the same table location. For instance, items with keys 22 and 36 maps to index 1 in the table, and acollision occurs. Similarly, keys 5 and 33 map to the same index 5. : 
 

The presence of collisions does not indicate an error in the hash function. Treat it as a fact of life. Collisions will occur, and attempting to avoid them is often unrealistic. The problem is to design the hashing process so it efficiently handles collisions. We discuss this issue in the next two sessions, where we design hash functions and the corresponding hash table.

沒有留言:

張貼留言

網誌存檔

關於我自己

我的相片
Where there is a will, there is a way!