程式扎記: [ Data Structures with Java ] Section 8.2 : Overview of Collections

標籤

2011年1月26日 星期三

[ Data Structures with Java ] Section 8.2 : Overview of Collections

Preface : 
The Collection interface describes basic collection operations. There is no reference to any ordering that items may have in the collection. These are features that are important in some applications. For instance, we may want to store elements in a natural ordering and create a sorted collection. For some types of applications, the order in which items enter and exist a collection is important. For instance, calls to a computer vendor for technical support are handled on a first-come first-served basis. 
Collections are distinguished by features by features that are over and above those defined by the Collection interface. The features take into account how elements are stored and accessed. They deal with the ordering of the elements, restrictions on duplicate values, and other conditions. We use the features to organize collections into categories. The organization includes a variety of interfaces that extend the Collection interface and add operations that take advantage of the underlying storage structures. 

List Collection : 
The List collection type describes an object that stores elements by position. The first element is at position 0, the second element at position 1 and so forth. The List interface extends the Collection interface by adding index-based operations. The methods allow a program to access and update elements in random order. The index provides sequential access to the elements. 
Here we will have ArrayList and the LinkedList collection classes as implementations of the List interface. The data structures offer the same index-based methods but differ significantly in the runtime efficiency of the operations. The efficiency issue often determines which data structure is used. 

- The ArrayList Collection 
An ArrayList collection is a generized array that stores elements in a contiguous block of memory. The underlying storage structure is an array that allows for direct access to the elements. An ArrayList grows dynamically to meet the needs of an application. As elements are added at the back of the list, the array fills up and is then automatically expanded to accept more elements. 
Assuming there is room to grow, adding an element at the back of the array is an O(1) operation. Using an array to store elements affects the efficiency of an ArrayList to insert and delete elements at intermediate positions in the list. Consider the problem of adding element at position 2 when the list currently contains the elements {9, 12, 3, 8, 6, 14, 22}. Below figure illustrates a before-and-after view of the sequence. 
 
The insertion requires shifting the tail of the list to the right to open a gap for the new element. It is an O(n) operation. Deleting an elements is also an O(n) operation because it requires shifting the tail of the list to the left one position to fill in the gap left by the existing element. 
Programmers use an ArrayList as the data structure of choice when an application needs direct access to list elements and inserts data at the back of a list that can grow dynamically. Addition and removal of data from the back of the list is very efficient. However, the list should require relatively few insertions and deletions in the interior of the list. 

- LinkedList Collection 
A LinkedList is a sequence whose elements have a value and links that identify adjacent elements in the sequence. In order to access a specific value in the list, you must start at the first position (front) and follow the links from element to element until you locate the item. Hence, a LinkedList is not a direct-access structure but rather a sequential-access structure. 
The power of a LinkedList collection is its ability to add and remove an element efficiently at any position in the sequence. You can think of the list as a chain consisting of a series of connected links. The algorithm to add an element follows the model of inserting a new link in the chain and it is an O(1) operation as well as the deleting operations. 
The LinkedList class can be extended to create collections that store elements in order. The derived class OrderedList overrides the add() method. Rather than inserting a new element at the end of the list, the new method searches the existing sorted list to find the location point. 
Features of the ArrayList and the LinkedList data structures point out an important application design issue. Both classes implement the List interface and so share common methods. For some methods, the choice of the data structure significantly affects runtime performance. For instance, an ArrayList allows direct access to an element. Executing get(i) has O(1) running time. The same method for a linked list has O(n) running time because the algorithm must scan the list from the front in order to locate the element. Adding or removing an element at the first position in the list is an O(1) operation for a linked list collection. The same operation has running time O(n) in an ArrayList because the tail of the list must be shifted. In an application design, we may determine that a list collection is appropriate because we want to store elements by position. Once that decision is made, we must evaluate whether an ArrayList or a LinkedList collection has better runtime performance. 

Set Collection : 
A Set is a collection that resembles a Bag collection but with the provision that no duplicate values are allowed. The Set interface extends the Collectioninterface by stipulating that the add() method should add a specified element to the set only if it is not already present. Set operations include union, intersection, and difference. 

Map Collection : 
A Map is a collection that stores key-value pairs. The key and the value may be of any object type, although some implementations requires that the key must implement comparison methods. The key field uniquely identifies the element while the value field typically has associated information. Access to an element requires only the key parameter and returns the value component. 
Arrays and maps have an obvious analogy. An array uses an index to access an element, while a map uses a key to access the associated value of an element. Thus, a map is often referred to as an associative array. Here we will develop tree and hash table implementations for the Set and Map interfaces. The tree classes TreeSet and TreeMap use a binary search tree for the underlying storage structure. As a by-product, an iterator scans the elements in order. This feature allows us to identify the minimum and maximum elements in the collection. We formalize this fact by creating two additional interfaces, OrderedSet and OrderedMap, that extend the corresponding Set and Map interfaces. The new interfaces define the methods first() and last(). The hash table classes HashSet and HashMap maintain unordered collections. Their main feature is a function, called a hash function, that converts a key to an integer, which serves as an index into a table. The result can be a very efficient lookup strategy. We introduce trees and hash tables as separate topics so that you gain hands-on experience with the concepts. They are then used to implement collection classes. 

Adapter Collections : 
List, Set and Map interfaces have methods that can add, remove or update the value of a separated element. They are data-storage and data-handling devices. We introduce other collections called adapter collections. An adapter contains another collection as its underlying storage structure; however, the interface for a adapter provides only a restricted set of operations from the structure. Examples are Stack, Queue and PriorityQueue. 

- Stack Collection 
A stack is a collection that resembles a rack of serving trays in a cafeteria. It has a single point of reference called the top of the stack. An element is added at the top of the stack. All other items currently on the stack are "pushed lower" to make room. The operation is referred to as pushing the item onto the stack. The operation of removing an item from a stack is called popping the stack. It deletes the element currently on the top. Below figure shows objects A,B and C are pushed onto the stack. A series of pop operations clear the stack by successively removing the top element : 
 
Note that items come off a stack in the reverse order of their original insertion into the stack. We refer to this as last-in first-out (LIFO) ordering. A stack collection allows a program to access the value of the element at the top without removing it. We continue our discussion of stacks in Chap 14 and use them to illustrate the runtime handling of recursive method calls and algorithms to evaluate arithmetic and relational expressions. 

- Queue Collection 
A queue is a collection that allows access only at the front and back. Items enter at the back and exit from the front. This is the model for a waiting line at a grocery store or a bank. Below figure provides a view of a queue after adding elements A,B,C and D in that order. An insertion adds E to the back of the queue and a deletion removes the first element A from the queue : 
 
Items leave the queue in the same order as they arrive; hence the collection provides a first-in first-out (FIFO) ordering. We develop the queue collection in Chap 15 and illustrate its use with the radix sort algorithm and an event-driven simulation. 

- Priority Queue Collection 
A priority queue is a collection that has restricted access operations similar to a stack or queue. Elements can enter the priority queue in any order. Once in the collection, a delete operation removes the maximum (or minimum) value, depending on the specified type of priority. You can visualize a priority queue as a filtering system that takes in elements and then releases them in priority order. We develop the priority queue in Chap15 and its implementation using a heap in Chap22. 

Graph Collection : 
A graph is a set of vertices connected by links, called edges. Depending on the applications, edges may or may not have a direction. An edge connecting vertices A and B has a direction when A is identified as the source and B is identified as the destination. Illustrations use an arrow from A to B for a directed edge. An edge may also have a weight defining a property of the edge; for instance, the length, the amount of available traffic or signal flow, and so forth. A graph whose edges have direction is called a directed graph or digraph. If weights are included, the graph is a weighted digraph. Below figure is a weighted digraph containing five vertices and seven edges : 
 
Graphs are involved in many proglems of practical interest. For instance, directed graphs are used to represent finite state machines. Optimizing the flow of data in a network is solved using weighted graphs. 
The Graph interface defines methods that access and update vertices, edges and weights for a graph. Access includes global properties such as the number of vertices or edges as well as specific information about individual components. For instance, methods return the number of edges that emanate from a vertex as well as the set of vertices (neighbors) that are the destination for the edges. 
The DiGraph class implements the Graph interface. Edges have associated weights, which can all be set to 1 when the graph object represents a simple digraph. We will develop a wide range of graph algorithms and present solutions using DiGraph methods. 

Java Collections Framework : 
The Java software development system presents data structure as a collections framework in the package java.util. The collections framework is an integrated system for describing and defining collections. It is build on size collection interfaces and a hierarchy of abstract classes and collection classes that implement the different data structure. 
We also present data structures using a framework. Our approach designs interfaces to include the key operations that describe the properties and behaviors of a collection type. All of our collection classes are implemented directly without using an inheritance hierarchy with abstract classes. They simply implement an interface and include other methods that take advantage of the underlying storage structure. We developed our framework to promote your understanding of data structures. As professionals, you will probably use the collections in the Java Collections Framework. Our framework is similar to the Java framework and will allow you to make a easy transition to the professional software. 
The following is a summary of the collection interfaces and collection classes that are developed in this book. The unshaded regions are interfaces and the shaded regions are collection classes. 

沒有留言:

張貼留言

網誌存檔

關於我自己

我的相片
Where there is a will, there is a way!