程式扎記: [ Data Structures with Java ] Section 19.2 : Set Operators

Preface :
A mathematical set is a good model of a Set collection. For applications, we often need the classic set operations union, intersection and difference along with the logical operation subset. These enables us to identify how elements are shared between two sets. Let us review these operations using mathematic notation. Venn diagrams describe the results of the operations on two sets, A and B. For example, assume :
set A = {1, 3, 8, 9, 10}
set B = {2, 3, 6, 9}

- Union(A, B) - Operation A U B

The set of all elements x such that x is an element in set A or x is an element in set B.
Example: A U B = {1, 2, 3, 6, 8, 9, 10}

- Intersection(A, B) - Operation A ∩ B

The set of all elements x such that x is an element in set A and x is an element in set B.
Example: A ∩ B = {3, 9}

- Difference(A, B) - Operation A-B

The set of all elements x such that x is an element in set A but x is not an element in set B.
Example: A - B = {1, 8, 10}

- Subset(A, B) - Operation A ⊆ B

A is a subset of B, provided each element x in A is also an element in B.
An equivalent definition is A ⊆ B iff A ∩ B Ξ A.
Example: {3, 8, 10 } ⊆ A (true) {2, 5, 6} ⊆ B (false)

Implementing Set Operators :
The mathematical set operations of union, intersection, difference and subset are not part of the Set interface or the concrete TreeSet and HashSet classes. A programmer must provide them as independent methods. They have important applications, however, so we take this opportunity to implement them. We treat them as binary operations and pass the operands setA and setB as Set arguments. In this way, the methods apply to both TreeSet and HashSet arguments. We assume that both arguments have the same collection type, either TreeSet or HashSet. The subset() method returns a boolean value that indicates whether the first argument is a subset of the second argument. For the other set operations, the instanceof() operator determines the type of the arguments and create a return set collection of that type. The method implementations are given in the Sets class and use a standard design :

- Method setOp() implementation :

view plain copy to clipboard print ?
private static Set setOp(Set setA, Set setB) {

    Set returnSet;

    if(setA instanceof TreeSet)

        returnSet = new TreeSet();

    else

        returnSet = new HashSet();

    return returnSet;

}

- Set Union
The method union() uses iterators to build the set union. The task is to scan each of the sets and use add() to insert elements in the return set, setUnion. The add() operation doesn't not allow duplicates and so the union doesn't not contain elements common to both sets :

- union(setA, setB) :

view plain copy to clipboard print ?
public static Set union(Set setA, Set setB) {

    Set setUnion = setOp(setA, setB);

    Iterator iter = setA.iterator();

    while(iter.hasNext())

        setUnion.add(iter.next());

    iter = setB.iterator();

    while(iter.hasNext())

        setUnion.add(iter.next());

    return setUnion;

}

- Set Intersection
To find the elements in the intersection of setA and setB, use an iterator to scan the first set. For each element, use contains() to check whether it is also in the second set. If so, the element is added to the setIntersection collection :

- Method intersection(setA, setB) :

view plain copy to clipboard print ?
public static Set intersection(Set setA, Set setB) {

    Set setIntersection = setOp(setA, setB);

    Iterator iter = setA.iterator();

    while(iter.hasNext()) {

        T item = iter.next();

        if(setB.contains(item)) setIntersection.add(item);

    }

    return setIntersection;

}

- Set Difference
The algorithm to find elements in the intersection extends to an algorithm for set difference. Use an iterator to scan the elements in set A. For each element, use contains() to check whether it is also in the second set. If not, the element is added to the setDifference collection :

- difference(setA, setB) :

view plain copy to clipboard print ?
public static Set difference(Set setA, Set setB) {

    Set setDifference = setOp(setA, setB);

    Iterator iter = setA.iterator();

    while(iter.hasNext()) {

        T item = iter.next();

        if(!setB.contains(item)) setDifference.add(item);

    }

    return setDifference;

}

- Subset Relation
The usual definition of subset (A ⊆ B) determines whether each element in set A is also in set B. For the sebset() method, we use the equivalent definition, A ⊆ B iff A ∩ B Ξ A. The algorithm returns true when intersection(setA, setB) is identical with setA. An implementation of subset() verifies the condition by comparing the size of setA with the size of the intersection :

- subset(setA, setB) :

view plain copy to clipboard print ?
public static  boolean subset(Set setA, Set setB) {

    return intersection(setA, setB).size()==setA.size();

    /* Second way to do it

               Iterator iter = setA.iterator();

    while(iter.hasNext()){

        if(setB.contains(iter.next())) return false;

    }

    return true;*/

}

Operations with Ordered Sets :
In the previous discussion, we implemented the general set methods using "brute-force" algorithms. For instance, with set union, iterators scan each of the two operand sets and use add() to insert all of the elements into the return set. The behavior of the add() method assures us that the union doesn't have duplicate values.
For TreeSet collections, we can take advantage of the ordering provided by the iterators to more efficiently implement the set operations. Let us look at an algorithm for set intersection. Algorithm for set union, difference and subset are left as exercise.
The ordered set-intersection algorithm uses iterators to make a pairwise scan of th elements in the two sets. At each step, a comparison is made between elements and if a match occurs, the value belongs to the intersection. An example illustrates how iterators traverse the two sets. Assume that lhsIter andrhsIter are iterators for the two operands and that the corresponding values are lhsValue and rhsValue. Initially, the iterators reference the first element in their respective sets :
* If lhsValue < rhsValue, then lhsValue is not an element in the intersection, and we can move the iterator lhsIter forward to the next element in its set. For instance, 3<7 and so lhsIter proceeds to element 9 :

* If rhsValue < lhsValue, then rhsValue is not an element in the intersection. Move the iterator rhsIter forward to the next element in its set. For instance, 7 < 9 and so rhsIter proceeds to element 12 :

* if lhsValue == rhsValue, the two sets have a common value. After inserting the value into the intersection, move both iterators forward to the next element in their respective sets. For instance, iterators lhsIter and rhsIter find a match at 12 :

The algorithm terminates when we exhaust the values in one of the lists. In the example, lhsIter completes the scan first after identifying that 20 is also a value in the intersection. The static method orderedIntersection() in the Sets class implements the algorithm. The private method advance() takes an Iterator argument and, if another value exists in the corresponding set collection, moves the iterator forward and returns the next value. The method return null whether no elements remain.
The method orderedIntersetion() takes two TreeSet arguments and has the compiler determine whether the generic type implements Comparable. The method advance() handles most of the details :

- Methods orderedIntersection() & advance() :

view plain copy to clipboard print ?
public staticextends Comparablesuper T>> TreeSet

          orderedIntersection(TreeSet lhs, TreeSet rhs) {

    TreeSet setIntersection = new TreeSet();

    Iterator lhsIter = lhs.iterator(),

                rhsIter = rhs.iterator();

    T lhsValue = advance(lhsIter);

    T rhsValue = advance(rhsIter);

    while(lhsValue!=null && rhsValue!=null) {

        int rst = lhsValue.compareTo(rhsValue);

        if(rst==0){

            setIntersection.add(lhsValue);

            lhsValue = advance(lhsIter);

            rhsValue = advance(rhsIter);

        } else if(rst>0) {

            rhsValue = advance(rhsIter);

        } else {

            lhsValue = advance(lhsIter);

        }

    }

    return setIntersection;

}



private static T advance(Iterator iter) {

    T value = null;

    if(iter.hasNext()) value = iter.next();

    return value;

}

Complexity of orderedIntersection() :
Assume that n1(lhs) and n2(rhs) are the number of elements in lhs and rhs. Each iteration of the loop makes one or two comparisons which we assume occur in O(1) running time. We must make at most n1+n2 comparisons, so the algorithm has worst-case running time O(n1+n2). The algorithm is linear with respect to the total number of elements in the two sets.

程式扎記

標籤

2011年3月27日星期日

[ Data Structures with Java ] Section 19.2 : Set Operators

沒有留言:

張貼留言

[Git 常見問題] error: The following untracked working tree files would be overwritten by merge

檢舉濫用情形

學習筆記

標籤

2011年3月27日 星期日