程式扎記: [ FP with Java ] Ch8 - Advanced list handling

Preface
This chapter covers

* Speeding list processing with memoization
* Composing List and Result
* Implementing indexed access on lists
* Unfolding lists
* Automatic parallel list processing

In chapter 5, you created your first data structure, the singly linked list. At that point, you didn’t have at your disposal all the techniques needed to make it a complete tool for data handling. One particularly useful tool you were missing was some way to represent operations producing optional data, or operations that can produce an error. In chapters 6 and 7, you learned how to represent optional data and errors. In this chapter, you’ll learn how to compose operations that produce optional data or errors with lists.

You also developed some functions that were far from optimal, such as length, and I said that you’d eventually learn more-efficient techniques for these operations. In this chapter, you’ll learn how to implement these techniques. You’ll also learn how to automatically parallelize some list operations in order to benefit from the multicore architecture of today’s computers.

The problem with length
Folding a list involves starting with a value and composing it successively with each element of the list. This obviously takes an amount of time proportional to the length of the list. Is there any way to make this operation faster? Or, at least, is there a way to make it appear faster? As an example of a fold application, you created a length method in List in exercise 5.9 with the following implementation:

view plaincopy to clipboardprint?
public int length() {  
  return foldRight(this, 0, x -> y -> y + 1);  
}  

In this implementation, the list is folded using an operation that consists of adding 1 to the result. The starting value is 0, and the value of each element of the list is simply ignored. This is what allows you to use the same definition for all lists. Because the list elements are ignored, the list element’s type is irrelevant.

You can compare the preceding operation with one that computes the sum of a list of integers:

view plaincopy to clipboardprint?
public static Integer sum(List<Integer> list) {  
  return list.foldRight(0, x -> y -> x + y);  
}  

The main difference here is that the sum method can only work with integers, whereas the length method works for any type. Notice that foldRight is only a way to abstract recursion. The length of a list can be defined as 0 for an empty list and 1 plus the length of the tail for a non-empty list. In the same way, the sum of a list of integers can be defined recursively as 0 for an empty list, and head plus the sum of the tail for a non-empty one.

There are other operations that can be applied to lists in this way, and, among them, several for which the type of the list elements is irrelevant:

* The hash code of a list can be computed by simply adding the hash codes of its elements. Because the hash code is an integer (at least for Java objects), this operation doesn’t depend on the object’s type.
* The string representation of a list, as returned by the toString method, can be computed by composing the toString representation of the list elements. Once again, the actual type of the elements is irrelevant.

Some operations may depend on some characteristics of the element’s type, but not on the specific type itself. For example, a max method that returns the maximum element of a list will only need the type to be Comparable or a Comparator.

The performance problem
All these methods can be implemented using a fold, but such implementations have a major drawback: the time needed to compute the result is proportional to the length of the list. Imagine you have a list of about a million elements, and you want to check the length. Counting the elements may seem the only way to go (this is what the fold-based length method does). But if you were adding elements to the list until it reaches a million, you surely wouldn’t count the elements after adding each one.

In such a situation, you’d keep a count of the elements somewhere, and add one to this count each time you added an element to the list. Maybe you’d have to count once if you were starting with a non-empty list, but that’s it. This technique is what you learned in chapter 4: memoization. The question is, where can you store the memoized value? The answer is obvious: in the list itself.

The benefit of memoization
Maintaining a count of the elements in a list will take some time, so adding an element to a list will be slightly slower than if you didn’t keep the count. It might look like you’re trading time against time. If you build a list of 1,000,000 elements, you’ll lose 1,000,000 times the amount of time needed to add one to the count. In compensation, however, the time needed to get the length of the list will be near 0 (and obviously constant). Maybe the total time lost in incrementing the count will equal the gain when calling length. But as soon as you call length more than once, the gain is absolutely obvious.

The drawbacks of memoization
Memoization can turn a function that works in O(n) time (time proportional to the number of elements) into O(1) time (constant time). This is a huge benefit, although it has a time cost, because it makes the insertion of elements slightly slower. But slowing insertion is generally not a big problem.

A much more important problem is the increase in memory space. Data structures implementing in-place mutation don’t have this problem. In a mutable list, nothing keeps you from memoizing the list length as a mutable integer, which takes only 32 bits. But with an immutable list, you have to memoize the length in each element. It’s difficult to know the exact increase in size, but if the size of a singly linked list is around 40 bytes per node (for the nodes themselves), plus two 32-bit references for the head and the tail (on a 32-bit JVM), this would result in about 100 bytes per element. In this case, adding the length would cause an increase of slightly over 30%. The result would be the same if the memoized values were references, such as memoizing the max or min of a list of Comparable objects. On a 64-bit JVM, it’s even more difficult to calculate due to some optimization in the size of the references, but you get the idea.

Sizes of object references
For more information about the size of object references in Java 7 and Java 8, see Oracle’s documentation on compressed oops (http://mng.bz/TjY9) and JVM performance enhancements (http://mng.bz/8X0o).

It’s up to you to decide whether you want to use memoization in your data structures. It may be a valid option for functions that are often called and don’t create new objects for their results. For example, the length and hashCode functions return integers, and the max and min functions return references to already existing objects, so they may be good candidates. On the other hand, the toString function creates new strings that would have to be memoized, so that would probably be a huge waste of memory space. The other factor to take into consideration is how often the function is used. The length function may be used more often than hashCode, because using lists as map keys is not a common practice.

Let's create a memoized version of the length method. Its signature in the List class will be (Exercise 8.1):

view plaincopy to clipboardprint?
public abstract int lengthMemoized();  

The implementation in the Nil class is exactly the same as for the nonmemoized length method:

view plaincopy to clipboardprint?
public int lengthMemoized() {  
  return 0;  
}  

To implement the Cons version, you must first add the memoizing field to the class and initialize it in the constructor:

view plaincopy to clipboardprint?
public static class Cons<A> extends List<A>{  
        private final A head;  
        private final List<A> tail;  
        private final int length;  // New  
          
        public Cons(A head, List<A> t) {  
            this.head = head;   
            this.tail = t;    
            this.length = tail.length() + 1;  // New  
        }  
         ...  
}  

Then you can implement the lengthMemoized method to simply return the length:

view plaincopy to clipboardprint?
@Override  
public int lengthMemoized() {  
    return length;  
}  

This version will be much faster than the original one. One interesting thing to note is the relationship between the length and isEmpty methods. You might tend to think that isEmpty is equivalent to length == 0, but although this is true from the logical point of view, there can be a huge difference in implementation, and thus in performance.

Note that memoizing the maximum or minimum value in a list of Comparable could be done the same way (although with a static method), but it wouldn’t help in the case where you want to remove the max or min value from the list. Min or max elements are often accessed to retrieve elements by priority. In that case, the elements’ compareTo method would compare their priorities. Memoizing priority would let you know immediately which element has the maximum priority, but it wouldn’t help much because what you often need is to remove the corresponding element. For such use cases, you’ll need a different data structure, which you’ll learn to create in chapter 11.

Actual performance
As I said, it’s up to you to decide if you should memoize some functions of the List class. A few experiments should help you make your decision. Measuring the available memory size just before and after the creation of a list of 1,000,000 integers shows a very small increase when using memoization. Although this measurement method isn’t very precise, the average decrease in available memory is about 22 MB in both cases (with or without memoization), varying between 20 MB and 25 MB. This shows that the theoretical increase of 4 MB (1,000,000 x 4 bytes) isn’t as significant as you’d expected. On the other hand, the increase in performance is huge. Asking for the length ten times might cost more than 200 milliseconds without memoization. With memoization, the time is 0 (too short a time to be measured in milliseconds).

Note that although adding an element increases the cost (adding one to the tail length and storing the result), removing an element has zero cost, because the tail length is already memoized. Another way to go, if memoization isn’t desirable, is to optimize the length method. Instead of using a fold, you can resort to imperative style, with a loop and a local mutable variable. Here’s the length implementation borrowed from the Scala List class:

view plaincopy to clipboardprint?
public int length() {  
  List%lt;A> these = this;  
  int len = 0;  
  while (!these.isEmpty()) {  
    len += 1;  
    these = these.tail();  
  }  
  return len;  
}  

Although it doesn’t look very functional in style, this implementation is perfectly compatible with the definition of functional programming. It’s a pure function without any observable effect from the outside world. The main problem is that it’s only five times faster than the fold-based implementation, where the memoized implementation can be millions of times faster for very large lists.

Composing List and Result
In the previous chapter, you saw that Result and List are very similar data structures, mainly differing in their cardinality but sharing some of their most important methods, such as map, flatMap, and even foldLeft and foldRight. You saw how lists could be composed with lists, and results with results. Now, you’re going to see how results can be composed with lists.

Methods on List returning Result
At this point, you’ve noticed that I try to avoid accessing the elements of results and lists directly. Accessing the head or the tail of a list will throw an exception if the list is Nil, and throwing an exception is one of the worst things that can happen in functional programming. But you saw that you could safely access the value in a Result by providing a default value to be used in the case of a failure or empty result. Can you do the same when accessing the head of a list? Not exactly, but you can return a Result.

Let's implement a headOption method in List<A> that will return a Result<A>. Use the following abstract method declaration in List, and implement it in each subclass (Exercise 8.2):

view plaincopy to clipboardprint?
public abstract Result<A> headOption();  

Note that the method is called headOption to indicate that a value is optional, although you’ll use Result for the type. The implementation of the Nil class returns Empty:

view plaincopy to clipboardprint?
@Override  
public Result<A> headOption() {  
    return Result.empty();  
}  

The Cons implementation returns a Success holding the head value:

view plaincopy to clipboardprint?
@Override  
public Result<A> headOption() {  
    return Result.success(head);  
}  

Let's create a lastOption method returning a Result of the last element in the list (Exercise 8.3). Don’t use explicit recursion, but try to build on the methods you developed in chapter 5. You should be able to define a single method in the List class. Below is A trivial solution is to use explicit recursion:

view plaincopy to clipboardprint?
public Result<A> lastOption() {  
  return isEmpty()  
      ? Result.empty()  
      : tail().isEmpty()  
          ? Result.success(head())  
          : tail().lastOption();  
}  

This solution has several problems. It’s stack-based recursive, so you should transform it to make it heap-based, plus you have to handle the case of the empty list, where tail().lastOption() would throw an NPE. But you can simply use a fold, which abstracts recursion for you! All you need to do is create the right function for folding. You need to always keep the last value if it exists. This might be the function to use:

view plaincopy to clipboardprint?
Function<Result<A>, Function<A, Result<A>>> f =  
                                   x -> y -> Result.success(y);  

Or use a method reference:

view plaincopy to clipboardprint?
Function<Result<A>, Function<A, Result<A>>> f =  
                                   x -> Result::success;  

Then you just have to foldLeft the list using Result.Empty as the identity:

view plaincopy to clipboardprint?
public Result<A> lastOption() {  
  return foldLeft(Result.empty(), x -> Result::success);  
}  

Can you replace the headOption method with a single implementation in the List class? What would be the benefits and drawbacks of such an implementation (Exercise 8.4)? It’s possible to create such an implementation:

view plaincopy to clipboardprint?
public Result<A> headOption() {  
  return foldRight(Result.empty(), x -> y -> Result.success(x));  
}  

The only benefit is that it’s more fun if you like it that way. When devising the last-Option implementation, you knew you had to traverse the list in order to find the last element. To find the first element, you don’t need to traverse the list. Using fold-Right here is exactly the same as reversing the list and then traversing the result to find the last element (which is the first element of the original list). Not very efficient! And by the way, this is exactly what the lastOption method does to find the last element: reverses the list and takes the first element of the result. So except for the fun, there’s really no reason to use this implementation.

Converting from List%lt;Result> to Result<List>
When a list contains the results of some computations, it will often be a List<Result>. For example, mapping a function from T to Result on a list of T will produce a list of Result. Such values will often have to be composed with functions taking a List<T> as their argument. This means you’ll need a way to convert the resulting List<Result> into a List, which is the same kind of flattening involved in the flatMap method, with the huge difference that two different data types are involved: List and Result. You can apply several strategies to this conversion:

* Throw away all failures or empty results and produce a list of U from the remaining list of successes. If there’s no success in the list, the result could simply contain an empty List.
* Throw away all failures or empty results and produce a list of U from the remaining list of successes. If there’s no success in the list, the result would be a Failure.
* Decide that all elements must be successes for the whole operation to succeed. Construct a list of U with the values if all are successes and return it as a Success<List>, or return a Failure<List> otherwise.

The first solution would correspond to a list of results where all results are optional. The second solution means that there should be at least one success in the list for the result to be a success. The third solution corresponds to the case where all results are mandatory. Let's write a method called flattenResult that takes a List<Result<A>> as its argument and returns a List<A> containing all the success values in the original list, ignoring the failures and empty values (Exercise 8.5). This will be a static method in List<A> with the following signature:

view plaincopy to clipboardprint?
public static <A> List<A> flattenResult(List<Result<A>> list)  

Try not to use explicit recursion but to compose methods from the List and Result classes. To solve this exercise, you can use the foldRight method to fold the list with a function producing a list of lists. Each Success will be transformed into a list of one element containing the value, whereas each Failure or Empty will be transformed into an empty list. Here’s the function:

view plaincopy to clipboardprint?
Function<Result<A>, Function<List<List<A>>, List<List<A>>>> f =  
                    x -> y -> y.cons(x.map(List::list).getOrElse(list()));  

Once you have this function, you can use it to fold the list to the right, producing a list of lists of values, with some elements being empty lists:

view plaincopy to clipboardprint?
list.foldRight(list(), f)  

All that’s left to do is to flatten the result. The complete method is as follows:

view plaincopy to clipboardprint?
public static <A> List<A> flattenResult(List<Result<A>> list) {  
  return flatten(list.foldRight(list(), x -> y ->  
                y.cons(x.map(List::list).getOrElse(list()))));  
  }  

Please note that this is not the most efficient way to do it. Take this mostly as an exercise.

Write a sequence function that combines a List<Result<T>> into a Result<List<T>>. It will be a Success<List<T>> if all values in the original list were Success instances, or a Failure<List<T>> otherwise. Here’s its signature:

view plaincopy to clipboardprint?
public static <A> Result<List<A>> sequence(List<Result<A>> list)  

Here’s the implementation using foldRight and map2:

view plaincopy to clipboardprint?
public static  Result> sequence(List> list) {  
  return list.foldRight(Result.success(List.list()),  
                  x -> y -> Result.map2(x, y, a -> b -> b.cons(a)));  
}  

Note that this implementation handles an empty Result as if it were a Failure and returns the first failing case it encounters, which can be a Failure or an Empty. This may or may not be what you need. To stick with the idea that Empty means optional data, you’d need to first filter the list to remove the Empty elements:

view plaincopy to clipboardprint?
public static  Result> sequence2(List> list) {  
  return list.filter(a -> a.isSuccess() || a.isFailure())  
      .foldRight(Result.success(List.list()),  
                 x -> y -> Result.map2(x, y, a -> b -> b.cons(a)));  
}  

Ultimately you should abstract the removal of empty elements into a separate method in the List class. But for the rest of this book, we’ll continue considering Empty as a Failure in the context of the sequence method.

Define a more generic traverse method that traverses a list of A while applying a function from A to Result and producing a Result>. Here’s its signature:

view plaincopy to clipboardprint?
public static  Result> traverse(List list,  
                                           Function> f)  

First define the traverse method:

view plaincopy to clipboardprint?
public static  Result> traverse(List list,  
                                              Function> f) {  
  return list.foldRight(Result.success(List.list()),  
      x -> y -> Result.map2(f.apply(x), y, a -> b -> b.cons(a)));  
}  

Then you can redefine the sequence method in terms of traverse:

view plaincopy to clipboardprint?
public static  Result> sequence(List> list) {  
  return traverse(list, x -> x);  
}  

Abstracting common List use cases
Many common use cases of the List data type deserve to be abstracted so you don’t have to repeat the same code again and again. You’ll regularly find yourself discovering new use cases that can be implemented by combining basic functions. You should never hesitate to incorporate these use cases as new functions in the List class. The following exercises show several of the most common use cases.

Zipping and unzipping lists
Zipping is the process of assembling two lists into one by combining the elements of the same index. Unzipping is the reverse procedure, consisting of making two lists out of one by “deconstructing” the elements, such as producing two lists of x and y coordinates from one list of points. Let's write a zipWith method that combines the elements of two lists of different types to produce a new list, given a function argument. Here’s the signature (Exercise 8.8):

view plaincopy to clipboardprint?
public static  List zipWith(List list1, List list2,  
                                        Function> f)  

This method takes a List and a List and produces a List with the help of a function from A to B to C (The zipping should be limited to the length of the shortest list.). For this exercise, you must use explicit recursion because recursion must be done on both lists simultaneously. You don’t have any abstraction at your disposal for this. Here’s the solution:

view plaincopy to clipboardprint?
public static  List zipWith(List list1, List list2, Function> f) {  
    return zipWith_(list(), list1, list2, f).eval().reverse();  
}  
  
private static  TailCall> zipWith_(List acc, List list1, List list2,  
        Function> f) {  
    return list1.isEmpty() || list2.isEmpty() ? ret(acc)  
            : sus(() -> zipWith_(new Cons<>(f.apply(list1.head()).apply(list2.head()), acc), list1.tail(),  
                    list2.tail(), f));  
}  

The zipWith_ helper method is called with an empty list as the starting accumulator. If one of the two argument lists is empty, recursion is stopped and the current accumulator is returned. Otherwise, a new value is computed by applying the function to the head value of both lists, and the helper function is called recursively with the tails of both argument lists.

The previous exercise consisted of creating a list by matching elements of both lists by their indexes. Write a product method that will produce a list of all possible combinations of elements taken from both lists (Exercise 8.9). In other words, given the two lists list("a", "b", "c") and list("d", "e", "f") and string concatenation, the product of the two lists should be List("ad", "ae", "af", "bd", "be", "bf", "cd", "ce", "cf").

The solution is similar to the comprehension pattern you used to compose Result in chapter 7. The only difference here is that it produces as many combinations as the product of the number of elements in the lists, whereas for combining Result, the number of combinations was always limited to one.

view plaincopy to clipboardprint?
public static  List product(List list1, List list2,  
                                        Function> f) {  
  return list1.flatMap(a -> list2.map(b -> f.apply(a).apply(b)));  
}  

Note that it’s possible to compose more than two lists this way. The only problem is that the number of combinations will grow exponentially. One of the common use cases for product and zipWith is to use a constructor for the combination function. Here’s an example using the Tuple constructor:

view plaincopy to clipboardprint?
List.product(List.list(1, 2, 3), List.list(4, 5, 6),  
                                    x -> y -> new Tuple<>(x, y));  
List.zipWith(List.list(1, 2, 3), List.list(4, 5, 6),  
                                    x -> y -> new Tuple<>(x, y));  

The first line will produce a list of all possible tuples constructed from the elements of both lists:

[(1,4), (1,5), (1,6), (2,4), (2,5), (2,6), (3,4), (3,5), (3,6), NIL]

The second line will only produce the list of tuples built from elements with the same index:

[(1,4), (2,5), (3,6), NIL]

Of course, you may use any constructor of any class. (Java objects are in fact tuples with special names.)

Now let's write an unzip static method to transform a list of tuples into a tuple of lists. Here’s its signature (Exercise 8.10):

view plaincopy to clipboardprint?
 Tuple, List> unzip(List> list)  

You need to foldRight the list using a tuple of two empty lists as the identity:

view plaincopy to clipboardprint?
public static  Tuple, List> unzip(List> list) {  
  return list.foldRight(new Tuple<>(list(), list()),  
               t -> tl -> new Tuple<>(tl._1.cons(t._1), tl._2.cons(t._2)));  
}  

Generalize the unzip function so it can transform a list of any type into a tuple of lists, given a function that takes an object of the list type as its argument, and produces a tuple. For example, given a list of Payment instances, you should be able to produce a tuple of lists: one containing the credit cards used to make the payments, and the other containing payment amounts. Implement this method as an instance method in List with the following signature: (Exercise 8.11)

view plaincopy to clipboardprint?
 Tuple, List> unzip(Function> f)  

One important thing is that the result of the function is to be used twice. In order not to apply the function twice, you must use a multiline lambda:

view plaincopy to clipboardprint?
public  Tuple, List> unzip(Function
                                                Tuple> f) {  
  return this.foldRight(new Tuple<>(list(), list()), a -> tl -> {  
    Tuple t = f.apply(a);  
    return new Tuple<>(tl._1.cons(t._1), tl._2.cons(t._2));  
  });  
}  

Accessing elements by their index
The singly linked list isn’t the best structure for indexed access to its elements, but sometimes it’s necessary to use indexed access. As usual, you should abstract such a procedure into List methods.

Let's write a getAt method that takes an index as its argument and returns the corresponding element. The method should not throw an exception in the case of the index being out of bounds (Exercise 8.12). This time, start with an explicitly recursive version. Then try to answer the following questions:

* Is it possible to do it with a fold? Right or left?
* Why is the explicit recursive version better?
* Can you see a way to solve the problem?

The explicitly recursive solution is easy:

view plaincopy to clipboardprint?
public Result getAt(int index) {  
    return index < 0 || index >= length()  
            ? Result.failure("Index out of bound")  
            : getAt_(this, index).eval();  
}  
  
private static  TailCall> getAt_(List list, int index) {  
    return index == 0  
              ? TailCall.ret(Result.success(list.head()))  
              : TailCall.sus(() -> getAt_(list.tail(), index - 1));  
}  

First, you can check the index to see if it’s positive and less than the list length. If it isn’t, just return a Failure. Otherwise, call the helper method to process the list recursively. This method checks whether the index is 0. If it is, it returns the head of the list. Otherwise, it calls itself recursively on the tail of the list with a decremented index.

This looks like the best possible recursive solution. Is it possible to use a fold? Yes, it is, and it should be a left fold. But the solution is tricky:

view plaincopy to clipboardprint?
public Result getAt(int index) {  
    Tuple, Integer> identity = new Tuple<>(Result.failure("Index out of bound"), index);  
    Tuple, Integer> rt = index < 0 || index >= length()  
            ? identity  
            : foldLeft(identity, ta -> a -> ta._2 < 0  
                    ? ta  
                    : new Tuple<>(Result.success(a), ta._2 - 1));  
    return rt._1;  
}  

First you have to define the identity value. Because this value must hold both the result and the index, it will be a Tuple holding the Failure case. Then you can check the index for validity. If it’s found invalid, make the temporary result (rt) equal to identity. Otherwise, fold to the left with a function returning either the already computed result (ta) if the index is less than 0, or a new Success otherwise. This solution might seem smarter, but it’s not, for three reasons:

* It’s far less legible. This may be subjective, so it’s up to you to decide.
* You have to use an intermediate result (rt) because Java can’t infer the right type. Try replacing rt with its value in the last line if you don’t believe me.
* It’s less efficient because it will continue folding the whole list even after it finds the searched-for value.

Let's find a solution that makes the fold-based version terminate as soon as the result is found (Exercise 8.13). You’ll need a special version of foldLeft for this, and also a special version of Tuple.

First, you need a special version of foldLeft in which you can escape the fold when the absorbing element (or “zero” element) of the folding operation is found. Think of a list of integers that you want to fold by multiplying them. The absorbing element for the multiplication is 0. Here’s the declaration of a short-circuiting (or escaping) version of foldLeft in the List class:

view plaincopy to clipboardprint?
public abstract  B foldLeft(B identity, B zero, Function> f);  

The zero element
It’s by analogy that the absorbing element of any operation is sometimes called “zero,” but remember that it’s not always equal to 0. The 0 value is only the absorbing element for multiplication. For the addition of positive integers, it would be infinity. And here’s the Cons implementation:

view plaincopy to clipboardprint?
@Override  
public  B foldLeft(B identity, B zero, Function> f) {  
    return foldLeft(identity, zero, this, f).eval();  
}  
  
private  TailCall foldLeft(B acc, B zero, List list, Function> f) {  
    return list.isEmpty() || acc.equals(zero)  
            ? ret(acc)  
            : sus(() -> foldLeft(f.apply(acc).apply(list.head()), zero, list.tail(), f));  
}  

As you can see, the only difference is that if the accumulator value is found to be “zero,” recursion is stopped and the accumulator is returned. Now you need a zero value for your fold. The zero value is a Tuple with the Integer value equal to -1 (the first value smaller than 0). Can you use a standard Tuple for this? No, you can’t, because it must have a special equals method, returning true when the integer values are equal, whatever the Result is. The complete method is as follows:

view plaincopy to clipboardprint?
public Result getAt(int index) {  
  
  class Tuple {  
  
    public final T _1;  
    public final U _2;  
  
    public Tuple(T t, U u) {  
      this._1 = Objects.requireNonNull(t);  
      this._2 = Objects.requireNonNull(u);  
    }  
  
    @Override  
    public boolean equals(Object o) {  
      if (!(o.getClass() == this.getClass()))  
        return false;  
      else {  
        @SuppressWarnings("rawtypes")  
        Tuple that = (Tuple) o;  
        return _2.equals(that._2);  
      }  
    }  
  }  
  
  Tuple, Integer> zero =  
              new Tuple<>(Result.failure("Index out of bound"), -1);  
  Tuple, Integer> identity =  
              new Tuple<>(Result.failure("Index out of bound"), index);  
  Tuple, Integer> rt = index < 0 || index >= length()  
      ? identity  
      : foldLeft(identity, zero, ta -> a -> ta._2 < 0  
                    ? ta  
                    : new Tuple<>(Result.success(a), ta._2 - 1));  
  return rt._1;  
}  

Note that I’ve omitted the hashCode and toString methods to make the code shorter. Now the fold will automatically stop as soon as the searched-for element is found. Of course, you can use the new foldLeft method for escaping any computation with a zero element. (Remember: zero, not 0.)

Splitting lists
Sometimes you need to split a list into two parts at a specific position. Although the singly linked list is far from ideal for this kind of operation, it’s relatively simple to implement. Splitting a list has several useful applications, among which is processing its parts in parallel using several threads.

Let's write a splitAt method that takes an int as its parameter and returns two lists by splitting the list at the given position. There shouldn’t be any IndexOutOfBoundExceptions. Instead, an index below 0 should be treated as 0, and an index above max should be treated as the maximum value for the index. (Exercise 8.14)

An explicitly recursive solution is easy to design:

view plaincopy to clipboardprint?
public Tuple<List<A>, List<A>> splitAt(int index) {  
      return index < 0  
          ? splitAt(0)  
          : index > length()  
              ? splitAt(length())  
              : splitAt(list(), this.reverse(), this.length() - index).eval();  
}  
  
private TailCall<Tuple<List<A>, List<A>>> splitAt(List<A> acc, List<A> list, int i) {  
    return i == 0 || list.isEmpty()  
            ? ret(new Tuple<>(list.reverse(), acc))  
            : sus(() -> splitAt(acc.cons(list.head()), list.tail(), i - 1));  
}  

Note that the first method uses recursion to adjust the value of the index. There’s no need for using TailCall, however, because this method will recurse at most once. The second method is very similar to the getAt method, with the difference that the list is first reversed. The method accumulates the elements until the index position is reached, so the accumulated list is in the correct order, but the remaining list has to be reversed back.

Searching for sublists
One common use case for lists is searching to find out whether a list is contained in another (longer) list. In other words, you want to know whether a list is a sublist of another list.

Let's implement a hasSubList method to check whether a list is a sublist of another. For example, the list (3, 4, 5) is a sublist of (1, 2, 3, 4, 5) but not of (1, 2, 4, 5, 6). Implement it as a static method with the following signature: (Exercise 8.16)

view plaincopy to clipboardprint?
public static <A> boolean hasSubsequence(List<A> list, List<A> sub)  

You’ll first have to implement a startsWith method to determine whether a list starts with a sublist. Once this is done, you’ll test this method recursively, starting from each element of the list. An explicitly recursive startsWith method can be implemented as follows:

view plaincopy to clipboardprint?
public static <A> Boolean startsWith(List<A> list, List<A> sub) {  
  return sub.isEmpty()  
      ? true  
      : list.isEmpty()  
          ? false  
          : list.head().equals(sub.head())  
              ? startsWith(list.tail(), sub.tail())  
              : false;  
}  

This is a stack-based version that can be transformed into a heap-based one using TailCall:

view plaincopy to clipboardprint?
public static  Boolean startsWith(List list, List sub) {  
    return startsWith_(list, sub).eval();  
}  
  
public static  TailCall startsWith_(List list, List sub) {  
    return sub.isEmpty()  
            ? ret(Boolean.TRUE)  
            : list.isEmpty()  
                ? ret(Boolean.FALSE)  
                : list.head().equals(sub.head())  
                    ? sus(() -> startsWith_(list.tail(), sub.tail()))  
                    : ret(Boolean.FALSE);  
}  

From there, implementing hasSubList is straightforward:

view plaincopy to clipboardprint?
public static  boolean hasSubList(List list, List sub) {  
    return hasSubList_(list, sub).eval();  
}  
  
public static  TailCall hasSubList_(List list, List sub){  
    return list.isEmpty()  
            ? ret(sub.isEmpty())  
            : startsWith(list, sub)  
                ? ret(true)  
                : sus(() -> hasSubList_(list.tail(), sub));  
}  

Miscellaneous functions for working with lists
Many other useful functions can be developed to work with lists. The following exercises will give you some practice in this domain. Note that the proposed solutions are certainly not the only ones. Feel free to invent your own.

Create a groupBy method taking a function from A to B as a parameter and returning a Map, where keys are the result of the function applied to each element of the list and values are lists of elements corresponding to each key. In other words, given a list of Payments such as these (Exercise 8.17):

view plaincopy to clipboardprint?
public class Payment {  
  
  public final String name;  
  public final int amount;  
  
  public Payment(String name, int amount) {  
    this.name = name;  
    this.amount = amount;  
  }  
}  

the following code should create a Map containing (key/value) pairs where each key is a name and the corresponding value is the list of Payments made by the corresponding person:

view plaincopy to clipboardprint?
Map> map = list.groupBy(x -> x.name);  

Here’s an imperative version. There’s not much to say about it, because it’s just traditional imperative code with a local mutable state:

view plaincopy to clipboardprint?
public  Map> groupByImperative(Function f) {  
    List workList = this;  
    Map> m = Map.empty();  
    while (!workList.isEmpty()) {  
        final B k = f.apply(workList.head());  
        List rt = m.get(k).getOrElse(()->List.list()).cons(workList.head());  
        m = m.put(k, rt);  
        workList = workList.tail();  
    }  
    return m;  
}  

Note that this implementation is perfectly functional because no state mutation is visible from outside the method. But the style is quite imperative, with a while loop and local variables. Here’s a version in a more functional style, using a fold:

view plaincopy to clipboardprint?
public  Map> groupBy(Function f) {  
    return foldRight(Map.empty(), t -> mt -> {  
        final B k = f.apply(t);  
        return mt.put(k, mt.get(k).getOrElse(()->list()).cons(t));  
    });  
}  

It’s up to you to choose the style you prefer. Obviously, the second version is more compact. But the main advantage is that it better expresses the intent. groupBy is a fold. Choosing the imperative style is re-implementing the fold, whereas choosing the functional style is reusing the abstraction.

Now let's write an unfold method that takes a starting element S and a function f from S to Result> and produces a List by successively applying f to the S value as long as the result is a Success. In other words, the following code should produce a list of integers from 0 to 9: (Exercise 8.18)

view plaincopy to clipboardprint?
List.unfold(0, i -> i < 10  
    ? Result.success(new Tuple<>(i, i + 1))  
    : Result.empty());  

A simple non-stack-safe recursive version is easy to implement:

view plaincopy to clipboardprint?
public static  List unfold_(S z, Function>> f) {  
    return f.apply(z).map(x ->  
                   unfold_(x._2, f).cons(x._1)).getOrElse(list());  
}  

Unfortunately, although this solution is smart, it will blow the stack for a little more than 1,000 recursion steps. To solve this problem, you can create a tail recursive version and use the TailCall class to make recursion happen on the heap:

view plaincopy to clipboardprint?
public static  List unfold(S z, Function>> f) {  
    return unfold(list(), z, f).eval().reverse();  
}  
  
private static  TailCall> unfold(List acc, S z, Function>> f) {  
    Result> r = f.apply(z);  
    Result>> result = r.map(rt -> sus(() -> unfold(acc.cons(rt._1), rt._2, f)));  
    return result.getOrElse(ret(acc));  
}  

Note, however, that this reverses the list. This might not be a big problem for small lists, but it could be for huge ones. In such cases, reverting to imperative style might be an option.

To continue, let's write a range method that takes two integers as its parameters and produces a list of all integers greater than or equal to the first and less than the second (Exercise 8.19). This is very simple if you reuse the method from exercise 8.18:

view plaincopy to clipboardprint?
public static List range(int start, int end) {v  
  return List.unfold(start, i -> i < end  
      ? Result.success(new Tuple<>(i, i + 1))  
      : Result.empty());  
}  

Create an exists method that takes a function from A to Boolean representing a condition, and that returns true if the list contains at least one element satisfying this condition. Don’t use explicit recursion, but try to build on the methods you’ve already defined (Exercise 8.20). A recursive solution could be defined as follows:

view plaincopy to clipboardprint?
public boolean exists(Function p) {  
  return p.apply(head()) || tail().exists(p);  
}  

Because the || operator lazily evaluates its second argument, the recursive process will stop as soon as an element is found that satisfies the condition expressed by the predicate p. But this is a non-tail-recursive stack-based method, and it will blow the stack if the list is long and no satisfying element is found in the first 1,000 or 2,000 elements. Incidentally, it will also throw an exception if the list is empty, so you’d have to define an abstract method in the List class with a specific implementation for the Nil subclass. A much better solution consists of reusing the foldLeft method with a zero parameter:

view plaincopy to clipboardprint?
public boolean exists(Function p) {  
  return foldLeft(false, true, x -> y -> x || p.apply(y))._1;  
}  

Finally, let's create a forAll method that takes a function from A to Boolean representing a condition, and that returns true if all the elements in the list satisfy this condition (Exercise 8.21). The solution is very close to the exists method with two differences: the identity and zero values are inverted, and the Boolean operator is && instead of ||:

view plaincopy to clipboardprint?
public boolean forAll(Function p) {  
  return foldLeft(true, false, x -> y -> x && p.apply(y))._1;  
}  

Note that another possibility is to reuse the exists method:

view plaincopy to clipboardprint?
public boolean forAll(Function p) {  
  return !exists(x -> !p.apply(x));  
}  

This methods checks whether an element exists that doesn’t meet the inverse of the condition.

程式扎記

標籤

2018年2月24日星期六

[ FP with Java ] Ch8 - Advanced list handling - Part1

沒有留言:

張貼留言

[Git 常見問題] error: The following untracked working tree files would be overwritten by merge

檢舉濫用情形

學習筆記

標籤

2018年2月24日 星期六