程式扎記: [ FP with Java ] Ch9 - Working with laziness

Preface
This chapter covers

* Understanding the importance of laziness
* Implementing laziness in Java
* Creating a lazy list data structure: the Stream
* Optimizing lazy lists by memoizing evaluated values
* Handling infinite streams

Some languages are said to be lazy, while others are not. Does this mean that some languages work harder than others? Not at all. Laziness is opposed to strictness. It has nothing to do with how hard a language can work, although you could sometimes think of lazy languages as languages that don’t require the programmer to work as hard as they must with strict ones.

Laziness, as you’ll see, has many advantages for some specific problems, such as composing infinite data structures and evaluating error conditions.

Understanding strictness and laziness
When applied to method arguments, strictness means that arguments are evaluated as soon as they’re received by the method. Laziness means that arguments are evaluated only when they’re needed. Of course, strictness and laziness apply not only to method arguments, but to everything. For example, consider the following declaration:

view plaincopy to clipboardprint?
int x = 2 + 3;  

Here, x is immediately evaluated to 5 because Java is a strict language; it performs the addition immediately. Let’s look at another example:

view plaincopy to clipboardprint?
int x = getValue();  

In Java, as soon as the x variable is declared, the getValue method is called to provide the corresponding value. On the other hand, with a lazy language, the getValue method is only called if and when the x variable is to be used. This can make a huge difference. For example, look at the following Java program:

view plaincopy to clipboardprint?
public static void main(String... args) {  
  int x = getValue();  
}  
  
public static int getValue() {  
  System.out.println("Returning 5");  
  return 5;  
}  

This program will print Returning 5 on the console because the getValue method will be called, although the returned value will never be used. In a lazy language, nothing would be evaluated, so nothing would be printed on the console.

Java is a strict language
Java, in principle, has no option concerning laziness. Java is strict. Everything is evaluated immediately. Method arguments are said to be passed by value, which means first they’re evaluated, and then the evaluated value is passed. On the other hand, in lazy languages, arguments are said to be passed by name, which means unevaluated. Don’t be confused by the fact that method arguments in Java are often references. References are addresses, and these addresses are passed by value.

Some languages are strict (like Java); others are lazy; some are strict by default and are optionally lazy; and others are lazy by default and are optionally strict. Java, however, isn’t always strict. These are some lazy constructs in Java:

* Boolean operators || and &&
* Ternary operator ?:
* if ... else
* for loop
* while loop
* Java 8 streams

If you think about it, you’ll soon realize that not much could be done if Java weren’t sometimes lazy. Can you imagine an if ... else structure where both branches were systematically evaluated? Or can you imagine a loop from which it was impossible to escape? All languages have to be lazy sometimes. This said, standard Java is often not lazy enough for functional programming.

The problem with strictness
Strictness is so fundamental in languages like Java that it’s seen by many programmers as the only possibility for evaluating expressions, even if, in reality, nothing would be possible with a totally strict language. Moreover, Java’s documentation doesn’t use the words non-strict or lazy when describing lazy constructs. For example, the Boolean operators || and && aren’t called lazy, but short-circuiting. But the simple reality is that these operators are non-strict regarding their arguments. We can easily show how this is different from a “strict” evaluation of method arguments.

Imagine that you wanted to simulate Boolean operators with a function. The following listing shows what you could do.
- Listing 9.1. The and and or logical methods

view plaincopy to clipboardprint?
public class BooleanMethods {  
  
  public static void main(String[] args) {  
    System.out.println(or(true, true));  
    System.out.println(or(true, false));  
    System.out.println(or(false, true));  
    System.out.println(or(false, false));  
  
    System.out.println(and(true, true));  
    System.out.println(and(true, false));  
    System.out.println(and(false, true));  
    System.out.println(and(false, false));  
  }  
  
  public static boolean or(boolean a, boolean b) {  
    return a ? true : b ? true : false;  
  }  
  
  public static boolean and(boolean a, boolean b) {  
    return a ? b ? true : false : false;  
  }  
}  

There are, of course, simpler ways to do this using the Boolean operators, but your goal here is to avoid these operators. Are you done? Running this program will display the following result on the console:

true
true
true
false
true
false
false
false

So far, so good. But now try running the following program.
- Listing 9.2. The problem with strictness

view plaincopy to clipboardprint?
public class BooleanMethods {  
  
  public static void main(String[] args) {  
    System.out.println(getFirst() || getSecond());  
    System.out.println(or(getFirst(), getSecond()));  
  }  
  
  public static boolean getFirst() {  
    return true;  
  }  
  
  public static boolean getSecond() {  
    throw new IllegalStateException();  
  }  
  
  public static boolean or(boolean a, boolean b) {  
    return a ? true : b ? true : false;  
  }  
  
  public static boolean and(boolean a, boolean b) {  
    return a ? b ? true : false : false;  
  }  
}  

This programs prints the following:

true
Exception in thread "main" java.lang.IllegalStateException

Obviously, the or method isn’t equivalent to the || operator. The difference is that || evaluates its operand lazily, which means the second operand isn’t evaluated if the first one is true, because it’s not necessary for computing the result. But the or method evaluates its arguments strictly, which means that the second argument is evaluated even if its value isn’t needed, so the IllegalStateException is always thrown. In chapters 6 and 7 you encountered this problem with the getOrElse method because its argument was always evaluated, even if the computation was successful.

Implementing laziness
Laziness is necessary on many occasions. Java does in fact use laziness for constructs like if ... else, loops, and try ... catch blocks. Without laziness, a catch block, for example, would be evaluated even in the absence of an exception. Implementing laziness is a must when it comes to providing behavior for errors, as well as when you need to manipulate infinite data structures. Implementing laziness in Java isn’t fully possible, but you can produce a good approximation using the Supplier class you used in previous chapters:

view plaincopy to clipboardprint?
public interface Supplier<T> {  
  T get();  
}  

Note that you created your own class, but Java 8 also offers a Supplier class. Which one you use is up to you. They are completely equivalent. Using the Supplier class, you can rewrite the BooleanMethods example as follows.
- Listing 9.3. Using laziness to emulate Boolean operators

view plaincopy to clipboardprint?
public class BooleanMethods {  
  
  public static void main(String[] args) {  
    System.out.println(getFirst() || getSecond());  
    System.out.println(or(() -> getFirst(), () -> getSecond()));  
  }  
  
  public static boolean getFirst() {  
    return true;  
  }  
  
  public static boolean getSecond() {  
    throw new IllegalStateException();  
  }  
  
  public static boolean or(Supplier<Boolean> a, Supplier<Boolean> b) {  
    return a.get() ? true : b.get() ? true : false;  
  }  
  
  public static boolean and(Supplier<Boolean> a, Supplier<Boolean> b) {  
    return a.get() ? b.get() ? true : false : false;  
  }  
}  

This programs prints out the following:

true
true

The problem of laziness is nearly solved, although you’ve been forced to change the signature of your method. This is a low price to pay for using laziness. Of course, it might be overkill if the parameters are very quick to evaluate, or if they’re already evaluated, such as when using literal values. But it may save a great deal of time when evaluation requires a long computation. And if that evaluation isn’t free of side effects, it may completely change the outcome of the program.

Things you can’t do without laziness
So far, it may seem that the absence of laziness in evaluating expressions in Java isn’t a big deal. After all, why should you bother rewriting Boolean methods when you can use Boolean operators? There are, however, other cases where laziness would be useful. There are even several algorithms that can’t be implemented without resorting to laziness. I’ve already talked about how useless a strict version of if ... else would be. Think about the following algorithm:

1. Take the list of positive integers.
2. Filter the primes.
3. Return the list of the first ten results.

This is an algorithm for finding the first ten primes, but this algorithm can’t be implemented without laziness. If you don’t believe me, just try it. Start with the first line. If you’re strict, you’ll first evaluate the list of positive integers. You’ll never have the opportunity to go to the second line, because the list of integers is infinite, and you’ll exhaust available memory before reaching the (nonexistent) end. Clearly, this algorithm can’t be implemented without laziness, but you know how to replace it with a different algorithm. The preceding algorithm was functional. If you want to find the result without resorting to laziness, you’ll have to replace it with an imperative algorithm, like this:

1. Take the first integer.
2. Check whether it’s a prime.
3. If it is, store it in a list.
4. Check whether this resulting list has ten elements.
5. If it has ten elements, return it as the result.
6. If not, increment the integer by 1.
7. Go to line 2.

Sure, it works. But what a mess! First, it’s a bad recipe. Shouldn’t you increment the tested integer by 2 rather than by 1, in order to not test even numbers? And why test multiples of 3, 5, and so on? But more importantly, it doesn’t express the nature of the problem. It’s only a recipe for computing the result. This isn’t to say that the implementation details (such as not testing even numbers) aren’t important for getting good performance. But these implementation details should be clearly separated from the problem definition. The imperative description isn’t a description of the problem—it’s a description of another problem giving the same result.

In functional programming, you generally solve this kind of problem with a special structure: the lazy list, called Stream.

Why not use the Java 8 Stream?
Java 8 introduced a new structure called Stream. Can you use it for this type of computation? Well, you could, but there are several reasons not to do this:

* Defining your own structure is far more rewarding. In doing so, you’ll learn and understand many things that you wouldn’t even have thought of if you were using Java 8 streams.
* Java streams are a very powerful tool, but not the tool you need. Java 8 streams were designed with the idea of automatic parallelization in mind. To allow for automatic parallelization, many compromises were made. Many functional methods are missing because they would have made automatic parallelization more difficult.
* Java 8 streams are stateful. Once they’ve been used for some operations, they will have changed their state and are no longer usable.
* Folding Java 8 streams is a strict operation that causes the evaluation of all elements.

For all these reasons, you’ll define your own streams in this chapter. After you’ve finished this chapter, you may prefer to use the Java 8 streams, but you’ll do so fully understanding what’s missing in the Java 8 implementation.

Creating a lazy list data structure
Now that you know how to represent non-evaluated data as instances of Supplier, you can easily define a lazy list data structure. It will be called Stream and will be very similar to the singly linked list you developed in chapter 5, with some subtle but very important differences. The following listing shows the starting point of your Stream data type.
- Listing 9.4. The Stream data type

view plaincopy to clipboardprint?
package fp.stream;  
  
import fp.compose.ch3.Supplier;  
  
public abstract class Stream<A> {  
    @SuppressWarnings("rawtypes")  
    private static Stream EMPTY = new Empty();  
    public abstract A head();  
    public abstract Stream<A> tail();  
    public abstract Boolean isEmpty();  
    private Stream(){}  
      
    private static class Empty<A> extends Stream<A>{  
        @Override  
        public Stream<A> tail(){ throw new IllegalStateException("tail called on empty"); }  
        @Override  
        public A head(){ throw new IllegalStateException("head called on empty"); }  
        @Override  
        public Boolean isEmpty(){ return true; }  
    }  
      
    private static class Cons<A> extends Stream<A>{  
        private final Supplier<A> head;  
        private final Supplier<Stream<A>> tail;  
          
        private Cons(Supplier<A> h, Supplier<Stream<A>> t)  
        {  
            head = h; tail = t;  
        }  
          
        @Override  
        public A head(){ return head.get(); }  
        @Override  
        public Stream<A> tail(){ return tail.get(); }  
        @Override  
        public Boolean isEmpty(){ return false; }  
    }  
      
    static <A> Stream<A> cons(Supplier<A> hd, Supplier<Stream<A>> tl){ return new Cons<>(hd, tl); }  
    @SuppressWarnings("unchecked")  
    public static <A> Stream<A> empty(){ return EMPTY; }  
    public static Stream<Integer> from(int i){ return cons(()->i, ()->from(i + 1));}  
}  

Here’s an example of how to use this Stream type:

view plaincopy to clipboardprint?
Stream<Integer> stream = Stream.from(1);  
System.out.println(stream.head());  
System.out.println(stream.tail().head());  
System.out.println(stream.tail().tail().head());  

This probably doesn’t seem very useful. To make Stream a valuable tool, you’ll need to add some methods to it. But first you must optimize it slightly.

Memoizing evaluated values
The idea behind laziness is that you can save time by evaluating data only when it’s needed. This implies that you must evaluate data when it’s first accessed. But reevaluating it on subsequent accesses is a waste of time. Because you’re writing functional programs, multiple evaluation won’t harm anything, but it will slow the program. One solution is to memoize the evaluated value.

To do this, you’ll have to add fields for evaluated values in the Cons class:

view plaincopy to clipboardprint?
private final Supplier<A> head;  
private A h;  
private final Supplier<Stream<A>> tail;  
private Stream<A> t;  

Then change the getters as follows:

view plaincopy to clipboardprint?
public A head() {  
  if (h == null) {  
    h = head.get();  
  }  
  return h;  
}  
  
public Stream<A> tail() {  
  if (t == null) {  
    t = tail.get();  
  }  
  return t;  
}  

This well-known technique isn’t specific to functional programming. It’s sometimes called evaluation on demand, or evaluation as needed, or lazy evaluation. When the value is asked for the first time, the evaluated field is null, so the value is evaluated. On subsequent access, the value won’t be evaluated again, and the previously evaluated value will be returned.

Some languages offer lazy evaluation as a standard feature, whether by default or optionally. With such languages, you don’t need to resort to null references and mutable fields. Unfortunately, Java isn’t one of these languages. In Java, the most frequent approach when a value is to be initialized later is to first assign it the null reference if it’s an object type, or a sentinel value if it’s a primitive. This is risky because there’s no guarantee that the value will indeed be initialized to a significant value when needed. A null reference will probably cause a NullPointerException to be thrown, which at least will be noticed if exception handling has been implemented correctly, but a zero value could be an acceptable business value, leading to a program silently using this acceptable but incorrect value.

Alternatively, you could use a Result<A> to represent the value. This would avoid the use of the null reference, but you’d still have to use mutable fields. Because all this stuff is private, it’s acceptable to use null. But if you prefer, you can use a Result (or an Option) to represent the h and t fields.

Note that although the h and t fields must be mutable, they don’t need synchronization. The worst thing that may happen is that one thread will test the field and find it null, and then a second thread might also test the field before it has been initialized by the first one. The end result is that the field will have been initialized twice with potentially different (although equal) values. By itself, this isn’t a big problem; writing references is atomic, so the data can’t be corrupted. However, this could cause two instances of the corresponding object to coexist in memory. This won’t be a problem if you only test objects for equality, but it could be if you test them for identity (which, of course, you never do).

Also note that it’s possible to completely avoid null references and mutable fields at the cost of slight modifications in other places. Try to figure out how to do this. If you don’t know how, keep this idea in mind. We’ll come back to it near the end of this chapter. Now let's write a headOption method that returns the evaluated head of the stream. This method will be declared in the Stream parent class with the following signature (Exercise 9.1):

view plaincopy to clipboardprint?
public abstract Result<A> headOption();  

The Empty implementation returns an empty Result:

view plaincopy to clipboardprint?
@Override  
public Result<A> headOption() {  
    return Result.empty();  
}  

The Cons implementation returns a Success of the evaluated head:

view plaincopy to clipboardprint?
@Override  
public Result<A> headOption() {  
    return Result.success(head());  
}  

Manipulating streams
In the remainder of this chapter, you’ll learn how to compose streams while making the most of the fact that the data is unevaluated. But in order to look at the streams, you’ll need a method to evaluate them. Evaluating all the elements of a stream can be done by converting it to a List. Or you can process a stream by evaluating only the first n elements, or by evaluating elements as long as a condition is met.

Firstly, let's create a toList method to convert a Stream into a List (Exercise 9.2). A recursive version will simply cons the head of the stream to the result of the toList method applied to the tail. Of course, you’ll need to make this process tail recursive in order to use TailCall to get a stack-safe implementation:

view plaincopy to clipboardprint?
public List<A> toList() {  
    return toList(this, List.list()).eval().reverse();  
}  
  
private TailCall<List<A>> toList(Stream<A> s, List<A> acc) {  
    return s.isEmpty()  
            ? ret(acc)  
            : sus(() -> toList(s.tail(), acc.cons(s.head())));  
}  

Beware that calling toList on an infinite stream, such as the stream created by Stream.from(1), will create an infinite list. Unlike the stream, the list is eagerly evaluated, so it will result, in theory, in a never-ending program. (In reality, it will end with an OutOfMemoryError.) Be sure to create a condition that will truncate the list before running the program, as you’ll see in the next exercise.

To continue, let's write a take(n) method that returns the first n elements of a stream, and a drop(n) method that returns the remaining stream after removing the first n elements. Note that you have to ensure that no evaluation occurs while calling these methods. Here are the signatures in the Stream parent class (Exercise 9.3):

view plaincopy to clipboardprint?
public abstract Stream<A> take(int n);  
public abstract Stream<A> drop(int n);  

Both implementations in the Empty class return this. For the take method in the Cons class, you need to create a new Stream<A> by calling the cons method with the non-evaluated head of the stream (which means a reference to the head field and not a call to the head() method) and making a recursive call to take(n - 1) on the tail of the stream until n == 1. The drop method is even simpler. You just have to call drop(n – 1) recursively on the tail while n > 0. Note that the take method doesn’t need to be made stack-safe, because the recursive call to take is already lazy:

view plaincopy to clipboardprint?
@Override  
public Stream<A> take(int n){  
    return n <= 0  
              ? empty()  
              : cons(head, () -> tail().take(n - 1));  
}  

The take method allows you to work on an infinite stream by truncating it after a number of elements. Beware, however, that this method must be called on the stream before converting it to a list:

view plaincopy to clipboardprint?
List<Integer> list = Stream.from(1).take(10).toList();  

Calling the equivalent method on the resulting list will instead hang until memory is exhausted, causing an OutOfMemoryError:

view plaincopy to clipboardprint?
List<Integer> list = Stream.from(1).toList().takeAtMost(10);  

By contrast, the drop method must be made stack-safe:

view plaincopy to clipboardprint?
@Override  
public Stream<A> drop(int n) {  
    return drop(this, n).eval();  
}  
  
public TailCall<Stream<A>> drop(Stream<A> acc, int n) {  
    return n <= 0   
            ? ret(acc)   
            : sus(() -> drop(acc.tail(), n - 1));  
}  

Write a takeWhile method that will return a Stream containing all starting elements as long as a condition is matched. Here’s the method signature in the Stream parent class (Exercise 9.4):

view plaincopy to clipboardprint?
public abstract Stream<A> takeWhile(Function<A, Boolean> p)  

This method is very similar to the take method. The main difference is that the terminal condition is no longer n <= 0 but the provided function returning false:

view plaincopy to clipboardprint?
public Stream<A> takeWhile(Function<A, Boolean> f) {  
    return f.apply(head())  
            ? cons(head, () -> tail().takeWhile(f))  
            : empty();  
}  

Once again, you don’t need to make the method stack-safe because the recursive call is unevaluated. The Empty implementation returns this.

Write a dropWhile method that returns a stream with the front elements removed as long as they satisfy a condition. Here’s the signature in the Stream parent class (Exercise 9.5):

view plaincopy to clipboardprint?
public Stream<A> dropWhile(Function<A, Boolean> p);  

As in previous recursive methods, the solution will include a main method calling a stack-safe recursive helper method and evaluating its result:

view plaincopy to clipboardprint?
public Stream<A> dropWhile(Function<A, Boolean> p) {  
    return dropWhile(this, p).eval();  
}  
  
private TailCall<Stream<A>> dropWhile(Stream<A> acc, Function<A, Boolean> p) {  
    return acc.isEmpty()   
            ? ret(acc)   
            : p.apply(acc.head())   
                ? sus(() -> dropWhile(acc.tail(), p))   
                : ret(acc);  
}  

Because this method uses a helper method, it can be implemented in the Stream parent class.

程式扎記

標籤

2018年3月8日星期四

[ FP with Java ] Ch9 - Working with laziness - Part1

沒有留言:

張貼留言

[Git 常見問題] error: The following untracked working tree files would be overwritten by merge

檢舉濫用情形

學習筆記

標籤

2018年3月8日 星期四