In typical imperative Python programs—including those that make use of classes and methods to hold their imperative code—a block of code generally consists of some outside loops (for or while), assignment of state variables within those loops, modification of data structures like dicts, lists, and sets (or various other structures, either from the standard library or from third-party packages), and some branch statements (if/elif/else or try/except/finally). All of this is both natural and seems at first easy to reason about. The problems often arise, however, precisely with those side effects that come with state variables and mutable data structures; they often model our concepts from the physical world of containers fairly well, but it is also difficult to reason accurately about what state data is in at a given point in a program.
One solution is to focus not on constructing a data collection but rather on describing “what” that data collection consists of. When one simply thinks, “Here’s some data, what do I need to do with it?” rather than the mechanism of constructing the data, more direct reasoning is often possible. The imperative flow control described in the last paragraph is much more about the “how” than the “what” and we can often shift the question.
Encapsulation
One obvious way of focusing more on “what” than “how” is simply to refactor code, and to put the data construction in a more isolated place—i.e., in a function or method. For example, consider an existing snippet of imperative code that looks like this:
- # configure the data to start with
- collection = get_initial_state()
- state_var = None
- for datum in data_set:
- if condition(state_var):
- state_var = calculate_from(datum)
- new = modify(datum, state_var)
- collection.add_to(new)
- else:
- new = modify_differently(datum)
- collection.add_to(new)
- # Now actually work with the data
- for thing in collection:
- process(thing)
- # tuck away construction of data
- def make_collection(data_set):
- collection = get_initial_state()
- state_var = None
- for datum in data_set:
- if condition(state_var):
- state_var = calculate_from(datum, state_var)
- new = modify(datum, state_var)
- collection.add_to(new)
- else:
- new = modify_differently(datum)
- collection.add_to(new)
- return collection
- # Now actually work with the data
- for thing in make_collection(data_set):
- process(thing)
Comprehensions
Using comprehensions is often a way both to make code more compact and to shift our focus from the “how” to the “what.” A comprehension is an expression that uses the same keywords as loop and conditional blocks, but inverts their order to focus on the data rather than on the procedure. Simply changing the form of expression can often make a surprisingly large difference in how we reason about code and how easy it is to understand. The ternary operator also performs a similar restructuring of our focus, using the same keywords in a different order. For example, if our original code was:
- collection = list()
- for datum in data_set:
- if condition(datum):
- collection.append(datum)
- else:
- new = modify(datum)
- collection.append(new)
- collection = [d if condition(d) else modify(d) for d in data_set]
List comprehensions have been in Python the longest, and are in some ways the simplest. We now also have generator comprehensions, set comprehensions, and dict comprehensions available in Python syntax. As a caveat though, while you can nest comprehensions to arbitrary depth, past a fairly simple level they tend to stop clarifying and start obscuring. For genuinely complex construction of a data collection, refactoring into functions remains more readable.
Generators
Generator comprehensions have the same syntax as list comprehensions—other than that there are no square brackets around them (but parentheses are needed syntactically in some contexts, in place of brackets)—but they are also lazy. That is to say that they are merely a description of “how to get the data” that is not realized until one explicitly asks for it, either by calling .next() on the object, or by looping over it. This often saves memory for large sequences and defers computation until it is actually needed. For example:
- log_lines = (line for line in read_line(huge_log_file) if complex_condition(line))
- def get_log_lines(log_file):
- line = read_line(log_file)
- while True:
- try:
- if complex_condition(line):
- yield line
- line = read_line(log_file)
- except StopIteration:
- raise
- log_lines = get_log_lines(huge_log_file)
- class GetLogLines(object):
- def __init__(self, log_file):
- self.log_file = log_file
- self.line = None
- def __iter__(self):
- return self
- def __next__(self):
- if self.line is None:
- self.line = read_line(log_file)
- while not complex_condition(self.line):
- self.line = read_line(self.log_file)
- return self.line
- log_lines = GetLogLines(huge_log_file)
Dicts and Sets
In the same fashion that lists can be created in comprehensions rather than by creating an empty list, looping, and repeatedly calling .append(), dictionaries and sets can be created “all at once” rather than by repeatedly calling .update() or .add() in a loop. For example:
The imperative versions of these comprehensions would look very similar to the examples shown earlier for other built-in datatypes.
Recursion
Functional programmers often put weight in expressing flow control through recursion rather than through loops. Done this way, we can avoid altering the state of any variables or data structures within an algorithm, and more importantly get more at the “what” than the “how” of a computation. However, in considering using recursive styles we should distinguish between the cases where recursion is just “iteration by another name” and those where a problem can readily be partitioned into smaller problems, each approached in a similar way.
There are two reasons why we should make the distinction mentioned. On the one hand, using recursion effectively as a way of marching through a sequence of elements is, while possible, really not “Pythonic.” It matches the style of other languages like Lisp, definitely, but it often feels contrived in Python. On the other hand, Python is simply comparatively slow at recursion, and has a limited stack depth limit. Yes, you can change this with [b]sys.setrecursionlimit() to more than the default 1000; but if you find yourself doing so it is probably a mistake[/b]. Python lacks an internal feature called tail call elimination that makes deep recursion computationally efficient in some languages. Let us find a trivial example where recursion is really just a kind of iteration:
- def running_sum(numbers, start=0):
- if len(numbers) == 0:
- print()
- return
- total = numbers[0] + start
- print(total, end=" ")
- running_sum(numbers[1:], total)
- def factorialR(N):
- "Recursive factorial function"
- assert isinstance(N, int) and N >= 1
- return 1 if N <= 1 else N * factorialR(N-1)
- def factorialI(N):
- "Iterative factorial function"
- assert isinstance(N, int) and N >= 1
- product = 1
- while N >= 1:
- product *= N
- N -= 1
- return product
As a footnote, the fastest version I know of for factorial() in Python is in a functional programming style, and also expresses the “what” of the algorithm well once some higher-order functions are familiar:
- from functools import reduce
- from operator import mul
- def factorialHOF(n):
- return reduce(mul, range(1, n+1), 1)
- def quicksort(lst):
- "Quicksort over a list-like sequence"
- if len(lst) == 0:
- return lst
- pivot = lst[0]
- pivots = [x for x in lst if x == pivot]
- small = quicksort([x for x in lst if x < pivot])
- large = quicksort([x for x in lst if x > pivot])
- return small + pivots + large
As general advice, it is good practice to look for possibilities of recursive expression—and especially for versions that avoid the need for state variables or mutable data collections—whenever a problem looks partitionable into smaller problems. It is not a good idea in Python—most of the time—to use recursion merely for “iteration by other means.”
Eliminating Loops
Just for fun, let us take a quick look at how we could take out all loops from any Python program. Most of the time this is a bad idea, both for readability and performance, but it is worth looking at how simple it is to do in a systematic fashion as background to contemplate those cases where it is actually a good idea. If we simply call a function inside a for loop, the built-in higherorder function map() comes to our aid:
- for e in it: # statement-based loop
- func(e)
- map(func, it) # map()-based "loop"
- # let f1, f2, f3 (etc) be functions that perform actions
- # an execution utility function
- do_it = lambda f, *args: f(*args)
- # map()-based action sequence
- map(do_it, [f1, f2, f3])
Of course, looking at the example, one suspects the result one really wants is actually to pass all the arguments to each of the functions rather than one argument from each list to each function. Expressing that is difficult without using a list comprehension, but easy enough using one:
In general, the whole of our main program could, in principle, be a map() expression with an iterable of functions to execute to complete the program.
Translating while is slightly more complicated, but is possible to do directly using recursion:
- # statement-based while loop
- while
: -
- if
: - break
- else:
-
- # FP-style recursive while loop
- def while_block():
-
- if
: - return 1
- else:
-
- return 0
- while_FP = lambda: (
and while_block()) or while_FP() - while_FP()
It is hard for
- # imperative version of "echo()"
- def echo_IMP():
- while 1:
- x = input("IMP -- ")
- if x == 'quit':
- break
- else:
- print(x)
- echo_IMP()
- # FP version of "echo()"
- def identity_print(x): # "identity with side-effect"
- print(x)
- return x
- echo_FP = lambda: identity_print(input("FP -- "))=='quit' or echo_FP()
- echo_FP()
Eliminating Recursion
As with the simple factorial example given above, sometimes we can perform “recursion without recursion” by using functools.reduce() or other folding operations (other “folds” are not in the Python standard library, but can easily be constructed and/or occur in third-party libraries). A recursion is often simply a way of combining something simpler with an accumulated intermediate result, and that is exactly what reduce() does at heart. A slightly longer discussion of functools.reduce() occurs in the chapter on higher-order functions.
沒有留言:
張貼留言