程式扎記: [ FP In Python ] Ch2. Callables

Preface
The emphasis in functional programming is, somewhat tautologously, on calling functions. Python actually gives us several different ways to create functions, or at least something very function-like (i.e., that can be called). They are:

• Regular functions created with def and given a name at definition time
• Anonymous functions created with lambda
• Instances of classes that define a __call__ method
• Closures returned by function factories
• Static methods of instances, either via the @staticmethod decorator or via the class __dict__
• Generator functions

This list is probably not exhaustive, but it gives a sense of the numerous slightly different ways one can create something callable. Of course, a plain method of a class instance is also a callable, but one generally uses those where the emphasis is on accessing and modifying mutable state. Python is a multiple paradigm language, but it has an emphasis on object-oriented styles. When one defines a class, it is generally to generate instances meant as containers for data that change as one calls methods of the class. This style is in some ways opposite to a functional programming approach, which emphasizes immutability and pure functions.

Any method that accesses the state of an instance (in any degree) to determine what result to return is not a pure function. Of course, all the other types of callables we discuss also allow reliance on state in various ways. The author of this report has long pondered whether he could use some dark magic within Python explicitly to declare a function as pure—say by decorating it with a hypothetical @purefunction decorator that would raise an exception if the function can have side effects—but consensus seems to be that it would be impossible to guard against every edge case in Python’s internal machinery.

The advantage of a pure function and side-effect-free code is that it is generally easier to debug and test. Callables that freely intersperse statefulness with their returned results cannot be examined independently of their running context to see how they behave, at least not entirely so. For example, a unit test (using doctest or unittest, or some third-party testing framework such as py.test or nose) might succeed in one context but fail when identical calls are made within a running, stateful program. Of course, at the very least, any program that does anything must have some kind of output (whether to console, a file, a database, over the network, or whatever) in it to do anything useful, so side effects cannot be entirely eliminated, only isolated to a degree when thinking in functional programming terms.

Named Functions and Lambdas
The most obvious ways to create callables in Python are, in definite order of obviousness, named functions and lambdas. The only in-principle difference between them is simply whether they have a .__qualname__ attribute, since both can very well be bound to one or more names. In most cases, lambda expressions are used within Python only for callbacks and other uses where a simple action is inlined into a function call. But as we have shown in this report, flow control in general can be incorporated into single-expression lambdas if we really want. Let’s define a simple example to illustrate:

view plaincopy to clipboardprint?
>>> def hello1(name):  
...     print("Hello {}".format(name))  
...  
>>> hello2 = lambda name: print("Hello {}".format(name))  
>>> hello1('David')  
Hello David  
>>> hello2('Davoid')  
Hello Davoid  
>>> hello1.__qualname__  
'hello1'  
>>> hello2.__qualname__  
''  
>>> hello3 = hello2  
>>> hello3.__qualname__  
''  
>>> hello3.__qualname__ = 'hello3'  
>>> hello3.__qualname__  
'hello3'  

One of the reasons that functions are useful is that they isolate state lexically, and avoid contamination of enclosing namespaces. This is a limited form of nonmutability in that (by default) nothing you do within a function will bind state variables outside the function. Of course, this guarantee is very limited in that both the global and nonlocal statements explicitly allow state to “leak out” of a function. Moreover, many data types are themselves mutable, so if they are passed into a function that function might change their contents. Furthermore, doing I/O can also change the “state of the world” and hence alter results of functions (e.g., by changing the contents of a file or a database that is itself read elsewhere).

Notwithstanding all the caveats and limits mentioned above, a programmer who wants to focus on a functional programming style can intentionally decide to write many functions as pure functions to allow mathematical and formal reasoning about them. In most cases, one only leaks state intentionally, and creating a certain subset of all your functionality as pure functions allows for cleaner code. They might perhaps be broken up by “pure” modules, or annotated in the function names or docstrings.

Closures and Callable Instances
There is a saying in computer science that a class is “data with operations attached” while a closure is “operations with data attached.” In some sense they accomplish much the same thing of putting logic and data in the same object. But there is definitely a philosophical difference in the approaches, with classes emphasizing mutable or rebindable state, and closures emphasizing immutability and pure functions. Neither side of this divide is absolute—at least in Python—but different attitudes motivate the use of each.

Let us construct a toy example that shows this, something just past a “hello world” of the different styles:

view plaincopy to clipboardprint?
# A class that creates callable adder instances  
class Adder(object):  
  def __init__(self, n):  
    self.n = n  
  def __call__(self, m):  
    return self.n + m  
  
add5_i = Adder(5) # "instance" or "imperative"  

We have constructed something callable that adds five to an argument passed in. Seems simple and mathematical enough. Let us also try it as a closure:

view plaincopy to clipboardprint?
def make_adder(n):  
  def adder(m):  
    return m + n  
  return adder  
  
add5_f = make_adder(5) # "functional"  

So far these seem to amount to pretty much the same thing, but the mutable state in the instance provides a attractive nuisance:

view plaincopy to clipboardprint?
>>> add5_i(10)  
15  
>>> add5_f(10) # only argument affects result  
15  
>>> add5_i.n = 10 # state is readily changeable  
>>> add5_i(10) # result is dependent on prior flow  
20  

The behavior of an “adder” created by either Adder() or make_adder() is, of course, not determined until runtime in general. But once the object exists, the closure behaves in a pure functional way, while the class instance remains state dependent. One might simply settle for “don’t change that state”—and indeed that is possible (if no one else with poorer understanding imports and uses your code)—but one is accustomed to changing the state of instances, and a style that prevents abuse programmatically encourages better habits.

There is a little “gotcha” about how Python binds variables in closures. It does so by name rather than value, and that can cause confusion, but also has an easy solution. For example, what if we want to manufacture several related closures encapsulating different data:

view plaincopy to clipboardprint?
# almost surely not the behavior we intended!  
>>> adders = []  
>>> for n in range(5):  
    adders.append(lambda m: m+n)  
>>> [adder(10) for adder in adders]  
[14, 14, 14, 14, 14]  
>>> n = 10  
>>> [adder(10) for adder in adders]  
[20, 20, 20, 20, 20]  

Fortunately, a small change brings behavior that probably better meets our goal:

view plaincopy to clipboardprint?
>>> adders = []  
>>> for n in range(5):  
.... adders.append(lambda m, n=n: m+n)  
....  
>>> [adder(10) for adder in adders]  
[10, 11, 12, 13, 14]  
>>> n = 10  
>>> [adder(10) for adder in adders]  
[10, 11, 12, 13, 14]  
>>> add4 = adders[4]  
>>> add4(10, 100) # Can override the bound value  
110  

Notice that using the keyword argument scope-binding trick allows you to change the closed-over value; but this poses much less of a danger for confusion than in the class instance. The overriding value for the named variable must be passed explictly in the call itself, not rebound somewhere remote in the program flow. Yes, the name add4 is no longer accurately descriptive for “add any two numbers,” but at least the change in result is syntactically local.

Methods of Classes
All methods of classes are callables. For the most part, however, calling a method of an instance goes against the grain of functional programming styles. Usually we use methods because we want to reference mutable data that is bundled in the attributes of the instance, and hence each call to a method may produce a different result that varies independently of the arguments passed to it.

Accessors and Operators
Even accessors, whether created with the @property decorator or otherwise, are technically callables, albeit accessors are callables with a limited use (from a functional programming perspective) in that they take no arguments as getters, and return no value as setters:

view plaincopy to clipboardprint?
class Car(object):  
  def __init__(self):  
    self._speed = 100  
  
  @property  
  def speed(self):  
    print("Speed is", self._speed)  
    return self._speed  
  
  @speed.setter  
  def speed(self, value):  
    print("Setting to", value)  
    self._speed = value  
  
# >> car = Car()  
# >>> car.speed = 80 # Odd syntax to pass one argument  
# Setting to 80  
# >>> x = car.speed  
# Speed is 80  

In an accessor, we co-opt the Python syntax of assignment to pass an argument instead. That in itself is fairly easy for much Python syntax though, for example:

view plaincopy to clipboardprint?
class TalkativeInt(int):  
  def __lshift__(self, other):  
    print("Shift", self, "by", other)  
    return int.__lshift__(self, other)  
  
>>> t = TalkativeInt(8)  
>>> t << 3  
Shift 8 by 3  
64  

Every operator in Python is basically a method call “under the hood.” (Standard operators as functions) But while occasionally producing a more readable “domain specific language” (DSL), defining special callable meanings for operators adds no improvement to the underlying capabilities of function calls.

Static Methods of Instances
One use of classes and their methods that is more closely aligned with a functional style of programming is to use them simply as namespaces to hold a variety of related functions:

view plaincopy to clipboardprint?
import math  
  
class RightTriangle(object):  
  "Class used solely as namespace for related functions"  
  @staticmethod  
  def hypotenuse(a, b):  
    return math.sqrt(a**2 + b**2)  
  
  @staticmethod  
  def sin(a, b):  
    return a / RightTriangle.hypotenuse(a, b)  
  
  @staticmethod  
  def cos(a, b):  
    return b / RightTriangle.hypotenuse(a, b)  

Keeping this functionality in a class avoids polluting the global (or module, etc.) namespace, and lets us name either the class or an instance of it when we make calls to pure functions. For example:

>>> RightTriangle.hypotenuse(3,4)
5.0
>>> rt = RightTriangle()
>>> rt.sin(3,4)
0.6
>>> rt.cos(3,4)
0.8

By far the most straightforward way to define static methods is with the decorator named in the obvious way. If your namespace is entirely a bag for pure functions, there is no reason not to call via the class rather than the instance. But if you wish to mix some pure functions with some other stateful methods that rely on instance mutable state, you should use the @staticmethod decorator.

Generator Functions
A special sort of function in Python is one that contains a yield statement, which turns it into a generator. What is returned from calling such a function is not a regular value, but rather an iterator that produces a sequence of values as you call the next() function on it or loop over it. This is discussed in more detail in the chapter entitled “Lazy Evaluation.”

While like any Python object, there are many ways to introduce statefulness into a generator, in principle a generator can be “pure” in the sense of a pure function. It is merely a pure function that produces a (potentially infinite) sequence of values rather than a single value, but still based only on the arguments passed into it. Notice, however, that generator functions typically have a great deal of internal state; it is at the boundaries of call signature and return value that they act like a side-effect-free “black box.” A simple example:

view plaincopy to clipboardprint?
>>> def get_primes():  
...   "Simple lazy Sieve of Eratosthenes"  
...   candidate = 2  
...   found = []  
...   while True:  
...      if all(candidate % prime != 0 for prime in found):  
...         yield candidate  
...         found.append(candidate)  
...      candidate +=  
...  
>>> primes = get_primes()  
>>> next(primes), next(primes), next(primes)  
(2, 3, 5)  
>>> for _, prime in zip(range(10), primes):  
...   print(prime, end=" ")  
....  
7 11 13 17 19 23 29 31 37 41  

Every time you create a new object with get_primes() the iterator is the same infinite lazy sequence—another example might pass in some initializing values that affected the result—but the object itself is stateful as it is consumed incrementally.

Multiple Dispatch
A very interesting approach to programming multiple paths of execution is a technique called “multiple dispatch” or sometimes “multimethods.” The idea here is to declare multiple signatures for a single function and call the actual computation that matches the types or properties of the calling arguments. This technique often allows one to avoid or reduce the use of explicitly conditional branching, and instead substitute the use of more intuitive pattern descriptions of arguments.

A long time ago, this author wrote a module called multimethods that was quite flexible in its options for resolving “dispatch linearization” but is also so old as only to work with Python 2.x, and was even written before Python had decorators for more elegant expression of the concept. Matthew Rocklin’s more recent multipledis patch is a modern approach for recent Python versions, albeit it lacks some of the theoretical arcana I explored in my ancient module. Ideally, in this author’s opinion, a future Python version would include a standardized syntax or API for multiple dispatch (but more likely the task will always be the domain of third-party libraries).

To explain how multiple dispatch can make more readable and less bug-prone code, let us implement the game of rock/paper/scissors in three styles. Let us create the classes to play the game for all the versions:

view plaincopy to clipboardprint?
class Thing(object): pass  
class Rock(Thing): pass  
class Paper(Thing): pass  
class Scissors(Thing): pass  

Many Branches
First a purely imperative version. This is going to have a lot of repetitive, nested, conditional blocks that are easy to get wrong:

Delegating to the Object
As a second try we might try to eliminate some of the fragile repitition with Python’s “duck typing”—that is, maybe we can have different things share a common method that is called as needed:

view plaincopy to clipboardprint?
class DuckRock(Rock):  
    def beats(self, other):  
        if isinstance(other, Rock):  
            return None # No winner  
        elif isinstance(other, Paper):  
            return other  
        elif isinstance(other, Scissors):  
            return self  
        else:  
            raise TypeError("Unknown second thing")  
  
class DuckPaper(Paper):  
    def beats(self, other):  
        if isinstance(other, Rock):  
            return self  
        elif isinstance(other, Paper):  
            return None # No winner  
        elif isinstance(other, Scissors):  
            return other  
        else:  
            raise TypeError("Unknown second thing")  
  
class DuckScissors(Scissors):  
    def beats(self, other):  
        if isinstance(other, Rock):  
            return other  
        elif isinstance(other, Paper):  
            return self  
        elif isinstance(other, Scissors):  
            return None # No winner  
        else:  
            raise TypeError("Unknown second thing")  
  
def beats2(x, y):  
    if hasattr(x, 'beats'):  
        return x.beats(y)  
    else:  
        raise TypeError("Unknown first thing")  

Then you can test it this way:

>>> rock, paper, scissors = DuckRock(), DuckPaper(), DuckScissors()
>>> beats2(rock, paper)

>>> beats2(3, rock)
Traceback (most recent call last):
File "", line 1, in
File "/tmp/game.py", line 43, in beats2
raise TypeError("Unknown first thing")
TypeError: Unknown first thing

We haven’t actually reduced the amount of code, but this version somewhat reduces the complexity within each individual callable, and reduces the level of nested conditionals by one. Most of the logic is pushed into separate classes rather than deep branching. In object-oriented programming we can “delgate dispatch to the object” (but only to the one controlling object).

Pattern Matching
As a final try, we can express all the logic more directly using multiple dispatch. This should be more readable, albeit there are still a number of cases to define:

view plaincopy to clipboardprint?
from multipledispatch import dispatch  
  
@dispatch(Rock, Rock)  
def beats3(x, y): return None  
  
@dispatch(Rock, Paper)  
def beats3(x, y): return y  
  
@dispatch(Rock, Scissors)  
def beats3(x, y): return x  
  
@dispatch(Paper, Rock)  
def beats3(x, y): return x  
  
@dispatch(Paper, Paper)  
def beats3(x, y): return None  
  
@dispatch(Paper, Scissors)  
def beats3(x, y): return x  
  
@dispatch(Scissors, Rock)  
def beats3(x, y): return y  
  
@dispatch(Scissors, Paper)  
def beats3(x, y): return x  
  
@dispatch(Scissors, Scissors)  
def beats3(x, y): return None  
  
@dispatch(object, object)  
def beats3(x, y):  
    if not isinstance(x, (Rock, Paper, Scissors)):  
        raise TypeError("Unknown first thing")  
    else:  
        raise TypeError("Unknown second thing")  
  
# >>> beats3(rock, paper)  
# <__main__ .duckpaper="" at="" class="number" nbsp="" span="" style="background-color: inherit; border: none; color: #c00000; margin: 0px; padding: 0px;">0x103b894a8>  

# >>> beats3(rock, 3)

# TypeError: Unknown second thing

Predicate-Based Dispatch
A really exotic approach to expressing conditionals as dispatch decisions is to include predicates directly within the function signatures (or perhaps within decorators on them, as with multipledispatch). I do not know of any well-maintained Python library that does this, but let us simply stipulate a hypothetical library briefly to illustrate the concept. This imaginary library might be aptly named predicative_dispatch:

view plaincopy to clipboardprint?
from predicative_dispatch import predicate  
  
@predicate(lambda x: x < 0, lambda y: True)  
def sign(x, y):  
  print("x is negative; y is", y)  
  
@predicate(lambda x: x == 0, lambda y: True)  
def sign(x, y):  
  print("x is zero; y is", y)  
  
@predicate(lambda x: x > 0, lambda y: True)  
def sign(x, y):  
  print("x is positive; y is", y)  

While this small example is obviously not a full specification, the reader can see how we might move much or all of the conditional branching into the function call signatures themselves, and this might result in smaller, more easily understood and debugged functions.

程式扎記

標籤

2018年5月25日星期五

[ FP In Python ] Ch2. Callables

沒有留言:

張貼留言

[Git 常見問題] error: The following untracked working tree files would be overwritten by merge

檢舉濫用情形

學習筆記

標籤

2018年5月25日 星期五