A powerful feature of Python is its iterator protocol (which we will get to shortly). This capability is only loosely connected to functional programming per se, since Python does not quite offer lazy data structures in the sense of a language like Haskell. However, use of the iterator protocol—and Python’s many built-in or standard library iteratables—accomplish much the same effect as an actual lazy data structure. Let us explain the contrast here in slightly more detail. In a language like Haskell, which is inherently lazily evaluated, we might define a list of all the prime numbers in a manner like the following:
- -- Define a list of ALL the prime numbers
- primes = sieve [2 ..]
- where sieve (p:xs) = p : sieve [x | x <- class="number" p="" rem="" span="" style="background-color: inherit; border: none; color: #c00000; margin: 0px; padding: 0px;" x="" xs="">0->]
Mind you, one can replicate this in Python too, it just isn’t in the inherent syntax of the language and takes more manual construction. Given the get_primes() generator function discussed earlier, we might write our own container to simulate the same thing, for example:
- def get_primes():
- "Simple lazy Sieve of Eratosthenes"
- candidate = 2
- found = []
- while True:
- if all(candidate % prime != 0 for prime in found):
- yield candidate
- found.append(candidate)
- candidate += 1
- from collections.abc import Sequence
- class ExpandingSequence(Sequence):
- def __init__(self, it):
- self.it = it
- self._cache = []
- def __getitem__(self, index):
- while len(self._cache) <= index:
- self._cache.append(next(self.it))
- return self._cache[index]
- def __len__(self):
- return len(self._cache)
Of course, there are other custom capabilities we might want to engineer in, since lazy data structures are not inherently intertwined into Python. Maybe we’d like to be able to slice this special sequence. Maybe we’d like a prettier representation of the object when printed. Maybe we should report the length as inf if we somehow signaled it was meant to be infinite. All of this is possible, but it takes a little bit of code to add each behavior rather than simply being the default assumption of Python data structures.
The Iterator Protocol
The easiest way to create an iterator—that is to say, a lazy sequence —in Python is to define a generator function, as was discussed in the chapter entitled “Callables.” Simply use the yield statement within the body of a function to define the places (usually in a loop) where values are produced. Or, technically, the easiest way is to use one of the many iterable objects already produced by built-ins or the standard library rather than programming a custom one at all. Generator functions are syntax sugar for defining a function that returns an iterator.
Many objects have a method named .__iter__(), which will return an iterator when it is called, generally via the iter() built-in function, or even more often simply by looping over the object (e.g., for item in collection: ...). What an iterator is is the object returned by a call to iter(some thing), which itself has a method named .__iter__() that simply returns the object itself, and another method named .__next__(). The reason the iterable itself still has an .__iter__()method is to make iter() idempotent. That is, this identity should always hold (or raise TypeError("object is not iterable")):
- iter_seq = iter(sequence)
- iter(iter_seq) == iter_seq
In a functional programming style—or even just generally for readability— writing custom iterators as generator functions is most natural. However, we can also create custom classes that obey the protocol; often these will have other behaviors (i.e., methods) as well, but most such behaviors necessarily rely on statefulness and side effects to be meaningful. For example:
- from collections.abc import Iterable
- class Fibonacci(Iterable):
- def __init__(self):
- self.a, self.b = 0, 1
- self.total = 0
- def __iter__(self):
- return self
- def __next__(self):
- self.a, self.b = self.b, self.a + self.b
- self.total += self.a
- return self.a
- def running_sum(self):
- return self.total
This example is trivial, of course, but it shows a class that both implements the iterator protocol and also provides an additional method to return something stateful about its instance. Since statefulness is for object-oriented programmers, in a functional programming style we will generally avoid classes like this.
Module: itertools
The module itertools is a collection of very powerful—and carefully designed—functions for performing iterator algebra. That is, these allow you to combine iterators in sophisticated ways without having to concretely instantiate anything more than is currently required. As well as the basic functions in the module itself, the module documentation provides a number of short, but easy to get subtly wrong, recipes for additional functions that each utilize two or three of the basic functions in combination. The third-party module more_itertools mentioned in the Preface provides additional functions that are likewise designed to avoid common pitfalls and edge cases.
The basic goal of using the building blocks inside itertools is to avoid performing computations before they are required, to avoid the memory requirements of a large instantiated collection, to avoid potentially slow I/O until it is stricly required, and so on. Iterators are lazy sequences rather than realized collections, and when combined with functions or recipes in itertools they retain this property.
Here is a quick example of combining a few things. Rather than the stateful Fibonacci class to let us keep a running sum, we might simply create a single lazy iterator to generate both the current number and this sum:
Figuring out exactly how to use functions in itertools correctly and optimally often requires careful thought, but once combined, remarkable power is obtained for dealing with large, or even infinite, iterators that could not be done with concrete collections. The documentation for the itertools module contain details on its combinatorial functions as well as a number of short recipes for combining them. This paper does not have space to repeat those descriptions, so just exhibiting a few of them above will suffice.
Note that for practical purposes, zip(), map(), filter(), and range() (which is, in a sense, just a terminating itertools.count()) could well live in itertools if they were not built-ins. That is, all of those functions lazily generate sequential items (mostly based on existing iterables) without creating a concrete sequence. Built-ins like all(), any(), sum(), min(), max(), and functools.reduce() also act on iterables, but all of them, in the general case, need to exhaust the iterator rather than remain lazy. The function itertools.product() might be out of place in its module since it also creates concrete cached sequences, and cannot operate on infinite iterators.
Chaining Iterables
The itertools.chain() and itertools.chain.from_iterable() functions combine multiple iterables. Built-in zip() and itertools.zip_longest() also do this, of course, but in manners that allow incremental advancement through the iterables. A consequence of this is that while chaining infinite iterables is valid syntactically and semantically, no actual program will exhaust the earlier iterable. For example:
- from itertools import chain, count
- thrice_to_inf = chain(count(), count(), count())
- def from_logs(fnames):
- yield from (open(file) for file in fnames)
- lines = chain.from_iterable(from_logs(['huge.log', 'gigantic.log']))
Besides the chaining with itertools, we should mention collections.ChainMap() in the same breath. Dictionaries (or generally any collections.abc.Mapping) are iterable (over their keys). Just as we might want to chain multiple sequence-like iterables, we sometimes want to chain together multiple mappings without needing to create a single larger concrete one. ChainMap() is handy, and does not alter the underlying mappings used to construct it.
Supplement
* Python3 Course - Generators