程式扎記: [Quick Python] 20. Advanced object-oriented features

標籤

2012年3月19日 星期一

[Quick Python] 20. Advanced object-oriented features


Preface :
This chapter will focus on some more advanced object-oriented features of Python. Python is distinguished by the ability to modify its objects in ways that can fundamentally change their behavior. For C++ users, this is somewhat similar to operator overloading, but in Python it’s more comprehensive and easier to use.

In addition to modifying the behavior of objects, you can also control the behavior of classes and the creation of their instances. Obviously, you’ll need to be fairly familiar with OO programming to use this feature.

This chapter covers :
* Using special method attributes
* Making an object behave like a list
* Subclassing built-in types
* Understanding metaclasses
* Creating abstract base classes

What is a special method attribute?
A special method attribute is an attribute of a Python class with a special meaning to Python. It’s defined as a method but isn’t intended to be used directly as such. Special methods aren’t usually directly invoked; instead, they’re called automatically by Python in response to a demand made on an object of that class.

Perhaps the simplest example of this is the __str__ special method attribute. If it’s defined in a class, then anytime an instance of that class is used where Python requires a user-readable string representation of that instance, the __str__ method attribute will be invoked, and the value it returns will be used as the required string. To see this, let’s define a class representing RGB colors as a triplet of numbers, one each for red, green, and blue intensities. As well as defining the standard__init__ method to initialize instances of the class, we’ll also define a __str__ method to return strings representing instances, in a reasonably human-friendly format Our definition looks something like Exam20-1.py :
- Exam20_1.py :
  1. class Color:  
  2.     def __init__(self, red, green, blue):  
  3.         self._red = red  
  4.         self._green = green  
  5.         self._blue = blue  
  6.     def __str__(self):  
  7.         return "Color: R={0:d}, G={1:d}, B={2:d}".format(self._red,  
  8.                                                          self._green,  
  9.                                                          self._blue)  

We can load it and use it in the normal manner :
>>> from Exam20_1 import Color
>>> c = Color(15, 35, 3)

The presence of the __str__ special method attribute can be seen if we now use print() to print out c :
>>> print(c)
Color: R=15, G=35, B=3

Even though our __str__ special method attribute has not been explicitly invoked by any of our code, it has nonetheless been used by Python, which knows that the__str__ attribute (if present) defines a method to convert objects into user-readable strings. This is the defining characteristic of special method attributes—they allow you to define functionality that hooks into Python in special ways. Special method attributes can be used to define classes whose objects behave in a fashion that’s syntactically and semantically equivalent to lists or dictionaries. You could, for example, use this ability to define objects that are used in exactly the same manner as Python lists but that use balanced trees rather than arrays to store data. To a programmer, they would appear to be lists, but with faster inserts, slower iterations, and certain other performance differences that would presumably be advantageous in the problem at hand.

The rest of this chapter covers longer examples using special method attributes. It doesn’t discuss all of Python’s available special method attributes, but it does expose you to the concept in enough detail that you can then easily make use of the other special attribute methods.

Making an object behave like a list :
This sample problem involves a large text file containing records of people; each record consists of a single line containing the person’s name, age, and place of residence, with a double semicolon (::) between the fields. A few lines from such a file might look like this :
John Smith::37::Springfield, Massachusetts
Ellen Nelle::25::Springfield, Connecticut
Dale McGladdery::29::Springfield, Hawaii
...

Suppose we need to collect information as to the distribution of ages of people in the file. There are many ways the lines in this file could be processed :
  1. fileobject = open(filename, 'r')  
  2. lines = fileobject.readlines()  
  3. fileobject.close()  
  4. for line in lines:  
  5.     . . . do whatever . . .  
That would work in theory, but it reads the entire file into memory at once. If the file was too large to be held in memory (and these files potentially are that large), the program wouldn’t work. Another way to attack the problem is this :
  1. fileobject = open(filename, 'r')  
  2. for line in fileobject:  
  3.     . . . do whatever . . .  
  4. fileobject.close()  
This would get around the problem of too little memory by reading in only one line at a time. It would work fine, but suppose we wanted to make opening the file even simpler and that we wanted to get only the first two fields (name and age) of the lines in the file? We’d need something that could, at least for the purposes of a for loop, treat a text file as a list of lines, but without reading the entire text file in at once.

- The __getitem__ special method attribute
A solution is to use the __getitem__ special method attribute, which you can define in any user-defined class, to enable instances of that class to respond to list access syntax and semantics. If AClass is a Python class that defines __getitem__, and obj is an instance of that class, then things like x = obj[n] and for x in obj: are meaningful; obj may be used in much the same way as a list.

Here’s the resulting code; explanations follow :
  1. class LineReader:  
  2.     def __init__(self, filename):  
  3.         self.fileobject = open(filename, 'r') # Open file for reading  
  4.     def __getitem__(self, index):  
  5.         line = self.fileobject.readline()  
  6.         if line == "":  # If no more data  
  7.             self.fileobject.close() # Close fileobject  
  8.             raise IndexError  # And raise IndexError  
  9.         else:  
  10.             return line.split("::")[:2]  # Otherwise, splits line, return first two fields  
  11.   
  12.   
  13. for name, age in LineReader("testfile.txt"):  
  14.     pass # do whatever...  
At first glance, this may look worse than the previous solution because there’s more code and it’s difficult to understand. But most of that code is in a class, which can be put into its own module, say the myutils module. Then the program becomes :
  1. import myutils  
  2. for name, age in myutils.LineReader("filename"):  
  3.     . . . do whatever . . .  
The LineReader class handles all the details of opening the file, reading in lines one at a time, and closing the file. At the cost of somewhat more initial development time, it provides a tool that makes working with one-record-per-line large text files easier and less error prone. Note that there are several powerful ways to read files already in Python, but this example has the advantage that it’s fairly easy to understand. When you get the idea, you can apply the same principle in many different situations.

- How it works
LineReader is a class, and the __init__ method opens the named file for reading and stores the opened fileobject for later access. To understand the use of the__getitem__ method, you need to know the following three points :
* Any object that defines __getitem__ as an instance method can return elements as if it were a list: all accesses of the form object are transformed by Python into a method invocation of the form object.__getitem__(i), which is then handled as a normal method invocation. It’s ultimately executed as __getitem__(object, i), using the version of __getitem__ defined in the class. The first argument of each call of __getitem__ is the object from which data is being extracted, and the second argument is the index of that data.
* Because [i]for
 loops access each piece of data in a list, one at a time, a for arg in sequence: loop works by calling __getitem__ over and over again, with sequentially increasing indexes. The for loop will first set arg to sequence.__getitem__(0), then to sequence.__getitem__(1), and so on.
* A for loop catches IndexError exceptions and handles them by exiting the loop. This is how for loops are terminated when used with normal lists or sequences.

The LineReader class is intended for use only with and inside a for loop, and the for loop will always generate calls with a uniformly increasing index: __getitem__(self, 0)__getitem__(self, 1)__getitem__(self, 2), and so on. The previous code takes advantage of this knowledge and returns lines one after the other, ignoring the indexargument.

With this knowledge, understanding how a LineReader object emulates a sequence in a for loop is easy. Each iteration of the loop causes the special Python attribute method __getitem__ to be invoked on the object; as a result, the object reads in the next line from its stored fileobject and examines that line. If the line is non-empty, it’s returned. An empty line means the end of the file has been reached, and the object closes the fileobject and raises the IndexError exception. IndexError is caught by the enclosing for loop, which then terminates.

Remember that this example is here for illustrative purposes only. Usually, iterating over the lines of a file using the for line in fileobject: type of loop is sufficient, but this example does show how easy it is in Python to create objects that behave like lists or other types.

- Implementing full list functionality
In the previous example, an object of the LineReader class behaves like a list object only to the extent that it correctly responds to sequential accesses of the lines in the file it’s reading from. You may wonder how this functionality can be expanded to make LineReader (or other) objects behave more like a list.

More generally, Python provides a number of special method attributes relating to list behavior. __setitem__ provides a way of defining what should be done when an object is used in the syntactic context of a list assignment, obj[n] = val. Some other special method attributes provide less-obvious list functionality, such as the __add__attribute, which enables objects to respond to the + operator and hence to perform their version of list concatenation. Several other special methods also need to be defined before a class fully emulates a list, but you can achieve this complete list emulation by defining the appropriate Python special method attributes. The next section gives an example that goes further toward implementing a full-list emulation class.

Giving an object full list capability :
__getitem__ is one of many Python special function attributes that may be defined in a class, to permit instances of that class to display special behavior. To see how this can be carried further, effectively integrating new abilities into Python in a seamless manner, we’ll look at another, more comprehensive example.

When lists are used, it’s common that any particular list will contain elements of only one type such as a list of strings or a list of numbers. Some languages, such as C++, have the ability to enforce this. In large programs, this ability to declare a list as containing a certain type of element can help you track down errors. An attempt to add an element of the wrong type to a typed list will result in an error message, potentially identifying a problem at an earlier stage of program development than would otherwise be the case.

Python doesn’t have typed lists built in, and most Python coders don’t miss them; but if you’re concerned about enforcing the homogeneity of a list, special method attributes make it easy to create a class that behaves like a typed list. Here’s the beginning of such a class (which makes extensive use of the Python built-in type() andisinstance() functions, to check the type of objects):
- Exam20_3.py :
  1. class TypedList:  
  2.     def __init__(self, example_element, initial_list=[]):  
  3.         self.type = type(example_element) # (1)  
  4.         if not isinstance(initial_list, list):  
  5.             raise TypeError("Second argument of TypedList must be a list")  
  6.         for element in initial_list:  
  7.             if not isinstance(element, self.type):  
  8.                 raise TypeError("Attempted to add an element of "  
  9.                                 "incorrect type to a typed list.")  
  10.         self.elements = initial_list[:]  

The example_element argument defines the type this list can contain by providing an example of the type of element (1). The TypedList class, as defined here, gives us the ability to make a call of the form :
x = TypedList('Hello', ["List", "of", "strings"])

The first argument, 'Hello', isn’t incorporated into the resulting data structure at all. It’s used as an example of the type of element the list must contain (strings, in this case). The second argument is an optional list that can be used to give an initial list of values. The __init__ function for the TypedList class checks that any list elements passed in when a TypedList instance is created are of the same type as the example value given. If there are any type mismatches, an exception is raised.

This version of the TypedList class can’t be used as a list, because it doesn’t respond to the standard methods for setting or accessing list elements. To fix this, we need to define the __setitem__ and __getitem__ special method attributes. The __setitem__ method will be called automatically by Python anytime a statement of the formTypedListInstance = value is executed, and the __getitem__ method will be called anytime the expression TypedListInstance is evaluated to return the value in the i'th slot of TypedListInstance. Here is the next version of the TypedList class. Because we’ll be type-checking a lot of new elements, we’ve abstracted this function out into the new private method __check :
  1. class TypedList:  
  2.     def __init__(self, example_element, initial_list=[]):  
  3.         self.type = type(example_element)  
  4.         if not isinstance(initial_list, list):  
  5.             raise TypeError("Second argument of TypedList must be a list")  
  6.         for element in initial_list:  
  7.             if not isinstance(element, self.type):  
  8.                 raise TypeError("Attempted to add an element of "  
  9.                                 "incorrect type to a typed list.")  
  10.         self.elements = initial_list[:]  
  11.     def __check(self, element):  
  12.         if type(element) != self.type:  
  13.             raise TypeError("Attempted to add an element of "  
  14.                             "incorrect type to a typed list.")  
  15.     def __setitem__(self, i, element):  
  16.         self.__check(element)  
  17.         self.elements[i] = element  
  18.   
  19.     def __getitem__(self, i):  
  20.         return self.elements[i]  
Now, instances of the TypedList class look more like lists. For example, the following code is valid :


The accesses of elements of x in the print statement are handled by __getitem__, which passes them down to the list instance stored in the TypedList object. The assignments to x[2] and x[3] are handled by __setitem__, which checks that the element being assigned into the list is of the appropriate type and then performs the assignment on the list contained in self.elements. The last line uses __getitem__ to unpack the first four items in x and then pack them into the variables a, b, c, d, and e, respectively. The calls to __getitem__ and __setitem__ are made automatically by Python.

Completion of the TypedList class, so that TypedList objects behave in all respects like list objects, requires more code. The special method attributes __setitem__ and__getitem__ should be defined so that TypedList instances can handle slice notation as well as single item access. __add__ should be defined so that list addition (concatenation) can be performed, and __mul__ should be defined so that list multiplication can be performed. __len__ should be defined so that calls oflen(TypedListInstance) are evaluated correctly. __delitem__ should be defined so that the TypedList class can handle del statements correctly. Also, an append method should be defined so that elements can be appended to TypedList instances using the standard list-style append, and similarly for an insert method.

Subclassing from built-in types :
The previous example makes for a good exercise in understanding how to implement a list-like class from scratch, but it’s also a lot of work. In practice, if you were planning to implement your own list-like structure along the lines demonstrated here, you might instead consider subclassing the list type or the UserList type.

- Subclassing list
Instead of creating a class for a typed list from scratch, as we did in the previous examples, you can also subclass the list type and override all the methods that need to be aware of the allowed type. One big advantage of this approach is that your class has default versions of all list operations, because it’s a list already. The main thing to keep in mind is that every type in Python is a class, and if you need a variation on the behavior of a built-in type, you may want to consider subclassing that type :


Note that all that we need to do in this case is implement a method to check the type of items being added and then tweak __setitem__ to make that check before calling list’s regular __setitem__ method. Other methods, like sort and del, work without any further coding. Overloading a built-in type can save a fair amount of time if you need only a few variations in its behavior, because the bulk of the class can be used unchanged.

- Subclassing UserList
If you need a variation on a list (as in the previous examples), there’s a third alternative. You can subclass the UserList class, a list wrapper class found in thecollections module. UserList was created for earlier versions of Python when subclassing the list type wasn’t possible; but it’s still useful, particularly in our current situation, because the underlying list is available as the data attribute :


This is much the same as subclassing list, except that in the implementation of the class, the list of items is available internally as the data member. In some situations, having direct access to the underlying data structure can be useful; and in addition to UserList, there are also UserDict and UserString wrapper classes.

When to use special method attributes :
As a rule, it’s a good idea to be somewhat cautious with the use of special method attributes. Other programmers who need to work with your code may wonder why one sequence-type object responds correctly to standard indexing notation, whereas another doesn’t.

My general guidelines are to use special method attributes in either of two situations. First, if I have a frequently used class in my own code that behaves in some respects like a Python built-in type, I’ll define such special method attributes as useful. This occurs most often with objects that behave like sequences in one way or another. Second, if I have a class that behaves identically or almost identically to a built-in class, I may choose to define all of the appropriate special function attributes or subclass the built-in Python type and distribute the class. An example of the latter might be lists implemented as balanced trees so that access is slower but insertion is faster than with standard lists.

These aren’t hard-and-fast rules. For example, it’s often a good idea to define the __str__ special method attribute for a class, so that you can say print(instance) in debugging code and get an informative and nice-looking representation of your object printed to the screen.

Metaclasses :
Everything in Python is an object, including classes. An object has to be created from something, and in the case of a class it’s created from a metaclass. In Python, classes are objects that are created at runtime as instances of the metaclass type. Let’s look at the standard definition of a class :
  1. class Spam:  
  2.     def __init__(self, x):  
  3.         self.x = x  
  4.     def show(self):  
  5.         print(self.x)  
This is the ordinary way of creating a class. We can create instances of it and exercise them and so on :
>>> from Exam20_6 import Spam
>>> my_spam = Spam("test")
>>> type(my_spam)

>>> type(Spam)

>>> my_spam.show()
test

Note that the type of the class Spam is 'type'.

Although this example is the common way to create a class, it’s really a shortcut for creating it explicitly using a metaclass. To do so, you need to call the metaclass (type(), by default) with the name of the class, a tuple of its base classes, and a dictionary of its attributes :


It looks a bit strange, because the methods are defined as first-class functions; but the result is the same, and the type of spam is still 'type'.

The point of this exercise is that the type metaclass can be subclassed and its behavior can be changed. That means that the way classes themselves are created from objects can be modified, allowing you to create classes that register or verify their instances when they’re created, for example, or classes that automatically have a class attribute. In the following somewhat simple-minded example, type is subclassed to NewType, which announces when it creates a class object and adds a class attributenew_attr to that class :


It’s not necessary to use the long form of class creation in order to use a custom metaclass, however. You can accomplish the same thing by using the metaclass keyword with a simple class definition :
- Exam20_7.py :
  1. class NewType(type):  
  2.     def __init__(cls, name, bases, dict):  
  3.         print("Creating from NewType")  
  4.         cls.new_attr = "test"  
  5.         type.__init__(cls, name, bases, dict)  
  6. class Spam(metaclass=NewType):  
  7.     def __init__(self, x):  
  8.         self.x = x  
  9.     def show(self):  
  10.         print(self.x)  

Then we can use it that way :


The previous examples have been kept deliberately simple, with the aim of making the basic mechanics of using metaclasses clear. Metaclass programming is enormously powerful, but it’s also a complex topic, and its details and use cases are well beyond both this book and most coding needs. In the words of master Pythonista Tim Peters, "Metaclasses are deeper magic than 99% of users should ever worry about. If you wonder whether you need them, you don’t (the people who need them know with certainty that they need them, and don’t need an explanation about why)."

Abstract base classes :
As you’ve seen, Python’s strategy for interacting with an object is to invoke its methods and judge its type from what it does. This use of duck typing is also referred to as EAFP, which is short for "easier to ask forgiveness than permission." The common approach in Python is to invoke an object’s method and either succeed or deal with the exception. If an object behaves like a sequence, for example, you can iterate over it with a for loop. If not, you catch and deal with the TypeError exception. Although this approach works well most of the time, there are occasions where it’s better for your code to know exactly what it’s getting into. This is sometimes called the LBYL, or "look before you leap," approach.

For example, suppose we need to know whether an object is a mutable sequence—something that works like a list. We can use isinstance() to see if the object has listas a base class. Or we can see if it has a __getitem__ method defined, by accessing the object with [], for example :
  1. try:  
  2.     x = my_object[0]  
  3.     #do stuff  
  4. except TypeError:  
  5.     pass  
The problem is that the first approach will miss perfectly good mutable sequences, like our LineReader class at the beginning of this chapter, because they aren’t subclasses of list. The second approach, on the other hand, will go in the other direction and accept both tuples (which aren’t mutable) and dictionaries (which aren’t sequences), among others.

Although it’s not in harmony with the overall philosophy of Python, in some scenarios it’s useful to be able to know for sure that an object is a sequence, a mutable sequence, a mapping, and so on. Python’s answer is an abstract base class (ABC), which is a class that can be put into an object’s inheritance tree to indicate to an external inspector that the object has a certain set of features. You can then test objects using isinstance for the presence of that abstract base class. The collectionslibrary contains several abstract collection types, including the following :

For more, please refer Collections Abstract Base Classes.

- Using abstract base classes for type checking
To return to our example, let’s see how we can use an abstract base class to make sure the custom TypedList class we created previously is identified as a mutable sequence. The first things we need to do are to import the MutableSequence base class from the collections module and then register our class as a MutableSequence :
- Exam20_8.py :
  1. class TypedList:  
  2.     def __init__(self, example_element, initial_list=[]):  
  3.         self.type = type(example_element)  
  4.         if not isinstance(initial_list, list):  
  5.             raise TypeError("Second argument of TypedList must "  
  6.                             "be a list.")  
  7.         for element in initial_list:  
  8.             self.__check(element)  
  9.             self.elements = initial_list[:]  
  10.     def __check(self, element):  
  11.         if type(element) != self.type:  
  12.             raise TypeError("Attempted to add an element of "  
  13.                             "incorrect type to a typed list.")  
  14.     def __setitem__(self, i, element):  
  15.         self.__check(element)  
  16.         self.elements[i] = element  
  17.     def __getitem__(self, i):  
  18.         return self.elements[i]  

And use it :


When TypedList is registered with MutableSequence as one of its own, any instance of it will also be an instance of MutableSequence.

- Creating abstract base classes
You can also create your own abstract base classes by setting their metaclass to be ABCMeta from the abc module. For example, if we want to make sure that every instance of list was also identified as an instance of MyABC, we do the following :
>>> from abc import ABCMeta
>>> class MyABC(metaclass=ABCMeta):
... pass
...
>>> MyABC.register(list)
>>> isinstance([1, 2, 3], MyABC)
True

Being able to use abstract base classes in Python gives you a choice: you can look before you leap, or you can ask for forgiveness.

- Using the @abstractmethod and @abstractproperty decorators
In Java, for example, an abstract class by definition can’t be instantiated under any circumstances. As with many features, in Python the "abstractness" of abstract base classes isn’t so much an enforced rule as it is a gentleman’s agreement. Python will allow you to create instances of generic abstract base classes without complaint as long as the base class doesn’t contain an abstract method :
>>> from abc import ABCMeta
>>> class MyABC(metaclass=ABCMeta):
... pass
...
>>> my_myabc = MyABC()
>>> print(type(my_myabc))

But if the class has an abstract method, then that class can’t be instantiated, nor can any subclass be instantiated unless it has overridden the abstract method. To create an abstract method, you use the @abstractmethod decorator from the abc module :


As the exceptions indicate, it’s not possible to instantiate a class with an abstract method, nor is it possible to instantiate a subclass, unless the abstract method has been overridden. In the following example, the abstract method has been overridden for the class SecondABC so the class can now be instantiated :

The abstractmethod function sets a function attribute __isabstractmethod__, which is checked by the __new__ method of ABCMeta. This means you can create abstract methods only for classes that are created with ABCMeta as their metaclass, and they must be defined in the class definition, not dynamically added.

Abstract methods in Python, unlike in Java, can have an implementation. You can call those methods by the overriding method in a subclass, as happens in the previous example. There is also an @abstractproperty decorator, which adds abstract properties to a class. These work the same way as normal properties, except that a class containing an abstract property can’t be instantiated, nor can its subclasses, unless they override the abstract property with a concrete one :
from abc import ABCMeta, abstractproperty
  1. class MyABC(metaclass=ABCMeta):    
  2.     @abstractproperty  
  3.     def readx(self):  
  4.         return self.__x  
  5.     def getx(self): # read only  
  6.         return self.__x  
  7.     def setx(self, x):  
  8.         self.__x = x  
  9.         x = abstractproperty(getx, setx)  
Even though the property x created here has an implementation that would work, this class can’t be instantiated. Instead, it must be subclassed, and the abstract property x must be overridden. But the subclass can access the abstract property in MyABC.

Supplement :
[Python 學習筆記] 進階議題 : Meta class (定義 meta class)

沒有留言:

張貼留言

網誌存檔

關於我自己

我的相片
Where there is a will, there is a way!