程式扎記: [Quick Python] 18. Packages

標籤

2012年3月12日 星期一

[Quick Python] 18. Packages

Preface : 
Modules make reusing small chunks of code easy. The problem comes when the project grows and the code you want to reload outgrows, either physically or logically, what would fit into a single file. If having one giant module file is an unsatisfactory solution, having a host of little unconnected modules isn’t much better. The answer to this problem is to combine related modules into a package. This chapter covers : 
* Defining a package
* Creating a simple package
* Exploring a concrete example
* Using the __all__ attribute
* Using packages properly

What is a package? 
module is a file containing code. A module defines a group of usually related Python functions or other objects. The name of the module is derived from the name of the file. When you understand modules, packages are easy, because a package is a directory containing code and possibly further subdirectories. A package contains a group of usually related code files (modules). The name of the package is derived from the name of the main package directory. 

Packages are a natural extension of the module concept and are designed to handle very large projects. Just as modules group related functions, classes, and variables, packages group related modules. 

A first example : 
To see how this might work in practice, let’s sketch a design layout for a type of project that by nature is very large—a generalized mathematics package, along the lines of Mathematica, Maple, or MATLAB. Maple, for example, consists of thousands of files, and some sort of hierarchical structure is vital to keeping such a project ordered. We’ll call our project as a whole mathproj

We can organize such a project in many ways, but a reasonable design splits the project into two parts: ui, consisting of the user interface elements, and comp, the computational elements. Within comp, it may make sense to further segment the computational aspect into symbolic (real and complex symbolic computation, such as high school algebra) and numeric (real and complex numerical computation, such as numerical integration). It may then make sense to have a constants.py file in both the symbolic and numeric parts of the project. 

The constants.py file in the numeric part of the project defines pi as : 
pi = 3.141592

whereas the constants.py file in the symbolic part of the project defines pi as : 
  1. class PiClass:  
  2.     def __str__(self):  
  3.         return "PI"  
  4. pi = PiClass()  
This means that a name like pi can be used in (and imported from) two different files named constants.py, as shown in figure 18.1 : 
 

The symbolic constants.py file defines pi as an abstract Python object, the sole instance of the PiClass class. As the system is developed, various operations can be implemented in this class, which return symbolic rather than numeric results. 

There is a natural mapping from this design structure to a directory structure. The top-level directory of the project, called mathproj, contains subdirectories ui and comp; comp in turn contains subdirectories symbolic and numeric; and each of symbolic and numeric contains its own constants.py file. 

Given this directory structure, and assuming that the root mathproj directory is installed somewhere in the Python search path, Python code both inside and outside themathproj package can access the two variants of pi as mathproj.symbolic.constants.pi and mathproj.numeric.constants.pi. In other words, the Python name for an item in the package is a reflection of the directory pathname to the file containing that item

That’s what packages are all about. They’re ways of organizing very large collections of Python code into coherent wholes, by allowing the code to be split among different files and directories and imposing a module/submodule naming scheme based on the directory structure of the package files. Unfortunately, all isn’t this simple in practice because details intrude to make their use more complex than their theory. The practical aspects of packages are the basis for the remainder of this chapter. 

A concrete example : 
The rest of this chapter will use a running example to illustrate the inner workings of the package mechanism (see figure 18.2). Filenames and paths are shown in plain text, to avoid confusion as to whether we’re talking about a file/directory or the module/package defined by that file/directory. The files we’ll be using in our example package are shown in listings 18.1 through 18.6 : 
- Listing 18.1 File mathproj/__init__.py
  1. print("Hello from mathproj init")  
  2. __all__ = ['comp']  
  3. version = 1.03  

- Listing 18.2 File mathproj/comp/__init__.py
  1. __all__ = ['c1']  
  2. print("Hello from mathproj.comp init")  

- Listing 18.3 File mathproj/comp/c1.py

- Listing 18.4 File mathproj/comp/numeric/__init__.py
  1. print("Hello from numeric init")  

- Listing 18.5 File mathproj/comp/numeric/n1.py
  1. from mathproj import version  
  2. from mathproj.comp import c1  
  3. from mathproj.comp.numeric.n2 import h  
  4. def g():  
  5.     print("version is", version)  
  6.     print(h())  

- Listing 18.6 File mathproj/comp/numeric/n2.py
  1. def h():  
  2.     return "Called function h in module n2"  

 

For the purposes of the examples in this chapter, we’ll assume that you’ve created these files in a mathproj directory that’s on the Python search path. (It’s sufficient to ensure that the current working directory for Python is the directory containing mathproj when executing these examples.

- Basic use of the mathproj package 
Before getting into the details of packages, let’s look at accessing items contained in the mathproj package. Start up a new Python shell, and do the following : 
>>> import mathproj
Hello from mathproj init

If all goes well, you should get another input prompt and no error messages. As well, the message "Hello from mathproj init" should be printed to the screen, by code in the mathproj/__init__.py file. We’ll talk more about __init__.py files in a bit; for now, all you need to know is that they’re run automatically whenever a package is first loaded. 

The mathproj/__init__.py file assigns 1.03 to the variable version. version is in the scope of the mathproj package namespace, and after it’s created, you can see it viamathproj, even from outside the mathproj/__init__.py file : 
>>> mathproj.version
1.03

In use, packages can look a lot like modules; they can provide access to objects defined within them via attributes. This isn’t surprising, because packages are a generalization of modules. 

- Loading subpackages and submodules 
Now, let’s start looking at how the various files defined in the mathproj package interact with one another. We’ll do this by invoking the function g defined in the filemathproj/comp/numeric/n1.py. The first obvious question is, has this module been loaded? We have already loaded mathproj, but what about its subpackage? Let’s see if it’s known to Python : 
>>> mathproj.comp.numeric.n1
Traceback (most recent call last):
File "", line 1, in
AttributeError: 'module' object has no attribute 'comp'

In other words, loading the top-level module of a package isn’t enough to load all the submodules. This is in keeping with Python’s philosophy that it shouldn’t do things behind your back. Clarity is more important than conciseness. This is simple enough to overcome. We import the module of interest and then execute the function g in that module : 
>>> import mathproj.comp.numeric.n1
Hello from mathproj.comp init
Hello from numeric init
>>> mathproj.comp.numeric.n1.g()
version is 1.03
Called function h in module n2

Notice, however, that the lines beginning with Hello are printed out as a side effect of loading mathproj.comp.numeric.n1. These two lines are printed out by printstatements in the __init__.py files in mathproj/comp and mathproj/comp/numeric. In other words, before Python can import mathproj.comp.numeric.n1, it first has to import mathproj.comp and then mathproj.comp.numeric. Whenever a package is first imported, its associated __init__.py file is executed, resulting in the Hello lines. To confirm that both mathproj.comp and mathproj.comp.numeric are imported as part of the process of importing mathproj.comp.numeric.n1, we can check to see that mathproj.comp and mathproj.comp.numeric are now known to the Python session : 
>>> mathproj.comp

>>> mathproj.comp.numeric

- import statements within packages 
Files within a package don’t automatically have access to objects defined in other files in the same package. As in outside modules, you must use import statements to explicitly access objects from other package files. To see how this works in practice, look back at the n1 subpackage. The code contained in n1.py is : 
  1. from mathproj import version  
  2. from mathproj.comp import c1  
  3. from mathproj.comp.numeric.n2 import h  
  4. def g():  
  5.     print "version is", version  
  6.     print h()  
g makes use of both versions from the top-level mathproj package and the function h from the n2 module; hence, the module containing g must import both version and hto make them accessible. We import version as we would in an import statement from outside the mathproj package, by saying from mathproj import version. In this example, we explicitly import h into the code by saying from mathproj.comp.numeric.n2 import h, and this will work in any file—explicit imports of package files are always allowed. But because n2.py is in the same directory as n1.py, we can also use a relative import by prepending a single dot to the submodule name. In other words, we can say : 
from .n2 import h

as the third line in n1.py, and it works fine. 

You can add more dots to move up more levels in the package hierarchy, and you can add module names. Instead of : 
from mathproj import version
from mathproj.comp import c1
from mathproj.comp.numeric.n2 import h

we could also have written the imports of n1.py as : 
from ... import version
from .. import c1
from . n2 import h

Relative imports can be handy and quick to type, but be aware that they’re relative to the module’s __name__ property. Therefore any module being executed as the main module, and thus having an __name__ of __main__, can’t use relative imports

- __init__.py files in packages 
You’ll have noticed that all the directories in our package—mathproj, mathproj/ comp, and mathproj/numeric—contain a file called __init__.py. An __init__.py file serves two purposes : 
* It’s automatically executed by Python the first time a package or subpackage is loaded. This permits whatever package initialization you may desire. Python requires that a directory contain an __init__.py file before it can be recognized as a package. This prevents directories containing miscellaneous Python code from being accidentally imported as if they defined a package.
* The second point is probably the more important. For many packages, you won’t need to put anything in the package’s __init__.py file—just make sure an empty __init__.py file is present.

The __all__ attribute : 
If you look back at the various __init__.py files defined in mathproj, you’ll notice that some of them define an attribute called __all__. This has to do with execution of statements of the form from ... import *, and it requires explanation. 

Generally speaking, we would hope that if outside code executed the statement from mathproj import *, it would import all nonprivate names from mathproj. In practice, life is more difficult. The primary problem is that some operating systems have an ambiguous definition of case when it comes to filenames. Microsoft Windows 95/98 is particularly bad in this regard, but it isn’t the only villain. Because objects in packages can be defined by files or directories, this leads to ambiguity as to exactly under what name a subpackage might be imported. If we say from mathproj import *, will comp be imported as compComp, or COMP? If we were to rely only on the name as reported by the operating system, the results might be unpredictable. 

There’s no good solution to this. It’s an inherent problem caused by poor OS design. As the best possible fix, the __all__ attribute was introduced. If present in an __init__.py file, __all__ should give a list of strings, defining those names that are to be imported when a from ... import * is executed on that particular package. If__all__ isn’t present, then from ... import * on the given package does nothing. Because case in a text file is always meaningful, the names under which objects are imported isn’t ambiguous, and if the OS thinks that comp is the same as COMP, that’s its problem. 

To see this in action, fire up Python again, and try the following : 
>>> from mathproj import *
Hello from mathproj init
Hello from mathproj.comp init

The __all__ attribute in mathproj/__init__.py contains a single entry, comp, and the import statement imports only comp. It’s easy enough to check that comp is now known to the Python session : 
>>> comp

But note that there’s no recursive importing of names with a from ... import * statement. The __all__ attribute for the comp package contains c1, but c1 isn’t magically loaded by our from mathproj import * statement : 
>>> c1
Traceback (most recent call last):
File "", line 1, in
NameError: name 'c1' is not defined

To insert names from mathproj.comp we must, again, do an explicit import : 
>>> from mathproj.comp import c1
>>> c1

Proper use of packages : 
Most of your packages shouldn’t be as structurally complex as these examples imply. The package mechanism allows wide latitude in the complexity and nesting of your package design. It’s obvious that very complex packages can be built, but it isn’t obvious that they should be built. 

Here are a couple of suggestions that are appropriate in most circumstances : 
* Packages shouldn’t use deeply nested directory structures. Except for absolutely huge collections of code, there should be no need to do so. For most packages, a single top-level directory is all that’s needed. A two-level hierarchy should be able to effectively handle all but a few of the rest. As written in the Zen of Python, by Tim Peters, "Flat is better than nested."
* Although you can use the __all__ attribute to hide names from from ... import * by not listing those names, this is probably not a good idea, because it’s inconsistent. If you want to hide names, make them private by prefacing them with an underscore.

Supplement : 
[Python 學習筆記] 函式、類別與模組 : 模組 (匯入套件)

沒有留言:

張貼留言

網誌存檔

關於我自己

我的相片
Where there is a will, there is a way!