程式扎記: [Quick Python] 10. Modules and scoping rules

標籤

2012年2月11日 星期六

[Quick Python] 10. Modules and scoping rules





Preface : 
Modules are used to organize larger Python projects. The Python standard library is split into modules to make it more manageable. You don’t need to organize your own code into modules, but if you’re writing any programs that are more than a few pages long, or any code that you want to reuse, you should probably do so. This chapter covers : 
* Defining a module
* Writing a first module
* Using the import statement
* Modifying the module search path
* Making names private in modules
* Importing standard library and third-party modules
* Understanding Python scoping rules and namespaces

What is a module? 
module is a file containing code. A module defines a group of Python functions or other objects, and the name of the module is derived from the name of the file. Modules most often contain Python source code, but they can also be compiled C or C++ object files. Compiled modules and Python source modules are used the same way. 

As well as grouping related Python objects, modules help avoid name-clash problems. For example, you might write a module for your program called mymodule, which defines a function called reverse. In the same program, you might also wish to use somebody else’s module called othermodule, which also defines a function calledreverse, but which does something different from your reverse function. In a language without modules, it would be impossible to use two different functions namedreverse. In Python, it’s trivial—you refer to them in your main program as mymodule. reverse and othermodule.reverse

This is because Python uses namespacesA namespace is essentially a dictionary of the identifiers available to a block, function, class, module, and so on. We’ll discuss namespaces a bit more at the end of this chapter, but be aware that each module has its own namespace, and this helps avoid naming conflicts. 

Modules are also used to make Python itself more manageable. Most standard Python functions aren’t built into the core of the language but instead are provided via specific modules, which you can load as needed. 

A first module : 
The best way to learn about modules is probably to make one, so let’s get started. Firstly, create a text file called mymath.py : 
- mymath.py :
  1. """mymath - our example math module"""  
  2. pi = 3.14159  
  3. def area(r):  
  4.     """area(r): Return the area of a circle with radius r."""  
  5.     global pi  
  6.     return (pi * r * r)  

Save this for now in the directory where your Python executable is. This code merely assigns pi a value and defines a function. The .py filename suffix is strongly suggested for all Python code files. It identifies that file to the Python interpreter as consisting of Python source code. As with functions, you have the option of putting in a document string as the first line of your module. 

Now, start up the Python Shell and type the following : 
 

In other words, Python doesn’t have the constant pi or the function area() built in. Now, type : 
 

We’ve brought in the definitions for pi and area() from the mymath.py file, using the import statement (which automatically adds on the .py suffix when it searches for the file defining the module named mymath). But the new definitions aren’t directly accessible; typing pi by itself gave an error. Instead, we access pi and area by prependingthem with the name of the module that contains them. This guarantees name safety. There may be another module out there that also defines pi (maybe the author of that module thinks that pi is 3.14 or 3.14159265), but that is of no concern. Even if that other module is imported, its version of pi will be accessed byothermodulename.pi, which is different from mymath.pi. This form of access is often referred to as qualification (that is, the variable pi is being qualified by the module mymath). We may also refer to pi as an attribute of mymath

If you want, you can also specifically ask for names from a module to be imported in such a manner that you don’t have to prepend it with the module name : 
>>> from mymath import pi
>>> pi
3.14159
>>> area(2)
Traceback (most recent call last):
File "", line 1, in
NameError: name 'area' is not defined

The name pi is now directly accessible because we specifically requested it using from module import name. The function area() still needs to be called as mymath.area, though, because it wasn’t explicitly imported. 

You may want to use the basic interactive mode or IDLE’s Python shell to incrementally test a module as you’re creating it. But if you change your module on disk, retyping the import command won’t cause it to load again. You need to use the reload() function from the imp module for this. The imp module provides an interface to the mechanisms behind importing modules : 
>>> import mymath, imp
>>> imp.reload(mymath)

When a module is reloaded (or imported for the first time), all of its code is parsed. A syntax exception is raised if an error is found. On the other hand, if everything is okay, a .pyc file (for example, mymath.pyc) containing Python byte code is created. 

Reloading a module doesn’t put you back into exactly the same situation as when you start a new session and import it for the first time. But the differences won’t normally cause you any problems. If you’re interested, you can look up reload in the section on the imp module in the Python Language Reference to find the details. 

Of course, modules don’t need to be used from the interactive Python shell. You can also import them into scripts, or other modules for that matter; enter suitable importstatements at the beginning of your program file. Internally to Python, the interactive session and a script are considered modules as well. 

To summarize : 
* A module is a file defining Python objects.
* If the name of the module file is modulename.py, then the Python name of the module is modulename.
* You can bring a module named modulename into use with the import modulename statement. After this statement is executed, objects defined in the module can be accessed as modulename.objectname.
* Specific names from a module can be brought directly into your program using the from modulename import objectname statement. This makes objectname accessible to your program without needing to prepend it with modulename, and it’s useful for bringing in names that are often used.

The import statement : 
The import statement takes three different forms. The most basic : 
import modulename

searches for a Python module of the given name, parses its contents, and makes it available. The importing code can use the contents of the module, but any references by that code to names within the module must still be prepended with the module name. If the named module isn’t found, an error will be generated. Exactly where Python looks for modules will be discussed shortly. 

The second form permits specific names from a module to be explicitly imported into the code : 
from modulename import name1name2name3, . . .

Each of name1name2, and so forth from within modulename is made available to the importing code; code after the import statement can use any of name1name2,name3, and so on without prepending the module name. 

Finally, there’s a general form of the from . . . import . . . statement : 
from modulename import *

The * stands for all the exported names in modulename. This imports all public names from modulename—that is, those that don’t begin with an underscore, and makes them available to the importing code without the necessity of prepending the module name. But if a list of names called __all__ exists in the module (or the package’s __init__.py), then the names are the ones imported, whether they begin with an underscore or not. 

You should take care when using this particular form of importing. If two modules both define a name, and you import both modules using this form of importing, you’ll end up with a name clash, and the name from the second module will replace the name from the first. It also makes it more difficult for readers of your code to determine where names you’re using originate. When you use either of the two previous forms of the import statement, you give your reader explicit information about where they’re from. 

But some modules (such as tkinter, which will be covered later) name their functions to make it obvious where they originate and to make it unlikely that name clashes will occur. It’s also common to use the general import to save keystrokes when using an interactive shell. 

The module search path : 
Exactly where Python looks for modules is defined in a variable called path, which you can access through a module called sys. Enter the following : 
>>> import sys
>>> sys.path
_list of directories in the search path_

The value shown in place of _list of directories in the search path_ will depend on the configuration of your system. Regardless of the details, the string indicates a list of directories that Python searches (in order) when attempting to execute an import statement. The first module found that satisfies the import request is used. If there’s no satisfactory module in the module search path, an ImportError exception is raised. 

If you’re using IDLE, you can graphically look at the search path and the modules on it using the Path Browser window, which you can start from File menu of the Python Shell window : 
 

The sys.path variable is initialized from the value of the environment (operating system) variable PYTHONPATH, if it exists, or from a default value that’s dependent on your installation. In addition, whenever you run a Python script, the sys.path variable for that script has the directory containing the script inserted as its first element—this provides a convenient way of determining where the executing Python program is located. In an interactive session such as the previous one, the first element ofsys.path is set to the empty string, which Python takes as meaning that it should first look for modules in the current directory. 

- Where to place your own modules 
In the example that started this chapter, the mymath module was accessible to Python because (1) when you execute Python interactively, the first element of sys.pathis "", telling Python to look for modules in the current directory; and (2) you were executing Python in the directory that contained the mymath.py file. In a production environment, neither of these conditions will typically be true. You won’t be running Python interactively, and Python code files won’t be located in your current directory. In order to ensure that your programs can use modules you coded, you need to do one of the following : 
* Place your modules into one of the directories that Python normally searches for modules.
* Place all the modules used by a Python program into the same directory as the program.
* Create a directory (or directories) that will hold your modules, and modify the sys.path variable so that it includes this new directory.

Of these three options, the first is apparently the easiest and is also an option that you should never choose unless your version of Python includes local code directories in its default module search path. Such directories are specifically intended for site-specific code and aren’t in danger of being overwritten by a new Python install because they’re not part of the Python installation. If your sys.path refers to such directories, you can put your modules there. 

The second option is a good choice for modules that are associated with a particular program. Just keep them with the program. 

The third option is the right choice for site-specific modules that will be used in more than one program at that site. You can modify sys.path in various ways. You can assign to it in your code, which is easy, but doing so hard-codes directory locations into your program code; you can set the PYTHONPATH environment variable, which is relatively easy, but it may not apply to all users at your site; or you can add to the default search path using a .pth file. 

The directory or directories you set it to PYTHONPATH are prepended to the sys.path variable. If you use it, be careful that you don’t define a module with the same name as one of the existing library modules that you’re using or is being used for you. Your module will be found before the library module. In some cases, this may be what you want, but probably not often. 

You can avoid this issue using the .pth method. In this case, the directory or directories you added will be appended to sys.path. The last of these mechanisms is best illustrated by a quick example. On Windows, you can place this in the directory pointed to by sys.prefix. Assume your sys.prefix is : 
>>> import sys
>>> sys.prefix
'C:\\Software\\Python3.2.2'

Put the file myModules.pth under C:\\Software\\Python3.2.2 : 
- myModules.pth :
  1. mymodules  
  2. C:\Software\Python3.2.2\modules  

The next time a Python interpreter is started, sys.path will have : 
 

You can now place your modules in these directories. Note that the mymodules directory still runs the danger of being overwritten with a new installation. The modulesdirectory is safer. You also may have to move or create a mymodules.pth file when you upgrade Python. See the description of the site module in the Python Library Reference if you want more details on using .pth files. 

Private names in modules : 
We mentioned that you can enter from module import * to import almost all names from a module. The exception to this is that names in the module beginning with an underscore can’t be imported in this manner so that people can write modules that are intended for importation with from module import *. By starting all internal names (that is, names that shouldn’t be accessed outside the module) with an underscore, you can ensure that from module import * brings in only those names that the user will want to access. 

To see this in action, let’s assume we have a file called modtest.py, containing the code below : 
- modtest.py :
  1. """modtest: our test module"""  
  2. def f(x):  
  3.     return x  
  4. def _g(x):  
  5.     return x  
  6. a = 4  
  7. _b = 2  

Now, start up an interactive session, and enter the following : 
 

As you can see, the names f and a are imported, but the names _g and _b remain hidden outside of modtest. Note that this behavior occurs only with from ... import *. We can do the following to access _g or _b : 
>>> import modtest
>>> modtest._b
2
>>> from modtest import _g
>>> _g(5)
5

The convention of leading underscores to indicate private names is used throughout Python and not just in modules. You’ll encounter it in classes and packages too. 

Library and third-party modules : 
At the beginning of this chapter, I mentioned that the standard Python distribution is split into modules to make it more manageable. After you’ve installed Python, all the functionality in these library modules is available to you. All that’s needed is to import the appropriate modules, functions, classes, and so forth explicitly, before you use them. 

Many of the most common and useful standard modules are discussed throughout this book. But the standard Python distribution includes far more than what this book describes. At the very least, you should browse through the table of contents of the Python Library Reference

In IDLE, you can easily browse to and look at those written in Python using the Path Browser window. You can also search for example code that uses them with the Find in Files dialog box, which you can open from the Edit menu of the Python Shell window. You can search through your own modules as well in this way. 

Available third-party modules, and links to them, are identified on the Python home page. You need to download these and place them in a directory in your module search path in order to make them available for import into your programs. 

Python scoping rules and namespaces : 
Python’s scoping rules and namespaces will become more interesting as your experience as a Python programmer grows. If you’re new to Python, you probably don’t need to do anything more than quickly read through the text to get the basic ideas. For more details, look up "namespaces" in the Python Language Reference

The core concept here is that of a namespace. A namespace in Python is a mapping from identifiers to objects and is usually represented as a dictionary. When a block of code is executed in Python, it has three namespaces: local, global, and built-in (see figure 10.2). 
 
Figure 10.2 The order in which namespaces are checked to locate identifiers 

When an identifier is encountered during execution, Python first looks in the local namespace for it. If it isn’t found, the global namespace is looked in next. If it still hasn’t been found, the built-in namespace is checked. If it doesn’t exist there, this is considered an error and a NameError exception occurs. 

For a module, a command executed in an interactive session, or a script running from a file, the global and local namespaces are the same. Creating any variable or function or importing anything from another module results in a new entry, or binding, being made in this namespace. 

But when a function call is made, a local namespace is created, and a binding is entered in it for each parameter of the call. A new binding is then entered into this local namespace whenever a variable is created within the function. The global namespace of a function is the global namespace of the containing block of the function (that of the module, script file, or interactive session). It’s independent of the dynamic context from which it’s called. 

In all of these situations, the built-in namespace is that of the __builtins__ module. This module contains, among other things, all the built-in functions you’ve encountered (such as lenminmaxintfloatlonglist, tuplecmprangestr, and repr) and the other built-in classes in Python, such as the exceptions (like NameError). 

One thing that sometimes catches new Python programmers is the fact that you can override items in the built-in module. If, for example, you create a list in your program and put it in a variable called list, you can’t subsequently use the built-in list() function. The entry for your list is found first. There’s no differentiation between names for functions and modules and other objects. The most recent occurrence of a binding for a given identifier is used. 

Enough talk—it’s time to explore this with some examples. We use two built-in functions, locals() and globals(). They return dictionaries containing the bindings in the local and global namespaces, respectively. Start a new interactive session : 
 

The local and global namespaces for this new interactive session are the same. They have three initial key/value pairs that are for internal use: (1) an empty documentation string __doc__, (2) the main module name __name__ (which for interactive sessions and scripts run from files is always __main__), and (3) the module used for the built-in namespace __builtins__ (the module __builtins__). 

Now, if we continue by creating a variable and importing from modules, we’ll see a number of bindings created : 
 

As expected, the local and global namespaces continue to be equivalent. Entries have been added for z as a number, math as a module, and cos from the cmath module as a function. You can use the del statement to remove these new bindings from the namespace (including the module bindings created with the import statements) : 
 

The result isn’t drastic, because we’re able to import the math module and use it again. Using del in this manner can be handy when you’re in the interactive mode. 

Now, let’s look at a function created in an interactive session : 
 

If we dissect this apparent mess, we see that, as expected, upon entry the parameter x is the original entry in f’s local namespace, but y is added later. The global namespace is the same as that of our interactive session, because this is where f was defined. Note that it contains z, which was defined after f

In a production environment, you normally call functions that are defined in modules. Their global namespace is that of the module they’re defined in. Assume that we’ve created the file : 
- scopetest.py :
  1. """scopetest: our scope test module"""  
  2. v = 6  
  3. def f(x):  
  4.     """f: scope test function"""  
  5.     print("global: ", list(globals().keys()))  
  6.     print("entry local: ", locals())  
  7.     y = x  
  8.     w = v  
  9.     print("exit local: ", list(locals().keys()))  

Note that we’ll be printing only the keys (identifiers) of the dictionary returned by globals(). This will reduce the clutter in the results. It was necessary in this case due to the fact that in modules as an optimization, the whole __builtins__ dictionary is stored in the value field for the __builtins__ key: 
>>> import scopetest
>>> z = 2
>>> scopetest.f(z) # No 'z'! Because the global name space of module doesn't define variable z.
global: ['f', '__builtins__', '__file__', '__package__', 'v', '__cached__', '__name__', '__doc__']
entry local: {'x': 2}
exit local: ['y', 'x', 'w']

The global namespace is now that of the scopetest module and includes the function f and integer v (but not z from our interactive session). Thus, when creating a module, you have complete control over the namespaces of its functions. 

We’ve now covered local and global namespaces. Next, let’s move on to the built-in namespace. We’ll introduce another built-in function, dir(), which, given a module, returns a list of the names defined in it : 
>>> dir(scopetest)
['__builtins__', '__cached__', '__doc__', '__file__', '__name__', '__package__', 'f', 'v']

You can also at any time easily obtain the documentation string for any of them, either by using the help() function or by printing the docstring directly : 
>>> print(max.__doc__)
max(iterable[, key=func]) -> value
max(a, b, c, ...[, key=func]) -> value

With a single iterable argument, return its largest item.
With two or more arguments, return the largest argument.

The locals() and globals() functions can be useful as simple debugging tools. The dir() function doesn’t give the current settings; but if you call it without parameters, it returns a sorted list of the identifiers in the local namespace. This helps catch the mis-typed variable error that compilers may usually catch for you in languages that require declarations : 
>>> x1 = 6
>>> xl = x1 - 2 # You may mis-typed l (lower case of L) with 1 (one)
>>> x1
6
>>> dir()
['__builtins__', '__doc__', '__name__', '__package__', 'scopetest', 'x1', 'xl', 'z']

Supplement : 
[Python 學習筆記] 函式、類別與模組 : 模組 (匯入模組) 
[Python 學習筆記] 函式、類別與模組 : 模組 (import、import as、from import)

沒有留言:

張貼留言

網誌存檔

關於我自己

我的相片
Where there is a will, there is a way!