程式扎記: [Quick Python] 7. Dictionaries

Preface :
This chapter discusses dictionaries, Python’s name for associative arrays, which it implements using hash tables. Dictionaries are amazingly useful, even in simple
programs. Because dictionaries are less familiar to many programmers than other basic data structures such as lists and strings, some of the examples illustrating dictionary use are slightly more complex than the corresponding examples for other built-in data structures. It may be necessary to read parts of the next chapter, “Control flow,” to fully understand some of the examples in this chapter. This chapter covers :

* Defining a dictionary
* Using dictionary operations
* Determining what can be used as a key
* Creating sparse matrices
* Using dictionaries as caches
* Trusting the efficiency of dictionaries

What is a dictionary ?
If you’ve never used associative arrays or hash tables in other languages, then a good way to start understanding the use of dictionaries is to compare them with lists :

* Values in lists are accessed by means of integers called indices, which indicate where in the list a given value is found.
* Dictionaries access values by means of integers, strings, or other Python objects called keys, which indicate where in the dictionary a given value is found. In other words, both lists and dictionaries provide indexed access to arbitrary values, but the set of items that can be used as dictionary indices is much larger than, and contains, the set of items that can be used as list indices. Also, the mechanism that dictionaries use to provide indexed access is quite different than that used by lists.
* Both lists and dictionaries can store objects of any type.
* Values stored in a list are implicitly ordered by their position in the list, because the indices that access these values are consecutive integers; you may or may not care about this ordering, but you can use it if desired. Values stored in a dictionary aren’t implicitly ordered relative to one another because dictionary keys aren’t just numbers. Note that if you’re using a dictionary, you can define an ordering on the items in a dictionary by using another data structure (often a list) to store such an ordering explicitly; this doesn’t change the fact that dictionaries have no implicit (built-in) ordering.

In spite of the differences between them, use of dictionaries and lists often appears alike. As a start, an empty dictionary is created much like an empty list, but with curly braces instead of square brackets :

>>> x = []
>>> y = {}

After you create a dictionary, values may be stored in it as if it were a list :

>>> y[0] = 'Hello'
>>> y[1] = 'Goodbye'

Even in these assignments, there is already a significant operational difference between the dictionary and list usage. Trying to do the same thing with a list would result in an error, because in Python it’s illegal to assign to a position in a list that doesn’t already exist. This isn’t a problem with dictionaries; new positions in dictionaries are created as necessary.

Having stored some values in the dictionary, we can now access and use them :

>>> print(y[0])
Hello
>>> y[1] + ", Friend."
'Goodbye, Friend.'

All in all, this makes a dictionary look pretty much like a list. Now for the big difference; let’s store (and use) some values under keys that aren’t integers :

>>> y["two"] = 2
>>> y["pi"] = 3.14
>>> y["two"] * y["pi"]
6.2800000000000002

This is definitely something that can’t be done with lists! Whereas list indices must be integers, dictionary keys are much less restricted—they may be numbers, strings, or one of a wide range of other Python objects. This makes dictionaries a natural for jobs that lists can’t do.

- Why dictionaries are called dictionaries
A dictionary is a way of mapping from one set of arbitrary objects to an associated but equally arbitrary set of objects. Actual dictionaries, thesauri, or translation books are a good analogy in the real world. To see how natural this correspondence is, here is the start of an English-to-French color translator :

>>> english_to_french = {} # Creates empty dictionary
>>> english_to_french['red'] = 'rouge' # Stores three words in it
>>> english_to_french['blue'] = 'bleu'
>>> english_to_french['green'] = 'vert'
>>> print("red is ", english_to_french['red']) # Obtains value for 'red'
red is rouge

Other dictionary operations :
Besides basic element assignment and access, dictionaries support a number of other operations. You can define a dictionary explicitly as a series of key/value pairs separated by commas :

>>> english_to_french = {'red': 'rouge', 'blue': 'bleu', 'green': 'vert'}
>>> len(english_to_french) # len returns the number of entries in a dictionary
3

You can obtain all the keys in the dictionary with the keys() method. This is often used to iterate over the contents of a dictionary using Python’s for loop, described in chapter 8 :

>>> list(english_to_french.keys())
['blue', 'green', 'red']

The order of the keys in a list returned by keys() has no meaning—they aren’t necessarily sorted, and they don’t necessarily occur in the order they were created. Your Python may print out the keys in a different order than my Python did. If you need keys sorted, you can store them in a list variable and then sort that list.

It’s also possible to obtain all the values stored in a dictionary, using values() :

>>> list(english_to_french.values())
['bleu', 'vert', 'rouge']

You can use the items() method to return all keys and their associated values as a sequence of tuples :

>>> list(english_to_french.items())
[('blue', 'bleu'), ('green', 'vert'), ('red', 'rouge')]

Like keys(), this is often used in conjunction with a for loop to iterate over the contents of a dictionary. The del statement can be used to remove an entry (key/value pair) from a dictionary :

>>> list(english_to_french.items())
[('blue', 'bleu'), ('green', 'vert'), ('red', 'rouge')]
>>> del english_to_french['green']
>>> list(english_to_french.items())
[('blue', 'bleu'), ('red', 'rouge')]

Attempting to access a key that isn’t in a dictionary is an error in Python. To handle this, you can test the dictionary for the presence of a key with the in keyword, which returns True if a dictionary has a value stored under the given key and False otherwise :

>>> 'red' in english_to_french
True
>>> 'orange' in english_to_french
False

Alternatively, you can use the get() function. It returns the value associated with a key, if the dictionary contains that key, but returns its second argument if the dictionary doesn’t contain the key :

>>> print(english_to_french.get('blue', 'No translation'))
bleu
>>> print(english_to_french.get('orange', 'No translation')) # Not exist key as 'orange'
No translation

(The second argument is optional. If it isn’t included, get() returns None if the dictionary doesn’t contain the key.)

Similarly, if you want to safely get a key’s value and make sure it’s set to a default in the dictionary, you can use the setdefault() method :

>>> list(english_to_french.keys())
['blue', 'red']
>>> english_to_french.setdefault('orange', 'No translation') # If the key 'orange' not exist, it will then be added to dict['orange']='No translation'
'No translation'
>>> list(english_to_french.keys())
['blue', 'orange', 'red']
>>> english_to_french.setdefault('blue', 'No translation')
'bleu'

The difference between get and setdefault is that after the setdefault call, there is a key in the dictionary 'chartreuse' with the value 'No translation'.

You can obtain a copy of a dictionary using the copy() method :

>>> x = {0: 'zero', 1: 'one'}
>>> y = x.copy()
>>> y
{0: 'zero', 1: 'one'}

This makes a shallow copy of the dictionary. This will likely be all you need in most situations. For dictionaries that contain any modifiable objects as values (that is, lists or other dictionaries), you may want to make a deep copy using the copy.deepcopy function. See "Nested lists and deep copies" (section 5.6) of "Lists, tuples, and sets" (chapter 5) for an introduction to the concept of shallow and deep copies.

The update() method updates a first dictionary with all the key/value pairs of a second dictionary. For keys that are common to both, the values from the second dictionary override those of the first :

>>> z = {1: 'One', 2: 'Two'}
>>> x = {0: 'zero', 1: 'one'}
>>> x.update(z) # x is updated by z
>>> x
{0: 'zero', 1: 'One', 2: 'Two'}
>>> z
{1: 'One', 2: 'Two'} # z doesn't change
>>> y = {1: 'one', 2: 'two', 3: 'three'}
>>> x.update(y)
>>> x # The key '3' from y is added to x
{0: 'zero', 1: 'one', 2: 'two', 3: 'three'}

Dictionary methods give you a full set of tools to manipulate and use dictionaries. For quick reference, refer to table 7.1 :

(This isn’t a complete list of all dictionary operations. For a complete list, refer to the official Python documentation.)

Word counting :
Assume that we have a file that contains a list of words, one word per line. We want to know how many times each word occurs in the file. Dictionaries can be used to do this easily :

>>> sample_string = "To be or not to be"
>>> occurrences = {}
>>> for word in sample_string.split():
... occurrences[word] = occurrences.get(word, 0) + 1
...
>>> for word in occurrences:
... print("The word(", word, ") occurs ", occurrences[word], \
... " times in the string")
...
The word( not ) occurs 1 times in the string
The word( To ) occurs 1 times in the string
The word( or ) occurs 1 times in the string
The word( to ) occurs 1 times in the string
The word( be ) occurs 2 times in the string

We increment the occurrences count for each word. This is a good example of the power of dictionaries. The code is not only simple, but because dictionary operations are highly optimized in Python, it’s also quite fast.

What can be used as a key ?
The previous examples use strings as keys, but Python permits more than just strings to be used in this manner. Any Python object that is immutable and hashable can be used as a key to a dictionary.

In Python, as discussed earlier, any object that can be modified is called mutable. Lists are mutable, because list elements can be added, changed, or removed. Dictionaries are also mutable, for the same reason. Numbers are immutable. If a variable x is holding the number 3, and you assign 4 to x, you’ve changed the value in x, but you haven’t changed the number 3; 3 is still 3. Strings are also immutable. list[n] returns the nth element of list, string[n] returns the nth character of string, and list[n] = value changes the nth element of list, but string[n] = character is illegal in Python and causes an error.

Unfortunately, the requirement that keys be immutable and hashable means that lists can’t be used as dictionary keys. But there are many instances when it would be convenient to have a listlike key. For example, it’s convenient to store information about a person under a key consisting of both their first and last names, which could be easily done if we could use a two-element list as a key.

Python solves this difficulty by providing tuples, which are basically immutable lists—they’re created and used similarly to lists, except that when you have them, you can’t modify them. But there’s one further restriction: keys must also be hashable, which takes things a step further than just immutable. To be hashable, a value must have a hash value (provided by a __hash__() method) that never changes throughout the life of the value. That means that tuples containing mutable values, although they themselves are immutable, aren’t hashable. Only tuples that don’t contain any mutable objects nested within them are hashable and valid to use as keys for dictionaries. Table 7.2 illustrates which of Python’s built-in types are immutable, hashable, and eligible to be dictionary keys :

The next sections give examples illustrating how tuples and dictionaries can work together.

Sparse matrices :
In mathematical terms, a matrix is a two-dimensional grid of numbers, usually written in textbooks as a grid with square brackets on each side, as shown at below :

A fairly standard way to represent such a matrix is by means of a list of lists. In Python, it’s presented like this :

matrix = [[3, 0, -2, 11], [0, 9, 0, 0], [0, 7, 0, 0], [0, 0, 0, -5]]

Elements in the matrix can then be accessed by row and column number :

element = matrix[rownum][colnum]

But in some applications, such as weather forecasting, it’s common for matrices to be very large—thousands of elements to a side, meaning millions of elements in total. It’s also common for such matrices to contain many zero elements. In some applications, all but a small percentage of the matrix elements may be set to zero. In order to conserve memory, it’s common for such matrices to be stored in a form where only the nonzero elements are actually stored. Such representations are called sparse matrices. It’s simple to implement sparse matrices using dictionaries with tuple indices. For example, the previous sparse matrix can be represented as follows :

matrix = {(0, 0): 3, (0, 2): -2, (0, 3): 11, (1, 1): 9, (2, 1): 7, (3, 3): -5}

Now, you can access an individual matrix element at a given row and column number by the following bit of code :

view plaincopy to clipboardprint?
if (rownum, colnum) in matrix:  
    element = matrix[(rownum, colnum)]  
else:  
    element = 0  

A slightly less clear (but more efficient) way of doing this is to use the dictionary get() method, which you can tell to return 0 if it can’t find a key in the dictionary and otherwise return the value associated with that key. This avoids one of the dictionary lookups :

element = matrix.get((rownum, colnum), 0)

If you’re considering doing extensive work with matrices, you may want to look into NumPy, the numeric computation package.

Dictionaries as caches :
The following is an example of how dictionaries can be used as caches, data structures that store results to avoid recalculating those results over and over. A short while ago, I wrote a function called sole(), which took three integers as arguments and returned a result. It looked something like this :

view plaincopy to clipboardprint?
def sole(m, n, t):  
    # . . . do some time-consuming calculations . . .  
    return(result)  

The problem with this function was that it really was time consuming, and because I was calling sole() tens of thousands of times, the program ran too slowly.

But sole() was called with only about 200 different combinations of arguments during any program run. That is, I might call sole(12, 20, 6) some 50 or more times during the execution of my program and similarly for many other combinations of arguments. By eliminating the recalculation of sole() on identical arguments, I’d save a huge amount of time. I used a dictionary with tuples as keys, like so :

view plaincopy to clipboardprint?
sole_cache = {}  
def sole(m, n, t):  
    if (m, n, t) in sole_cache:  
        return sole_cache[(m, n, t)]  
    else:  
        # . . . do some time-consuming calculations . . .  
        sole_cache[(m, n, t)] = result  
        return result  

The rewritten sole() function uses a global variable to store previous results. The global variable is a dictionary, and the keys of the dictionary are tuples corresponding to argument combinations that have been given to sole() in the past. Then, any time sole() passes an argument combination for which a result has already been calculated, it returns that stored result, rather than recalculating it.

Efficiency of dictionaries :
If you come from a traditional compiled-language background, you may hesitate to use dictionaries, worrying that they’re less efficient than lists (arrays). The truth is that the Python dictionary implementation is quite fast. Many of the internal language features rely on dictionaries, and a lot of work has gone into making them efficient. Because all of Python’s data structures are heavily optimized, you shouldn’t spend much time worrying about which is faster or more efficient. If the problem can be solved more easily and cleanly by using a dictionary than by using a list, do it that way, and consider alternatives only if it’s clear that dictionaries are causing an unacceptable slowdown.

程式扎記

標籤

2012年2月5日星期日

[Quick Python] 7. Dictionaries

沒有留言:

張貼留言

[Git 常見問題] error: The following untracked working tree files would be overwritten by merge

檢舉濫用情形

學習筆記

標籤

2012年2月5日 星期日