This chapter discusses dictionaries, Python’s name for associative arrays, which it implements using hash tables. Dictionaries are amazingly useful, even in simple
programs. Because dictionaries are less familiar to many programmers than other basic data structures such as lists and strings, some of the examples illustrating dictionary use are slightly more complex than the corresponding examples for other built-in data structures. It may be necessary to read parts of the next chapter, “Control flow,” to fully understand some of the examples in this chapter. This chapter covers :
What is a dictionary ?
If you’ve never used associative arrays or hash tables in other languages, then a good way to start understanding the use of dictionaries is to compare them with lists :
In spite of the differences between them, use of dictionaries and lists often appears alike. As a start, an empty dictionary is created much like an empty list, but with curly braces instead of square brackets :
After you create a dictionary, values may be stored in it as if it were a list :
Even in these assignments, there is already a significant operational difference between the dictionary and list usage. Trying to do the same thing with a list would result in an error, because in Python it’s illegal to assign to a position in a list that doesn’t already exist. This isn’t a problem with dictionaries; new positions in dictionaries are created as necessary.
Having stored some values in the dictionary, we can now access and use them :
All in all, this makes a dictionary look pretty much like a list. Now for the big difference; let’s store (and use) some values under keys that aren’t integers :
This is definitely something that can’t be done with lists! Whereas list indices must be integers, dictionary keys are much less restricted—they may be numbers, strings, or one of a wide range of other Python objects. This makes dictionaries a natural for jobs that lists can’t do.
- Why dictionaries are called dictionaries
A dictionary is a way of mapping from one set of arbitrary objects to an associated but equally arbitrary set of objects. Actual dictionaries, thesauri, or translation books are a good analogy in the real world. To see how natural this correspondence is, here is the start of an English-to-French color translator :
Other dictionary operations :
Besides basic element assignment and access, dictionaries support a number of other operations. You can define a dictionary explicitly as a series of key/value pairs separated by commas :
You can obtain all the keys in the dictionary with the keys() method. This is often used to iterate over the contents of a dictionary using Python’s for loop, described in chapter 8 :
The order of the keys in a list returned by keys() has no meaning—they aren’t necessarily sorted, and they don’t necessarily occur in the order they were created. Your Python may print out the keys in a different order than my Python did. If you need keys sorted, you can store them in a list variable and then sort that list.
It’s also possible to obtain all the values stored in a dictionary, using values() :
You can use the items() method to return all keys and their associated values as a sequence of tuples :
Like keys(), this is often used in conjunction with a for loop to iterate over the contents of a dictionary. The del statement can be used to remove an entry (key/value pair) from a dictionary :
Attempting to access a key that isn’t in a dictionary is an error in Python. To handle this, you can test the dictionary for the presence of a key with the in keyword, which returns True if a dictionary has a value stored under the given key and False otherwise :
Alternatively, you can use the get() function. It returns the value associated with a key, if the dictionary contains that key, but returns its second argument if the dictionary doesn’t contain the key :
(The second argument is optional. If it isn’t included, get() returns None if the dictionary doesn’t contain the key.)
Similarly, if you want to safely get a key’s value and make sure it’s set to a default in the dictionary, you can use the setdefault() method :
The difference between get and setdefault is that after the setdefault call, there is a key in the dictionary 'chartreuse' with the value 'No translation'.
You can obtain a copy of a dictionary using the copy() method :
This makes a shallow copy of the dictionary. This will likely be all you need in most situations. For dictionaries that contain any modifiable objects as values (that is, lists or other dictionaries), you may want to make a deep copy using the copy.deepcopy function. See "Nested lists and deep copies" (section 5.6) of "Lists, tuples, and sets" (chapter 5) for an introduction to the concept of shallow and deep copies.
The update() method updates a first dictionary with all the key/value pairs of a second dictionary. For keys that are common to both, the values from the second dictionary override those of the first :
Dictionary methods give you a full set of tools to manipulate and use dictionaries. For quick reference, refer to table 7.1 :
(This isn’t a complete list of all dictionary operations. For a complete list, refer to the official Python documentation.)
Word counting :
Assume that we have a file that contains a list of words, one word per line. We want to know how many times each word occurs in the file. Dictionaries can be used to do this easily :
We increment the occurrences count for each word. This is a good example of the power of dictionaries. The code is not only simple, but because dictionary operations are highly optimized in Python, it’s also quite fast.
What can be used as a key ?
The previous examples use strings as keys, but Python permits more than just strings to be used in this manner. Any Python object that is immutable and hashable can be used as a key to a dictionary.
In Python, as discussed earlier, any object that can be modified is called mutable. Lists are mutable, because list elements can be added, changed, or removed. Dictionaries are also mutable, for the same reason. Numbers are immutable. If a variable x is holding the number 3, and you assign 4 to x, you’ve changed the value in x, but you haven’t changed the number 3; 3 is still 3. Strings are also immutable. list[n] returns the nth element of list, string[n] returns the nth character of string, and list[n] = value changes the nth element of list, but string[n] = character is illegal in Python and causes an error.
Unfortunately, the requirement that keys be immutable and hashable means that lists can’t be used as dictionary keys. But there are many instances when it would be convenient to have a listlike key. For example, it’s convenient to store information about a person under a key consisting of both their first and last names, which could be easily done if we could use a two-element list as a key.
Python solves this difficulty by providing tuples, which are basically immutable lists—they’re created and used similarly to lists, except that when you have them, you can’t modify them. But there’s one further restriction: keys must also be hashable, which takes things a step further than just immutable. To be hashable, a value must have a hash value (provided by a __hash__() method) that never changes throughout the life of the value. That means that tuples containing mutable values, although they themselves are immutable, aren’t hashable. Only tuples that don’t contain any mutable objects nested within them are hashable and valid to use as keys for dictionaries. Table 7.2 illustrates which of Python’s built-in types are immutable, hashable, and eligible to be dictionary keys :
The next sections give examples illustrating how tuples and dictionaries can work together.
Sparse matrices :
In mathematical terms, a matrix is a two-dimensional grid of numbers, usually written in textbooks as a grid with square brackets on each side, as shown at below :
A fairly standard way to represent such a matrix is by means of a list of lists. In Python, it’s presented like this :
Elements in the matrix can then be accessed by row and column number :
But in some applications, such as weather forecasting, it’s common for matrices to be very large—thousands of elements to a side, meaning millions of elements in total. It’s also common for such matrices to contain many zero elements. In some applications, all but a small percentage of the matrix elements may be set to zero. In order to conserve memory, it’s common for such matrices to be stored in a form where only the nonzero elements are actually stored. Such representations are called sparse matrices. It’s simple to implement sparse matrices using dictionaries with tuple indices. For example, the previous sparse matrix can be represented as follows :
Now, you can access an individual matrix element at a given row and column number by the following bit of code :
If you’re considering doing extensive work with matrices, you may want to look into NumPy, the numeric computation package.
Dictionaries as caches :
The following is an example of how dictionaries can be used as caches, data structures that store results to avoid recalculating those results over and over. A short while ago, I wrote a function called sole(), which took three integers as arguments and returned a result. It looked something like this :
But sole() was called with only about 200 different combinations of arguments during any program run. That is, I might call sole(12, 20, 6) some 50 or more times during the execution of my program and similarly for many other combinations of arguments. By eliminating the recalculation of sole() on identical arguments, I’d save a huge amount of time. I used a dictionary with tuples as keys, like so :
Efficiency of dictionaries :
If you come from a traditional compiled-language background, you may hesitate to use dictionaries, worrying that they’re less efficient than lists (arrays). The truth is that the Python dictionary implementation is quite fast. Many of the internal language features rely on dictionaries, and a lot of work has gone into making them efficient. Because all of Python’s data structures are heavily optimized, you shouldn’t spend much time worrying about which is faster or more efficient. If the problem can be solved more easily and cleanly by using a dictionary than by using a list, do it that way, and consider alternatives only if it’s clear that dictionaries are causing an unacceptable slowdown.