程式扎記: [Quick Python] 6. Strings

Converting from objects to strings :
In Python, almost anything can be converted to some sort of a string representation, using the built-in repr() function. Lists are the only complex Python data types you’re familiar with so far, so let’s turn some lists into their representations :

The example uses repr to convert the list x into a string representation, which is then concatenated with the other string to form the final string. Without the use ofrepr(), this wouldn’t work. In an expression like "string" + [1, 2] + 3, are you trying to add strings, or add lists, or just add numbers? Python doesn’t know what you want in such a circumstance, and it will do the safe thing (raise an error) rather than make any assumptions. In the previous example, all the elements had to be converted to string representations before the string concatenation would work.

Lists are the only complex Python objects that have been described to this point, but repr() can be used to obtain some sort of string representation for almost any Python object. To see this, try repr() around a built-in complex object—an actual Python function :

>>> repr(len)
''

Python hasn’t produced a string containing the code that implements the len function, but it has at least returned a string—

—that describes what that function is. If you keep the repr() function in mind and try it on each Python data type (dictionaries, tuples, classes, and the like) as we get to them in the book, you’ll see that no matter what type of Python object you have, you can get a string saying something about that object. This is great for debugging programs. If you’re in doubt as to what’s held in a variable at a certain point in your program, use repr() and print out the contents of that variable.

We’ve covered how Python can convert any object into a string that describes that object. The truth is, Python can do this in either of two different ways. The repr()function always returns what might be loosely called the formal string representation of a Python object. More specifically, repr() returns a string representation of a Python object from which the original object can be rebuilt. For large, complex objects, this may not be the sort of thing you wish to see in debugging output or status reports.

Python also provides the built-in str() function. In contrast to repr(), str() is intended to produce printable string representations, and it can be applied to any Python object. str() returns what might be called the informal string representation of the object. A string returned by str() need not define an object fully and is intended to be read by humans, not by Python code.

You won’t notice any difference between repr() and str() when you first start using them, because until you begin using the object-oriented features of Python, there is no difference. str applied to any built-in Python object always calls repr to calculate its result. It’s only when you start defining your own classes that the difference between str() and repr() becomes important. This will be discussed in chapter 15.

Using the format method :
You can format strings in Python 3 in two ways. The newer way to format strings in Python is to use the string class’s format method. The format method combines a format string containing replacement fields marked with { } with replacement values taken from the parameters given to the format command. If you need to include a literal { or } in the string, you double it to {{ or }}. The format command is a powerful string-formatting mini-language and offers almost endless possibilities for manipulating string formatting. On the other hand, it’s fairly simple to use for the most common use cases, so we’ll look at a few basic patterns. Then, if you need to use the more advanced options, you can refer to the string-formatting section of the standard library documentation.

- The format method and positional parameters
The simplest use of the string format() method uses numbered replacement fields that correspond to the parameters passed to the format function :

>>> "{0} is the {1} of {2}".format("Ambrosia", "food", "the gods") # (1)
'Ambrosia is the food of the gods'
>>> "{{Ambrosia}} is the {0} of {1}".format("food", "the gods") # (2)
'{Ambrosia} is the food of the gods'

Note that the format method is applied to the format string, which can also be a string variable (1). Doubling the { } characters escapes them so that they don’t mark a replacement field (2). No matter where in the format string we place {0}, it will always be replaced by the first parameter, and so on.

- The format method and named parameters
The format() method also recognizes named parameters and replacement fields :

>>> "{food} is the food of {user}".format(food="Ambrosia", user="the gods")
'Ambrosia is the food of the gods'

In this case, the replacement parameter is chosen by matching the name of the replacement field to the name of the parameter given to the format command. You can also use both positional and named parameters, and you can even access attributes and elements within those parameters :

>>> "{0} is the food of {user[1]}. {1}!".format("Ambrosia", user=["men", "the gods", "others"], "Bye")
File "", line 1
SyntaxError: non-keyword arg after keyword arg # The key word argument should always before positional argument!
>>> "{0} is the food of {user[1]}. {1}!".format("Ambrosia", "Bye", user=["men", "the gods", "others"])
'Ambrosia is the food of the gods. Bye!'

In this case, the first parameter is positional, and the second, user[1], refers to the second element of the named parameter user.

- Format specifiers
Format specifiers let you specify the result of the formatting with even more power and control than the formatting sequences of the older style of string formatting. The format specifier lets you control the fill character, alignment, sign, width, precision, and type of the data when it’s substituted for the replacement field. The following examples give you an idea of its usefulness :

:10 is a format specifier that makes the field 10 spaces wide and pads with spaces (1). :{1} takes the width from the second parameter (2). :>10 forces left justification of the field and pads with spaces (4). :&>10 forces left justification and pads with & instead of spaces (5).

Formatting strings with % :
This section covers formatting strings with the string modulus (%) operator. It’s used to combine Python values into formatted strings for printing or other use. C users will notice a strange similarity to the printf family of functions. The use of % for string formatting is the old style of string formatting, and I cover it here because it was the standard in earlier versions of Python and you’re likely to see it in code that’s been ported from earlier versions of Python or was written by coders familiar with those versions.

This style of formatting shouldn’t be used in new code, because it’s slated to be deprecated and then removed from the language in the future :

>>> "%s is the %s of %s" % ("Ambrosia", "food", "the gods")
'Ambrosia is the food of the gods'

The string modulus operator (the bold % that occurs in the middle, not the three instances of %s that come before it in the example) takes two parts: the left side, which is a string; and the right side, which is a tuple. The string modulus operator scans the left string for special formatting sequences and produces a new string by substituting the values on the right side for those formatting sequences, in order. In this example, the only formatting sequences on the left side are the three instances of %s, which stands for "stick a string in here."

The members of the tuple on the right will have str applied to them automatically by %s, so they don’t have to already be strings :

>>> x = [1, 2, "three"]
>>> "The %s contains %s" % ("list", x)
"The list contains [1, 2, 'three']"

- Using formatting sequences
All formatting sequences are substrings contained in the string on the left side of the central %. Each formatting sequence begins with a percent sign and is followed by one or more characters that specify what is to be substituted for the formatting sequence and how the substitution is accomplished. The %s formatting sequence used previously is the simplest formatting sequence, and it indicates that the corresponding string from the tuple on the right side of the central % should be substituted in place of the %s.

Other formatting sequences can be more complex. This one specifies the field width (total number of characters) of a printed number to be six, specifies the number of characters after the decimal point to be two, and left-justifies the number in its field. I’ve put in angle brackets so you can see where extra spaces are inserted into the formatted string :

>>> "Pi is <%-6.2f>" % 3.14159 # Use of the formatting sequence: %-6.2f
'Pi is <3.14 >'
>>> "Pi is <%6.2f>" % 3.14159 # Use of the formatting sequence: %6.2f
'Pi is < 3.14>'

- Named parameters and formatting sequences
Finally, one additional feature is available with the % operator that can be useful in certain circumstances. Unfortunately, to describe it we’re going to have to employ a Python feature we haven’t used yet—dictionaries, commonly called hashtables or associative arrays by other languages. You can skip ahead to the next chapter, “Dictionaries,” to learn about dictionaries, skip this section for now and come back to it later, or read straight through, trusting to the examples to make things clear.

Formatting sequences can specify what should be substituted for them by name rather than by position. When you do this, each formatting sequence has a name in parentheses, immediately following the initial % of the formatting sequence, like so :

"%(pi).2f" # Note name in parentheses

In addition, the argument to the right of the % operator is no longer given as a single value or tuple of values to be printed but rather as a dictionary of values to be printed, with each named formatting sequence having a correspondingly named key in the dictionary. Using the previous formatting sequence with the string modulus operator, we might produce code like this :

>>> num_dict = {'e':2.718, 'pi':3.14159}
>>> print("%(pi).2f - %(pi).4f - %(e).2f" % num_dict)
3.14 - 3.1416 - 2.72

This is particularly useful when you’re using format strings that perform a large number of substitutions, because you no longer have to keep track of the positional correspondences of the right-side tuple of elements with the formatting sequences in the format string. The order in which elements are defined in the dict argument is irrelevant, and the template string may use values from dict more than once (as it does with the 'pi' entry).

Using the print() function’s options gives you enough control for simple text output, but more complex situations are best served by using the format method.

Bytes :
A bytes object is similar to a string object but with an important difference. A string is an immutable sequence of Unicode characters, whereas a bytes object is a sequence of integers with values from 0 to 256. Bytes can be necessary when you’re dealing with binary data—for example, reading from a binary data file. The key thing to remember is that bytes objects may look like strings, but they can’t be used exactly like a string and they can’t be combined with strings :

>>> unicode_a_with_acute = '\N{LATIN SMALL LETTER A WITH ACUTE}'
>>> unicode_a_with_acute
'\xe1'
>>> xb = unicode_a_with_acute.encode() # (1)
>>> xb
b'\xc3\xa1' # (2)
>>> xb += 'A' # (3)
Traceback (most recent call last):
File "", line 1, in
TypeError: can't concat bytes to str
>>> xb.decode() # (4)
'\xe1'

The first thing you can see is that to convert from a regular (Unicode) string to bytes, you need to call the string’s encode method (1). After it’s encoded to a bytes object, the character is now 2 bytes and no longer prints the same way the string did (2). Further, if you attempt to add a bytes object and a string object together, you get a type error, because the two are incompatible types (3). Finally, to convert a bytes object back to a string, you need to call that object’s decode method (4).

Most of the time, you shouldn’t need to think about Unicode or bytes at all. But when you need to deal with international character sets, an increasingly common issue, you must understand the difference between regular strings and bytes.

Supplement :
* [Quick Python] 6. Strings - Part 1
* [Python 學習筆記] 起步走 : 內建型態與操作 (字串格式化)

程式扎記

標籤

2012年2月2日星期四

[Quick Python] 6. Strings - Part 2

沒有留言:

張貼留言

[Git 常見問題] error: The following untracked working tree files would be overwritten by merge

檢舉濫用情形

學習筆記

標籤

2012年2月2日 星期四