Source From Here
Iterators
We use for statement for looping over a list:
If we use it with a string, it loops over its characters:
If we use it with a dictionary, it loops over its keys:
If we use it with a file, it loops over lines of the file:
So there are many types of objects which can be used with a for loop. These are called
iterable objects.
There are many functions which consume these iterables:
The Iteraton Protocol
The built-in function iter takes an iterable object and returns an iterator.
Iterators
We use for statement for looping over a list:
- >>> for i in [1,2,3,4]:
- ... print i,
- ...
- 1 2 3 4
- >>> for c in 'python':
- ... print c
- ...
- p
- y
- t
- h
- o
- n
- >>> for k in {"x": 1, "y": 2}:
- ... print k
- ...
- y
- x
- >>> for line in open("a.txt"):
- ... print line,
- ...
- first line
- second line
There are many functions which consume these iterables:
- >>> ",".join(["a", "b", "c"])
- 'a,b,c'
- >>> ",".join({"x": 1, "y": 2})
- 'y,x'
- >>> list("python")
- ['p', 'y', 't', 'h', 'o', 'n']
- >>> list({"x": 1, "y": 2})
- ['y', 'x']
The built-in function iter takes an iterable object and returns an iterator.
- >>> x = iter([1, 2, 3])
- >>> x
0x1004ca850 >
Iterators are implemented as classes. Here is an iterator that works like built-in xrange function.
- class yrange:
- def __init__(self, n):
- self.i = 0
- self.n = n
- def __iter__(self):
- return self
- def next(self):
- if self.i < self.n:
- i = self.i
- self.i += 1
- return i
- else:
- raise StopIteration()
- >>> y = yrange(3)
- >>> y.next()
- 0
- >>> y.next()
- 1
- >>> y.next()
- 2
- >>> y.next()
- Traceback (most recent call last):
- File "
" , line 1, in - File "
" , line 14, in next - StopIteration
- >>> list(yrange(5))
- [0, 1, 2, 3, 4]
- >>> sum(yrange(5))
- 10
- class zrange:
- def __init__(self, n):
- self.n = n
- def __iter__(self):
- return zrange_iter(self.n)
- class zrange_iter:
- def __init__(self, n):
- self.i = 0
- self.n = n
- def __iter__(self):
- # Iterators are iterables too.
- # Adding this functions to make them so.
- return self
- def next(self):
- if self.i < self.n:
- i = self.i
- self.i += 1
- return i
- else:
- raise StopIteration()
Problem 1: Let's practice what we learn so far. Please write an iterator class reverse_iter, that takes a list and iterates it from the reverse direction.
- >>> it = reverse_iter([1, 2, 3, 4])
- >>> it.next()
- 4
- >>> it.next()
- 3
- >>> it.next()
- 2
- >>> it.next()
- 1
- >>> it.next()
- Traceback (most recent call last):
- File "
" , line 1, in - StopIteration
- class reverse_iter:
- def __init__(self, list):
- self.data = list
- def __iter__(self):
- return rIter(self.data)
- class rIter:
- def __init__(self, list):
- self.data = list
- self.i = len(list)
- def next(self):
- self.i=self.i-1
- if self.i >= 0:
- return self.data[self.i]
- else:
- raise StopIteration()
- ri = reverse_iter([1,2,3,4,5])
- for i in ri:
- print i,
- print ""
- ri = reverse_iter("Hello")
- for i in ri:
- print i,
Generators
Generators simplifies creation of iterators. A generator is a function that produces a sequence of results instead of a single value. For example:
- def yrange(n):
- i = 0
- while i < n:
- yield i
- i += 1
- >>> y = yrange(3)
- >>> y
0x401f30 >
The word “generator” is confusingly used to mean both the function that generates and what it generates. In this chapter, I’ll use the word “generator” to mean the genearted object and “generator function” to mean the function that generates it.
Can you think about how it is working internally?
When a generator function is called, it returns an generator object without even beginning execution of the function. When next` method is called for the first time, the function starts executing until it reaches yield statement. The yielded value is returned by the next call.
The following example demonstrates the interplay between yield and call to next method on generator object:
- >>> def foo():
- ... print "begin"
- ... for i in range(3):
- ... print "before yield", i
- ... yield i
- ... print "after yield", i
- ... print "end"
- ...
- >>> f = foo()
- >>> f.next()
- begin
- before yield 0
- 0
- >>> f.next()
- after yield 0
- before yield 1
- 1
- >>> f.next()
- after yield 1
- before yield 2
- 2
- >>> f.next()
- after yield 2
- end
- Traceback (most recent call last):
- File "
" , line 1, in - StopIteration
- >>>
- def integers():
- """Infinite sequence of integers."""
- i = 1
- while True:
- yield i
- i = i + 1
- def squares():
- for i in integers():
- yield i * i
- def take(n, seq):
- """Returns first n values from the given sequence."""
- seq = iter(seq)
- result = []
- try:
- for i in range(n):
- result.append(seq.next())
- except StopIteration:
- pass
- return result
- print take(5, squares()) # prints [1, 4, 9, 16, 25]
Generator Expressions are generator version of list comprehensions. They look like list comprehensions, but returns a generator back instead of a list:
When there is only one argument to the calling function, the parenthesis around generator expression can be omitted:
Another fun example, lets say we want to find first 10 (or any n) pythogorian triplets. A triplet (x, y, z) is called pythogorian triplet if x*x + y*y == z*z.
It is easy to solve this problem if we know till what value of z to test for. But we want to find first n pythogorian triplets:
Example: Reading multiple files
Lets say we want to write a program that takes a list of filenames as arguments and prints contents of all those files, like cat command in unix. The traditional way to implement it is:
- def cat(filenames):
- for f in filenames:
- for line in open(f):
- print line,
- def grep(pattern, filenames):
- for f in filenames:
- for line in open(f):
- if pattern in line:
- print line,
- def readfiles(filenames):
- for f in filenames:
- for line in open(f):
- yield line
- def grep(pattern, lines):
- return (line for line in lines if pattern in lines)
- def printlines(lines):
- for line in lines:
- print line,
- def main(pattern, filenames):
- lines = readfiles(filenames)
- lines = grep(pattern, lines)
- printlines(lines)
Problem 2: Write a program that takes one or more filenames as arguments and prints all the lines which are longer than 40 characters.
Answer 2:
- import random, string
- def readfiles(filenames):
- for f in filenames:
- for line in f:
- yield line
- def printlines(lines):
- for line in lines:
- print("{0} ({1})".format(line, len(line)))
- # Generate testing random files
- filenames = []
- for i in range(10): # Generate 10 temple files
- f=[]
- for j in range(random.randint(10,20)):
- f.append(''.join(random.choice(string.ascii_uppercase + string.digits) for _ in range(random.randint(30,50))))
- filenames.append(f)
- lines = readfiles(filenames)
- u40=(line for line in lines if len(line)>40)
- printlines(u40)
Answer 3:
- import random, string, os
- from os.path import *
- from os import listdir
- def traverse_file(file):
- if isfile(file):
- yield abspath(file)
- elif isdir(file):
- yield "{0} (dir)".format(file)
- for f in listdir(file):
- f = "{0}\{1}".format(file, f)
- for sf in traverse_file(f):
- yield sf
- else:
- yield abspath(file)
- def num_of_pyfile(path):
- tf = traverse_file(path)
- pylist = [f for f in tf if f.endswith(".py")]
- for py in pylist: print py
- return len(pylist)
- print "Total {0} .py files!".format(num_of_pyfile("C:\\John\\EclipseBase\\PyLab"))
Problem 5: Write a function to compute the total number of lines of code, ignoring empty and comment lines, in all python files in the specified directory recursively.
Problem 6: Write a program split.py, that takes an integer n and a filename as command line arguments and splits the file into multiple small files with each having n lines.
Itertools
The itertools module in the standard library provides lot of intersting tools to work with iterators. Lets look at some of the interesting functions.
chain – chains multiple iterators together.
- >>> it1 = iter([1, 2, 3])
- >>> it2 = iter([4, 5, 6])
- >>> itertools.chain(it1, it2)
- [1, 2, 3, 4, 5, 6]
- >>> for x, y in itertools.izip(["a", "b", "c"], [1, 2, 3]):
- ... print x, y
- ...
- a 1
- b 2
- c 3
- >>> it = iter(range(5))
- >>> x, it1 = peep(it)
- >>> print x, list(it1)
- 0 [0, 1, 2, 3, 4]
- >>> list(enumerate(["a", "b", "c"])
- [(0, "a"), (1, "b"), (2, "c")]
- >>> for i, c in enumerate(["a", "b", "c"]):
- ... print i, c
- ...
- 0 a
- 1 b
- 2 c
Problem 10: Implement a function izip that works like itertools.izip.
Supplement
* Python Gossip - yield 產生器
* Python: List Comprehensions
沒有留言:
張貼留言