Preface
Python is a really powerful programming language of the modern era. To code in Python is as easy as implementing pseudo code and as verbose as the English language. On top of it, python boasts of a large repository of modules for just about any problem that the programmer wants to tinker with and a large community base supporting issue-resolution, discussions, best-practices and compatibility. No wonder, it is one of the top programming languages in the world today and a language of choice for working in certain domains such as Web development, Data Science & Machine Learning due to some powerful frameworks like Django, Flask, Tornado, Dash etc.
Yet, after everything said, once you actually start programming in Python, you discover that there are much more powerful features, hidden inside this deceptively simple language to handle tricky situations and common pitfalls with ease. For e.g. Python allows the programmer to use the ‘else’ keyword with loops; just as it is conventionally used with the ‘if’ keyword. Imagine a problem where you evaluate each element of a list to check if the list contains an even number or not. Using a for-else combination, you can write an implementation as simple as one given below; no need to use a Boolean variable to mark whether the even number was found or not in the list:
- #!/usr/bin/env python3
- # Sample function to process the even number
- def processEven(number):
- print("Even number found: {}".format(number))
- # The for-else loop
- for listOfNumbers in [[1,2,3,4,5], [1,3,5]]:
- for number in listOfNumbers:
- if number % 2 == 0:
- processEven(number)
- break
- else:
- print("An even number was not found in the list={} !".format(listOfNumbers))
This kind of implementation is extremely useful in initializing default values to return from a loop if a break statement is not encountered during the iterations. But this is only the tip of the iceberg. Here are the 5 advanced concepts to know in Python.
CONTEXT MANAGERS
Ever opened a file to perform a read/write operation? Of course you have. And did you remember to close the file and release the file-handle, after the operation has been done? Mostly! Enters Context Managers. Context Managers allow the programmer to open and close the file or connection object using a ‘[b]with’ and an ‘as’ keyword. It automatically takes cares of the object after execution has finished[/b]; making sure that the connection or file object is safely released or closed afterwards. The object is safely closed even when an error occurs while executing the processing logic within the block. So, mostly it replaces this code:
someFile = open(‘some_file’, ‘w’)
- try:
- someFile.write(‘Hello World !’)
- finally:
- someFile.close()
- with open(‘some_file’, ‘w’) as someFile:
- someFile.write(‘Hello World !’)
IMPLICIT TUPLE UNPACKING
Python supports this feature to allow functions like multiple assignments and multiple returns in a single statement that makes the life of the programmer easier. Multiple assignments means that this is possible:
- x, y = 10, 20
- x, y = y, x
- x, y = ‘OK’
- def function():
- # Some processing
- return name, salary, employeeID
- x, y, z = function()
- list = [(‘Alice’,25), (‘Bob’,30), (‘Jake’,27), (‘Barbara’,40)]
- for name, age in list:
- print(name,’is aged’,str(age))
Magic methods are implicitly invoked functions when certain operations occur on the object of a particular class. They are surrounded by double-underscores and each can be defined while creating your own class to easily impart certain properties to it. Consider the following code sample:
- #!/usr/bin/env python3
- class Employee:
- def __init__(self, name, ID, salary):
- self.empName = name
- self.empID = ID
- self.empSalary = salary
- def __str__(self):
- return f"Employee({self.empName}, {self.empID}, {self.empSalary})"
- def __repr__(self):
- return f"[ {self.empID} ] - {self.empName}"
- def __add__(self, secondObject):
- return (self.empSalary + secondObject.empSalary)
- if __name__ == "__main__":
- objAlice = Employee("Alice", "EMP001", 10000)
- print(objAlice)
- print(repr(objAlice))
- print("")
- objBob = Employee("Bob", "EMP002", 5000)
- print(objBob)
- print(repr(objBob))
- print("\nSum: {}".format(objAlice+objBob))
As is visible, the __str__() method is implicitly invoked when print() is called on the object. The __repr__() method defines a representation of the class object however it makes sense. The method __add__() allows you to define what happens when the ‘+’ operator is used with the objects of the class. The __init__() method is like a constructor of the class.
In short, magic methods lets the programmer define what happens when some of the common operators and functions are used on the object. Without the __add__() method defined in the above example, the interpreter won’t know what to do when two objects of the class are added together. Using the magic methods within the class definition, the programmer can control its behavior when used with common operators.
GENERATORS
Generators are lazy iterators that process an element in a list only when it is used. Consider a function that processes a very large list. Normally, the large list needs to be loaded in memory (in a container, a variable) before another function can process it. This means very large data corpus pose a space-complexity problem for the program. Imagine, instead a technique that loads the data only when it is their turn to be processed and not before that. That technique is called Generators in python. Consider the following code:
- #!/usr/bin/env python3
- import sys
- import time
- # Normal function
- def getList(limit):
- listVal = list()
- for i in range(limit):
- listVal.append(i)
- return listVal
- # Generator function
- def genList(limit):
- for i in range(limit):
- yield i
- if __name__ == "__main__":
- numLimit = 10000000
- print("\nWithout Generators:")
- startTime = time.time()
- numList = getList(numLimit)
- usedTime = time.time() - startTime
- usedMem = sys.getsizeof(numList)
- print(f" — Time: {usedTime} seconds")
- print(f" — Size: {usedMem}")
- print("\nWith Generators:")
- startTime = time.time()
- numGenerator = genList(numLimit)
- usedTime = time.time() - startTime
- usedMem = sys.getsizeof(numGenerator )
- print(f" — Time: {usedTime} seconds")
- print(f" — Size: {usedMem}")
Notice that in the normal function generates the list and returns the value to ‘numList’ variable, which would take up a lot of memory till the processing is complete. This becomes a serious problem while processing say a very large corpus of files. The second function though, doesn’t generates the list right away. The list elements would be ‘generated’ one-by-one as they need to be processed and hence, the time& space-complexity of the program remains low; even when processing Big Data.
Another big advantage of the generators is that at the ‘yield’ statement, control is passed back from the function to the calling program and the state of the local variables are remembered for the next iteration. This means that if you need to conditionally look for, say, prime numbers in the generated stream and stop processing when consecutive 3 numbers are detected which are not prime, you don’t need to have loaded a very large list from a file of 1000 numbers. Using Generators, you can load the elements one-by-one, process them till 3 consecutive non-primes appear, and then terminate the program.
DECORATORS
In Python, functions are objects; meaning that they can be passed as argument and returned from another functions. Decorators take advantage of this feature and provide a method to wrap functions inside another to impart additional functionalities without changing the behavior of the original function. Let me explain this with a use-case. Imagine you have written a program that does a lot of time taking operations like loading a large file, making an API call, generating a summary report etc. After you have written everything, you wish to calculate the time it takes for each of these operations. The most common method to do that is to use the time module as shown below:
- import time
- if __name__ == ‘__main__’:
- startTime = time.time()
- loadLargeFile("abc.txt")
- usedTime = time.time() - startTime
- print(f"Time: {usedTime} seconds’)
- #!/usr/bin/env python3
- import time
- def timeIt(func):
- def wrapper(*args, **kwargs):
- startTime = time.time()
- func(*args, **kwargs)
- usedTime = time.time() - startTime
- print(f"Time: {usedTime} seconds")
- return wrapper
- @timeIt
- def loadLargeFile(filename):
- print(f"Loading file: {filename}")
- time.sleep(2)
- @timeIt
- def makeAPICall():
- print("Making an API call and waiting for the response...")
- time.sleep(1.5)
- @timeIt
- def generateSummaryReport():
- print("Generating summary report...")
- time.sleep(5)
- if __name__ == "__main__":
- loadLargeFile("abc.txt")
- makeAPICall()
- generateSummaryReport()
As you can notice, this defines some functions (I am mocking long operations with a ‘sleep’ call) and a wrapper method — timeIt(), that has the code to calculate the time of the passed function object. Just by adding ‘@timeIt’ before the defined functions, one can wrap the function call inside the timeIt() wrapper. This is equivalent to doing:
- timeIt(loadLargeFile(‘abc.txt’))
- timeIt(makeAPICall())
- timeIt(generateSummaryReport())
I have mentioned some of the hidden features of the Python language in this post and tried to explain them with the help of some sample codes. But Python boasts of a lot of other strengths apart from the ones mentioned above. I strongly suggest reading up a little on the official documentation to uncover these. Using the advanced concepts like Generators and Decorators can actually mean shipping cleaner and highly-maintainable code which is also free from errors.
Supplement
* 如何使用 Python 進行字串格式化
沒有留言:
張貼留言