2019年4月20日 星期六

[ Python 文章收集 ] Python 的 asyncio 模組 (一):異步執行的好處


Source From Here 
Preface 
最近在工作中實作爬蟲,常常使用到 Python 的 asyncio 模組,這是一個 python3.4 版才開始引入的異步框架標準模組,這在 IO 密集的任務中(比如說爬蟲),實在是非常好用,但同時他又有些複雜,所以才紀錄一下一些基本用法與原理,整理一下近期使用的心得。在並發 (concurrency) 任務中,雖然 Python 因為本身 GIL (Global Interpreter Lock) 的限制,但 asyncio 模組支援異步 (asynchronous) 執行的功能,雖然本身還是只能達到單核 CPU 的效能,無法達到完全的平行 (parallelism) 運算,但至少在一些常調動io的任務中,可以讓 Python 不會因為 IO 的調動而阻塞,讓程式的運行可以完全發揮單核 CPU 的效率。 


Background Example 
在講解一些概念之前,先用 Python 的 requests 模組 來對一個 url 重複的做 request,來看看非異步的程式有什麼效能上的限制: 
asyncio_e1.py 

  1. #!/usr/bin/env python3  
  2. import requests  
  3. import time  
  4.   
  5. url = 'https://www.google.com.tw/'  
  6.   
  7. class Timer:  
  8.     def __init__(self, do_start=True):  
  9.         if do_start:  
  10.             self.start()  
  11.   
  12.     def start(self):  
  13.         self.st = time.time()  
  14.   
  15.     def end(self):  
  16.         self.et = time.time()  
  17.         return self.et - self.st  
  18.   
  19.   
  20. def send_req(url):  
  21.     _st = time.time()  
  22.     print("Send a request to {}...".format(url))  
  23.     resp = requests.get(url)  
  24.     _diff = time.time() - _st  
  25.     print("Receive a response for {:.02f} second(s).".format(_diff))  
  26.   
  27.   
  28. def other_tasks():  
  29.     _st = time.time()  
  30.     time.sleep(0.5)  
  31.     _diff = time.time() - _st  
  32.     print("Other task took {:.01f} second(s).".format(_diff))  
  33.   
  34. num_of_req = 10  
  35. timer = Timer()  
  36. for i in range(num_of_req):  
  37.     send_req(url)  
  38.     other_tasks()  
  39.   
  40. print("After {} request(s), {:.02f} second(s) passed!".format(num_of_req, timer.end()))  
上面的程式碼對 Google 的入口網站做了十次的 request 與執行模擬其他工作的方法 other_tasks(),然後對發送 request 和接收 response 的時間做了紀錄. 以下是這個 script 執行的結果: 
# ./asyncio_e1.py
Send a request to https://www.google.com.tw/...
Receive a response for 0.08 second(s).
Other task took 0.5 second(s).
Send a request to https://www.google.com.tw/...
Receive a response for 0.08 second(s).
Other task took 0.5 second(s).
...
After 10 request(s), 6.48 second(s) passed!

發送 request 到收到 response 的時間卻長很多,大約要 0.08 秒~0.1 秒。這段等待 sever 回傳 response 的過程,就是 IO 調度的過程,但這過程若要讓 CPU 掛在一旁等待,實在是太浪費時間了,所以才會引入異步執行的 programing 方式,讓 IO 調度的過程中,程式不會掛在一旁等待,而是繼續執行下一條指令。 

Adopting Asyncio 
現在我們用 asyncio 模組 以異步的方式重複上一段程式所做的事,程式的細節先不要理他,用執行結果來看有沒有為程式的速度帶來提升: 
asyncio_e2.py 
  1. #!/usr/bin/env python3  
  2. import requests  
  3. import time  
  4. import asyncio  
  5. import threading  
  6.   
  7. loop = asyncio.get_event_loop()  
  8. url = 'https://www.google.com.tw/'  
  9.   
  10. class Timer:  
  11.     def __init__(self, do_start=True):  
  12.         if do_start:  
  13.             self.start()  
  14.   
  15.     def start(self):  
  16.         self.st = time.time()  
  17.   
  18.     def end(self):  
  19.         self.et = time.time()  
  20.         return self.et - self.st  
  21.   
  22.   
  23. async def send_req(url):  
  24.     _st = time.time()  
  25.     print("Send a request to {}...".format(url))  
  26.     res = await loop.run_in_executor(None,requests.get,url)  
  27.     _diff = time.time() - _st  
  28.     print("Receive a response for {:.02f} second(s).".format(_diff))  
  29.   
  30.   
  31. def other_tasks():  
  32.     _st = time.time()  
  33.     time.sleep(0.5)  
  34.     _diff = time.time() - _st  
  35.     print("Other task took {:.01f} second(s).".format(_diff))  
  36.   
  37.   
  38. num_of_req = 10  
  39. timer = Timer()  
  40.   
  41. 1) Let high IO tasks be handled in asyncio way  
  42. tasks = []  
  43. for i in range(num_of_req):  
  44.     task = loop.create_task(send_req(url))  
  45.     tasks.append(task)  
  46.   
  47. future_thd = threading.Thread(target=loop.run_until_complete, args=(asyncio.wait(tasks),))  
  48. future_thd.start()  
  49.   
  50. 2) Executing normal tasks while waiting the asyncio tasks  
  51. for i in range(num_of_req):  
  52.     other_tasks()  
  53.   
  54.   
  55. 3) Wait for asyncio tasks  
  56. future_thd.join()  
  57.   
  58.   
  59. loop.close()  
  60. print("After {} request(s), {:.02f} second(s) passed!".format(num_of_req, timer.end()))  
執行過程如下: 
# ./asyncio_e2.py
Send a request to https://www.google.com.tw/...
Send a request to https://www.google.com.tw/...
Send a request to https://www.google.com.tw/...
Send a request to https://www.google.com.tw/...
Send a request to https://www.google.com.tw/...
Send a request to https://www.google.com.tw/...
Send a request to https://www.google.com.tw/...
Send a request to https://www.google.com.tw/...
Send a request to https://www.google.com.tw/...
Send a request to https://www.google.com.tw/...
Send a request to https://www.google.com.tw/...
Receive a response for 0.07 second(s).
Receive a response for 0.07 second(s).
Receive a response for 0.07 second(s).
Receive a response for 0.08 second(s).
Receive a response for 0.07 second(s).
Receive a response for 0.07 second(s).
Receive a response for 0.07 second(s).
Receive a response for 0.08 second(s).
Other task took 0.5 second(s).
Receive a response for 0.64 second(s).
Receive a response for 0.65 second(s).
Other task took 0.5 second(s).
Other task took 0.5 second(s).
Other task took 0.5 second(s).
Other task took 0.5 second(s).
Other task took 0.5 second(s).
Other task took 0.5 second(s).
Other task took 0.5 second(s).
Other task took 0.5 second(s).
Other task took 0.5 second(s).
After 10 request(s), 5.01 second(s) passed!

用了異步執行的方式後,把時間縮短到 5.01 秒,這裡還有一個值得注意的地方是,結果的前十行都是"Send a request",代表程式沒有因為 IO 調度而被掛在一旁,而是繼續把剩下的 request 發完,這就是異步執行所達到的效果。以上只是很粗略的說明異步執行為 IO 密集任務所帶來的好處,其中還有非常多的細節以及 asyncio 的用法留待以後探討。 

Supplement 
python 的 asyncio 模組(二):異步程式設計基本概念 
python 的 asyncio 模組(三):建立 Event Loop 和 定義協程 
Hands-on Python 3 Concurrency With the asyncio Module 
Python 的 GIL 是什麼鬼,多線程性能究竟如何 
剛接觸Python的時候時常聽到GIL這個詞,並且發現這個詞經常和Python無法高效的實現多線程劃上等號。本著不光要知其然,還要知其所以然的研究態度,博主蒐集了各方面的資料,花了一周內幾個小時的閒暇時間深入理解了下GIL,並歸納成此文,也希望讀者能通過次本文更好且客觀的理解 GIL...

Python Asyncio Tutorial 
Real Python - An Intro to Threading in Python

2019年4月18日 星期四

[ Python 文章收集 ] SQLAlchemy ORM - Updating Objects

Source From Here 
Updating Objects 
In this chapter, we will see how to modify or update the table with desired values. To modify data of a certain attribute of any object, we have to assign new value to it and commit the changes to make the change persistent. Let us fetch an object from the table whose primary key identifier, in our Customers table with ID=2. We can use get() method of session as follows: 
  1. x = session.query(Customers).get(2)  
We can display contents of the selected object with the below given code: 
  1. print ("Name: ", x.name, "Address:", x.address, "Email:", x.email)  
From our customers table, following output should be displayed: 
Name: Komal Pande Address: Koti, Hyderabad Email: komal@gmail.com

Now we need to update the Address field by assigning new value as given below: 
  1. >>> x.address = 'Banjara Hills Secunderabad'  
  2. >>> session.commit()  
  3. 2019-04-18 14:56:28,346 INFO sqlalchemy.engine.base.Engine UPDATE customers SET address=%(address)s WHERE customers.id = %(customers_id)s  
  4. 2019-04-18 14:56:28,346 INFO sqlalchemy.engine.base.Engine {'address''Banjara Hills Secunderabad''customers_id'2}  
  5. 2019-04-18 14:56:28,349 INFO sqlalchemy.engine.base.Engine COMMIT  
The change will be persistently reflected in the database. Now we fetch object corresponding to first row in the table by using first() method as follows: 
  1. >>> x = session.query(Customers).filter(Customers.id == 2).first()  
  2. 2019-04-18 14:59:19,520 INFO sqlalchemy.engine.base.Engine BEGIN (implicit)  
  3. 2019-04-18 14:59:19,521 INFO sqlalchemy.engine.base.Engine SELECT customers.id AS customers_id, customers.name AS customers_name, customers.address AS customers_address, customers.email AS customers_email  
  4. FROM customers  
  5. WHERE customers.id = %(id_1)s  
  6. LIMIT %(param_1)s  
  7. 2019-04-18 14:59:19,521 INFO sqlalchemy.engine.base.Engine {'id_1'2'param_1'1}  
  8.   
  9. >>> print ("Name: ", x.name, "Address:", x.address, "Email:", x.email)  
Now, the output for the above code displaying the updated row is as follows: 
Name: Komal Pande Address: Banjara Hills Secunderabad Email: komal@gmail.com

Now change name attribute and display the contents using the below code: 
>>> x.name = 'John Lee'
>>> print ("Name: ", x.name, "Address:", x.address, "Email:", x.email)
Name: John Lee Address: Banjara Hills Secunderabad Email: komal@gmail.com

Even though the change is displayed, it is not committed. You can retain the earlier persistent position by using rollback() method with the code below: 
>>> session.rollback()
2019-04-18 15:04:18,402 INFO sqlalchemy.engine.base.Engine ROLLBACK

>>> print ("Name: ", x.name, "Address:", x.address, "Email:", x.email)
2019-04-18 15:04:20,483 INFO sqlalchemy.engine.base.Engine BEGIN (implicit)
2019-04-18 15:04:20,484 INFO sqlalchemy.engine.base.Engine SELECT customers.id AS customers_id, customers.name AS customers_name, customers.address AS customers_address, customers.email AS customers_email
FROM customers
WHERE customers.id = %(param_1)s
2019-04-18 15:04:20,484 INFO sqlalchemy.engine.base.Engine {'param_1': 2}
Name: Komal Pande Address: Banjara Hills Secunderabad Email: komal@gmail.com

For bulk updates, we shall use update() method of the Query object. Let us try and give a prefix, ‘Mr.’ to name in each row (Except ID = 2). The corresponding update() statement is as follows: 
>>> session.query(Customers).filter(Customers.id != 2).update({Customers.name: "Mr." + Customers.name}, synchronize_session=False)
2019-04-18 15:07:38,472 INFO sqlalchemy.engine.base.Engine UPDATE customers SET name=(%(name_1)s || customers.name) WHERE customers.id != %(id_1)s
2019-04-18 15:07:38,473 INFO sqlalchemy.engine.base.Engine {'name_1': 'Mr.', 'id_1': 2}
3

>>> for row in session.query(Customers).all():
... print("ID={}; Name={}".format(row.id, row.name))
...
2019-04-18 15:09:05,153 INFO sqlalchemy.engine.base.Engine SELECT customers.id AS customers_id, customers.name AS customers_name, customers.address AS customers_address, customers.email AS customers_email
FROM customers
2019-04-18 15:09:05,154 INFO sqlalchemy.engine.base.Engine {}

ID=2; Name=Komal Pande
ID=1; Name=Mr.Ravi Kumar
ID=3; Name=Mr.Rajender Nath
ID=4; Name=Mr.S.M.Krishna

The update() method requires two parameters as follows: 
* A dictionary of key-values with key being the attribute to be updated, and value being the new contents of attribute.
* synchronize_session attribute mentioning the strategy to update attributes in the session. Valid values are false: for not synchronizing the session, fetch: performs a select query before the update to find objects that are matched by the update query; and evaluate: evaluate criteria on objects in the session.

Three out of 4 rows in the table will have name prefixed with ‘Mr.’ However, the changes are not committed and hence will not be reflected in the table view. It will be refreshed only when we commit the session.

[ Python 文章收集 ] Python 的 asyncio 模組 (一):異步執行的好處

Source From  Here   Preface   最近在工作中實作爬蟲,常常使用到 Python 的  asyncio 模組 ,這是一個 python3.4 版才開始引入的異步框架標準模組,這在 IO 密集的任務中( 比如說爬蟲 ),實在是非常好用,但同時他又有...