程式扎記: [ Python 文章收集 ] Python 的 asyncio 模組 (一)：異步執行的好處

Source From Here
Preface
最近在工作中實作爬蟲，常常使用到 Python 的 asyncio 模組，這是一個 python3.4 版才開始引入的異步框架標準模組，這在 IO 密集的任務中(比如說爬蟲)，實在是非常好用，但同時他又有些複雜，所以才紀錄一下一些基本用法與原理，整理一下近期使用的心得。在並發 (concurrency) 任務中，雖然 Python 因為本身 GIL (Global Interpreter Lock) 的限制，但 asyncio 模組支援異步 (asynchronous) 執行的功能，雖然本身還是只能達到單核 CPU 的效能，無法達到完全的平行 (parallelism) 運算，但至少在一些常調動io的任務中，可以讓 Python 不會因為 IO 的調動而阻塞，讓程式的運行可以完全發揮單核 CPU 的效率。

Source: What is asyncio?

Background Example
在講解一些概念之前，先用 Python 的 requests 模組 來對一個 url 重複的做 request，來看看非異步的程式有什麼效能上的限制：
- asyncio_e1.py

view plaincopy to clipboardprint?
#!/usr/bin/env python3  
import requests  
import time  
  
url = 'https://www.google.com.tw/'  
  
class Timer:  
    def __init__(self, do_start=True):  
        if do_start:  
            self.start()  
  
    def start(self):  
        self.st = time.time()  
  
    def end(self):  
        self.et = time.time()  
        return self.et - self.st  
  
  
def send_req(url):  
    _st = time.time()  
    print("Send a request to {}...".format(url))  
    resp = requests.get(url)  
    _diff = time.time() - _st  
    print("Receive a response for {:.02f} second(s).".format(_diff))  
  
  
def other_tasks():  
    _st = time.time()  
    time.sleep(0.5)  
    _diff = time.time() - _st  
    print("Other task took {:.01f} second(s).".format(_diff))  
  
num_of_req = 10  
timer = Timer()  
for i in range(num_of_req):  
    send_req(url)  
    other_tasks()  
  
print("After {} request(s), {:.02f} second(s) passed!".format(num_of_req, timer.end()))  

上面的程式碼對 Google 的入口網站做了十次的 request 與執行模擬其他工作的方法 other_tasks()，然後對發送 request 和接收 response 的時間做了紀錄. 以下是這個 script 執行的結果：

# ./asyncio_e1.py
Send a request to https://www.google.com.tw/...
Receive a response for 0.08 second(s).
Other task took 0.5 second(s).
Send a request to https://www.google.com.tw/...
Receive a response for 0.08 second(s).
Other task took 0.5 second(s).
...
After 10 request(s), 6.48 second(s) passed!

發送 request 到收到 response 的時間卻長很多，大約要 0.08 秒~0.1 秒。這段等待 sever 回傳 response 的過程，就是 IO 調度的過程，但這過程若要讓 CPU 掛在一旁等待，實在是太浪費時間了，所以才會引入異步執行的 programing 方式，讓 IO 調度的過程中，程式不會掛在一旁等待，而是繼續執行下一條指令。

Adopting Asyncio
現在我們用 asyncio 模組 以異步的方式重複上一段程式所做的事，程式的細節先不要理他，用執行結果來看有沒有為程式的速度帶來提升：
- asyncio_e2.py

view plaincopy to clipboardprint?
#!/usr/bin/env python3  
import requests  
import time  
import asyncio  
import threading  
  
loop = asyncio.get_event_loop()  
url = 'https://www.google.com.tw/'  
  
class Timer:  
    def __init__(self, do_start=True):  
        if do_start:  
            self.start()  
  
    def start(self):  
        self.st = time.time()  
  
    def end(self):  
        self.et = time.time()  
        return self.et - self.st  
  
  
async def send_req(url):  
    _st = time.time()  
    print("Send a request to {}...".format(url))  
    res = await loop.run_in_executor(None,requests.get,url)  
    _diff = time.time() - _st  
    print("Receive a response for {:.02f} second(s).".format(_diff))  
  
  
def other_tasks():  
    _st = time.time()  
    time.sleep(0.5)  
    _diff = time.time() - _st  
    print("Other task took {:.01f} second(s).".format(_diff))  
  
  
num_of_req = 10  
timer = Timer()  
  
# 1) Let high IO tasks be handled in asyncio way  
tasks = []  
for i in range(num_of_req):  
    task = loop.create_task(send_req(url))  
    tasks.append(task)  
  
future_thd = threading.Thread(target=loop.run_until_complete, args=(asyncio.wait(tasks),))  
future_thd.start()  
  
# 2) Executing normal tasks while waiting the asyncio tasks  
for i in range(num_of_req):  
    other_tasks()  
  
  
# 3) Wait for asyncio tasks  
future_thd.join()  
  
  
loop.close()  
print("After {} request(s), {:.02f} second(s) passed!".format(num_of_req, timer.end()))  

執行過程如下:

# ./asyncio_e2.py
Send a request to https://www.google.com.tw/...
Send a request to https://www.google.com.tw/...
Send a request to https://www.google.com.tw/...
Send a request to https://www.google.com.tw/...
Send a request to https://www.google.com.tw/...
Send a request to https://www.google.com.tw/...
Send a request to https://www.google.com.tw/...
Send a request to https://www.google.com.tw/...
Send a request to https://www.google.com.tw/...
Send a request to https://www.google.com.tw/...
Send a request to https://www.google.com.tw/...
Receive a response for 0.07 second(s).
Receive a response for 0.07 second(s).
Receive a response for 0.07 second(s).
Receive a response for 0.08 second(s).
Receive a response for 0.07 second(s).
Receive a response for 0.07 second(s).
Receive a response for 0.07 second(s).
Receive a response for 0.08 second(s).
Other task took 0.5 second(s).
Receive a response for 0.64 second(s).
Receive a response for 0.65 second(s).
Other task took 0.5 second(s).
Other task took 0.5 second(s).
Other task took 0.5 second(s).
Other task took 0.5 second(s).
Other task took 0.5 second(s).
Other task took 0.5 second(s).
Other task took 0.5 second(s).
Other task took 0.5 second(s).
Other task took 0.5 second(s).
After 10 request(s), 5.01 second(s) passed!

用了異步執行的方式後，把時間縮短到 5.01 秒，這裡還有一個值得注意的地方是，結果的前十行都是"Send a request"，代表程式沒有因為 IO 調度而被掛在一旁，而是繼續把剩下的 request 發完，這就是異步執行所達到的效果。以上只是很粗略的說明異步執行為 IO 密集任務所帶來的好處，其中還有非常多的細節以及 asyncio 的用法留待以後探討。

Supplement
* python 的 asyncio 模組(二)：異步程式設計基本概念
* python 的 asyncio 模組(三)：建立 Event Loop 和定義協程
* Hands-on Python 3 Concurrency With the asyncio Module
* Python 的 GIL 是什麼鬼，多線程性能究竟如何

剛接觸Python的時候時常聽到GIL這個詞，並且發現這個詞經常和Python無法高效的實現多線程劃上等號。本著不光要知其然，還要知其所以然的研究態度，博主蒐集了各方面的資料，花了一周內幾個小時的閒暇時間深入理解了下GIL，並歸納成此文，也希望讀者能通過次本文更好且客觀的理解 GIL...

* Python Asyncio Tutorial
* Real Python - An Intro to Threading in Python

程式扎記

標籤

2019年4月20日星期六

[ Python 文章收集 ] Python 的 asyncio 模組 (一)：異步執行的好處

沒有留言:

張貼留言

[Git 常見問題] error: The following untracked working tree files would be overwritten by merge

檢舉濫用情形

學習筆記

標籤

2019年4月20日 星期六