2018年7月27日 星期五

[ Python 常見問題 ] requests - How to download large file in python?

Source From Here 
Question 
Requests is a really nice library. I'd like to use it for download big files (>1GB). The problem is it's not possible to keep whole file in memory I need to read it in chunks. And this is a problem with the following code: 
  1. import requests  
  2.   
  3. def DownloadFile(url)  
  4.     local_filename = url.split('/')[-1]  
  5.     r = requests.get(url)  
  6.     f = open(local_filename, 'wb')  
  7.     for chunk in r.iter_content(chunk_size=512 * 1024):   
  8.         if chunk: # filter out keep-alive new chunks  
  9.             f.write(chunk)  
  10.     f.close()  
  11.     return   
By some reason it doesn't work this way. It still loads response into memory before save it to a file. 

How-To 
It's much easier if you use Response.raw and shutil.copyfileobj()
  1. import requests  
  2. import shutil  
  3.   
  4. def download_file(url):  
  5.     local_filename = url.split('/')[-1]  
  6.     r = requests.get(url, stream=True)  
  7.     with open(local_filename, 'wb') as f:  
  8.         shutil.copyfileobj(r.raw, f)  
  9.   
  10.     return local_filename  
This streams the file to disk without using excessive memory, and the code is simple. For large file, you need to write the content piece by piece to avoid "out of memory": 
  1. def download_file(url):  
  2.     local_filename = url.split('/')[-1]  
  3.     # NOTE the stream=True parameter  
  4.     r = requests.get(url, stream=True)  
  5.     with open(local_filename, 'wb') as f:  
  6.         for chunk in r.iter_content(chunk_size=1024):   
  7.             if chunk: # filter out keep-alive new chunks  
  8.                 f.write(chunk)  
  9.                 #f.flush() commented by recommendation from J.F.Sebastian  
  10.     return local_filename  
See http://docs.python-requests.org/en/latest/user/advanced/#body-content-workflow for further reference.

沒有留言:

張貼留言

[Git 常見問題] error: The following untracked working tree files would be overwritten by merge

  Source From  Here 方案1: // x -----删除忽略文件已经对 git 来说不识别的文件 // d -----删除未被添加到 git 的路径中的文件 // f -----强制运行 #   git clean -d -fx 方案2: 今天在服务器上  gi...