2018年12月25日 星期二

[ Python 常見問題 ] Is there a simple process-based parallel map for python?

Source From Here 
Question 
I'm looking for a simple process-based parallel map for python. With native support map function, the performance: 
  1. In [1]: data = range(10000000)  
  2.   
  3. In [2]: time alist = list(map(lambda e:(e*5+1)/2, data))  
  4. CPU times: user 1.48 s, sys: 47.6 ms, total: 1.53 s  
  5. Wall time: 1.53 s  
  6.   
  7. In [3]: time olist = [(e*5+1)/2 for e in data]  
  8. CPU times: user 862 ms, sys: 54 ms, total: 916 ms  
  9. Wall time: 917 ms  
How-To 
I seems like what you need is the map method in multiprocessing.Pool()
map(func, iterable[, chunksize])

A parallel equivalent of the map() built-in function (it supports only one iterable argument though). It blocks till the result is ready. This method chops the iterable into a number of chunks which it submits to the process pool as separate tasks. The (approximate) size of these chunks can be specified by setting chunksize to a positive integer

Below is the sample code to show the usage: 
- test.py 
  1. #!/usr/bin/env python3  
  2. import multiprocessing  
  3. from datetime import datetime  
  4.   
  5. def f(e):  
  6.     return (e*5+1)/2  
  7.   
  8. data = range(10000000)  
  9. pool = multiprocessing.Pool()  
  10. st = datetime.now()  
  11. print("Start at {}".format(st))  
  12. mlist = pool.map(f, data)  
  13. diff = datetime.now() - st  
  14. print("Done with {} ms".format(diff.microseconds/1000))  
Execution result: 
$ ./test.py
Start at 2018-12-25 22:11:46.570080
Done with 245.617 ms

Supplement 
Python 文章收集 - multiprocessing 模塊介紹

沒有留言:

張貼留言

[Git 常見問題] error: The following untracked working tree files would be overwritten by merge

  Source From  Here 方案1: // x -----删除忽略文件已经对 git 来说不识别的文件 // d -----删除未被添加到 git 的路径中的文件 // f -----强制运行 #   git clean -d -fx 方案2: 今天在服务器上  gi...