2021年1月30日 星期六

[ Python 文章收集 ] Python Multiprocessing For Beginners

 Source From Here

Preface
This article is going to teach you, How to apply python multiprocessing for your long-running functions

What is multiprocessing,
Basically, multiprocessing means run two or more tasks parallelly. So in Python, We can use python’s inbuilt multiprocessing module to achieve that. Imagine you have ten functions that take ten seconds to run and you're in a situation that you want to run that long-running function ten times. Without a doubt, It will take hundred seconds to finish if you run it sequentially. That is where multiprocessing comes into action. By using multiprocessing, you can separate those ten processes into ten sub-processes and complete them all in ten seconds.

Different between multiprocessing and multithreading,
So didn’t you wonder why we use multiprocessing instead of multithreading? It is good to use multithreading in the above example, but if your function required more processing power and more memory, It is ideal to use multiprocessing because when you use multiprocessing, each sub-process will have a dedicated CPU and Memory slot. So it is ideal to use multiprocessing instead of multithreading (multi-threading has another issue called GIL) if your long-running function required more processing power and memory:


Let’s see multiprocessing in action,
Imagine this is your long-running function:
  1. def factorize(number):  
  2.     for i in range(1, number + 1):  
  3.         if number % i == 0:  
  4.             yield i  
If you want to run this function ten times without using multiprocessing or multithreading it will look something like this:
  1. from time import time  
  2.   
  3. numbers = [8402868229573859383427925426987612448712994514789235665432185321895033218765]  
  4. start = time()  
  5. for number in numbers:  
  6.     list(factorize(number))  
  7. end = time()  
  8. print ('Took %.3f seconds' % (end - start))  
Output:
Took 17.259 seconds

Let’s see how to apply multiprocessing to this simple example. First of all, you will have to import python’s multiprocessing module,
  1. import multiprocessing  
Then you have to make an object from the Process and pass the target function and arguments if any. e.g.:
  1. def print_factorize(num, q):  
  2.     q.put((num, list(factorize(num))))  
  3.   
  4. q = mp.Queue()  
  5. process = mp.Process(target=print_factorize, args=(8402868, q, ))  
So now we can call its start method to start the execution of the function factorize:
  1. process.start()  
  2. process.join()  
  3.   
  4. while not q.empty():  
  5.     print(q.get())  
Output:
(8402868, [1, 2, 3, 4, 6, 9, 12, 18, 36, 41, 82, 123, 164, 246, 369, 492, 738, 1476, 5693, 11386, 17079, 22772, 34158, 51237, 68316, 102474, 204948, 233413, 466826, 700239, 933652, 1400478, 2100717, 2800956, 4201434, 8402868])

Then our for loop will look like this:
  1. import multiprocessing as mp  
  2. from time import time  
  3.   
  4. def factorize(number):  
  5.     for i in range(1, number + 1):  
  6.         if number % i == 0:  
  7.             yield i  
  8.   
  9. def print_factorize(num, q):  
  10.     start = time()  
  11.     ans = list(factorize(num))  
  12.     end = time()  
  13.     q.put((num, ans, end - start))  
  14.   
  15. start = time()  
  16.   
  17. numbers = [8402868229573859383427925426987612448712994514789235665432185321895033218765]  
  18. plist = []  
  19. q = mp.Queue()  
  20. for n in numbers:  
  21.     process = mp.Process(target=print_factorize, args=(n, q, ))  
  22.     plist.append(process)  
  23.     process.start()  
  24.   
  25. for p in plist:  
  26.     p.join()  
  27.   
  28. while not q.empty():  
  29.     num, flist, et = q.get()  
  30.     print(f"{num} took {et} seconds!")  
  31.   
  32. end = time()  
  33. print ('Total took %.3f seconds' % (end - start))  
Execution result:
2295738 took 0.3090219497680664 seconds!
5938342 took 0.5050191879272461 seconds!
7925426 took 0.776221513748169 seconds!
8402868 took 0.9802758693695068 seconds!
14789235 took 1.1664249897003174 seconds!
33218765 took 2.0890510082244873 seconds!
53218950 took 3.24381947517395 seconds!
66543218 took 3.6708388328552246 seconds!
87129945 took 4.638180255889893 seconds!
98761244 took 4.760260343551636 seconds!
Total took 4.795 seconds
If you run the calculation sequentially, you will take 0.309 + 0.505 + ... + 4.638 + 4.76 >> 4.795 seconds!

沒有留言:

張貼留言

[Git 常見問題] error: The following untracked working tree files would be overwritten by merge

  Source From  Here 方案1: // x -----删除忽略文件已经对 git 来说不识别的文件 // d -----删除未被添加到 git 的路径中的文件 // f -----强制运行 #   git clean -d -fx 方案2: 今天在服务器上  gi...