Source From Here
Preface
Impact of Python’s GIL on multithreaded program explained!
Threads are a common programming construct. A thread is a separate flow of execution. This means that our program will have two things happening at once. In Python, by default programs run as a single process with a single thread of execution; this uses just a single CPU.
It’s tempting to think of threading as having two (or more) different processors running on our program, each one doing an independent task at the same time. That’s almost right. In Python, the threads may be running on different processors, but they will only be running one at a time.
Python’s Global Interpreter Lock (GIL)
CPython (the standard python implementation) has something called the GIL (Global Interpreter Lock); the GIL prevents two threads from executing simultaneously in the same program.
The GIL limits parallel programming in Python out of the box. Since the GIL allows only one thread to execute at a time, even in a multi-threaded architecture with more than one CPU core, the GIL has gained a reputation as an “infamous” feature of Python. (Refer here to know more about it)
In this article, we’ll learn how the GIL affects the performance of our multithreaded Python programs.
Because of the way CPython implementation of Python works, threading may not speed up all tasks. Again, this is due to interactions with the GIL that essentially limit one Python thread to run at a time. Problems that require heavy CPU computation might not run faster at all. This means that when we reach for threads to do parallel computation and speedup our Python programs, we will be solely disappointed.
Let’s use a naive number factorization algorithm to perform some computation intensive task:
Above method will return list of all the factors of the number. Factorizing a set of numbers in serial takes a long time:
Output:
As above, executing serially took ~1.6 secs. Using multiple threads to do above computation would make sense in other languages because we can take advantage of all the CPU cores. Let’s try the same in Python and log the time again:
Above code will create a thread for factorizing each number in parallel. Let’s start a few threads to log time of computation:
Output:
What’s surprising is that, it took even longer than running factorize in serial. This demonstrates the effect of the GIL on programs running in the standard CPython interpreter. Therefore it’s not recommended to use multithreading for CPU intensive tasks in Python. (multiprocessing is the fair alternative)
Having said that, we should not treat GIL as some looming evil. It’s a designer’s choice. The GIL is simple to implement and was easily added to Python. It provides a performance increase to single-threaded programs as only one lock needs to be managed. Removing GIL would complicate the interpreter’s code and greatly increase the difficulty for maintaining the system across every platform.
Key Highlights:
Threads are a common programming construct. A thread is a separate flow of execution. This means that our program will have two things happening at once. In Python, by default programs run as a single process with a single thread of execution; this uses just a single CPU.
It’s tempting to think of threading as having two (or more) different processors running on our program, each one doing an independent task at the same time. That’s almost right. In Python, the threads may be running on different processors, but they will only be running one at a time.
Python’s Global Interpreter Lock (GIL)
CPython (the standard python implementation) has something called the GIL (Global Interpreter Lock); the GIL prevents two threads from executing simultaneously in the same program.
The GIL limits parallel programming in Python out of the box. Since the GIL allows only one thread to execute at a time, even in a multi-threaded architecture with more than one CPU core, the GIL has gained a reputation as an “infamous” feature of Python. (Refer here to know more about it)
In this article, we’ll learn how the GIL affects the performance of our multithreaded Python programs.
Because of the way CPython implementation of Python works, threading may not speed up all tasks. Again, this is due to interactions with the GIL that essentially limit one Python thread to run at a time. Problems that require heavy CPU computation might not run faster at all. This means that when we reach for threads to do parallel computation and speedup our Python programs, we will be solely disappointed.
Let’s use a naive number factorization algorithm to perform some computation intensive task:
- def factorize(number):
- for i in range(1, number + 1):
- if number % i == 0:
- yield i
Above method will return list of all the factors of the number. Factorizing a set of numbers in serial takes a long time:
- from time import time
- numbers = [8402868, 2295738, 5938342, 7925426]
- start = time()
- for number in numbers:
- list(factorize(number))
- end = time()
- print ('Took %.3f seconds' % (end - start))
As above, executing serially took ~1.6 secs. Using multiple threads to do above computation would make sense in other languages because we can take advantage of all the CPU cores. Let’s try the same in Python and log the time again:
- from threading import Thread
- class FactorizeThread(Thread):
- def __init__(self, number):
- super().__init__()
- self.number = number
- def run(self):
- self.factors = list(factorize(self.number))
- start = time()
- threads = []
- for number in numbers:
- thread = FactorizeThread(number)
- thread.start()
- threads.append(thread)
- # wait for all thread to finish
- for thread in threads:
- thread.join()
- end = time()
- print('Took %.3f seconds' % (end - start))
What’s surprising is that, it took even longer than running factorize in serial. This demonstrates the effect of the GIL on programs running in the standard CPython interpreter. Therefore it’s not recommended to use multithreading for CPU intensive tasks in Python. (multiprocessing is the fair alternative)
Having said that, we should not treat GIL as some looming evil. It’s a designer’s choice. The GIL is simple to implement and was easily added to Python. It provides a performance increase to single-threaded programs as only one lock needs to be managed. Removing GIL would complicate the interpreter’s code and greatly increase the difficulty for maintaining the system across every platform.
Key Highlights:
沒有留言:
張貼留言