2020年6月29日 星期一

[ Python 文章收集 ] Memory Management in Python

Source From Here
Preface
Understanding memory management is important for a software developer. With Python being used widely across software development, writing efficient Python code often means writing memory-efficient code. With the increasing use of big data, the importance of memory management can not be overlooked. Ineffective memory management leads to slowness on application and server-side components. Memory leaks often lead to an inordinate amount of time spent on testing and debugging. It also can wreak havoc on data processing and cause concurrent processing issues.

Even though most of Python’s memory management is done by the Python Memory Manager, an understanding of best coding practices and how Python’s Memory Manager works can lead to more efficient and maintainable code. The most important part of memory management for a software developer is memory allocation. Understanding the process that assigns an empty block of space in the computer’s physical or virtual memory is crucial. There are two types of memory allocation.

Memory Allocation
Static Memory Allocation — The program is allocated memory at compile time. An example of this would be in C/C++, you declare static arrays only with fixed sizes. The memory is allocated at the time of compilation. Stack is used for implementing static allocation. In this case, memory can not be reused.
  1. static int a=10;  
Dynamic Memory Allocation — The program is allocated memory at runtime. An example of this would be in C/C++, you declare arrays with the unary operator new. The memory is allocated at runtime. Heap is used for implementing dynamic allocation. In this case, memory can be freed and reused when not required.
  1. int *p;  
  2. p=new int;  
The good thing about Python is that everything in Python is an object. This means that Dynamic Memory Allocation underlies Python Memory Management. When objects are no longer needed, the Python Memory Manager will automatically reclaim memory from them.

Python is a high-level programming language that’s implemented in the C programming language. The Python memory manager manages Python’s memory allocations. There’s a private heap that contains all Python objects and data structures. The Python memory manager manages the Python heap on demand. The Python memory manager has object-specific allocators to allocate memory distinctly for specific objects such as int, string, etc… Below that, the raw memory allocator interacts with the memory manager of the operating system to ensure that there’s space on the private heap.

The Python memory manager manages chunks of memory called “Blocks”. A collection of blocks of the same size makes up the “Pool”. Pools are created on Arenas, chunks of 256kB memory allocated on heap=64 pools. If the objects get destroyed, the memory manager fills this space with a new object of the same size.

Methods and variables are created in Stack memory. A stack frame is created whenever methods and variables are created. These frames are destroyed automatically whenever methods are returned. Objects and instance variables are created in Heap memory. As soon as the variables and functions are returned, dead objects will be garbage collected.

It is important to note that the Python memory manager doesn’t necessarily release the memory back to the Operating System, instead memory is returned back to the python interpreter. Python has a small objects allocator that keeps memory allocated for further use. In long-running processes, you may have an incremental reserve of unused memory.

Best Practices for Efficient Python Code

Use joins for adding items onto lists
Instead of adding line1, line2 to mymsg individually, use list and join. Don’t do this:
  1. mymsg=’line1\n’  
  2. mymsg+=’line2\n’  
Better choice:
  1. mymsg=[‘line1’,’line2']  
  2. ‘\n’.join(mymsg)  
Avoid using the + operator for strings
Don’t use the + operator for concatenation if you can avoid it. Because strings are immutable, every time you add an element to a string, Python creates a new string and a new address. This means that new memory needs to be allocated each time the string is altered.
Don’t do this:
  1. msg=’hello’+mymsg+’world’  
Better choice:
  1. msg="hello %s world" % mymsg  
Use Generators
Generators allow you to create a function that returns one item at a time rather than all the items at once. This means that if you have a large dataset, you don’t have to wait for the entire dataset to be accessible.
  1. def __iter__(self):  
  2.      return self._generator()  
  3.   
  4. def _generator(self):  
  5.      for itm in self.items():  
  6.          yield itm  
Put evaluations outside the loop
If you are iterating through data, you can use the cached version of the regex:
  1. match_regex=re.compile(“foo|bar”)  
  2. for i in big_it:  
  3.      m = match_regex.search(i)  
  4.          ….  
Assign a function to a local variable
Python accesses local variables much more efficiently than global variables. Assign functions to local variables then use them.
  1. myLocalFunc=myObj.func  
  2. for i in range(n):  
  3.     myLocalFunc(i)  
Use built-in functions and libraries
Use built-in functions and libraries whenever you can. Built-in functions are often implemented using the best memory usage practices.
Don’t do this:
  1. mylist=[]  
  2. for myword in oldlist:  
  3.       mylist.append(myword.upper())  
Better choice:
  1. mylist=map(str.upper, oldlist)  
Better choice for creating a dataset with keyword arguments than loops:
  1. mycounter = Counter (a = 1, b = 2, c = 3, d = 5, e = 6, f = 7, g = 8)  
  2. for i in mycounter.elements():  
  3.     ...  
Getting rid of unwanted loops by using itertools
itertools saves you a lot of time on loops. It also gets rid of the complexity of the code. For example:
  1. def test_data(n=900000):  
  2.     for i in range(n):  
  3.         yield i  
  4.           
  5. def myfunc(shape, v):  
  6.     if shape:  
  7.         return [pow(v, 2)]  
  8.     else:  
  9.         return [v * 2]  
Don’t do this:
  1. %%time  
  2. mylist = []  
  3. for shape in [True, False]:  
  4.     for v in test_data():  
  5.         mylist += myfunc(shape, v)  
  6. # Wall time: 863 ms  
Better choice:
  1. %time  
  2. from itertools import product, chain  
  3.   
  4. mylist = list(chain.from_iterable(myfunc(shape, v) for shape, v in product([True, False], test_data())))  
  5. # Wall time: 0 ns  
Overwriting the _new_ and exploiting metaclasses for safety and memory management— by @maxwell flitton
Overwriting the __new__ and exploiting metaclasses to also be useful and safe for memory management when it comes to enforcing Singleton and Flyweight patterns. For instance, here’s an example of a dict object that reads a Yaml file. Because it’s meta class is a singleton design pattern once it’s defined, it can be imported anywhere in the system and defined again and the interpreter will just point to the initial object. It reduces the memory footprint and ensures safety. No matter how junior another developer is on the team, they will not cause duplicate objects, preventing them altering the dict in one part of the system and referencing a different dict in another part:
  1. class Singleton(type):  
  2.     _instances = {}  
  3.     def __call__(cls, *args, **kwargs):  
  4.         if cls not in cls._instances:  
  5.             cls._instances[cls] = super(Singleton, cls).__call__(*args, **kwargs)  
  6.             return cls._instances[cls]  
  7.           
  8. class ConfigDict(dict, metaclass=Singleton):  
  9.     def __init__(self):  
  10.         super().__init__(self.read_config_file())  
  11.   
  12.     @staticmethod  
  13.     def read_config_file():  
  14.     """  
  15.     Reads config file based on path passed when running app.  
  16.     :return: (dict) loaded data from yml file  
  17.     """  
  18.         config_file_path = sys.argv[-1]  
  19.         if not config_file_path.endswith(".yml"):  
  20.             raise ConfigDictError(message="yml file not passed into flask app but {} instead".format(config_file_path))  
  21.         return yaml.load(open(str(config_file_path)), Loader=yaml.FullLoader)  
How to check for performance in Python code
You can use the profiling modules such as cProfile and Profile for performance checks:
# python -m cProfile [-o output_file][-s sort_order](-m module | myscript.py)

Check out this article running through the entire process of benchmarking to check for the best way to reverse a string. Read more about Python Memory Management, check the below resources:
* Fluent Python: Clear, Concise, and Effective Programming
* Python Cookbook: Recipes for Mastering Python 3
* Real Python: Memory Management in Python
* Python.org Memory Management
* Atem Golubin: Memory Management in Python

[ Python 常見問題 ] When using unittest.mock.patch, why is autospec not True by default?

  Source From  Here Question When you patch a function using  mock , you have the option to specify  autospec  as True: If you set  autospec...