程式扎記: 6月 2020

Source From Here
Preface
Understanding memory management is important for a software developer. With Python being used widely across software development, writing efficient Python code often means writing memory-efficient code. With the increasing use of big data, the importance of memory management can not be overlooked. Ineffective memory management leads to slowness on application and server-side components. Memory leaks often lead to an inordinate amount of time spent on testing and debugging. It also can wreak havoc on data processing and cause concurrent processing issues.

Even though most of Python’s memory management is done by the Python Memory Manager, an understanding of best coding practices and how Python’s Memory Manager works can lead to more efficient and maintainable code. The most important part of memory management for a software developer is memory allocation. Understanding the process that assigns an empty block of space in the computer’s physical or virtual memory is crucial. There are two types of memory allocation.

Memory Allocation
Static Memory Allocation — The program is allocated memory at compile time. An example of this would be in C/C++, you declare static arrays only with fixed sizes. The memory is allocated at the time of compilation. Stack is used for implementing static allocation. In this case, memory can not be reused.

view plaincopy to clipboardprint?
static int a=10;  

Dynamic Memory Allocation — The program is allocated memory at runtime. An example of this would be in C/C++, you declare arrays with the unary operator new. The memory is allocated at runtime. Heap is used for implementing dynamic allocation. In this case, memory can be freed and reused when not required.

view plaincopy to clipboardprint?
int *p;  
p=new int;  

The good thing about Python is that everything in Python is an object. This means that Dynamic Memory Allocation underlies Python Memory Management. When objects are no longer needed, the Python Memory Manager will automatically reclaim memory from them.

Python is a high-level programming language that’s implemented in the C programming language. The Python memory manager manages Python’s memory allocations. There’s a private heap that contains all Python objects and data structures. The Python memory manager manages the Python heap on demand. The Python memory manager has object-specific allocators to allocate memory distinctly for specific objects such as int, string, etc… Below that, the raw memory allocator interacts with the memory manager of the operating system to ensure that there’s space on the private heap.

The Python memory manager manages chunks of memory called “Blocks”. A collection of blocks of the same size makes up the “Pool”. Pools are created on Arenas, chunks of 256kB memory allocated on heap=64 pools. If the objects get destroyed, the memory manager fills this space with a new object of the same size.

Methods and variables are created in Stack memory. A stack frame is created whenever methods and variables are created. These frames are destroyed automatically whenever methods are returned. Objects and instance variables are created in Heap memory. As soon as the variables and functions are returned, dead objects will be garbage collected.

It is important to note that the Python memory manager doesn’t necessarily release the memory back to the Operating System, instead memory is returned back to the python interpreter. Python has a small objects allocator that keeps memory allocated for further use. In long-running processes, you may have an incremental reserve of unused memory.

Best Practices for Efficient Python Code

Use joins for adding items onto lists
Instead of adding line1, line2 to mymsg individually, use list and join. Don’t do this:

view plaincopy to clipboardprint?
mymsg=’line1\n’  
mymsg+=’line2\n’  

Better choice:

view plaincopy to clipboardprint?
mymsg=[‘line1’,’line2']  
‘\n’.join(mymsg)  

Avoid using the + operator for strings
Don’t use the + operator for concatenation if you can avoid it. Because strings are immutable, every time you add an element to a string, Python creates a new string and a new address. This means that new memory needs to be allocated each time the string is altered.
Don’t do this:

view plaincopy to clipboardprint?
msg=’hello’+mymsg+’world’  

Better choice:

view plaincopy to clipboardprint?
msg="hello %s world" % mymsg  

Use Generators
Generators allow you to create a function that returns one item at a time rather than all the items at once. This means that if you have a large dataset, you don’t have to wait for the entire dataset to be accessible.

view plaincopy to clipboardprint?
def __iter__(self):  
     return self._generator()  
  
def _generator(self):  
     for itm in self.items():  
         yield itm  

Put evaluations outside the loop
If you are iterating through data, you can use the cached version of the regex:

view plaincopy to clipboardprint?
match_regex=re.compile(“foo|bar”)  
for i in big_it:  
     m = match_regex.search(i)  
         ….  

Assign a function to a local variable
Python accesses local variables much more efficiently than global variables. Assign functions to local variables then use them.

view plaincopy to clipboardprint?
myLocalFunc=myObj.func  
for i in range(n):  
    myLocalFunc(i)  

Use built-in functions and libraries
Use built-in functions and libraries whenever you can. Built-in functions are often implemented using the best memory usage practices.
Don’t do this:

view plaincopy to clipboardprint?
mylist=[]  
for myword in oldlist:  
      mylist.append(myword.upper())  

Better choice:

view plaincopy to clipboardprint?
mylist=map(str.upper, oldlist)  

Better choice for creating a dataset with keyword arguments than loops:

view plaincopy to clipboardprint?
mycounter = Counter (a = 1, b = 2, c = 3, d = 5, e = 6, f = 7, g = 8)  
for i in mycounter.elements():  
    ...  

Getting rid of unwanted loops by using itertools
itertools saves you a lot of time on loops. It also gets rid of the complexity of the code. For example:

view plaincopy to clipboardprint?
def test_data(n=900000):  
    for i in range(n):  
        yield i  
          
def myfunc(shape, v):  
    if shape:  
        return [pow(v, 2)]  
    else:  
        return [v * 2]  

Don’t do this:

view plaincopy to clipboardprint?
%%time  
mylist = []  
for shape in [True, False]:  
    for v in test_data():  
        mylist += myfunc(shape, v)  
# Wall time: 863 ms  

Better choice:

view plaincopy to clipboardprint?
%time  
from itertools import product, chain  
  
mylist = list(chain.from_iterable(myfunc(shape, v) for shape, v in product([True, False], test_data())))  
# Wall time: 0 ns  

Overwriting the _new_ and exploiting metaclasses for safety and memory management— by @maxwell flitton
Overwriting the __new__ and exploiting metaclasses to also be useful and safe for memory management when it comes to enforcing Singleton and Flyweight patterns. For instance, here’s an example of a dict object that reads a Yaml file. Because it’s meta class is a singleton design pattern once it’s defined, it can be imported anywhere in the system and defined again and the interpreter will just point to the initial object. It reduces the memory footprint and ensures safety. No matter how junior another developer is on the team, they will not cause duplicate objects, preventing them altering the dict in one part of the system and referencing a different dict in another part:

view plaincopy to clipboardprint?
class Singleton(type):  
    _instances = {}  
    def __call__(cls, *args, **kwargs):  
        if cls not in cls._instances:  
            cls._instances[cls] = super(Singleton, cls).__call__(*args, **kwargs)  
            return cls._instances[cls]  
          
class ConfigDict(dict, metaclass=Singleton):  
    def __init__(self):  
        super().__init__(self.read_config_file())  
  
    @staticmethod  
    def read_config_file():  
    """  
    Reads config file based on path passed when running app.  
    :return: (dict) loaded data from yml file  
    """  
        config_file_path = sys.argv[-1]  
        if not config_file_path.endswith(".yml"):  
            raise ConfigDictError(message="yml file not passed into flask app but {} instead".format(config_file_path))  
        return yaml.load(open(str(config_file_path)), Loader=yaml.FullLoader)  

How to check for performance in Python code
You can use the profiling modules such as cProfile and Profile for performance checks:

# python -m cProfile [-o output_file][-s sort_order](-m module | myscript.py)

Check out this article running through the entire process of benchmarking to check for the best way to reverse a string. Read more about Python Memory Management, check the below resources:
* Fluent Python: Clear, Concise, and Effective Programming
* Python Cookbook: Recipes for Mastering Python 3
* Real Python: Memory Management in Python
* Python.org Memory Management
* Atem Golubin: Memory Management in Python

程式扎記

標籤

2020年6月29日星期一

[ Python 文章收集 ] Memory Management in Python

[Git 常見問題] error: The following untracked working tree files would be overwritten by merge

檢舉濫用情形

學習筆記

標籤

2020年6月29日 星期一

[ Python 文章收集 ] Memory Management in Python

[Git 常見問題] error: The following untracked working tree files would be overwritten by merge

檢舉濫用情形

學習筆記

2020年6月29日星期一