Preface
Understanding memory management is important for a software developer. With Python being used widely across software development, writing efficient Python code often means writing memory-efficient code. With the increasing use of big data, the importance of memory management can not be overlooked. Ineffective memory management leads to slowness on application and server-side components. Memory leaks often lead to an inordinate amount of time spent on testing and debugging. It also can wreak havoc on data processing and cause concurrent processing issues.
Even though most of Python’s memory management is done by the Python Memory Manager, an understanding of best coding practices and how Python’s Memory Manager works can lead to more efficient and maintainable code. The most important part of memory management for a software developer is memory allocation. Understanding the process that assigns an empty block of space in the computer’s physical or virtual memory is crucial. There are two types of memory allocation.
Memory Allocation
Static Memory Allocation — The program is allocated memory at compile time. An example of this would be in C/C++, you declare static arrays only with fixed sizes. The memory is allocated at the time of compilation. Stack is used for implementing static allocation. In this case, memory can not be reused.
- static int a=10;
- int *p;
- p=new int;
Python is a high-level programming language that’s implemented in the C programming language. The Python memory manager manages Python’s memory allocations. There’s a private heap that contains all Python objects and data structures. The Python memory manager manages the Python heap on demand. The Python memory manager has object-specific allocators to allocate memory distinctly for specific objects such as int, string, etc… Below that, the raw memory allocator interacts with the memory manager of the operating system to ensure that there’s space on the private heap.
The Python memory manager manages chunks of memory called “Blocks”. A collection of blocks of the same size makes up the “Pool”. Pools are created on Arenas, chunks of 256kB memory allocated on heap=64 pools. If the objects get destroyed, the memory manager fills this space with a new object of the same size.
Methods and variables are created in Stack memory. A stack frame is created whenever methods and variables are created. These frames are destroyed automatically whenever methods are returned. Objects and instance variables are created in Heap memory. As soon as the variables and functions are returned, dead objects will be garbage collected.
It is important to note that the Python memory manager doesn’t necessarily release the memory back to the Operating System, instead memory is returned back to the python interpreter. Python has a small objects allocator that keeps memory allocated for further use. In long-running processes, you may have an incremental reserve of unused memory.
Best Practices for Efficient Python Code
Use joins for adding items onto lists
Instead of adding line1, line2 to mymsg individually, use list and join. Don’t do this:
- mymsg=’line1\n’
- mymsg+=’line2\n’
- mymsg=[‘line1’,’line2']
- ‘\n’.join(mymsg)
Don’t use the + operator for concatenation if you can avoid it. Because strings are immutable, every time you add an element to a string, Python creates a new string and a new address. This means that new memory needs to be allocated each time the string is altered.
Don’t do this:
- msg=’hello’+mymsg+’world’
- msg="hello %s world" % mymsg
Generators allow you to create a function that returns one item at a time rather than all the items at once. This means that if you have a large dataset, you don’t have to wait for the entire dataset to be accessible.
- def __iter__(self):
- return self._generator()
- def _generator(self):
- for itm in self.items():
- yield itm
If you are iterating through data, you can use the cached version of the regex:
- match_regex=re.compile(“foo|bar”)
- for i in big_it:
- m = match_regex.search(i)
- ….
Python accesses local variables much more efficiently than global variables. Assign functions to local variables then use them.
- myLocalFunc=myObj.func
- for i in range(n):
- myLocalFunc(i)
Use built-in functions and libraries whenever you can. Built-in functions are often implemented using the best memory usage practices.
Don’t do this:
- mylist=[]
- for myword in oldlist:
- mylist.append(myword.upper())
- mylist=map(str.upper, oldlist)
- mycounter = Counter (a = 1, b = 2, c = 3, d = 5, e = 6, f = 7, g = 8)
- for i in mycounter.elements():
- ...
itertools saves you a lot of time on loops. It also gets rid of the complexity of the code. For example:
- def test_data(n=900000):
- for i in range(n):
- yield i
- def myfunc(shape, v):
- if shape:
- return [pow(v, 2)]
- else:
- return [v * 2]
- %%time
- mylist = []
- for shape in [True, False]:
- for v in test_data():
- mylist += myfunc(shape, v)
- # Wall time: 863 ms
- %time
- from itertools import product, chain
- mylist = list(chain.from_iterable(myfunc(shape, v) for shape, v in product([True, False], test_data())))
- # Wall time: 0 ns
Overwriting the __new__ and exploiting metaclasses to also be useful and safe for memory management when it comes to enforcing Singleton and Flyweight patterns. For instance, here’s an example of a dict object that reads a Yaml file. Because it’s meta class is a singleton design pattern once it’s defined, it can be imported anywhere in the system and defined again and the interpreter will just point to the initial object. It reduces the memory footprint and ensures safety. No matter how junior another developer is on the team, they will not cause duplicate objects, preventing them altering the dict in one part of the system and referencing a different dict in another part:
- class Singleton(type):
- _instances = {}
- def __call__(cls, *args, **kwargs):
- if cls not in cls._instances:
- cls._instances[cls] = super(Singleton, cls).__call__(*args, **kwargs)
- return cls._instances[cls]
- class ConfigDict(dict, metaclass=Singleton):
- def __init__(self):
- super().__init__(self.read_config_file())
- @staticmethod
- def read_config_file():
- """
- Reads config file based on path passed when running app.
- :return: (dict) loaded data from yml file
- """
- config_file_path = sys.argv[-1]
- if not config_file_path.endswith(".yml"):
- raise ConfigDictError(message="yml file not passed into flask app but {} instead".format(config_file_path))
- return yaml.load(open(str(config_file_path)), Loader=yaml.FullLoader)
You can use the profiling modules such as cProfile and Profile for performance checks:
Check out this article running through the entire process of benchmarking to check for the best way to reverse a string. Read more about Python Memory Management, check the below resources:
* Fluent Python: Clear, Concise, and Effective Programming
* Python Cookbook: Recipes for Mastering Python 3
* Real Python: Memory Management in Python
* Python.org Memory Management
* Atem Golubin: Memory Management in Python