PYTHON:
THANKS FORTHE
MEMORIES
Danil Ineev
Python Dublin Meetup
September 2019
Agenda
■ Managed vs Unmanaged memory
■ Memory Allocation in CPython
■ GarbageCollection in CPython
■ How other Python implementations handle memory
■ Tips &Tricks
■ Q&A
About author
■ Cursed by a witch in 2008 and since then can only use programming languages with
managed memory
■ ~7 years of professional experience with Python
■ Still constantly learning something new
■ Currently developing software here atTenable (we’re hiring)
■ Quite passionate about photography and filmmaking, please visit my website
https://notreally.media/ (even if you don’t know Russian, you can always take a look at
pretty pictures)
MANAGEDVS
UNMANAGED
Unmanaged Memory
■ Developer needs to manually
allocate and release memory
Code from https://www.codingunit.com/c-tutorial-the-functions-malloc-and-free
Managed memory
■ UnderlyingVM or runtime will take care of
all memory operations
Behold the power of Python
DEEP C
Memory allocation in CPython
Deep C
■ In actualVM someone still needs to write memory allocation
■ CPython has a few special allocators for both objects and non-objects
■ What allocator will be used also depends on a type of build
From https://docs.python.org/3/c-api/memory.html#default-memory-allocators
Memory Allocation Layers
Physical Memory Swap
OS-specificVirtual Memory Manager
Kernel dynamic storage allocation & management (page-based)
General-purpose allocator (e.g. malloc)
Virtual memory allocated for the python process
Python’s raw memory allocator (PyMem API)
Python memory
Python’s object allocator
Object memory Internal buffers
[ int ] [ dict ] [ list ] [ string ]
Object-specific memory
Python Core
Non-object
memory
0
-1
-2
1
2
3
Memory Layout
Arena 1 Arena 2
Pool 1 Pool 2 Pool 3 Pool 1 Pool 2
Block 1
Block 2
Block 3
Block 4
Block 5
Block 1
”Simple segregated storage based on array of free lists”
But that’s not all
■ Here you may start to think “it’s completely irrelevant to what I do in my day to day
job”
■ It is
■ Hence why I really don’t want to tell you a difference between untouched and free
memory blocks, for example
GARBAGE
COLLECTION
GIL
■ Global Interpreter Lock (GIL) is a lock (!) in interpreter (!!) that allows to execute only a
single thread at a time
■ It’s a quite controversial topic
■ Because of GIL CPython can’t into multithreading
■ But why do we even need GIL, if everyone hate it?
■ Also, why are we talking about GIL today?
Garbage Collection
■ GarbageCollector is a mechanism that automatically deletes unused objects
■ Python standard doesn’t force anyone to implement a certain type of GC
■ In CPython primary garbage collection mechanism is a reference counting
Reference counting
■ What is a variable?
■ Variable is just a label and reference for some object in memory
■ Object in memory can be referred without labels: see lists, tuples, etc.
■ If there are no references to an object – it can be deleted
Reference counting
■ When number of references increases?
■ Storing object in a new variable
■ Adding object to a collection
■ Passing object to a function
Reference counting
■ When number of references decreases?
■ Change variable value
■ Execution leaves a scope
■ Explicit call for `del var_name`
■ Removing from a collection
■ Global objects refcount can’t be 0
Reference counting
■ Reference counting caveats
■ Can’t handle cycle references
■ Not really thread safe (That’s why
CPython has GIL)
■ String constants and small integers can
be cached
■ Can’t be turned off
Generational GC
■ GenerationalGC is a built-in module to clean up everything that can’t be utilized by
reference counting
■ Based on a principle that most objects die young
■ Objects are tracked using the special lists — generations
Generational GC
■ There are only 3 generations
■ New objects are stored in a Generation 0
■ During the cleanup,GC will try to
determine, if every object in generation is
reachable from a root set of objects
■ If object is unreachable it will be deleted
■ If object survives in a cleanup, it will be
promoted to the next generation
Root Objects Generation X
Generational GC
■ Each generation have a threshold
■ If number of object in generation exceeds threshold, garbage collection will be
triggered automatically
■ Unlike the reference counting, GenerationalGC can be configured
■ User can set threshold levels
■ User can trigger collection manually
OTHER PYTHON
IMPLEMENTATIONS
Other Python implementations
■ Jython and IronPython use whatever GC is in underlyingVM
■ PyPy has a pluggableGC architecture and a list of ready made garbage collectors
■ Default GC is called MinimarkGC, a super smart generationalGC
■ There is also an ongoing experiment to implement STM
SoftwareTransactional Memory
■ Instead of locking on a global level, uses transactions for more granular memory
operations
■ It means that using STM we can have real multithreading in python
■ Huge performance gain if program uses a lot of simultaneous threads
■ Significant performance loss for singlethreaded programs
■ Still not ready for production after many years of development
TIPS &TRICKS
How to improve memory usage in your project?
How to improve memory usage in your
project?
■ General rule: create benchmarks before starting the optimization
■ Useful tools for profiling: memory-profiler, pympler, objgraph
array
■ Memory efficient dynamic arrays
■ Can only store simple data types
■ Homogenous
External modules
■ Lots of external Python modules are written in C in a very efficient way
■ Most commonly used is NumPy
■ You can write your ownC/C++/Rust extension using FFI
■ Cython is also a good choice for memory and/orCPU heavy modules
Generators
■ Very often you don’t really need to store the whole list/dict/etc.
■ Passing generator to a function looks prettier and usually more efficient
foo([x.bar for x in arr])
vs
foo(x.bar for x in arr)
■ Use generators as a first choice option for every new method returning a collection
■ Side effect: it will be easier to transform your code to be asynchronous
Generators
Returns array
Returns generator
Tail call optimization
■ Recursion is not only limited in python, but also consumes a lot of memory
■ Certain recursive methods could be rewritten to be tail recursive
■ It means that every return statement is independent
■ By default, Python doesn’t optimize tail recursion
■ It can be easily optimized using simple decorators
Tail call optimization
Migrate to Python 3.7+
■ Latest versions have many memory optimizations
■ Don’t need to think about __slots__
■ Better finalizers handling
■ Python 2 sunsetting 1 January 2020
GC tuning
■ GenerationalGC can be tuned
■ Should only be done if you really have no other options
■ Threshold values can be optimized for your use-cases
■ GenerationalGC may be completely turned off
What to read next?
■ CPython source code. Its documentation is really great:
– cpython/Objects/obmalloc.c
– cpython/Modules/gcmodule.c
■ PyPy documentation on GC and STM:
– https://doc.pypy.org/en/latest/gc_info.html
– https://doc.pypy.org/en/latest/stm.html
■ Various talks over the last few years at PyCon
■ Instagram Engineering blogs, especially these two articles:
– https://instagram-engineering.com/dismissing-python-garbage-collection-at-instagram-
4dca40b29172
– https://instagram-engineering.com/copy-on-write-friendly-python-garbage-collection-
ad6ed5233ddf
Wrap up
■ If you don’t know anything about internal Python memory management – it’s fine
■ Memory usage can be reduced, but optimization techniques are limited
■ If you to have fine control over memory – don’t use Python
ANY
QUESTIONS?
THANKYOU FOR
LISTENING!

Python: Thanks for the memories

  • 1.
  • 2.
    Agenda ■ Managed vsUnmanaged memory ■ Memory Allocation in CPython ■ GarbageCollection in CPython ■ How other Python implementations handle memory ■ Tips &Tricks ■ Q&A
  • 3.
    About author ■ Cursedby a witch in 2008 and since then can only use programming languages with managed memory ■ ~7 years of professional experience with Python ■ Still constantly learning something new ■ Currently developing software here atTenable (we’re hiring) ■ Quite passionate about photography and filmmaking, please visit my website https://notreally.media/ (even if you don’t know Russian, you can always take a look at pretty pictures)
  • 4.
  • 5.
    Unmanaged Memory ■ Developerneeds to manually allocate and release memory Code from https://www.codingunit.com/c-tutorial-the-functions-malloc-and-free
  • 6.
    Managed memory ■ UnderlyingVMor runtime will take care of all memory operations Behold the power of Python
  • 7.
  • 8.
    Deep C ■ InactualVM someone still needs to write memory allocation ■ CPython has a few special allocators for both objects and non-objects ■ What allocator will be used also depends on a type of build From https://docs.python.org/3/c-api/memory.html#default-memory-allocators
  • 9.
    Memory Allocation Layers PhysicalMemory Swap OS-specificVirtual Memory Manager Kernel dynamic storage allocation & management (page-based) General-purpose allocator (e.g. malloc) Virtual memory allocated for the python process Python’s raw memory allocator (PyMem API) Python memory Python’s object allocator Object memory Internal buffers [ int ] [ dict ] [ list ] [ string ] Object-specific memory Python Core Non-object memory 0 -1 -2 1 2 3
  • 10.
    Memory Layout Arena 1Arena 2 Pool 1 Pool 2 Pool 3 Pool 1 Pool 2 Block 1 Block 2 Block 3 Block 4 Block 5 Block 1
  • 11.
    ”Simple segregated storagebased on array of free lists”
  • 12.
    But that’s notall ■ Here you may start to think “it’s completely irrelevant to what I do in my day to day job” ■ It is ■ Hence why I really don’t want to tell you a difference between untouched and free memory blocks, for example
  • 13.
  • 14.
    GIL ■ Global InterpreterLock (GIL) is a lock (!) in interpreter (!!) that allows to execute only a single thread at a time ■ It’s a quite controversial topic ■ Because of GIL CPython can’t into multithreading ■ But why do we even need GIL, if everyone hate it? ■ Also, why are we talking about GIL today?
  • 15.
    Garbage Collection ■ GarbageCollectoris a mechanism that automatically deletes unused objects ■ Python standard doesn’t force anyone to implement a certain type of GC ■ In CPython primary garbage collection mechanism is a reference counting
  • 16.
    Reference counting ■ Whatis a variable? ■ Variable is just a label and reference for some object in memory ■ Object in memory can be referred without labels: see lists, tuples, etc. ■ If there are no references to an object – it can be deleted
  • 17.
    Reference counting ■ Whennumber of references increases? ■ Storing object in a new variable ■ Adding object to a collection ■ Passing object to a function
  • 18.
    Reference counting ■ Whennumber of references decreases? ■ Change variable value ■ Execution leaves a scope ■ Explicit call for `del var_name` ■ Removing from a collection ■ Global objects refcount can’t be 0
  • 19.
    Reference counting ■ Referencecounting caveats ■ Can’t handle cycle references ■ Not really thread safe (That’s why CPython has GIL) ■ String constants and small integers can be cached ■ Can’t be turned off
  • 20.
    Generational GC ■ GenerationalGCis a built-in module to clean up everything that can’t be utilized by reference counting ■ Based on a principle that most objects die young ■ Objects are tracked using the special lists — generations
  • 21.
    Generational GC ■ Thereare only 3 generations ■ New objects are stored in a Generation 0 ■ During the cleanup,GC will try to determine, if every object in generation is reachable from a root set of objects ■ If object is unreachable it will be deleted ■ If object survives in a cleanup, it will be promoted to the next generation Root Objects Generation X
  • 22.
    Generational GC ■ Eachgeneration have a threshold ■ If number of object in generation exceeds threshold, garbage collection will be triggered automatically ■ Unlike the reference counting, GenerationalGC can be configured ■ User can set threshold levels ■ User can trigger collection manually
  • 23.
  • 24.
    Other Python implementations ■Jython and IronPython use whatever GC is in underlyingVM ■ PyPy has a pluggableGC architecture and a list of ready made garbage collectors ■ Default GC is called MinimarkGC, a super smart generationalGC ■ There is also an ongoing experiment to implement STM
  • 25.
    SoftwareTransactional Memory ■ Insteadof locking on a global level, uses transactions for more granular memory operations ■ It means that using STM we can have real multithreading in python ■ Huge performance gain if program uses a lot of simultaneous threads ■ Significant performance loss for singlethreaded programs ■ Still not ready for production after many years of development
  • 26.
    TIPS &TRICKS How toimprove memory usage in your project?
  • 27.
    How to improvememory usage in your project? ■ General rule: create benchmarks before starting the optimization ■ Useful tools for profiling: memory-profiler, pympler, objgraph
  • 28.
    array ■ Memory efficientdynamic arrays ■ Can only store simple data types ■ Homogenous
  • 29.
    External modules ■ Lotsof external Python modules are written in C in a very efficient way ■ Most commonly used is NumPy ■ You can write your ownC/C++/Rust extension using FFI ■ Cython is also a good choice for memory and/orCPU heavy modules
  • 30.
    Generators ■ Very oftenyou don’t really need to store the whole list/dict/etc. ■ Passing generator to a function looks prettier and usually more efficient foo([x.bar for x in arr]) vs foo(x.bar for x in arr) ■ Use generators as a first choice option for every new method returning a collection ■ Side effect: it will be easier to transform your code to be asynchronous
  • 31.
  • 32.
    Tail call optimization ■Recursion is not only limited in python, but also consumes a lot of memory ■ Certain recursive methods could be rewritten to be tail recursive ■ It means that every return statement is independent ■ By default, Python doesn’t optimize tail recursion ■ It can be easily optimized using simple decorators
  • 33.
  • 34.
    Migrate to Python3.7+ ■ Latest versions have many memory optimizations ■ Don’t need to think about __slots__ ■ Better finalizers handling ■ Python 2 sunsetting 1 January 2020
  • 35.
    GC tuning ■ GenerationalGCcan be tuned ■ Should only be done if you really have no other options ■ Threshold values can be optimized for your use-cases ■ GenerationalGC may be completely turned off
  • 36.
    What to readnext? ■ CPython source code. Its documentation is really great: – cpython/Objects/obmalloc.c – cpython/Modules/gcmodule.c ■ PyPy documentation on GC and STM: – https://doc.pypy.org/en/latest/gc_info.html – https://doc.pypy.org/en/latest/stm.html ■ Various talks over the last few years at PyCon ■ Instagram Engineering blogs, especially these two articles: – https://instagram-engineering.com/dismissing-python-garbage-collection-at-instagram- 4dca40b29172 – https://instagram-engineering.com/copy-on-write-friendly-python-garbage-collection- ad6ed5233ddf
  • 37.
    Wrap up ■ Ifyou don’t know anything about internal Python memory management – it’s fine ■ Memory usage can be reduced, but optimization techniques are limited ■ If you to have fine control over memory – don’t use Python
  • 38.
  • 39.

Editor's Notes

  • #2 A quick guide for Python developers, who don’t want to know anything about memory management
  • #3 By the end of this talk I hope that you will be glad that you don’t need to think about memory management in Python
  • #6 Basically, the term "unmanaged memory" was only made up to reflect the existence of "managed memory". Before the 90-s it was just called "memory". In programming languages like C and C++, developer needs to manually allocate and deallocate memory on heap for the most non-trivial use cases. Here you can see the dramatization of this approach.
  • #7 In comparison, in languages like Python, you don't need to think about memory allocation at all. Underlying Virtual Machine will handle all this boring stuff for you. As you can see in this example, Python is clearly superior to C, we don't need any more evidence.
  • #8 However, someone still needs to write memory allocation for Python. It's well hidden from a common developer, and for the good reasons. It's time to dive into a Deep C
  • #9 CPython memory allocation is rather complicated process. It involves using multiple different allocators and a number of heuristics to optimize memory usage. Allocators may vary depending on a size of requested memory, type of object or even a type of build.
  • #10 Let’s talk about the general process of memory allocation in CPython. First of all, interpreter will get a huge chunk of memory from the host operating system. Some of it will be used for internal needs, some for program needs.
  • #11 Now, to allocate something for an object, interpreter will need to think hard. It can’t just get the leftmost empty chunk of memory. Userspace memory organized the next way: biggest areas are called arenas. They are aligned with virtual memory pages, but usually have size bigger than just one page. Inside arenas there are a few pools, each of them will have a size of a virtual memory page (4kb). Memory for objects should reside in these pools, in their subdivision called blocks. Size of those blocks will depend on so called “size class” of the object. Usually, it’s size in bytes rounded to the closest bigger multiple of 8 bytes. As they say in documentation, this strategy is a variant of “simple segregated storage based on array of free lists”. Emphasis on simple. For me, it looks more like a Russian Doll
  • #13 By this moment you should be totally disappointed in this talk. If you think that all this memory layout stuff has nothing to do with actual python developer work, you’re right. I don’t even want to go in depth of this topic, because further I’ll have to explain what’s the difference between untouched and free memory blocks, or how allocator chooses which pool to use. If you’re not a core python developer, you don’t need to know that. I just wanted to show you, that the deeper you go into memory allocation, the more you love the fact that you don’t need to write it yourself.
  • #14 Let’s talk about something that really matters in context of memory management, and what is actually useful. Garbage collection.
  • #15 And we’ll start with Global Interpreter Lock. If you don’t know what it is, first of all – shame on you. But I’ll give a brief definition. Global Interpreter Lock is an interpreter level lock that allows to execute only a single thread at a time. Whenever people talk about GIL, there are usually pretty heated debates. Basically, because of GIL CPython doesn’t have real mutlithreading, and developers in other languages are laughing about it. So, why do we even need GIL in the first place, and why do we talk about it today? The main reason of GIL existence is hidden in CPython’s garbage collector implementation
  • #16 Let’s start with the basics. What’s a garbage collector? To cut a long story short, it’s a mechanism to automatically deletes unused objects. If you go to the python documentation, you won’t find any specifics on which garbage collection algorithm to use. Basically, memory should be freed at some point of time. For example, it can be freed by the host operating system when the interpreter process is terminated. No one will do it this way, but there is a possibility.
  • #17 Before starting to talk about reference counting, let’s talk about variables. In short, variable in Python is just a label and reference to a certain object. Each object will store the number of such references to it. However, some references don’t even require any labels-variables. For example, you can create a list without ever creating any variable. Quite self-evident, that if there are no references to object, it can be safely deleted.
  • #20 There are a few caveats with referencing counting. First of all, it can’t handle reference cycles. If you look at this simple example, you can note that even after deleting variable l from namespace, each node in linked list have at least one reference to it. Second, to be safe and fast, reference counting should be single threaded. It means that we need some kind of global lock, that will allow execution of only one thread per process. Hmmm… Also, reference counting can be tricky when you have special mechanisms in interpreter like caching small strings and integers. And you can’t turn it off at all.
  • #21 To solve the problem of cyclic references, modern CPython implementations have a special mechanism, called Generational Garbage Collector. It’s stored in a built-in module called gc. It’s based on a principle that most objects die young. How they say it “objects have high infant mortality”. Most objects are tracked by GC using the special lists, called generations. By the way, to continue the previous analogy, first generation is usually called nursery.
  • #24 Now let’s briefly talk about other python implementations
  • #25 Jython and IronPython use whatever GC that is in underlying VM. I don’t really remember any details, but one thing is important for sure — there are no Global Interpreter Lock in those implementations.