Global Interpreter Lock in Python and Different kinds of workloads(I/O, CPU) in SRE.
How can you enhance your Python code to deal with it. Different ways to solve this - Multithreading, Multiprocessing, AsyncIO, AsyncIO with Multithreading/Multiprocessing
9. Multiprocessing
• Multiple child processes
• Message Passing
• Might require more Memory
compared to multi-threading.
Advantages
• No GIL
• Good for CPU bound
11. Caveats of multithreading & multiprocessing
• Context Switches
• No. of Threads/Processes -> No. of Context Switches
• Deciding Optimal number of Threads or Processes
• Varying I/O wait-times
15. Summary
• Multithreading – Blocking I/O
• Multiprocessing – CPU Bound
• AsyncIO – Non-blocking I/O
• AsyncIO with Multithreading – Blocking and Non Blocking I/O
• AsyncIO with Multiprocessing – CPU Bound with Non Blocking I/O
16. Reach out to Us
• Nitin Bhojwani – nitinbhojwani@outlook.com
• Arabinda Das – darabinda@gmail.com
Global Interpreter Lock
Hint: It’s what the name suggests.
Let’s look at the official definition
It’s a lock at Python’s interpreter level that doesn’t allow multiple threads to execute simultaneously, even in different cores of processor.
Which means even if you write multi-threaded program, it will keep locking and waiting for data structures. Increases the execution time.
However, some extension modules, either standard or third-party, are designed to release the GIL when doing computationally-intensive tasks such as compression or hashing.
Also, the GIL is always released when doing I/O.
Past efforts to create a “free-threaded” interpreter (one which locks shared data at a much finer granularity) have not been successful becausePerformance suffered for one thread single-processor caseOvercoming this performance issue made the implementation much more complicated and therefore costlier to maintain.
Our code is implicitly thread-safe (as multiple threads can’t run simultaneously on same data)
Increased speed of single-threaded programs
Easy integration of C libraries that usually are not thread-safe.
Simplified Garbage Collection - based on reference counts.
75% I/0 intensive in general:
Call APIs
Run commands over SSH
Database calls.
15 - 20% Memory intensive
Define variables
Store arrays and other data structures in memory
5- 10 % CPU intensive
Parse request and result
If else and other compute logic
Think it at Scale of 100 thousands nodes.
We'll look into different approaches to solve the problem while efficiently utilizing our resources.
Spawn multiple threads and use shared memory space to communicate and sync.
Lesser memory consumption as memory is shared among child threads.
During I/O, GIL is released and other thread can continue with their execution.
Python’s threading module provide all the basic APIs to deal with threads – start, run, join, locking and semaphores etc.It’s seen that Queuing and threading are used together, and so, concurrent.futures has even abstracted queuing of tasks and distributing among threads.It provides ThreadPoolExecutor
Spawning multiple child processes to achieve true parallelism
Use message passing for inter-process communication
GIL is not even in picture as it’s different processes and different memory space altogether
Might require more memory compared to multi-threading
multiprocessing is a package that supports spawning child processes using an API similar to the threading module.
Again concurrent.futures has provided an abstraction on queuing of tasks and distributing them among child processes. It provides ProcessPoolExecutor
For multiple processes, additional feature here. If we don’t define max_workers, by default, it creates processes equal to number of cores in the machine.
Multiprocessing is a good solution as GIL wait is no longer present.
But what about Context Switches ?The Python Interpreter switches between threads to allow concurrency.We want max utilization of system resources but increasing number of threads or processes will increase Context Switching.
We’ll need to define right number of threads / processes such that our system resources are maximum utilized.Not all I/O are similar and so wait times will be different for each I/O – considering this it makes it even more difficult.
Event loop is at the core
It’s a Single-threaded, single-process design
It’s a programming construct
A coroutine is a specialized version of a Python generator function.In other words it’s a function that can suspend its execution before reaching return, and it can indirectly pass control to another coroutine for some time.
AsyncIO - Library to write concurrent code using the async/await syntax
File operations (such as logging) can block the event loop: run them in a thread pool.
CPU-bound operations will block the event loop: in general it is preferable to run them in a process pool.