Typically, Python software engineers don’t necessarily care about how the language handles memory. However, sometimes it’s very useful to understand what’s going on under the hood. In this talk, I’ll give you a brief overview of how Python manages memory and some useful tips and tricks that you may not already know.
Pythran is a tool that can be used to accelerate SciPy kernels by transpiling pure Python and NumPy code into efficient C++. SciPy developers have started using Pythran for some computationally intensive kernels, finding it easier to write fast code with than alternatives like Cython or Numba. Initial integration into the SciPy build process has gone smoothly. Ongoing work includes porting more kernels to Pythran and exploring combining it with CuPy for fast CPU and GPU code generation.
Delivered at Python Toronto on 18th Sep 2018. Asyncio's is the new frontier in Python and these slides share my experiences with it while working at SwissBorg.
Vagrant, Ansible and Docker - How they fit together for productive flexible d...Samuel Lampa
A very quick overview of how Vagrant, Ansible and Docker fits nicely together as a very productive and flexible solution for creating automated development environments.
SciPipe - A light-weight workflow library inspired by flow-based programmingSamuel Lampa
A presentation of the SciPipe workflow library, written in Go (Golang), inspired by Flow-based programming, at an internal workshop at Uppsala University, Department of Pharmaceutical Biosciences.
The document provides 14 tips for optimizing Python code for speed. It recommends profiling code with %timeit and %prun in IPython, using iterators and generators when possible, using list comprehensions over for loops, xrange over range, map, filter and reduce, minimizing function calls and global variables, using threads for I/O-bound processes, and using the C versions of Python libraries when performance is critical. It also provides links to additional resources on Python optimization.
This document discusses image processing using FPGAs. It begins with an overview of FPGAs and their components. It then discusses using high-level synthesis to convert C++ code to hardware designs for FPGAs. An example of implementing Sobel edge detection on an FPGA is provided. The implementation was optimized from 40 cycles per pixel to 1 cycle per pixel through pipelining, parallelism, and using block RAM for intermediate storage. Challenges discussed include limited debugging tools and steep learning curves for FPGA development.
How my visualization tools use little memory: A tale of incrementalization an...Eugene Kirpichov
1) The document describes optimization techniques used to improve memory usage and performance of visualization tools splot and tplot.
2) Key techniques included reading input lazily and incrementally building output in a single pass to avoid loading all data into memory at once.
3) Debugging laziness issues required tools like Debug.HTrace to understand where unevaluated thunks were accumulating excess memory.
Pythran is a tool that can be used to accelerate SciPy kernels by transpiling pure Python and NumPy code into efficient C++. SciPy developers have started using Pythran for some computationally intensive kernels, finding it easier to write fast code with than alternatives like Cython or Numba. Initial integration into the SciPy build process has gone smoothly. Ongoing work includes porting more kernels to Pythran and exploring combining it with CuPy for fast CPU and GPU code generation.
Delivered at Python Toronto on 18th Sep 2018. Asyncio's is the new frontier in Python and these slides share my experiences with it while working at SwissBorg.
Vagrant, Ansible and Docker - How they fit together for productive flexible d...Samuel Lampa
A very quick overview of how Vagrant, Ansible and Docker fits nicely together as a very productive and flexible solution for creating automated development environments.
SciPipe - A light-weight workflow library inspired by flow-based programmingSamuel Lampa
A presentation of the SciPipe workflow library, written in Go (Golang), inspired by Flow-based programming, at an internal workshop at Uppsala University, Department of Pharmaceutical Biosciences.
The document provides 14 tips for optimizing Python code for speed. It recommends profiling code with %timeit and %prun in IPython, using iterators and generators when possible, using list comprehensions over for loops, xrange over range, map, filter and reduce, minimizing function calls and global variables, using threads for I/O-bound processes, and using the C versions of Python libraries when performance is critical. It also provides links to additional resources on Python optimization.
This document discusses image processing using FPGAs. It begins with an overview of FPGAs and their components. It then discusses using high-level synthesis to convert C++ code to hardware designs for FPGAs. An example of implementing Sobel edge detection on an FPGA is provided. The implementation was optimized from 40 cycles per pixel to 1 cycle per pixel through pipelining, parallelism, and using block RAM for intermediate storage. Challenges discussed include limited debugging tools and steep learning curves for FPGA development.
How my visualization tools use little memory: A tale of incrementalization an...Eugene Kirpichov
1) The document describes optimization techniques used to improve memory usage and performance of visualization tools splot and tplot.
2) Key techniques included reading input lazily and incrementally building output in a single pass to avoid loading all data into memory at once.
3) Debugging laziness issues required tools like Debug.HTrace to understand where unevaluated thunks were accumulating excess memory.
This document discusses combining Rust and Python to create a new "hip" programming language. It proposes two approaches: 1) Building Rust extensions for Python to improve performance of Python code. Rust could replace C and provide memory safety and better performance for Python extensions. 2) Building a Python interpreter using Rust (RustPython), which provides benefits like memory safety and borrowing rules from Rust. However, a Rust-based Python interpreter still has a long way to go before matching the performance and capabilities of CPython. In the end, the document acknowledges both Rust and Python have limitations and neither can fully "replace" the other.
Python is a versatile programming language used almost in all application development. OO Features coupled with excellent library support makes Python a preferred language for Web Development, IoT Application Development, Data Science and Machine Learning. In this module, we will focus on getting hands-on development and debugging experience of Python, starting with basics. Further will get a deeper understanding of advanced aspects like OO programming, File handling, Package creation, Exception handling. Eventually, you will be able to develop a multitasking application in IoT Gateways.
Numba is a just-in-time compiler for Python that can optimize numerical code to achieve speeds comparable to C/C++ without requiring the user to write C/C++ code. It works by compiling Python functions to optimized machine code using type information. Numba supports NumPy arrays and common mathematical functions. It can automatically optimize loops and compile functions for CPU or GPU execution. Numba allows users to write high-performance numerical code in Python without sacrificing readability or development speed.
This document provides an overview of deep learning applications and techniques for implementing deep neural networks. It discusses:
- Various use cases for deep learning in industries like computer vision, natural language processing, and speech recognition.
- Frameworks for implementing deep networks like Caffe and Keras, which allow finetuning pre-trained models or building custom networks.
- Examples of applying deep learning to tasks like classifying bee photos and search queries, using techniques like convolutional neural networks, word embeddings, and XGBoost classifiers.
- Tips for optimizing neural network architecture like starting simple and gradually increasing complexity.
The document shares results from several classification problems to demonstrate deep learning methods.
This document discusses using Pybind11 to create C extensions for Python. Pybind11 helps extend Python with C/C++ and provides functionality similar to CFFI. Examples are given of integrating NumPy and the Eigen linear algebra library. Testing C code with Google Test (GTEST) is recommended. GTEST example code is provided. Memory leaks should be checked with Valgrind. A case study of using Azure DevOps for continuous integration is also presented. In summary, Pybind11 allows mixing C/C++ with Python for efficiency reasons while leveraging Python libraries, and testing C code is important.
This document summarizes a presentation about monitoring in the cloud using Puppet. It discusses challenges with traditional monitoring like Nagios being difficult to scale in cloud environments. It promotes using Puppet to configure monitoring tools like Collectd, Logstash, Graphite and Icinga in a dynamic and reproducible way. Specific examples are given around using Puppet to export monitoring configurations and collect metrics that are then graphed in Graphite.
This document provides an overview of the Python programming language and its uses. It introduces Python, how to set it up, and popular packages and tools used for software engineering, AI engineering, and data science. It discusses object-oriented programming, functional programming, APIs, web apps, databases, machine learning, data analysis, and visualization in Python. Popular integrated development environments and libraries like NumPy, Pandas, Matplotlib, and Scikit-Learn are also introduced. The presenter's credentials and experience working with Python are provided at the end.
This presentation demonstrates how to efficiently manage GPU buffers using today's APIs. It describes why buffer management is so important, and how inefficient buffer management can cut frame rates in half. Finally, it demonstrates a couple of new techniques; the first being discard-free circular buffers and the second transient buffers.
Shared memory allows processes to share blocks of memory, allowing for fast inter-process communication and data sharing. It provides benefits over alternatives like storing to disk or duplicating in memory. However, there are challenges like processes attaching segments at different addresses, requiring relative rather than absolute pointers. It also requires manually serializing data to work across processes instead of duplicating Ruby objects. Overall, shared memory can be useful for applications that need to share large read-only data or provide fast interprocess locks and messaging.
Presentation by Dr. Cliff Click, Jr. Mention Java performance to a C hacker, or vice versa, and a flame war will surely ensue. The Web is full of broken benchmarks and crazy claims about Java and C performance. This session will aim to give a fair(er) comparison between the languages, striving to give a balanced view of each language's various strengths and weaknesses. It will also point out what's broken about many of the Java-versus-C Websites, so when you come across one, you can see the flaws and know that the Website isn't telling you what it (generally) claims to be telling you. (It's surely telling you "something," but almost just as surely is "not realistically" telling you why X is better than Y).
Performance optimization techniques for Java codeAttila Balazs
The presentation covers the the basics of performance optimizations for real-world Java code. It starts with a theoretical overview of the concepts followed by several live demos
showing how performance bottlenecks can be diagnosed and eliminated. The demos include some non-trivial multi-threaded examples
inspired by real-world applications.
Customize and Secure the Runtime and Dependencies of Your Procedural Language...VMware Tanzu
Customize and Secure the Runtime and Dependencies of Your Procedural Languages Using PL/Container
Greenplum Summit at PostgresConf US 2018
Hubert Zhang and Jack Wu
With each of the past 3 Ruby releases, YJIT has delivered higher and higher performance. However, we are seeing diminishing returns, because as JIT-compiled code becomes faster, it makes up less and less of the total execution time, which is now becoming dominated by C function calls. As such, it may appear like there is a fundamental limit to Ruby’s performance.
In the first half of the 20th century, some early airplane designers thought that the speed of sound was a fundamental limit on the speed reachable by airplanes, thus coining the term “sound barrier”. This limit was eventually overcome, as it became understood that airflow behaves differently at supersonic speeds.
In order to break the Ruby performance barrier, it will be necessary to reduce the dependency on C extensions, and start writing more gems in pure Ruby code. In this talk, I want to look at this problem more in depth, and explore how YJIT can help enable writing pure-Ruby software that delivers high performance levels.
This document provides 10 tips for analyzing Wikipedia public data:
1. Be aware of special page types like disambiguation pages and redirects that need filtering.
2. Plan hardware carefully, prioritizing memory over disk and considering database engine configuration.
3. Fine tune database engine parameters to your hardware and exploit memory.
4. Use source control, publish code publicly, document code, and include testing.
5. Consider tools like Python and Perl that are well-suited to Wikipedia's text and link data formats.
6. Leverage existing solutions rather than reinventing functionality.
7. Automate processes to handle large datasets and enable reproducibility.
8. Expect
The document discusses lessons learned from using MongoDB at the New York Times over 6 months. It covers initial setup without backups or monitoring, improving to replication and monitoring, optimizing storage, backups, restores, querying, indexing and administration. Key lessons include using replication and backups, monitoring all aspects of MongoDB and storage, optimizing data and indexes for queries, and understanding data and access patterns.
The core idea of PyPy is to produce a flexible and fast implementation of the Python programming language. The talk will cover the interpreter, translator and jit parts of the code and their relationships and the fundamental ways in which PyPy differs from other virtual machine implementations.
Black, Flake8, isort, and Mypy are useful Python linters but it’s challenging to use them effectively at scale in the case of multiple codebases, in a large codebase, or with many developers. Manually managing consistent linter versions and configurations across codebases requires endless effort. Linter analysis on large codebases is slow. Linters may slow down developers by asking them to fix trivial issues. Running linters in distributed CI jobs makes it hard to understand the overall developer experience.
To handle these scale challenges, we developed a reusable linter framework that releases new linter updates automatically, reuses consistent configurations, runs linters on only updated code to speedup runtime, collects logs and metrics to provide observability, and builds auto fixes for common linter issues. Our linter runs are fast and scalable. Every week, they run 10k times on multiple millions of lines of code in over 25 codebases, generating 25k suggestions for more than 200 developers. Its autofixes also save 20 hours of developer time every week.
In this talk, we’ll walk you through popular Python linters and configuration recommendations, and we will discuss common issues and solutions when scaling them out. Using linters more effectively will make it much easier for you to apply best practices and more quickly write better code.
The document provides tips for managing memory and time when working with large datasets. It discusses choosing minimal data types to reduce memory usage, using tools to monitor memory usage, and manually collecting garbage after removing variables. It also recommends storing intermediate results in efficient file formats like HDF5 and Parquet to speed up iterations, and balancing the tradeoff between developing better models that take more time and doing faster iterations for debugging. The document uses examples from a Kaggle competition on ad tracking data to illustrate these strategies.
This document discusses combining Rust and Python to create a new "hip" programming language. It proposes two approaches: 1) Building Rust extensions for Python to improve performance of Python code. Rust could replace C and provide memory safety and better performance for Python extensions. 2) Building a Python interpreter using Rust (RustPython), which provides benefits like memory safety and borrowing rules from Rust. However, a Rust-based Python interpreter still has a long way to go before matching the performance and capabilities of CPython. In the end, the document acknowledges both Rust and Python have limitations and neither can fully "replace" the other.
Python is a versatile programming language used almost in all application development. OO Features coupled with excellent library support makes Python a preferred language for Web Development, IoT Application Development, Data Science and Machine Learning. In this module, we will focus on getting hands-on development and debugging experience of Python, starting with basics. Further will get a deeper understanding of advanced aspects like OO programming, File handling, Package creation, Exception handling. Eventually, you will be able to develop a multitasking application in IoT Gateways.
Numba is a just-in-time compiler for Python that can optimize numerical code to achieve speeds comparable to C/C++ without requiring the user to write C/C++ code. It works by compiling Python functions to optimized machine code using type information. Numba supports NumPy arrays and common mathematical functions. It can automatically optimize loops and compile functions for CPU or GPU execution. Numba allows users to write high-performance numerical code in Python without sacrificing readability or development speed.
This document provides an overview of deep learning applications and techniques for implementing deep neural networks. It discusses:
- Various use cases for deep learning in industries like computer vision, natural language processing, and speech recognition.
- Frameworks for implementing deep networks like Caffe and Keras, which allow finetuning pre-trained models or building custom networks.
- Examples of applying deep learning to tasks like classifying bee photos and search queries, using techniques like convolutional neural networks, word embeddings, and XGBoost classifiers.
- Tips for optimizing neural network architecture like starting simple and gradually increasing complexity.
The document shares results from several classification problems to demonstrate deep learning methods.
This document discusses using Pybind11 to create C extensions for Python. Pybind11 helps extend Python with C/C++ and provides functionality similar to CFFI. Examples are given of integrating NumPy and the Eigen linear algebra library. Testing C code with Google Test (GTEST) is recommended. GTEST example code is provided. Memory leaks should be checked with Valgrind. A case study of using Azure DevOps for continuous integration is also presented. In summary, Pybind11 allows mixing C/C++ with Python for efficiency reasons while leveraging Python libraries, and testing C code is important.
This document summarizes a presentation about monitoring in the cloud using Puppet. It discusses challenges with traditional monitoring like Nagios being difficult to scale in cloud environments. It promotes using Puppet to configure monitoring tools like Collectd, Logstash, Graphite and Icinga in a dynamic and reproducible way. Specific examples are given around using Puppet to export monitoring configurations and collect metrics that are then graphed in Graphite.
This document provides an overview of the Python programming language and its uses. It introduces Python, how to set it up, and popular packages and tools used for software engineering, AI engineering, and data science. It discusses object-oriented programming, functional programming, APIs, web apps, databases, machine learning, data analysis, and visualization in Python. Popular integrated development environments and libraries like NumPy, Pandas, Matplotlib, and Scikit-Learn are also introduced. The presenter's credentials and experience working with Python are provided at the end.
This presentation demonstrates how to efficiently manage GPU buffers using today's APIs. It describes why buffer management is so important, and how inefficient buffer management can cut frame rates in half. Finally, it demonstrates a couple of new techniques; the first being discard-free circular buffers and the second transient buffers.
Shared memory allows processes to share blocks of memory, allowing for fast inter-process communication and data sharing. It provides benefits over alternatives like storing to disk or duplicating in memory. However, there are challenges like processes attaching segments at different addresses, requiring relative rather than absolute pointers. It also requires manually serializing data to work across processes instead of duplicating Ruby objects. Overall, shared memory can be useful for applications that need to share large read-only data or provide fast interprocess locks and messaging.
Presentation by Dr. Cliff Click, Jr. Mention Java performance to a C hacker, or vice versa, and a flame war will surely ensue. The Web is full of broken benchmarks and crazy claims about Java and C performance. This session will aim to give a fair(er) comparison between the languages, striving to give a balanced view of each language's various strengths and weaknesses. It will also point out what's broken about many of the Java-versus-C Websites, so when you come across one, you can see the flaws and know that the Website isn't telling you what it (generally) claims to be telling you. (It's surely telling you "something," but almost just as surely is "not realistically" telling you why X is better than Y).
Performance optimization techniques for Java codeAttila Balazs
The presentation covers the the basics of performance optimizations for real-world Java code. It starts with a theoretical overview of the concepts followed by several live demos
showing how performance bottlenecks can be diagnosed and eliminated. The demos include some non-trivial multi-threaded examples
inspired by real-world applications.
Customize and Secure the Runtime and Dependencies of Your Procedural Language...VMware Tanzu
Customize and Secure the Runtime and Dependencies of Your Procedural Languages Using PL/Container
Greenplum Summit at PostgresConf US 2018
Hubert Zhang and Jack Wu
With each of the past 3 Ruby releases, YJIT has delivered higher and higher performance. However, we are seeing diminishing returns, because as JIT-compiled code becomes faster, it makes up less and less of the total execution time, which is now becoming dominated by C function calls. As such, it may appear like there is a fundamental limit to Ruby’s performance.
In the first half of the 20th century, some early airplane designers thought that the speed of sound was a fundamental limit on the speed reachable by airplanes, thus coining the term “sound barrier”. This limit was eventually overcome, as it became understood that airflow behaves differently at supersonic speeds.
In order to break the Ruby performance barrier, it will be necessary to reduce the dependency on C extensions, and start writing more gems in pure Ruby code. In this talk, I want to look at this problem more in depth, and explore how YJIT can help enable writing pure-Ruby software that delivers high performance levels.
This document provides 10 tips for analyzing Wikipedia public data:
1. Be aware of special page types like disambiguation pages and redirects that need filtering.
2. Plan hardware carefully, prioritizing memory over disk and considering database engine configuration.
3. Fine tune database engine parameters to your hardware and exploit memory.
4. Use source control, publish code publicly, document code, and include testing.
5. Consider tools like Python and Perl that are well-suited to Wikipedia's text and link data formats.
6. Leverage existing solutions rather than reinventing functionality.
7. Automate processes to handle large datasets and enable reproducibility.
8. Expect
The document discusses lessons learned from using MongoDB at the New York Times over 6 months. It covers initial setup without backups or monitoring, improving to replication and monitoring, optimizing storage, backups, restores, querying, indexing and administration. Key lessons include using replication and backups, monitoring all aspects of MongoDB and storage, optimizing data and indexes for queries, and understanding data and access patterns.
The core idea of PyPy is to produce a flexible and fast implementation of the Python programming language. The talk will cover the interpreter, translator and jit parts of the code and their relationships and the fundamental ways in which PyPy differs from other virtual machine implementations.
Black, Flake8, isort, and Mypy are useful Python linters but it’s challenging to use them effectively at scale in the case of multiple codebases, in a large codebase, or with many developers. Manually managing consistent linter versions and configurations across codebases requires endless effort. Linter analysis on large codebases is slow. Linters may slow down developers by asking them to fix trivial issues. Running linters in distributed CI jobs makes it hard to understand the overall developer experience.
To handle these scale challenges, we developed a reusable linter framework that releases new linter updates automatically, reuses consistent configurations, runs linters on only updated code to speedup runtime, collects logs and metrics to provide observability, and builds auto fixes for common linter issues. Our linter runs are fast and scalable. Every week, they run 10k times on multiple millions of lines of code in over 25 codebases, generating 25k suggestions for more than 200 developers. Its autofixes also save 20 hours of developer time every week.
In this talk, we’ll walk you through popular Python linters and configuration recommendations, and we will discuss common issues and solutions when scaling them out. Using linters more effectively will make it much easier for you to apply best practices and more quickly write better code.
The document provides tips for managing memory and time when working with large datasets. It discusses choosing minimal data types to reduce memory usage, using tools to monitor memory usage, and manually collecting garbage after removing variables. It also recommends storing intermediate results in efficient file formats like HDF5 and Parquet to speed up iterations, and balancing the tradeoff between developing better models that take more time and doing faster iterations for debugging. The document uses examples from a Kaggle competition on ad tracking data to illustrate these strategies.
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...IJECEIAES
Climate change's impact on the planet forced the United Nations and governments to promote green energies and electric transportation. The deployments of photovoltaic (PV) and electric vehicle (EV) systems gained stronger momentum due to their numerous advantages over fossil fuel types. The advantages go beyond sustainability to reach financial support and stability. The work in this paper introduces the hybrid system between PV and EV to support industrial and commercial plants. This paper covers the theoretical framework of the proposed hybrid system including the required equation to complete the cost analysis when PV and EV are present. In addition, the proposed design diagram which sets the priorities and requirements of the system is presented. The proposed approach allows setup to advance their power stability, especially during power outages. The presented information supports researchers and plant owners to complete the necessary analysis while promoting the deployment of clean energy. The result of a case study that represents a dairy milk farmer supports the theoretical works and highlights its advanced benefits to existing plants. The short return on investment of the proposed approach supports the paper's novelty approach for the sustainable electrical system. In addition, the proposed system allows for an isolated power setup without the need for a transmission line which enhances the safety of the electrical network
artificial intelligence and data science contents.pptxGauravCar
What is artificial intelligence? Artificial intelligence is the ability of a computer or computer-controlled robot to perform tasks that are commonly associated with the intellectual processes characteristic of humans, such as the ability to reason.
› ...
Artificial intelligence (AI) | Definitio
Advanced control scheme of doubly fed induction generator for wind turbine us...IJECEIAES
This paper describes a speed control device for generating electrical energy on an electricity network based on the doubly fed induction generator (DFIG) used for wind power conversion systems. At first, a double-fed induction generator model was constructed. A control law is formulated to govern the flow of energy between the stator of a DFIG and the energy network using three types of controllers: proportional integral (PI), sliding mode controller (SMC) and second order sliding mode controller (SOSMC). Their different results in terms of power reference tracking, reaction to unexpected speed fluctuations, sensitivity to perturbations, and resilience against machine parameter alterations are compared. MATLAB/Simulink was used to conduct the simulations for the preceding study. Multiple simulations have shown very satisfying results, and the investigations demonstrate the efficacy and power-enhancing capabilities of the suggested control system.
Design and optimization of ion propulsion dronebjmsejournal
Electric propulsion technology is widely used in many kinds of vehicles in recent years, and aircrafts are no exception. Technically, UAVs are electrically propelled but tend to produce a significant amount of noise and vibrations. Ion propulsion technology for drones is a potential solution to this problem. Ion propulsion technology is proven to be feasible in the earth’s atmosphere. The study presented in this article shows the design of EHD thrusters and power supply for ion propulsion drones along with performance optimization of high-voltage power supply for endurance in earth’s atmosphere.
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Sinan KOZAK
Sinan from the Delivery Hero mobile infrastructure engineering team shares a deep dive into performance acceleration with Gradle build cache optimizations. Sinan shares their journey into solving complex build-cache problems that affect Gradle builds. By understanding the challenges and solutions found in our journey, we aim to demonstrate the possibilities for faster builds. The case study reveals how overlapping outputs and cache misconfigurations led to significant increases in build times, especially as the project scaled up with numerous modules using Paparazzi tests. The journey from diagnosing to defeating cache issues offers invaluable lessons on maintaining cache integrity without sacrificing functionality.
2. Agenda
■ Managed vs Unmanaged memory
■ Memory Allocation in CPython
■ GarbageCollection in CPython
■ How other Python implementations handle memory
■ Tips &Tricks
■ Q&A
3. About author
■ Cursed by a witch in 2008 and since then can only use programming languages with
managed memory
■ ~7 years of professional experience with Python
■ Still constantly learning something new
■ Currently developing software here atTenable (we’re hiring)
■ Quite passionate about photography and filmmaking, please visit my website
https://notreally.media/ (even if you don’t know Russian, you can always take a look at
pretty pictures)
5. Unmanaged Memory
■ Developer needs to manually
allocate and release memory
Code from https://www.codingunit.com/c-tutorial-the-functions-malloc-and-free
8. Deep C
■ In actualVM someone still needs to write memory allocation
■ CPython has a few special allocators for both objects and non-objects
■ What allocator will be used also depends on a type of build
From https://docs.python.org/3/c-api/memory.html#default-memory-allocators
12. But that’s not all
■ Here you may start to think “it’s completely irrelevant to what I do in my day to day
job”
■ It is
■ Hence why I really don’t want to tell you a difference between untouched and free
memory blocks, for example
14. GIL
■ Global Interpreter Lock (GIL) is a lock (!) in interpreter (!!) that allows to execute only a
single thread at a time
■ It’s a quite controversial topic
■ Because of GIL CPython can’t into multithreading
■ But why do we even need GIL, if everyone hate it?
■ Also, why are we talking about GIL today?
15. Garbage Collection
■ GarbageCollector is a mechanism that automatically deletes unused objects
■ Python standard doesn’t force anyone to implement a certain type of GC
■ In CPython primary garbage collection mechanism is a reference counting
16. Reference counting
■ What is a variable?
■ Variable is just a label and reference for some object in memory
■ Object in memory can be referred without labels: see lists, tuples, etc.
■ If there are no references to an object – it can be deleted
17. Reference counting
■ When number of references increases?
■ Storing object in a new variable
■ Adding object to a collection
■ Passing object to a function
18. Reference counting
■ When number of references decreases?
■ Change variable value
■ Execution leaves a scope
■ Explicit call for `del var_name`
■ Removing from a collection
■ Global objects refcount can’t be 0
19. Reference counting
■ Reference counting caveats
■ Can’t handle cycle references
■ Not really thread safe (That’s why
CPython has GIL)
■ String constants and small integers can
be cached
■ Can’t be turned off
20. Generational GC
■ GenerationalGC is a built-in module to clean up everything that can’t be utilized by
reference counting
■ Based on a principle that most objects die young
■ Objects are tracked using the special lists — generations
21. Generational GC
■ There are only 3 generations
■ New objects are stored in a Generation 0
■ During the cleanup,GC will try to
determine, if every object in generation is
reachable from a root set of objects
■ If object is unreachable it will be deleted
■ If object survives in a cleanup, it will be
promoted to the next generation
Root Objects Generation X
22. Generational GC
■ Each generation have a threshold
■ If number of object in generation exceeds threshold, garbage collection will be
triggered automatically
■ Unlike the reference counting, GenerationalGC can be configured
■ User can set threshold levels
■ User can trigger collection manually
24. Other Python implementations
■ Jython and IronPython use whatever GC is in underlyingVM
■ PyPy has a pluggableGC architecture and a list of ready made garbage collectors
■ Default GC is called MinimarkGC, a super smart generationalGC
■ There is also an ongoing experiment to implement STM
25. SoftwareTransactional Memory
■ Instead of locking on a global level, uses transactions for more granular memory
operations
■ It means that using STM we can have real multithreading in python
■ Huge performance gain if program uses a lot of simultaneous threads
■ Significant performance loss for singlethreaded programs
■ Still not ready for production after many years of development
27. How to improve memory usage in your
project?
■ General rule: create benchmarks before starting the optimization
■ Useful tools for profiling: memory-profiler, pympler, objgraph
29. External modules
■ Lots of external Python modules are written in C in a very efficient way
■ Most commonly used is NumPy
■ You can write your ownC/C++/Rust extension using FFI
■ Cython is also a good choice for memory and/orCPU heavy modules
30. Generators
■ Very often you don’t really need to store the whole list/dict/etc.
■ Passing generator to a function looks prettier and usually more efficient
foo([x.bar for x in arr])
vs
foo(x.bar for x in arr)
■ Use generators as a first choice option for every new method returning a collection
■ Side effect: it will be easier to transform your code to be asynchronous
32. Tail call optimization
■ Recursion is not only limited in python, but also consumes a lot of memory
■ Certain recursive methods could be rewritten to be tail recursive
■ It means that every return statement is independent
■ By default, Python doesn’t optimize tail recursion
■ It can be easily optimized using simple decorators
34. Migrate to Python 3.7+
■ Latest versions have many memory optimizations
■ Don’t need to think about __slots__
■ Better finalizers handling
■ Python 2 sunsetting 1 January 2020
35. GC tuning
■ GenerationalGC can be tuned
■ Should only be done if you really have no other options
■ Threshold values can be optimized for your use-cases
■ GenerationalGC may be completely turned off
36. What to read next?
■ CPython source code. Its documentation is really great:
– cpython/Objects/obmalloc.c
– cpython/Modules/gcmodule.c
■ PyPy documentation on GC and STM:
– https://doc.pypy.org/en/latest/gc_info.html
– https://doc.pypy.org/en/latest/stm.html
■ Various talks over the last few years at PyCon
■ Instagram Engineering blogs, especially these two articles:
– https://instagram-engineering.com/dismissing-python-garbage-collection-at-instagram-
4dca40b29172
– https://instagram-engineering.com/copy-on-write-friendly-python-garbage-collection-
ad6ed5233ddf
37. Wrap up
■ If you don’t know anything about internal Python memory management – it’s fine
■ Memory usage can be reduced, but optimization techniques are limited
■ If you to have fine control over memory – don’t use Python
A quick guide for Python developers, who don’t want to know anything about memory management
By the end of this talk I hope that you will be glad that you don’t need to think about memory management in Python
Basically, the term "unmanaged memory" was only made up to reflect the existence of "managed memory". Before the 90-s it was just called "memory". In programming languages like C and C++, developer needs to manually allocate and deallocate memory on heap for the most non-trivial use cases. Here you can see the dramatization of this approach.
In comparison, in languages like Python, you don't need to think about memory allocation at all. Underlying Virtual Machine will handle all this boring stuff for you. As you can see in this example, Python is clearly superior to C, we don't need any more evidence.
However, someone still needs to write memory allocation for Python. It's well hidden from a common developer, and for the good reasons. It's time to dive into a Deep C
CPython memory allocation is rather complicated process. It involves using multiple different allocators and a number of heuristics to optimize memory usage. Allocators may vary depending on a size of requested memory, type of object or even a type of build.
Let’s talk about the general process of memory allocation in CPython. First of all, interpreter will get a huge chunk of memory from the host operating system. Some of it will be used for internal needs, some for program needs.
Now, to allocate something for an object, interpreter will need to think hard. It can’t just get the leftmost empty chunk of memory. Userspace memory organized the next way: biggest areas are called arenas. They are aligned with virtual memory pages, but usually have size bigger than just one page. Inside arenas there are a few pools, each of them will have a size of a virtual memory page (4kb). Memory for objects should reside in these pools, in their subdivision called blocks. Size of those blocks will depend on so called “size class” of the object. Usually, it’s size in bytes rounded to the closest bigger multiple of 8 bytes. As they say in documentation, this strategy is a variant of “simple segregated storage based on array of free lists”. Emphasis on simple. For me, it looks more like a Russian Doll
By this moment you should be totally disappointed in this talk. If you think that all this memory layout stuff has nothing to do with actual python developer work, you’re right. I don’t even want to go in depth of this topic, because further I’ll have to explain what’s the difference between untouched and free memory blocks, or how allocator chooses which pool to use. If you’re not a core python developer, you don’t need to know that. I just wanted to show you, that the deeper you go into memory allocation, the more you love the fact that you don’t need to write it yourself.
Let’s talk about something that really matters in context of memory management, and what is actually useful. Garbage collection.
And we’ll start with Global Interpreter Lock. If you don’t know what it is, first of all – shame on you. But I’ll give a brief definition. Global Interpreter Lock is an interpreter level lock that allows to execute only a single thread at a time.
Whenever people talk about GIL, there are usually pretty heated debates. Basically, because of GIL CPython doesn’t have real mutlithreading, and developers in other languages are laughing about it. So, why do we even need GIL in the first place, and why do we talk about it today? The main reason of GIL existence is hidden in CPython’s garbage collector implementation
Let’s start with the basics. What’s a garbage collector? To cut a long story short, it’s a mechanism to automatically deletes unused objects. If you go to the python documentation, you won’t find any specifics on which garbage collection algorithm to use. Basically, memory should be freed at some point of time. For example, it can be freed by the host operating system when the interpreter process is terminated. No one will do it this way, but there is a possibility.
Before starting to talk about reference counting, let’s talk about variables. In short, variable in Python is just a label and reference to a certain object. Each object will store the number of such references to it. However, some references don’t even require any labels-variables. For example, you can create a list without ever creating any variable. Quite self-evident, that if there are no references to object, it can be safely deleted.
There are a few caveats with referencing counting. First of all, it can’t handle reference cycles. If you look at this simple example, you can note that even after deleting variable l from namespace, each node in linked list have at least one reference to it. Second, to be safe and fast, reference counting should be single threaded. It means that we need some kind of global lock, that will allow execution of only one thread per process. Hmmm… Also, reference counting can be tricky when you have special mechanisms in interpreter like caching small strings and integers. And you can’t turn it off at all.
To solve the problem of cyclic references, modern CPython implementations have a special mechanism, called Generational Garbage Collector. It’s stored in a built-in module called gc. It’s based on a principle that most objects die young. How they say it “objects have high infant mortality”. Most objects are tracked by GC using the special lists, called generations. By the way, to continue the previous analogy, first generation is usually called nursery.
Now let’s briefly talk about other python implementations
Jython and IronPython use whatever GC that is in underlying VM. I don’t really remember any details, but one thing is important for sure — there are no Global Interpreter Lock in those implementations.