This document proposes a two-level just-in-time compilation approach using one interpreter and one engine. It finds that by providing different interpreter definitions to the RPython meta-tracing compiler, different kinds of compilers and compilations can be derived, such as tracing, method, and threaded code compilers. The key idea is an adaptive RPython system that performs multitier compilation by generating different interpreters from a generic interpreter and driving the RPython engine accordingly. This challenges the assumption in the JIT community that a meta-tracing compiler can only perform tracing compilation.
Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...Yusuke Izawa
This document summarizes Yusuke Izawa's master's thesis defense on stack hybridization, a mechanism for bridging two compilation strategies - tracing and method-based - in a meta compiler framework. The proposal extends a meta-tracing just-in-time (JIT) compiler to apply different compilation strategies to different parts of a program based on call context. A proof-of-concept implementation in OCaml showed the hybrid approach was about 1.1x faster than a method-based only approach and over 100x faster than a tracing only approach.
GCC is a widely used open source compiler system developed by the GNU Project. It compiles C, C++, Java, Fortran and other languages. GCC has undergone major changes to its structure since 2005, including the addition of GENERIC and GIMPLE intermediate representations between the front end and back end. The front end parses source code into ASTs, then GIMPLE trees are optimized through many passes in the middle end before being converted to RTL for the back end code generation.
The document discusses the Cilk programming language and its runtime system for parallel programming. Cilk extends C with keywords like spawn and sync to express parallelism. It provides performance guarantees and automatically manages scheduling across processors. The runtime system uses work-stealing to map Cilk threads to processors with near-optimal efficiency. Cilk allows expressing parallelism while hiding low-level details like load balancing.
eBPF Debugging Infrastructure - Current TechniquesNetronome
eBPF (extended Berkeley Packet Filter), in particular with its driver-level hook XDP (eXpress Data Path), has increased in importance over the past few years. As a result, the ability to rapidly debug and diagnose problems is becoming more relevant. This talk will cover common issues faced and techniques to diagnose them, including the use of bpftool for map and program introspection, the use of disassembly to inspect generated assembly code and other methods such as using debug prints and how to apply these techniques when eBPF programs are offloaded to the hardware.
The talk will also explore where the current gaps in debugging infrastructure are and suggest some of the next steps to improve this, for example, integrations with tools such as strace, valgrind or even the LLDB debugger.
RaVioli: A Parallel Vide Processing Library with Auto Resolution AdjustabilityMatsuo and Tsumura lab.
RaVioli is a parallel video processing library that provides auto resolution adjustability. It hides resolutions from programmers and allows for pseudo real-time processing by adjusting computational loads. The library includes semi-automatic parallelization functions such as automatic block decomposition and a pipelining interface with an automatic load balancing mechanism. Evaluation results demonstrate the library's ability to adjust frame rate and resolution, perform parallelization through block decomposition, and balance loads between pipeline stages.
GCC is a widely used open source compiler. It consists of frontends for languages like C and C++ and backends that generate code for different CPU architectures. The GCC Extensibility Made Easy (GEM) framework allows dynamically loading modules to extend GCC functionality. Examples include adding new language features, improving security, and facilitating operating system development.
The LLVM project is a collection of compiler and toolchain technologies, including an optimizer, code generators, and front-ends like llvm-gcc and Clang. The project aims to provide modular, reusable compiler components to reduce the time and cost of building compilers. It also seeks to implement modern compiler techniques to generate fast, optimized code. LLVM has been used to build fast C/C++ compilers like LLVM-GCC that show improvements in compilation speed and generated code quality compared to GCC.
This document discusses Veriloggen, a Python framework for generating Verilog HDL code from Python. It allows designing hardware at the register-transfer level using Python by mapping Python constructs to Verilog modules, always blocks, wires, and other Verilog constructs. Veriloggen includes modules for RTL generation (Core), connecting Python threads to finite state machines (Thread), and defining streaming hardware (Stream). It aims to support a "Veriloggen for DSL X" approach to create domain-specific hardware description languages in Python.
Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...Yusuke Izawa
This document summarizes Yusuke Izawa's master's thesis defense on stack hybridization, a mechanism for bridging two compilation strategies - tracing and method-based - in a meta compiler framework. The proposal extends a meta-tracing just-in-time (JIT) compiler to apply different compilation strategies to different parts of a program based on call context. A proof-of-concept implementation in OCaml showed the hybrid approach was about 1.1x faster than a method-based only approach and over 100x faster than a tracing only approach.
GCC is a widely used open source compiler system developed by the GNU Project. It compiles C, C++, Java, Fortran and other languages. GCC has undergone major changes to its structure since 2005, including the addition of GENERIC and GIMPLE intermediate representations between the front end and back end. The front end parses source code into ASTs, then GIMPLE trees are optimized through many passes in the middle end before being converted to RTL for the back end code generation.
The document discusses the Cilk programming language and its runtime system for parallel programming. Cilk extends C with keywords like spawn and sync to express parallelism. It provides performance guarantees and automatically manages scheduling across processors. The runtime system uses work-stealing to map Cilk threads to processors with near-optimal efficiency. Cilk allows expressing parallelism while hiding low-level details like load balancing.
eBPF Debugging Infrastructure - Current TechniquesNetronome
eBPF (extended Berkeley Packet Filter), in particular with its driver-level hook XDP (eXpress Data Path), has increased in importance over the past few years. As a result, the ability to rapidly debug and diagnose problems is becoming more relevant. This talk will cover common issues faced and techniques to diagnose them, including the use of bpftool for map and program introspection, the use of disassembly to inspect generated assembly code and other methods such as using debug prints and how to apply these techniques when eBPF programs are offloaded to the hardware.
The talk will also explore where the current gaps in debugging infrastructure are and suggest some of the next steps to improve this, for example, integrations with tools such as strace, valgrind or even the LLDB debugger.
RaVioli: A Parallel Vide Processing Library with Auto Resolution AdjustabilityMatsuo and Tsumura lab.
RaVioli is a parallel video processing library that provides auto resolution adjustability. It hides resolutions from programmers and allows for pseudo real-time processing by adjusting computational loads. The library includes semi-automatic parallelization functions such as automatic block decomposition and a pipelining interface with an automatic load balancing mechanism. Evaluation results demonstrate the library's ability to adjust frame rate and resolution, perform parallelization through block decomposition, and balance loads between pipeline stages.
GCC is a widely used open source compiler. It consists of frontends for languages like C and C++ and backends that generate code for different CPU architectures. The GCC Extensibility Made Easy (GEM) framework allows dynamically loading modules to extend GCC functionality. Examples include adding new language features, improving security, and facilitating operating system development.
The LLVM project is a collection of compiler and toolchain technologies, including an optimizer, code generators, and front-ends like llvm-gcc and Clang. The project aims to provide modular, reusable compiler components to reduce the time and cost of building compilers. It also seeks to implement modern compiler techniques to generate fast, optimized code. LLVM has been used to build fast C/C++ compilers like LLVM-GCC that show improvements in compilation speed and generated code quality compared to GCC.
This document discusses Veriloggen, a Python framework for generating Verilog HDL code from Python. It allows designing hardware at the register-transfer level using Python by mapping Python constructs to Verilog modules, always blocks, wires, and other Verilog constructs. Veriloggen includes modules for RTL generation (Core), connecting Python threads to finite state machines (Thread), and defining streaming hardware (Stream). It aims to support a "Veriloggen for DSL X" approach to create domain-specific hardware description languages in Python.
The document discusses challenges in GPU compilers. It begins with introductions and abbreviations. It then outlines the topics to be covered: a brief history of GPUs, what makes GPUs special, how to program GPUs, writing a GPU compiler including front-end, middle-end, and back-end aspects, and a few words about graphics. Key points are that GPUs are massively data-parallel, execute instructions in lockstep, and require supporting new language features like OpenCL as well as optimizing for and mapping to the GPU hardware architecture.
The document discusses using the GNU Debugger (gdb) to debug applications. It covers when to use a debugger, invoking and configuring gdb, setting breakpoints, examining stack frames and data, disassembling code, and viewing registers. Gdb allows stepping through code, viewing variables and memory, and setting conditional breakpoints to debug programs.
This document discusses hybrid OpenMP and MPI programming. It provides an introduction to hybrid programming and outlines some of the benefits such as exploiting shared memory parallelism within a node using OpenMP while also scaling across nodes with MPI. It discusses different parallelization strategies and considerations for debugging and optimizing hybrid codes. It also provides two examples of hybrid codes: a multi-dimensional array transpose algorithm and the Community Atmosphere Model climate simulation code.
Directive-based approach to Heterogeneous ComputingRuymán Reyes
The document discusses a directive-based approach to heterogeneous computing. It describes how applications used in HPC centers commonly use MPI and OpenMP programming models. It also discusses how complexity arises from mixing different Fortran dialects and the need for faster ways to migrate code to new architectures like accelerators without rewriting the code. The document proposes using directives to enhance legacy code for heterogeneous systems in a portable way.
The document discusses stacks and procedures in assembly language programming. It covers stack implementation using registers and instructions, parameter passing methods using registers or stack, and establishing stack frames using ENTER and LEAVE instructions. Procedures can be called using CALL and control returned using RET. The stack is used for temporary data storage, parameter passing, and storing return addresses for procedures.
We describe ocl, a Python library built on top of pyOpenCL and numpy. It allows programming
GPU devices using Python. Python functions which are marked up using the provided
decorator, are converted into C99/OpenCL and compiled using the JIT at runtime. This
approach lowers the barrier to entry to programming GPU devices since it requires only
Python syntax and no external compilation or linking steps. The resulting Python program runs
even if a GPU is not available. As an example of application, we solve the problem of
computing the covariance matrix for historical stock prices and determining the optimal
portfolio according to Modern Portfolio Theory
A Framework for Efficient Rapid Prototyping by Virtually Enlarging FPGA Resou...Shinya Takamaeda-Y
A Framework for Efficient Rapid Prototyping by Virtually Enlarging FPGA Resources (ReConFig2014@Cancun, Mexico)
flipSyrup, a new framework for rapid prototyping is proposed.
Everything You Need to Know About the Intel® MPI LibraryIntel® Software
The document discusses tuning the Intel MPI library. It begins with an introduction to factors that impact MPI performance like CPUs, memory, network speed and job size. It notes that MPI libraries must make choices that may not be optimal for all applications. The document then outlines its plan to cover basic tuning techniques like profiling, hostfiles and process placement, as well as intermediate topics like point-to-point optimization and collective tuning. The goal is to help reduce time and memory usage of MPI applications.
The document discusses parallel programming using the Message Passing Interface (MPI). It provides an overview of MPI, including what MPI is, common implementations like OpenMPI, the general MPI API, point-to-point and collective communication functions, and how to perform basic operations like send, receive, broadcast and reduce. It also covers MPI concepts like communicators, blocking vs non-blocking communication, and references additional resources for learning more about MPI programming.
This document provides an overview of using the Rcpp package to integrate C++ with R code in order to improve performance. It discusses getting started with Rcpp, converting R functions to C++, attributes and classes in Rcpp, handling missing values, Rcpp Sugar for vectorization, using the Standard Template Library, and examples. The key points covered are how Rcpp allows embedding C++ code in R and compiling it to create faster R functions, as well as techniques like Rcpp Sugar and the STL that help write efficient C++ code for R.
The document describes an algorithm for parsing context-free grammars called the CYK algorithm. It works by examining all possible decompositions of the input string into prefix-suffix pairs based on the grammar productions. It runs in O(n3) time, where n is the length of the input string, making it faster than an exhaustive search approach. The algorithm is demonstrated on an example grammar and string to show how it builds up the possible derivations in a bottom-up dynamic programming manner.
The document discusses compiler optimization techniques. It begins with an introduction to compiler optimizations and describes 16 specific optimization techniques including copy propagation, constant folding, dead code removal, and loop unrolling. It explains each technique in detail with examples. The key takeaway is that the more information a programmer provides to the compiler, the better job the compiler can do optimizing the code.
The document discusses various compiler optimizations including:
1. Procedure integration replaces procedure calls with the procedure body to eliminate function call overhead.
2. Common subexpression elimination replaces repeated computations of the same expression with a single variable to store the result.
3. Constant propagation replaces variables assigned a constant value with the constant throughout the code.
4. The document provides examples of these and other optimizations like copy propagation, code motion, induction variable elimination, and loop unrolling which aims to improve performance by reducing instructions and improving pipeline utilization.
International Journal of Engineering Research and DevelopmentIJERD Editor
Electrical, Electronics and Computer Engineering,
Information Engineering and Technology,
Mechanical, Industrial and Manufacturing Engineering,
Automation and Mechatronics Engineering,
Material and Chemical Engineering,
Civil and Architecture Engineering,
Biotechnology and Bio Engineering,
Environmental Engineering,
Petroleum and Mining Engineering,
Marine and Agriculture engineering,
Aerospace Engineering.
The document discusses run-time environments and activation records. It explains that activation records are used to manage information for each procedure call and are allocated on the stack. Activation records contain fields for return values, parameters, local variables, and more. When a procedure is called, its activation record is pushed onto the stack and popped off when it returns. Activation records allow recursive calls by creating a new record each time a procedure is activated.
Cray XT Porting, Scaling, and Optimization Best PracticesJeff Larkin
The document discusses optimization best practices for Cray XT systems. It covers choosing compilers and compiler flags, profiling and debugging codes at scale with hardware performance counters and CrayPAT tools, optimizing communication with MPI by using techniques like pre-posting receives and reducing collectives, and optimizing I/O. The document emphasizes testing optimizations on the number of nodes the application will actually run on.
This document discusses event tracing using VampirTrace and Vampir. It provides an overview of event tracing, including instrumentation, run-time measurement, and visualization. Event tracing involves instrumenting code to record events, running the instrumented code to generate trace files, and using tools like Vampir to analyze and visualize the trace files.
This document discusses porting, scaling, and optimizing applications on Cray XT systems. It covers topics such as choosing compilers, profiling and debugging applications at scale, understanding CPU affinity, and improvements in the Cray Message Passing Toolkit (MPT). The document provides guidance on leveraging different compilers, collecting performance data using hardware counters and CrayPAT, understanding MPI process binding, and enhancements in MPT 4.0 related to MPI standards support and communication optimizations.
The document discusses updates to the POSIX and C standards. Regarding POSIX, it summarizes the new features in POSIX:2008, including expanded API sets derived from Linux standards. For C, it outlines proposals and changes in C1X, the next revision of the C standard, such as new character types for UTF-16/32, bounds-checking interfaces, and dynamic memory allocation functions. It provides status updates on implementations in various operating systems.
The document discusses challenges in GPU compilers. It begins with introductions and abbreviations. It then outlines the topics to be covered: a brief history of GPUs, what makes GPUs special, how to program GPUs, writing a GPU compiler including front-end, middle-end, and back-end aspects, and a few words about graphics. Key points are that GPUs are massively data-parallel, execute instructions in lockstep, and require supporting new language features like OpenCL as well as optimizing for and mapping to the GPU hardware architecture.
The document discusses using the GNU Debugger (gdb) to debug applications. It covers when to use a debugger, invoking and configuring gdb, setting breakpoints, examining stack frames and data, disassembling code, and viewing registers. Gdb allows stepping through code, viewing variables and memory, and setting conditional breakpoints to debug programs.
This document discusses hybrid OpenMP and MPI programming. It provides an introduction to hybrid programming and outlines some of the benefits such as exploiting shared memory parallelism within a node using OpenMP while also scaling across nodes with MPI. It discusses different parallelization strategies and considerations for debugging and optimizing hybrid codes. It also provides two examples of hybrid codes: a multi-dimensional array transpose algorithm and the Community Atmosphere Model climate simulation code.
Directive-based approach to Heterogeneous ComputingRuymán Reyes
The document discusses a directive-based approach to heterogeneous computing. It describes how applications used in HPC centers commonly use MPI and OpenMP programming models. It also discusses how complexity arises from mixing different Fortran dialects and the need for faster ways to migrate code to new architectures like accelerators without rewriting the code. The document proposes using directives to enhance legacy code for heterogeneous systems in a portable way.
The document discusses stacks and procedures in assembly language programming. It covers stack implementation using registers and instructions, parameter passing methods using registers or stack, and establishing stack frames using ENTER and LEAVE instructions. Procedures can be called using CALL and control returned using RET. The stack is used for temporary data storage, parameter passing, and storing return addresses for procedures.
We describe ocl, a Python library built on top of pyOpenCL and numpy. It allows programming
GPU devices using Python. Python functions which are marked up using the provided
decorator, are converted into C99/OpenCL and compiled using the JIT at runtime. This
approach lowers the barrier to entry to programming GPU devices since it requires only
Python syntax and no external compilation or linking steps. The resulting Python program runs
even if a GPU is not available. As an example of application, we solve the problem of
computing the covariance matrix for historical stock prices and determining the optimal
portfolio according to Modern Portfolio Theory
A Framework for Efficient Rapid Prototyping by Virtually Enlarging FPGA Resou...Shinya Takamaeda-Y
A Framework for Efficient Rapid Prototyping by Virtually Enlarging FPGA Resources (ReConFig2014@Cancun, Mexico)
flipSyrup, a new framework for rapid prototyping is proposed.
Everything You Need to Know About the Intel® MPI LibraryIntel® Software
The document discusses tuning the Intel MPI library. It begins with an introduction to factors that impact MPI performance like CPUs, memory, network speed and job size. It notes that MPI libraries must make choices that may not be optimal for all applications. The document then outlines its plan to cover basic tuning techniques like profiling, hostfiles and process placement, as well as intermediate topics like point-to-point optimization and collective tuning. The goal is to help reduce time and memory usage of MPI applications.
The document discusses parallel programming using the Message Passing Interface (MPI). It provides an overview of MPI, including what MPI is, common implementations like OpenMPI, the general MPI API, point-to-point and collective communication functions, and how to perform basic operations like send, receive, broadcast and reduce. It also covers MPI concepts like communicators, blocking vs non-blocking communication, and references additional resources for learning more about MPI programming.
This document provides an overview of using the Rcpp package to integrate C++ with R code in order to improve performance. It discusses getting started with Rcpp, converting R functions to C++, attributes and classes in Rcpp, handling missing values, Rcpp Sugar for vectorization, using the Standard Template Library, and examples. The key points covered are how Rcpp allows embedding C++ code in R and compiling it to create faster R functions, as well as techniques like Rcpp Sugar and the STL that help write efficient C++ code for R.
The document describes an algorithm for parsing context-free grammars called the CYK algorithm. It works by examining all possible decompositions of the input string into prefix-suffix pairs based on the grammar productions. It runs in O(n3) time, where n is the length of the input string, making it faster than an exhaustive search approach. The algorithm is demonstrated on an example grammar and string to show how it builds up the possible derivations in a bottom-up dynamic programming manner.
The document discusses compiler optimization techniques. It begins with an introduction to compiler optimizations and describes 16 specific optimization techniques including copy propagation, constant folding, dead code removal, and loop unrolling. It explains each technique in detail with examples. The key takeaway is that the more information a programmer provides to the compiler, the better job the compiler can do optimizing the code.
The document discusses various compiler optimizations including:
1. Procedure integration replaces procedure calls with the procedure body to eliminate function call overhead.
2. Common subexpression elimination replaces repeated computations of the same expression with a single variable to store the result.
3. Constant propagation replaces variables assigned a constant value with the constant throughout the code.
4. The document provides examples of these and other optimizations like copy propagation, code motion, induction variable elimination, and loop unrolling which aims to improve performance by reducing instructions and improving pipeline utilization.
International Journal of Engineering Research and DevelopmentIJERD Editor
Electrical, Electronics and Computer Engineering,
Information Engineering and Technology,
Mechanical, Industrial and Manufacturing Engineering,
Automation and Mechatronics Engineering,
Material and Chemical Engineering,
Civil and Architecture Engineering,
Biotechnology and Bio Engineering,
Environmental Engineering,
Petroleum and Mining Engineering,
Marine and Agriculture engineering,
Aerospace Engineering.
The document discusses run-time environments and activation records. It explains that activation records are used to manage information for each procedure call and are allocated on the stack. Activation records contain fields for return values, parameters, local variables, and more. When a procedure is called, its activation record is pushed onto the stack and popped off when it returns. Activation records allow recursive calls by creating a new record each time a procedure is activated.
Cray XT Porting, Scaling, and Optimization Best PracticesJeff Larkin
The document discusses optimization best practices for Cray XT systems. It covers choosing compilers and compiler flags, profiling and debugging codes at scale with hardware performance counters and CrayPAT tools, optimizing communication with MPI by using techniques like pre-posting receives and reducing collectives, and optimizing I/O. The document emphasizes testing optimizations on the number of nodes the application will actually run on.
This document discusses event tracing using VampirTrace and Vampir. It provides an overview of event tracing, including instrumentation, run-time measurement, and visualization. Event tracing involves instrumenting code to record events, running the instrumented code to generate trace files, and using tools like Vampir to analyze and visualize the trace files.
This document discusses porting, scaling, and optimizing applications on Cray XT systems. It covers topics such as choosing compilers, profiling and debugging applications at scale, understanding CPU affinity, and improvements in the Cray Message Passing Toolkit (MPT). The document provides guidance on leveraging different compilers, collecting performance data using hardware counters and CrayPAT, understanding MPI process binding, and enhancements in MPT 4.0 related to MPI standards support and communication optimizations.
The document discusses updates to the POSIX and C standards. Regarding POSIX, it summarizes the new features in POSIX:2008, including expanded API sets derived from Linux standards. For C, it outlines proposals and changes in C1X, the next revision of the C standard, such as new character types for UTF-16/32, bounds-checking interfaces, and dynamic memory allocation functions. It provides status updates on implementations in various operating systems.
Python is General purpose, High level programming language.Python is one of the simplest language ever. Syntaxes are simple, easy to
remember and quite expressive. When it comes to learning, it has been found that the learning curve for python is quite steeper compared to other programming languages.Python being freeware, you don’t have to spend on licensing. And since it is open source so its original source code is freely available and can be redistributed and modifiable.Python was developed to bridge the gap between C and shell
scripting and also include the feature of exception handling from ABC language. So we can say that, initially Python was interpreted language. But later it was made compiled and interpreted both.
Learn more about Python programming with Learnbay.
Visit:www.learnbay.co
The core idea of PyPy is to produce a flexible and fast implementation of the Python programming language. The talk will cover the interpreter, translator and jit parts of the code and their relationships and the fundamental ways in which PyPy differs from other virtual machine implementations.
Talk at PyCon2022 over building binary packages for Python. Covers an overview and an in-depth look into pybind11 for binding, scikit-build for creating the build, and build & cibuildwheel for making the binaries that can be distributed on PyPI.
Monitoraggio del Traffico di Rete Usando Python ed ntopPyCon Italia
This document discusses integrating the network traffic monitoring tool ntop with Python. It describes ntop as an open source tool that supports network monitoring and management. It outlines previous attempts to add scripting to ntop using Perl and Lua, and explains why Python was ultimately chosen for its ease of use, features, and efficiency. The architecture of the ntop Python integration is presented, including how Python scripts can access ntop data and methods and generate dynamic web pages. Challenges and limitations of mixing ntop with Python are also covered.
Compilers have been improving programmer productivity ever since IBM produced the first FORTRAN compiler in 1957. Today, we mostly take them for granted but even after more than 60 years, compiler researchers and practitioners continue to push the boundaries for what compilers can achieve as well as how easy it is to leverage the sophisticated code bases that encapsulate those six decades of learning in this field. In this talk, I want to highlight how industry trends like the migration to cloud infrastructures and data centers as well as the rise of flexibly licensed open source projects like LLVM and Eclipse OMR are paving the way towards even more effective and powerful compilation infrastructures than have ever existed: compilers with the opportunity to contribute to programmer productivity in even more ways than simply better hardware instruction sequences, and with simpler APIs so they can be readily used in scenarios where even today's most amazing Just In Time compilers are not really practical.
This document summarizes a knowledge sharing session on Javascript sourcemaps and Angular compilation. It discusses how sourcemaps allow minified code to be mapped back to original source code for debugging purposes. It also explains the different stages of Angular compilation including initialization, analysis, resolution, type checking and emitting. The key differences between just-in-time (JIT) compilation and ahead-of-time (AOT) compilation are outlined, noting that AOT produces smaller bundles but requires compilation during the build. The advantages of sourcemaps and AOT for production use are highlighted.
This document discusses using Python and Amazon EC2 for parallel programming and clustering. It introduces ElasticWulf, which provides Amazon Machine Images preconfigured for clustering. It also covers MPI (message passing interface) basics in Python, including broadcasting, scattering, gathering, and reducing data across nodes. A demo is given of launching an ElasticWulf cluster on EC2, configuring it for MPI, and running a simple parallel pi calculation example using mpi4py.
PyCon TW 2017 - PyPy's approach to construct domain-specific language runtime...Tsundere Chen
PyCon TW 2017 - PyPy's approach to construct domain-specific language runtime -Part 2
This is the slide for PyCon TW 2017 Day 3 PyPy's approach to construct domain-specific language runtime's Slide, and this is part 2, Part 1 is jserv's work, refer to his slide
This document discusses training on Python that was conducted over six weeks by Cetpa Infotech Pvt. Ltd. It covers topics like what Python is, the differences between programs and scripting languages, Python's history and uses. It also discusses installing Python IDEs and provides examples of Python code, variables, data types, strings, lists, tuples, and control flow statements. The conclusion is that Python is a good teaching language due to being free, easy to install, and flexible for both procedural and object-oriented programming.
Python uses DYNAMIC typing AND A COMBINATION of reference counting AND A cycle-detecting GARBAGE collector for memory MANAGEMENT. It ALSO FEATURES DYNAMIC NAME RESOLUTION(LATE binding) which binds methods AND VARIABLE NAMES during PROGRAM execution.
As we know, whenever we run ANY APPLICATION it gets LOADED into RAM AND some memory gets ALLOCATED by OS for THAT APPLICATION.
Learn more about Python PRogramming with Learnbay.
Visit:https://www.learnbay.co/data-science-course/
The document discusses parallel programming using MPI (Message Passing Interface). It introduces MPI as a standard for message passing between processes. It describes how to set up a basic parallel computing environment using a cluster of networked computers. It provides examples of using MPI functions to implement parallel algorithms, including point-to-point and collective communication like broadcast, gather, and scatter.
This document contains summaries of several lab assignments and homework problems for an embedded systems course. It discusses topics including memory mapping, assembly language, loops and subroutines, I/O port programming, data manipulation, and embedded C programming.
For more classes visit
www.snaptutorial.com
Laboratory Title: Introduction to Memory Map
Submittal Date:Click here to enter a date.
Objectives:
The objective of this lab is familiarize ourselves with different factor for memory such as memory decoding
This document contains summaries of several lab assignments and homework problems for an embedded systems course. It discusses topics including memory mapping, assembly language, loops and subroutines, I/O port programming, data manipulation, and embedded C programming.
Ecet 330 Enthusiastic Study / snaptutorial.comStephenson033
Laboratory Title: Introduction to Memory Map
Submittal Date:Click here to enter a date.
Objectives:
The objective of this lab is familiarize ourselves with different factor for memory such as memory decoding and memory mapping
Give two differences between EEPROM and Flash memory.
The document discusses JIT compilation in CPython. It begins with a brief history of JIT compilation, including early implementations in LISP and Smalltalk. The author then describes their experience with JIT compilation in CPython, including converting Python code to IL assembly and machine code. Benchmarks show the JIT compiled Fibonacci function is around 8 times faster than the unoptimized version. Finally, the document briefly mentions the Numba project, which uses JIT compilation to accelerate Python code.
The document summarizes a presentation on Cython, a programming language that allows writing Python extensions and integrating Python with C/C++ code. Cython code can be compiled into C/C++ extensions that speed up Python code by allowing static type declarations. The presentation covers Cython features like static typing, C pointers and strings, exception handling, and defining extension types. It provides examples of Cython code and compiling Cython to C/C++ extensions using various methods.
Talk Python To Me: Stream Processing in your favourite Language with Beam on ...Aljoscha Krettek
Flink is a great stream processor, Python is a great programming language, Apache Beam is a great programming model and portability layer. Using all three together is a great idea! We will demo and discuss writing Beam Python pipelines and running them on Flink. We will cover Beam's portability vision that led here, what you need to know about how Beam Python pipelines are executed on Flink, and where Beam's portability framework is headed next (hint: Python pipelines reading from non-Python connectors)
Similar to Two-level Just-in-Time Compilation with One Interpreter and One Engine (20)
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...XfilesPro
Wondering how X-Sign gained popularity in a quick time span? This eSign functionality of XfilesPro DocuPrime has many advancements to offer for Salesforce users. Explore them now!
SMS API Integration in Saudi Arabia| Best SMS API ServiceYara Milbes
Discover the benefits and implementation of SMS API integration in the UAE and Middle East. This comprehensive guide covers the importance of SMS messaging APIs, the advantages of bulk SMS APIs, and real-world case studies. Learn how CEQUENS, a leader in communication solutions, can help your business enhance customer engagement and streamline operations with innovative CPaaS, reliable SMS APIs, and omnichannel solutions, including WhatsApp Business. Perfect for businesses seeking to optimize their communication strategies in the digital age.
UI5con 2024 - Keynote: Latest News about UI5 and it’s EcosystemPeter Muessig
Learn about the latest innovations in and around OpenUI5/SAPUI5: UI5 Tooling, UI5 linter, UI5 Web Components, Web Components Integration, UI5 2.x, UI5 GenAI.
Recording:
https://www.youtube.com/live/MSdGLG2zLy8?si=INxBHTqkwHhxV5Ta&t=0
E-commerce Development Services- Hornet DynamicsHornet Dynamics
For any business hoping to succeed in the digital age, having a strong online presence is crucial. We offer Ecommerce Development Services that are customized according to your business requirements and client preferences, enabling you to create a dynamic, safe, and user-friendly online store.
Microservice Teams - How the cloud changes the way we workSven Peters
A lot of technical challenges and complexity come with building a cloud-native and distributed architecture. The way we develop backend software has fundamentally changed in the last ten years. Managing a microservices architecture demands a lot of us to ensure observability and operational resiliency. But did you also change the way you run your development teams?
Sven will talk about Atlassian’s journey from a monolith to a multi-tenanted architecture and how it affected the way the engineering teams work. You will learn how we shifted to service ownership, moved to more autonomous teams (and its challenges), and established platform and enablement teams.
Artificia Intellicence and XPath Extension FunctionsOctavian Nadolu
The purpose of this presentation is to provide an overview of how you can use AI from XSLT, XQuery, Schematron, or XML Refactoring operations, the potential benefits of using AI, and some of the challenges we face.
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian CompaniesQuickdice ERP
Explore the seamless transition to e-invoicing with this comprehensive guide tailored for Saudi Arabian businesses. Navigate the process effortlessly with step-by-step instructions designed to streamline implementation and enhance efficiency.
When it is all about ERP solutions, companies typically meet their needs with common ERP solutions like SAP, Oracle, and Microsoft Dynamics. These big players have demonstrated that ERP systems can be either simple or highly comprehensive. This remains true today, but there are new factors to consider, including a promising new contender in the market that’s Odoo. This blog compares Odoo ERP with traditional ERP systems and explains why many companies now see Odoo ERP as the best choice.
What are ERP Systems?
An ERP, or Enterprise Resource Planning, system provides your company with valuable information to help you make better decisions and boost your ROI. You should choose an ERP system based on your company’s specific needs. For instance, if you run a manufacturing or retail business, you will need an ERP system that efficiently manages inventory. A consulting firm, on the other hand, would benefit from an ERP system that enhances daily operations. Similarly, eCommerce stores would select an ERP system tailored to their needs.
Because different businesses have different requirements, ERP system functionalities can vary. Among the various ERP systems available, Odoo ERP is considered one of the best in the ERp market with more than 12 million global users today.
Odoo is an open-source ERP system initially designed for small to medium-sized businesses but now suitable for a wide range of companies. Odoo offers a scalable and configurable point-of-sale management solution and allows you to create customised modules for specific industries. Odoo is gaining more popularity because it is built in a way that allows easy customisation, has a user-friendly interface, and is affordable. Here, you will cover the main differences and get to know why Odoo is gaining attention despite the many other ERP systems available in the market.
Mobile App Development Company In Noida | Drona InfotechDrona Infotech
Drona Infotech is a premier mobile app development company in Noida, providing cutting-edge solutions for businesses.
Visit Us For : https://www.dronainfotech.com/mobile-application-development/
SOCRadar's Aviation Industry Q1 Incident Report is out now!
The aviation industry has always been a prime target for cybercriminals due to its critical infrastructure and high stakes. In the first quarter of 2024, the sector faced an alarming surge in cybersecurity threats, revealing its vulnerabilities and the relentless sophistication of cyber attackers.
SOCRadar’s Aviation Industry, Quarterly Incident Report, provides an in-depth analysis of these threats, detected and examined through our extensive monitoring of hacker forums, Telegram channels, and dark web platforms.
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Łukasz Chruściel
No one wants their application to drag like a car stuck in the slow lane! Yet it’s all too common to encounter bumpy, pothole-filled solutions that slow the speed of any application. Symfony apps are not an exception.
In this talk, I will take you for a spin around the performance racetrack. We’ll explore common pitfalls - those hidden potholes on your application that can cause unexpected slowdowns. Learn how to spot these performance bumps early, and more importantly, how to navigate around them to keep your application running at top speed.
We will focus in particular on tuning your engine at the application level, making the right adjustments to ensure that your system responds like a well-oiled, high-performance race car.
8 Best Automated Android App Testing Tool and Framework in 2024.pdfkalichargn70th171
Regarding mobile operating systems, two major players dominate our thoughts: Android and iPhone. With Android leading the market, software development companies are focused on delivering apps compatible with this OS. Ensuring an app's functionality across various Android devices, OS versions, and hardware specifications is critical, making Android app testing essential.
How Can Hiring A Mobile App Development Company Help Your Business Grow?ToXSL Technologies
ToXSL Technologies is an award-winning Mobile App Development Company in Dubai that helps businesses reshape their digital possibilities with custom app services. As a top app development company in Dubai, we offer highly engaging iOS & Android app solutions. https://rb.gy/necdnt
How Can Hiring A Mobile App Development Company Help Your Business Grow?
Two-level Just-in-Time Compilation with One Interpreter and One Engine
1. Two-level Just-in-Time Compilation with One
Interpreter and One Engine
Yusuke Izawa 1
Hidehiko Masuhara 1
Carl Friedrich Bolz-Tereick 2
1Tokyo Institute of Technology
2Heinrich-Heine-Universität Düsseldorf
PEPM 2022
January 18, 2022
Two-level JIT Compilation with .. PEPM 2022 1 / 15
2. Outline
1. Introduction: the folklore in a JIT community and our findings
2. Proposal: Adaptive RPython that performs multitier compilation with “one
interpreter” and “one engine”
3. Observation: to confirm that Adaptive RPython “actually” works
Two-level JIT Compilation with .. PEPM 2022 2 / 15
3. Folklore: A Meta-JIT Compiler Performs a Fixed Kind of
JIT Compilation
• Build an interpreter from scratch for realizing different kinds of JIT
compilers
Interptracing Interpmethod Interpthreaded
Meta-JIT
compiler
Meta-JIT
compiler
Meta-JIT
compiler
Tracing JIT Method JIT
Threaded Code
Gen.[ICOOOLPS 2021]
Two-level JIT Compilation with .. PEPM 2022 3 / 15
4. Folklore: A Meta-JIT Compiler Performs a Fixed Kind of
JIT Compilation
• Build an interpreter from scratch for realizing different kinds of JIT
compilers
Interptracing Interpmethod Interpthreaded
Meta-JIT
compiler
Meta-JIT
compiler
Meta-JIT
compiler
Tracing JIT Method JIT
Threaded Code
Gen.[ICOOOLPS 2021]
consisting CALL insts in bytecode:
removing indirect-branching
Two-level JIT Compilation with .. PEPM 2022 3 / 15
5. Folklore: A Meta-JIT Compiler Performs a Fixed Kind of
JIT Compilation
• Build an interpreter from scratch for realizing different kinds of JIT
compilers
Interptracing Interpmethod Interpthreaded
RPython Truffle/Graal
Meta-JIT
compiler
PyPy TruffleRuby Threaded Code
Gen.[ICOOOLPS 2021]
Two-level JIT Compilation with .. PEPM 2022 3 / 15
6. Our Findings Will Affect JIT Community’s Assumption
JIT community assumes that ..
• Meta-tracing JIT compiler can
only do tracing compilation
RPython[interp, source] = ptracing
Two-level JIT Compilation with .. PEPM 2022 4 / 15
7. Our Findings Will Affect JIT Community’s Assumption
JIT community assumes that ..
• Meta-tracing JIT compiler can
only do tracing compilation
But, with our findings ..
• Let meta-tracing JIT do several
compilations, like
− method compilation
− threaded code compilation
− etc.
RPython[interp, source] = ptracing RPython[interp, source] = pα
RPython[interp, source] = pβ
RPython[interp, source] = pγ
· · ·
Two-level JIT Compilation with .. PEPM 2022 4 / 15
9. Our Findings
By providing different interp definitions to RPython, can derive different kinds
of outputs
E.g. when passes ..
Two-level JIT Compilation with .. PEPM 2022 5 / 15
10. Our Findings
By providing different interp definitions to RPython, can derive different kinds
of outputs
E.g. when passes ..
• interptracing to RPython → tracing compilation
RPython[interptracing, source] = ptracing
Two-level JIT Compilation with .. PEPM 2022 5 / 15
11. Our Findings
By providing different interp definitions to RPython, can derive different kinds
of outputs
E.g. when passes ..
• interptracing to RPython → tracing compilation
• interpmethod to RPython → method compilation
RPython[interptracing, source] = ptracing
RPython[interpmethod, source] = pmethod
Two-level JIT Compilation with .. PEPM 2022 5 / 15
12. Our Findings
By providing different interp definitions to RPython, can derive different kinds
of outputs
E.g. when passes ..
• interptracing to RPython → tracing compilation
• interpmethod to RPython → method compilation
• interpthreaded to RPython → threaded compilation
RPython[interptracing, source] = ptracing
RPython[interpmethod, source] = pmethod
RPython[interpthreaded, source] = pthreaded
Two-level JIT Compilation with .. PEPM 2022 5 / 15
13. Our Findings
By providing different interp definitions to RPython, can derive different kinds
of outputs
E.g. when passes ..
• interptracing to RPython → tracing compilation
• interpmethod to RPython → method compilation
• interpthreaded to RPython → threaded compilation
RPython[interptracing, source] = ptracing
RPython[interpmethod, source] = pmethod
RPython[interpthreaded, source] = pthreaded
In other words ..
By changing an interpreter, we can get different kinds of compilers
Two-level JIT Compilation with .. PEPM 2022 5 / 15
15. Proposal: Multitier Compilation on Adaptive RPython
Adaptive RPython performs multitier compilation with “one interpreter” and
“one engine”
optimization level
threaded code baseline 2 · · ·
tracing
method
tracing + method
With Adaptive RPython ..
one generic interp. → common interp + a bit different definitions
Two-level JIT Compilation with .. PEPM 2022 6 / 15
16. Proposal: Multitier Compilation on Adaptive RPython
Adaptive RPython performs multitier compilation with “one interpreter” and
“one engine”
optimization level
threaded code baseline 2 · · ·
tracing
method
tracing + method
With Adaptive RPython ..
one generic interp. → common interp + a bit different definitions
perform on one engine = RPython
Two-level JIT Compilation with .. PEPM 2022 6 / 15
17. Overview: Adaptive RPython Performs Multitier
Compilation
(1) A developer writes a generic
interp
Generic interp
Adaptive RPython
Pre-processor
Adaptive RPython
P.E. System
Two-level JIT Compilation with .. PEPM 2022 7 / 15
18. Overview: Adaptive RPython Performs Multitier
Compilation
(2) Pass information to the
pre-processor
Generic interp
Adaptive RPython
Pre-processor
Which instruc-
tions will be
transformed?
Adaptive RPython
P.E. System
Two-level JIT Compilation with .. PEPM 2022 7 / 15
19. Overview: Adaptive RPython Performs Multitier
Compilation
(3) Generate interps from generic
interp
Generic interp
Adaptive RPython
Pre-processor
Which instruc-
tions will be
transformed?
Icommon
Itracing Ithreaded Imethod
Adaptive RPython
P.E. System
Two-level JIT Compilation with .. PEPM 2022 7 / 15
20. Overview: Adaptive RPython Performs Multitier
Compilation
(4) Pass information to the offline
P.E.
Generic interp
Adaptive RPython
Pre-processor
Which instruc-
tions will be
transformed?
Icommon
Itracing Ithreaded Imethod
Adaptive RPython
P.E. System
Source program
and info about
static and dy-
namic inputs
Two-level JIT Compilation with .. PEPM 2022 7 / 15
21. Overview: Adaptive RPython Performs Multitier
Compilation
(5) Tracing compilation:
choose Icommon and Itracing
RPython[Icommon + Itracing,
P, V] = P′
tracing
Generic interp
Adaptive RPython
Pre-processor
Which instruc-
tions will be
transformed?
Icommon
Itracing Ithreaded Imethod
Adaptive RPython
P.E. System
Source program
and info about
static and dy-
namic inputs
Ptracing
Two-level JIT Compilation with .. PEPM 2022 7 / 15
22. Overview: Adaptive RPython Performs Multitier
Compilation
(6) Threaded code gen. [Izawa et al., 2021]
:
choose Icommon and Ithreaded
RPython[Icommon + Ithreaded,
P, V] = P′
threaded
Generic interp
Adaptive RPython
Pre-processor
Which instruc-
tions will be
transformed?
Icommon
Itracing Ithreaded Imethod
Adaptive RPython
P.E. System
Source program
and info about
static and dy-
namic inputs
Ptracing Pthreaded
Two-level JIT Compilation with .. PEPM 2022 7 / 15
23. Overview: Adaptive RPython Performs Multitier
Compilation
(7) Method compilation:
choose Icommon and Imethod
RPython[Icommon + Imethod,
P, V] = P′
method
Generic interp
Adaptive RPython
Pre-processor
Which instruc-
tions will be
transformed?
Icommon
Itracing Ithreaded Imethod
Adaptive RPython
P.E. System
Source program
and info about
static and dy-
namic inputs
Ptracing Pthreaded Pmethod
Two-level JIT Compilation with .. PEPM 2022 7 / 15
24. Overview: How to Drive the RPython Engine [ICOOOLPS 2021]
Meta-tracing JIT
• Trace the execution of an interp.
Threaded code generation
A
B
C
D
JUMP
E
F
RET
c
a
l
l
[p0]
i1 = load(..)
i1 = int_add(..)
i2 = int_lt(..)
guard_true(i2) [p0]
..
..
jump(p0)
Two-level JIT Compilation with .. PEPM 2022 8 / 15
25. Overview: How to Drive the RPython Engine [ICOOOLPS 2021]
Meta-tracing JIT
• Trace the execution of an interp
Threaded code generation
Two-level JIT Compilation with .. PEPM 2022 9 / 15
26. Overview: How to Drive the RPython Engine [ICOOOLPS 2021]
Meta-tracing JIT
• Trace the execution of an interp
Threaded code generation
• Traverse the entire method body
• Not trace inside the handlers
Two-level JIT Compilation with .. PEPM 2022 9 / 15
27. Overview: How to Drive the RPython Engine [ICOOOLPS 2021]
Meta-tracing JIT
• Trace the execution of an interp
Threaded code generation
• Traverse the entire method body
• Not trace inside the handlers
A
B
C
D
JUMP
E
F
RET
c
a
l
l
c
a
l
l
Traverse the en-
tire method body Not trace the
inside but
leave CALL to
the handler
Cut/stitch the
temporal trace
[p0]
i7 = call_i(ConstClass(DUP, ..))
i12 = call_i(ConstClass(CONST_I ..))
i16 = call_i(ConstClass(LT, ..))
guard_true(i16) [p0]
...
jump(p0)
[p0]
...
i28 = call_i(ConstClass(CALL, ..))
...
i32 = call_i(ConstClass(RET, ..2))
leave_portal_frame(0)
finish(i32)
bridge
Two-level JIT Compilation with .. PEPM 2022 9 / 15
28. Overview: How to Drive the RPython Engine [ICOOOLPS 2021]
Meta-tracing JIT
• Trace the execution of an interp
Threaded code generation
• Traverse the entire method body
• Not trace inside the handlers
tweaking an interp
A
B
C
D
JUMP
E
F
RET
c
a
l
l
c
a
l
l
Traverse the en-
tire method body Not trace the
inside but
leave CALL to
the handler
Cut/stitch the
temporal trace
[p0]
i7 = call_i(ConstClass(DUP, ..))
i12 = call_i(ConstClass(CONST_I ..))
i16 = call_i(ConstClass(LT, ..))
guard_true(i16) [p0]
...
jump(p0)
[p0]
...
i28 = call_i(ConstClass(CALL, ..))
...
i32 = call_i(ConstClass(RET, ..2))
leave_portal_frame(0)
finish(i32)
bridge
Two-level JIT Compilation with .. PEPM 2022 9 / 15
29. Method-traversal Interpreter: How to Drive the
RPython Engine [ICOOOLPS 2021]
traverse depth-firstly the entire method body w/ traverse_stack
@dont_look_insdie
def ADD():
..
while True:
if opcde == JUMP_IF:
top = pop()
target = bytecode[pc++]
if top.is_true():
traverse_stack.push(pc++)
pc = target
else:
traverse_stack.push(target)
pc++
elif opcode == JUMP:
target = bytecode[pc++]
if not traverse_stack.is_empty():
pc = traverse_stack.pop()
else:
finish()
elif opcode == RET:
if not traverse_stack.is_empty():
pc = traverse_stack.pop()
else:
return pop()
30. Method-traversal Interpreter: How to Drive the
RPython Engine [ICOOOLPS 2021]
traverse depth-firstly the entire method body w/ traverse_stack
@dont_look_insdie
def ADD():
..
while True:
if opcde == JUMP_IF:
top = pop()
target = bytecode[pc++]
if top.is_true():
traverse_stack.push(pc++)
pc = target
else:
traverse_stack.push(target)
pc++
elif opcode == JUMP:
target = bytecode[pc++]
if not traverse_stack.is_empty():
pc = traverse_stack.pop()
else:
finish()
elif opcode == RET:
if not traverse_stack.is_empty():
pc = traverse_stack.pop()
else:
return pop()
Suppress inlining
31. Method-traversal Interpreter: How to Drive the
RPython Engine [ICOOOLPS 2021]
traverse depth-firstly the entire method body w/ traverse_stack
@dont_look_insdie
def ADD():
..
while True:
if opcde == JUMP_IF:
top = pop()
target = bytecode[pc++]
if top.is_true():
traverse_stack.push(pc++)
pc = target
else:
traverse_stack.push(target)
pc++
elif opcode == JUMP:
target = bytecode[pc++]
if not traverse_stack.is_empty():
pc = traverse_stack.pop()
else:
finish()
elif opcode == RET:
if not traverse_stack.is_empty():
pc = traverse_stack.pop()
else:
return pop()
Suppress inlining
Save another side
to traverse later
32. Method-traversal Interpreter: How to Drive the
RPython Engine [ICOOOLPS 2021]
traverse depth-firstly the entire method body w/ traverse_stack
@dont_look_insdie
def ADD():
..
while True:
if opcde == JUMP_IF:
top = pop()
target = bytecode[pc++]
if top.is_true():
traverse_stack.push(pc++)
pc = target
else:
traverse_stack.push(target)
pc++
elif opcode == JUMP:
target = bytecode[pc++]
if not traverse_stack.is_empty():
pc = traverse_stack.pop()
else:
finish()
elif opcode == RET:
if not traverse_stack.is_empty():
pc = traverse_stack.pop()
else:
return pop()
Suppress inlining
Save another side
to traverse later
Jump to another side
Two-level JIT Compilation with .. PEPM 2022 10 / 15
33. The Design of Generic Interpreter
• From a generic interp, Adaptive RPython generates interps including MTI
Embed tier-specific definitions in a meta-tracing-based interpreter
1. Declare JitTierDriver jittierdriver = JitTierDriver(pc='pc')
def interp(self);
..
if opcode == JUMP_IF:
target = bytecode[pc]
elif opcode == JUMP:
target = bytecode[pc]
elif opcode == RET:
w_x = self.pop()
elif ..
Two-level JIT Compilation with .. PEPM 2022 11 / 15
34. The Design of Generic Interpreter
• From a generic interp, Adaptive RPython generates interps including MTI
Embed tier-specific definitions in a meta-tracing-based interpreter
1. Declare JitTierDriver
2. Define can_enter_tier1_XX at
JUMP_IF, JUMP, RET for threaded
code gen. and method comp.
jittierdriver = JitTierDriver(pc='pc')
def interp(self);
..
if opcode == JUMP_IF:
target = bytecode[pc]
jittierdriver.can_enter_tier1_branch(
true_path=target,false_path=pc+1,
cond=self.is_true)
if we_are_in_tier2():
elif opcode == JUMP:
target = bytecode[pc]
jittierdriver.can_enter_tier1_jump(target=target)
elif opcode == RET:
w_x = self.pop()
jittierdriver.can_enter_tier1_ret(ret_value=w_x)
elif ..
Two-level JIT Compilation with .. PEPM 2022 11 / 15
35. The Design of Generic Interpreter
• From a generic interp, Adaptive RPython generates interps including MTI
Embed tier-specific definitions in a meta-tracing-based interpreter
1. Declare JitTierDriver
2. Define can_enter_tier1_XX at
JUMP_IF, JUMP, RET for threaded
code gen. and method comp.
3. Define interp. for tracing JIT inside
we_are_in_tier2
jittierdriver = JitTierDriver(pc='pc')
def interp(self);
..
if opcode == JUMP_IF:
target = bytecode[pc]
jittierdriver.can_enter_tier1_branch(
true_path=target,false_path=pc+1,
cond=self.is_true)
if we_are_in_tier2():
do stuff for tracing JIT
elif opcode == JUMP:
target = bytecode[pc]
jittierdriver.can_enter_tier1_jump(target=target)
if we_are_in_tier2():
do stuff for tracing JIT
elif opcode == RET:
w_x = self.pop()
jittierdriver.can_enter_tier1_ret(ret_value=w_x)
if we_are_in_tier2():
do stuff for tracing JIT
elif ..
Two-level JIT Compilation with .. PEPM 2022 11 / 15
36. The Design of Generic Interpreter
• From a generic interp, Adaptive RPython generates interps including MTI
Embed tier-specific definitions in a meta-tracing-based interpreter
1. Declare JitTierDriver
2. Define can_enter_tier1_XX at
JUMP_IF, JUMP, RET for threaded
code gen. and method comp.
3. Define interp. for tracing JIT inside
we_are_in_tier2
4. The pre-processor generates
method-traversal interp and tracing
interp from this
jittierdriver = JitTierDriver(pc='pc')
def interp(self);
..
if opcode == JUMP_IF:
target = bytecode[pc]
jittierdriver.can_enter_tier1_branch(
true_path=target,false_path=pc+1,
cond=self.is_true)
if we_are_in_tier2():
do stuff for tracing JIT
elif opcode == JUMP:
target = bytecode[pc]
jittierdriver.can_enter_tier1_jump(target=target)
if we_are_in_tier2():
do stuff for tracing JIT
elif opcode == RET:
w_x = self.pop()
jittierdriver.can_enter_tier1_ret(ret_value=w_x)
if we_are_in_tier2():
do stuff for tracing JIT
elif ..
Two-level JIT Compilation with .. PEPM 2022 11 / 15
37. Observation: Can Adaptive RPython “Actually” Work? (1)
Setup
• Write TLA lang. interpreter in Adaptive RPython
• Run TLA interpreter on small examples
− loopabit: nested loop
− callabit: two functions – one is suitable for tracing, the other is for
thraeded code gen. (method)
NOTE
• Current multitier is the combination of threaded code gen. and tracing
(two-level)
Two-level JIT Compilation with .. PEPM 2022 12 / 15
38. Observation: Can Adaptive RPython “Actually” Work? (2)
Situation in callabit: increasing an optimization level
program JIT applied to f JIT applied to g
callabit_baseline_interp threaded (interpreted)
function f
(for threaded
code gen.)
function g
(for trac-
ing comp.)
call
ret
threaded code
generation
interpreted
Two-level JIT Compilation with .. PEPM 2022 13 / 15
39. Observation: Can Adaptive RPython “Actually” Work? (2)
Situation in callabit: increasing an optimization level
program JIT applied to f JIT applied to g
callabit_baseline_interp threaded (interpreted)
callabit_baseline_only threaded threaded
function f
(for threaded
code gen.)
function g
(for trac-
ing comp.)
call
ret
threaded code
generation
threaded code
generation
Two-level JIT Compilation with .. PEPM 2022 13 / 15
40. Observation: Can Adaptive RPython “Actually” Work? (2)
Situation in callabit: increasing an optimization level
program JIT applied to f JIT applied to g
callabit_baseline_interp threaded (interpreted)
callabit_baseline_only threaded threaded
callabit_tracing_baseline tracing baseline
function f
(for threaded
code gen.)
function g
(for trac-
ing comp.)
call
ret
tracing compi-
lation
threaded code
generation
Two-level JIT Compilation with .. PEPM 2022 13 / 15
41. Observation: Can Adaptive RPython “Actually” Work? (2)
Situation in callabit: increasing an optimization level
program JIT applied to f JIT applied to g
callabit_baseline_interp threaded (interpreted)
callabit_baseline_only threaded threaded
callabit_tracing_baseline tracing baseline
callabit_baseline_tracing threaded tracing
function f
(for threaded
code gen.)
function g
(for trac-
ing comp.)
call
ret
threaded code
generation
tracing compi-
lation
Two-level JIT Compilation with .. PEPM 2022 13 / 15
42. Observation: Can Adaptive RPython “Actually” Work? (2)
Situation in callabit: increasing an optimization level
program JIT applied to f JIT applied to g
callabit_baseline_interp threaded (interpreted)
callabit_baseline_only threaded threaded
callabit_tracing_baseline tracing baseline
callabit_baseline_tracing threaded tracing
callabit_tracing_only tracing tracing
function f
(for threaded
code gen.)
function g
(for trac-
ing comp.)
call
ret
tracing compi-
lation
tracing compi-
lation
Two-level JIT Compilation with .. PEPM 2022 13 / 15
43. Observation: Running Speeds and Trace Sizes
• Actually worked: compilation speed is reaching to tracing compilation
c
a
l
l
a
b
i
t
_
b
a
s
e
l
i
n
e
_
i
n
t
e
r
p
c
a
l
l
a
b
i
t
_
b
a
s
e
l
i
n
e
_
o
n
l
y
c
a
l
l
a
b
i
t
_
b
a
s
e
l
i
n
e
_
t
r
a
c
i
n
g
c
a
l
l
a
b
i
t
_
t
r
a
c
i
n
g
_
b
a
s
e
l
i
n
e
c
a
l
l
a
b
i
t
_
t
r
a
c
i
n
g
_
o
n
l
y
0.0
0.5
1.0
1.5
2.0
2.5
3.0
The
speed
up
ratio
(interp
=
1)
TLA w/ Adaptive RPython (Stable speed)
# Traces
0
50
100
150
200
250
300
350
400
callabit_baseline_interp
callabit_baseline_only
callabit_baseline_tracing
callabit_tracing_baseline
callabit_tracing_only
Increasing
better
smaller
Two-level JIT Compilation with .. PEPM 2022 14 / 15
44. Observation: Running Speeds and Trace Sizes
• Actually worked: compilation speed is reaching to tracing compilation
• Promising signs: multitier is same speed but smaller code size compared
to single tier → might get good performance in the future
c
a
l
l
a
b
i
t
_
b
a
s
e
l
i
n
e
_
i
n
t
e
r
p
c
a
l
l
a
b
i
t
_
b
a
s
e
l
i
n
e
_
o
n
l
y
c
a
l
l
a
b
i
t
_
b
a
s
e
l
i
n
e
_
t
r
a
c
i
n
g
c
a
l
l
a
b
i
t
_
t
r
a
c
i
n
g
_
b
a
s
e
l
i
n
e
c
a
l
l
a
b
i
t
_
t
r
a
c
i
n
g
_
o
n
l
y
0.0
0.5
1.0
1.5
2.0
2.5
3.0
The
speed
up
ratio
(interp
=
1)
TLA w/ Adaptive RPython (Stable speed)
# Traces
0
50
100
150
200
250
300
350
400
callabit_baseline_interp
callabit_baseline_only
callabit_baseline_tracing
callabit_tracing_baseline
callabit_tracing_only
better
smaller
Two-level JIT Compilation with .. PEPM 2022 14 / 15
45. Conclusion and Future Work
Conclusion
• Adaptive RPython actually worked on a small
lang.
RPython [I, P, V] = P′
RPython [ Icommon + Itracing, P, V ] = P′
tracing
RPython [ Icommon + Ithreaded, P, V ] = P′
threaded
RPython [ Icommon + Imethod, P, V ] = P′
method
One
Engine
One Interpreter
Multitier
Outputs
Derive from
Generic Interp.
Common
Interp.
Tweaked
Defs.
Future Work
Two-level JIT Compilation with .. PEPM 2022 15 / 15
46. Conclusion and Future Work
Conclusion
• Adaptive RPython actually worked on a small
lang.
RPython [I, P, V] = P′
RPython [ Icommon + Itracing, P, V ] = P′
tracing
RPython [ Icommon + Ithreaded, P, V ] = P′
threaded
RPython [ Icommon + Imethod, P, V ] = P′
method
One
Engine
One Interpreter
Multitier
Outputs
Derive from
Generic Interp.
Common
Interp.
Tweaked
Defs.
Future Work
• Decide multitier compilation
strategy
− How to shift between
levels?
− How to decide an
appropriate level?
• Implement our ideas on PyPy
Two-level JIT Compilation with .. PEPM 2022 15 / 15
47. Conclusion and Future Work
Conclusion
• Adaptive RPython actually worked on a small
lang.
RPython [I, P, V] = P′
RPython [ Icommon + Itracing, P, V ] = P′
tracing
RPython [ Icommon + Ithreaded, P, V ] = P′
threaded
RPython [ Icommon + Imethod, P, V ] = P′
method
One
Engine
One Interpreter
Multitier
Outputs
Derive from
Generic Interp.
Common
Interp.
Tweaked
Defs.
Future Work
• Decide multitier compilation
strategy
− How to shift between
levels?
− How to decide an
appropriate level?
• Implement our ideas on PyPy
Two-level JIT Compilation with .. PEPM 2022 15 / 15
48. References I
Izawa, Y., Masuhara, H., Bolz-Tereick, C. F., and Cong, Y. (2021).
Threaded code generation with a meta-tracing JIT compiler.
The Journal of Object Technology Special Issue for ICOOOLPS 2021, pages 1–11.
Accepted.
Two-level JIT Compilation with .. PEPM 2022 1 / 1