This talk shows how languages can be implemented as self-optimizing interpreters, and how Truffle or RPython go about to just-in-time compile these interpreters to efficient native code.
Programming languages are never perfect, so people start building domain-specific languages to be able to solve their problems more easily. However, custom languages are often slow, or take enormous amounts of effort to be made fast by building custom compilers or virtual machines.
With the notion of self-optimizing interpreters, researchers proposed a way to implement languages easily and generate a JIT compiler from a simple interpreter. We explore the idea and experiment with it on top of RPython (of PyPy fame) with its meta-tracing JIT compiler, as well as Truffle, the JVM framework of Oracle Labs for self-optimizing interpreters.
In this talk, we show how a simple interpreter can reach the same order of magnitude of performance as the highly optimizing JVM for Java. We discuss the implementation on top of RPython as well as on top of Java with Truffle so that you can start right away, independent of whether you prefer the Python or JVM ecosystem.
While our own experiments focus on SOM, a little Smalltalk variant to keep things simple, other people have used this approach to improve peek performance of JRuby, or build languages such as JavaScript, R, and Python 3.
Zero-Overhead Metaprogramming: Reflection and Metaobject Protocols Fast and w...Stefan Marr
Runtime metaprogramming enables many useful applications and is often a convenient solution to solve problems in a generic way, which makes it widely used in frameworks, middleware, and domain-specific languages. However, powerful metaobject protocols are rarely supported and even common concepts such as reflective method invocation or dynamic proxies are not optimized. Solutions proposed in literature either restrict the metaprogramming capabilities or require application or library developers to apply performance improving techniques.
For overhead-free runtime metaprogramming, we demonstrate that dispatch chains, a generalized form of polymorphic inline caches common to self-optimizing interpreters, are a simple optimization at the language-implementation level. Our evaluation with self-optimizing interpreters shows that unrestricted metaobject protocols can be realized for the first time without runtime overhead, and that this optimization is applicable for just-in-time compilation of interpreters based on meta-tracing as well as partial evaluation. In this context, we also demonstrate that optimizing common reflective operations can lead to significant performance improvements for existing applications.
Optimizing Communicating Event-Loop Languages with TruffleStefan Marr
Communicating Event-Loop Languages similar to E and AmbientTalk are recently gaining more traction as a subset of actor languages. With the rise of JavaScript, E’s notion of vats and non-blocking communication based on promises entered the mainstream. For implementations, the combination of dynamic typing, asynchronous message sending, and promise resolution pose new optimization challenges.
This paper discusses these challenges and presents initial experiments for a Newspeak implementation based on the Truffle framework. Our implementation is on average 1.65x slower than Java on a set of 14 benchmarks. Initial optimizations improve the performance of asynchronous messages and reduce the cost of encapsulation on microbenchmarks by about 2x. Parallel actor benchmarks further show that the system scales based on the workload characteristics. Thus, we conclude that Truffle is a promising platform also for communicating event-loop languages.
C++ How I learned to stop worrying and love metaprogrammingcppfrug
Cette présentation parcours quelques applications directes de la méta-programmation en C++(11/14) avec comme objectif de démontrer son utilité dans un cadre applicatif.
Writing concurrent code is becoming more and more important to leverage the parallelism of multicore architectures. The C++11 library introduced futures and promises as a first step towards task-based programming. However, the C++ support of concurrency is still very limited. Other languages, like C# and Python, provide some forms of resumable functions or coroutines and in C#, the async/await pattern enables to write functions that suspend their execution while waiting for a computation or I/O to complete.This talk will describe a proposal for the addition of resumable function and async/await in C++17. We will focus on the implementation of resumable function on Windows, and we'll play with a first prototype of their implementation in the Visual Studio 2015 Preview. Finally, we will see how resumable functions can also be used to implement (lazy) generators, similar to the one provided by "yield" statements in C#.
Gor Nishanov, C++ Coroutines – a negative overhead abstractionSergey Platonov
C++ coroutines are one of the few major features that may land in C++17. We will look at the current standardization status, available experimental implementations and develop a small coroutine adapter over raw C networking APIs that will beat hand-crafted state machine in performance.
Доклад рассказывает об устройстве и опыте применения инструментов динамического тестирования C/C++ программ — AddressSanitizer, ThreadSanitizer и MemorySanitizer. Инструменты находят такие ошибки, как использование памяти после освобождения, обращения за границы массивов и объектов, гонки в многопоточных программах и использования неинициализированной памяти.
Accelerating Habanero-Java Program with OpenCL GenerationAkihiro Hayashi
Accelerating Habanero-Java Program with OpenCL Generation. Akihiro Hayashi, Max Grossman, Jisheng Zhao, Jun Shirako, Vivek Sarkar. 10th International Conference on the Principles and Practice of Programming in Java (PPPJ), September 2013.
Zero-Overhead Metaprogramming: Reflection and Metaobject Protocols Fast and w...Stefan Marr
Runtime metaprogramming enables many useful applications and is often a convenient solution to solve problems in a generic way, which makes it widely used in frameworks, middleware, and domain-specific languages. However, powerful metaobject protocols are rarely supported and even common concepts such as reflective method invocation or dynamic proxies are not optimized. Solutions proposed in literature either restrict the metaprogramming capabilities or require application or library developers to apply performance improving techniques.
For overhead-free runtime metaprogramming, we demonstrate that dispatch chains, a generalized form of polymorphic inline caches common to self-optimizing interpreters, are a simple optimization at the language-implementation level. Our evaluation with self-optimizing interpreters shows that unrestricted metaobject protocols can be realized for the first time without runtime overhead, and that this optimization is applicable for just-in-time compilation of interpreters based on meta-tracing as well as partial evaluation. In this context, we also demonstrate that optimizing common reflective operations can lead to significant performance improvements for existing applications.
Optimizing Communicating Event-Loop Languages with TruffleStefan Marr
Communicating Event-Loop Languages similar to E and AmbientTalk are recently gaining more traction as a subset of actor languages. With the rise of JavaScript, E’s notion of vats and non-blocking communication based on promises entered the mainstream. For implementations, the combination of dynamic typing, asynchronous message sending, and promise resolution pose new optimization challenges.
This paper discusses these challenges and presents initial experiments for a Newspeak implementation based on the Truffle framework. Our implementation is on average 1.65x slower than Java on a set of 14 benchmarks. Initial optimizations improve the performance of asynchronous messages and reduce the cost of encapsulation on microbenchmarks by about 2x. Parallel actor benchmarks further show that the system scales based on the workload characteristics. Thus, we conclude that Truffle is a promising platform also for communicating event-loop languages.
C++ How I learned to stop worrying and love metaprogrammingcppfrug
Cette présentation parcours quelques applications directes de la méta-programmation en C++(11/14) avec comme objectif de démontrer son utilité dans un cadre applicatif.
Writing concurrent code is becoming more and more important to leverage the parallelism of multicore architectures. The C++11 library introduced futures and promises as a first step towards task-based programming. However, the C++ support of concurrency is still very limited. Other languages, like C# and Python, provide some forms of resumable functions or coroutines and in C#, the async/await pattern enables to write functions that suspend their execution while waiting for a computation or I/O to complete.This talk will describe a proposal for the addition of resumable function and async/await in C++17. We will focus on the implementation of resumable function on Windows, and we'll play with a first prototype of their implementation in the Visual Studio 2015 Preview. Finally, we will see how resumable functions can also be used to implement (lazy) generators, similar to the one provided by "yield" statements in C#.
Gor Nishanov, C++ Coroutines – a negative overhead abstractionSergey Platonov
C++ coroutines are one of the few major features that may land in C++17. We will look at the current standardization status, available experimental implementations and develop a small coroutine adapter over raw C networking APIs that will beat hand-crafted state machine in performance.
Доклад рассказывает об устройстве и опыте применения инструментов динамического тестирования C/C++ программ — AddressSanitizer, ThreadSanitizer и MemorySanitizer. Инструменты находят такие ошибки, как использование памяти после освобождения, обращения за границы массивов и объектов, гонки в многопоточных программах и использования неинициализированной памяти.
Accelerating Habanero-Java Program with OpenCL GenerationAkihiro Hayashi
Accelerating Habanero-Java Program with OpenCL Generation. Akihiro Hayashi, Max Grossman, Jisheng Zhao, Jun Shirako, Vivek Sarkar. 10th International Conference on the Principles and Practice of Programming in Java (PPPJ), September 2013.
A three-part presentation on the Swift programming language:
• An introduction to Swift for Objective-C developers
• Changes in Swift 2
• What's coming in Swift 2.2 & 3.0
Arduino C maXbox web of things slide showMax Kleiner
RGB LED in Arduino
• Generating QR Code
• 3D Printing
• Web Video Cam
• Digi Clock with Real Time Clock
• Android SeekBar to Arduino LED Matrix
The second part is about new ideas, prototyping and new technologies that are in the lab. It’s
about research papers, and software philosophy, and about researchers worldwide.
Есть много причин заниматься конверсией управляемых языков в нативные: это прежде всего производительность, но также защита от реверс-инжиниринга, поддержка аппаратных технологий или каких-то специфичных платформ. В этом докладе мы посмотрим на пример построения конвертера из C# в C++ и те нюансы, которые встречаются при решении этой задачи
Tracing versus Partial Evaluation: Which Meta-Compilation Approach is Better ...Stefan Marr
Tracing and partial evaluation have been proposed as meta-compilation techniques for interpreters to make just-in-time compilation language-independent. They promise that programs executing on simple interpreters can reach performance of the same order of magnitude as if they would be executed on state-of-the-art virtual machines with highly optimizing just-in-time compilers built for a specific language. Tracing and partial evaluation approach this meta-compilation from two ends of a spectrum, resulting in different sets of tradeoffs.
This study investigates both approaches in the context of self-optimizing interpreters, a technique for building fast abstract-syntax-tree interpreters. Based on RPython for tracing and Truffle for partial evaluation, we assess the two approaches by comparing the impact of various optimizations on the performance of an interpreter for SOM, an object-oriented dynamically-typed language. The goal is to determine whether either approach yields clear performance or engineering benefits. We find that tracing and partial evaluation both reach roughly the same level of performance. SOM based on meta-tracing is on average 3x slower than Java, while SOM based on partial evaluation is on average 2.3x slower than Java. With respect to the engineering, tracing has however significant benefits, because it requires language implementers to apply fewer optimizations to reach the same level of performance.
A three-part presentation on the Swift programming language:
• An introduction to Swift for Objective-C developers
• Changes in Swift 2
• What's coming in Swift 2.2 & 3.0
Arduino C maXbox web of things slide showMax Kleiner
RGB LED in Arduino
• Generating QR Code
• 3D Printing
• Web Video Cam
• Digi Clock with Real Time Clock
• Android SeekBar to Arduino LED Matrix
The second part is about new ideas, prototyping and new technologies that are in the lab. It’s
about research papers, and software philosophy, and about researchers worldwide.
Есть много причин заниматься конверсией управляемых языков в нативные: это прежде всего производительность, но также защита от реверс-инжиниринга, поддержка аппаратных технологий или каких-то специфичных платформ. В этом докладе мы посмотрим на пример построения конвертера из C# в C++ и те нюансы, которые встречаются при решении этой задачи
Tracing versus Partial Evaluation: Which Meta-Compilation Approach is Better ...Stefan Marr
Tracing and partial evaluation have been proposed as meta-compilation techniques for interpreters to make just-in-time compilation language-independent. They promise that programs executing on simple interpreters can reach performance of the same order of magnitude as if they would be executed on state-of-the-art virtual machines with highly optimizing just-in-time compilers built for a specific language. Tracing and partial evaluation approach this meta-compilation from two ends of a spectrum, resulting in different sets of tradeoffs.
This study investigates both approaches in the context of self-optimizing interpreters, a technique for building fast abstract-syntax-tree interpreters. Based on RPython for tracing and Truffle for partial evaluation, we assess the two approaches by comparing the impact of various optimizations on the performance of an interpreter for SOM, an object-oriented dynamically-typed language. The goal is to determine whether either approach yields clear performance or engineering benefits. We find that tracing and partial evaluation both reach roughly the same level of performance. SOM based on meta-tracing is on average 3x slower than Java, while SOM based on partial evaluation is on average 2.3x slower than Java. With respect to the engineering, tracing has however significant benefits, because it requires language implementers to apply fewer optimizations to reach the same level of performance.
Buying Decision Making Process
Buying roles, Stages of the decision process – High and low effort decisions, Post purchase decisions, Models of consumer behaviour
Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workl...Databricks
Meltdown and Spectre are two security vulnerabilities disclosed in early 2018 that expose systems to cross-VM and cross-process attacks. They were the first of their kind and opened up a new class of exploits that allow one program to scan another program’s memory. The kernel and VM patches released to address these vulnerabilities have shown to degrade the performance of Apache Spark workloads in the cloud by 2-5%.
This talk will dive deep into the exploits and their patches in order to help explain the origin of this decline in performance.
How to write clean & testable code without losing your mindAndreas Czakaj
If you create software that is to be developed continuously over several years you'll need a sustainable approach to code quality.
In our early days of AEM development, however, we used to struggle with code that is rigid, hard to test and full of LOG.debug calls.
In this talk I will share some development best practices we have found that really work in actual AEM based software, e.g. to achieve 100% code coverage and provide high confidence in the code base.
Spoiler alert: no new libraries, frameworks or tools are required - once you know the ideas, plain old TDD and the S.O.L.I.D. principles of Clean Code will do the trick.
by Andreas Czakaj, mensemedia Gesellschaft für Neue Medien mbH
Presented at the adaptTo() 2017 conference in Berlin (https://adapt.to/2017/en/schedule/how-to-write-clean---testable-code-without-losing-your-mind.html).
Presentation video can be found on YouTube (https://www.youtube.com/watch?v=JbJw5oN_zL4)
Design Patterns - Compiler Case Study - Hands-on ExamplesGanesh Samarthyam
This presentation takes a case-study based approach to design patterns. A purposefully simplified example of expression trees is used to explain how different design patterns can be used in practice. Examples are in C#, but is relevant for anyone who is from object oriented background.
Python is a high level language focused on readability. The Python community developed the concept of "Pythonic Code", requiring not only semantic correctness, but also conformity to universally acknowledged stylistic criteria.
A pre-requisite to write pythonic code is to write idiomatic code. Using the right idioms is a matter of acquired taste and experience, however, some idioms are quite easy to learn.
This presentation focuses on some of these idioms and other stylistic criteria:
* for vs. while
* iterators, itertools
* code conventions (space invaders)
* avoid default values bugs
* first order functions
* internal/external iterators
* substituting the switch statement
* properties, attributes, read only objects
* named tuples
* duck typings
* bits of metaprogramming
* exception management: LBYL vs. EAFP
Presentation of NvFX: an effect layer that allows encapsulation of GLSL and/or D3D shading language.
The basic concept follows the footprints of NVIDIA CgFX
https://github.com/tlorach/nvFX
PVS-Studio for Linux Went on a Tour Around DisneyPVS-Studio
Recently we released a Linux version of PVS-Studio analyzer, which we had used before to check a number of open-source projects such as Chromium, GCC, LLVM (Clang), and others. Now this list includes several projects developed by Walt Disney Animation Studios for the community of virtual-reality developers. Let's see what bugs and defects the analyzer found in these projects.
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Iterative Spark Developmen...Data Con LA
This presentation will explore how Bloomberg uses Spark, with its formidable computational model for distributed, high-performance analytics, to take this process to the next level, and look into one of the innovative practices the team is currently developing to increase efficiency: the introduction of a logical signature for datasets.
Kotlin Backend Development 6 Yrs Recap. The Good, the Bad and the UglyHaim Yadid
NEXT Insurance is a US based insurtech startup, revolutionizing the small business insurance industry. NEXT was founded 6 years ago and ever since we have been building our microservices in Kotlin. During this period we grew from a small startup with one backend developer(myself) to a $4B company with 150 backend developers. We have written over 1.2M lines of code in Kotlin and aquired long mileage with this programming language. In this talk I am going to share our experiences, insights and pains.
This talk was given in JFokus 2022
JVM Mechanics: When Does the JVM JIT & Deoptimize?Doug Hawkins
HotSpot promises to do the "right" thing for us by identifying our hot code and compiling "just-in-time", but how does HotSpot make those decisions?
This presentation aims to detail how HotSpot makes those decisions and how it corrects its mistakes through a series of demos that you run yourself.
Metaprogramming, Metaobject Protocols, Gradual Type Checks: Optimizing the "U...Stefan Marr
Metaobject Protocols and Type Checks, do they have much in common?
Perhaps not from a language perspective. However, under the hood of a modern virtual machine, they turn out to show very similar behavior and can be optimized very similarly.
This talk will go back to the days of Terminator 2, The Naked Gun 2 1/2, and Star Trek VI. We will revisit the early days of just-in-time compilation, the basic insights that are still true, and see how to apply them to metaprogramming techniques of different shapes and forms.
Why Is Concurrent Programming Hard? And What Can We Do about It?Stefan Marr
In short, I think, it is hard because on the one hand there is not a single concurrency abstraction that fits all problems, and on the other hand the various different abstractions are rarely designed to be used in combination with each other.
Cloud PARTE: Elastic Complex Event Processing based on Mobile ActorsStefan Marr
Traffic monitoring or crowd management systems produce large amounts of data in the form of events that need to be processed to detect relevant incidents.
Rule-based pattern recognition is a promising approach for these applications, however, increasing amounts of data as well as large and complex rule sets demand for more and more processing power and memory. In order to scale such applications, a rule-based pattern detection system needs to be distributable over multiple machines. Today’s approaches are however focused on static distribution of rules or do not support reasoning over the full set of events.
We propose Cloud PARTE, a complex event detection system that implements the Rete algorithm on top of mobile actors. These actors can migrate between machines to respond to changes in the work load distribution. Cloud PARTE is an extension of PARTE and offers the first rule engine specifically tailored for continuous complex event detection that is able to benefit from elastic systems as provided by cloud computing platforms. It supports fully automatic load balancing and supports online rules with access to the entire event pool.
Supporting Concurrency Abstractions in High-level Language Virtual MachinesStefan Marr
The slides of my PhD defense, more information are available here: http://soft.vub.ac.be/~smarr/2013/01/supporting-concurrency-abstractions-in-high-level-language-virtual-machines/
Identifying A Unifying Mechanism for the Implementation of Concurrency Abstra...Stefan Marr
Supporting all known abstractions for concurrent and parallel programming in a virtual machines (VM) is a futile undertaking, but it is required to give programmers appropriate tools and performance. Instead of supporting all abstractions directly, VMs need a unifying mechanism similar to INVOKEDYNAMIC for JVMs.
Our survey of parallel and concurrent programming concepts identifies concurrency abstractions as the ones benefiting most from support in a VM. Currently, their semantics is often weakened, reducing their engineering benefits. They require a mechanism to define flexible language guarantees.
Based on this survey, we define an ownership-based meta-object protocol as candidate for VM support. We demonstrate its expressiveness by implementing actor semantics, software transactional memory, agents, CSP, and active objects. While the performance of our prototype confirms the need for VM support, it also shows that the chosen mechanism is appropriate to express a wide range of concurrency abstractions in a unified way.
Which Problems Does a Multi-Language Virtual Machine Need to Solve in the Mul...Stefan Marr
While parallel programming for very regular problems has been used in the scientific community by non-computer-scientists successfully for a few decades now, concurrent programming and solving irregular problems remains hard. Furthermore, we shift from few expert system programmers mastering concurrency for a constrained set of problems to mainstream application developers being required to master concurrency for a wide variety of problems.
Consequently, high-level language virtual machine (VM) research faces interesting questions. What are processor design changes that have an impact on the abstractions provided by VMs to provide platform independence? How can application programmers’ diverse needs be facilitated to solve concurrent programming problems?
We argue that VMs will need to be ready for a wide range of different concurrency models that allow solving concurrency problems with appropriate abstractions. Furthermore, they need to abstract from heterogeneous processor architectures, varying performance characteristics, need to account for memory access cost and inter-core communication mechanisms but should only expose the minimal useful set of notions like locality, explicit communication, and adaptable scheduling to maintain their abstracting nature.
Eventually, language designers need to be enabled to guarantee properties like encapsulation, scheduling guarantees, and immutability also when an interaction between different problem-specific concurrency abstractions is required.
Sly and the RoarVM: Exploring the Manycore Future of ProgrammingStefan Marr
The manycore future has several challenges ahead of us that suggest that fundamental assumptions of contemporary programming approaches do not apply anymore when scalability is required.
Sly is a language prototype designed to experiment with the inherently nondeterministic properties of parallel systems. It is designed to enable programmers to embrace nondeterminism instead of guiding them to fight it. Nature shows that complex system can be built from independent entities that achieve a common goal without global synchronization/communication. Sly is design to enable the prototyping of algorithms that show such emerging behavior. It will be introduced in the first part of the talk.
The second part of the talk will focus on the underlying problems of building virtual machines for the manycore future, which allow to harness the available computing power. The RoarVM was design to experiment on the Tilera TILE64 manycore processor architecture which provides 64 cores and characteristics that are distinctly different from today's commodity multicore processors. Memory bandwidth, caches and communication are the biggest challenges on such architectures and this talk will give a brief overview over the design choices of the RoarVM which tackle the characteristics of the TILE64 architecture.
Acknowledgement: Sly and the RoarVM were designed and implemented by David Ungar and Sam Adams at IBM Research.
This slide deck was use for a presentation for AFUP (France) in Paris. It discusses the motivation for traits, the problems of alternative approaches, and the language design of traits for php. A small case study, and a discussion of the implementation details in the Zend Engine are included, too.
The Price of the Free Lunch: Programming in the Multicore EraStefan Marr
The multicore era comes with new challenges for software engineers. The Software Languages Lab and its Parallel Programming Group does research on languages and language runtimes/virtual machines for the new era of multi- and manycore hardware.
This talk gives a brief introduction to the problem and names our approaches to tackle the challenges. However, since this is presented as a Pecha Kucha, the talk is very abstract and focuses on conveying the theme instead of technical details.
Locality and Encapsulation: A Foundation for Concurrency Support in Multi-Lan...Stefan Marr
We propose to search for common abstractions for different concurrency models to enable high-level language virtual machines to support a wide range of different concurrency models. This would enable domain-specific solutions for the concurrency problem. Furthermore, advanced knowledge about concurrency in the VM model will most likely lead to better implementation opportunities on top of the different upcoming many-core architectures. The idea is to investigate the concepts of encapsulation and locality to this end. Thus, we are going to experiment with different language abstractions for concurrency on top of a virtual machine, which supports encapsulation and locality, to see how language designers could benefit, and how virtual machines could optimize programs using these concepts.
Insertion Tree Phasers: Efficient and Scalable Barrier Synchronization for Fi...Stefan Marr
This paper presents an algorithm and a data structure for scalable dynamic synchronization in fine-grained parallelism. The algorithm supports the full generality of phasers with dynamic, two-phase, and point-to-point synchronization. It retains the scalability of classical tree barriers, but provides unbounded dynamicity by employing a tailor-made insertion tree data structure.
It is the first completely documented implementation strategy for a scalable phaser synchronization construct. Our evaluation shows that it can be used as a drop-in replacement for classic barriers without harming performance, despite its additional complexity and potential for performance optimizations. Furthermore, our approach overcomes performance and scalability limitations which have been present in other phaser proposals.
Encapsulation and Locality: A Foundation for Concurrency Support in Multi-Lan...Stefan Marr
The goal of my research is to investigate the notions of encapsulation and locality and evaluate how they could be used inside of a virtual machine to better support different concurrency models on top of it, and retain the potential for optimization on different many-core architectures.
Intermediate Language Design of High-level Language VMs: Towards Comprehensiv...Stefan Marr
Today, high-level language virtual machines (VMs) are becoming successful in providing an execution platform for a wide range of languages. With the transition from few-core to many-core processors, we argue that virtual machines will also have to abstract from concrete concurrency models at the hardware level, to be able to support a wide range of abstract concurrency models on a language level.
As a first step towards this goal, this paper presents a survey on intermediate language design of VMs. The different designs are analyzed and categorized according to used techniques. Special attention is given to the concurrency support at the intermediate language level and the state of the art is presented. The survey is complemented with a brief look at hardware instruction set architectures and their relevant features for concurrency support.
Furthermore, as an outlook, we are speculating on relevant features for comprehensive concurrency support as part of the inter- mediate languages.
Virtual Machine Support for Many-Core Architectures: Decoupling Abstract from...Stefan Marr
The upcoming many-core architectures require software developers to exploit concurrency to utilize available computational power. We argue that today's virtual machines (VMs), which are a cornerstone of software development, do not provide sufficient abstraction for concurrency concepts. To overcome this shortcoming, we propose to integrate concurrency operations into VM instruction sets.
Since there will always be VMs optimized for special purposes, our goal is to develop a methodology to design instruction sets with concurrency support. Therefore, we also propose a list of tradeoffs that have to be investigated to advise the design of such instruction sets. As a first experiment, we implemented one instruction set extension for shared memory and one for non-shared memory concurrency. From our experimental results, we derived a list of requirements for a full-grown experimental environment for further research.
VMADL: An Architecture Definition Language for Variability and Composition ...Stefan Marr
High-level language virtual machines (VMs) can be used on a wide range of devices as a basic part of the deployed software
stack. As the available devices differ to a large degree in their applications and their available resources, distinct implementation
strategies have to be used for certain parts of a VM to meet the special requirements. This paper motivates the need for an architecture
definition language for complex software systems like VM implementations. The basic concepts and language constructs of this
language, which is called VMADL, are introduced. To motivate further discussions, the benefits of this approach are briefly discussed.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
2. Why should you care about how
Programming Languages work?
2
SMBC: http://www.smbc-comics.com/?id=2088
3. 3
SMBC: http://www.smbc-comics.com/?id=2088
Why should you care about how
Programming Languages work?
• Performance isn’t magic
• Domain-specific languages
• More concise
• More productive
• It’s easier than it looks
• Often open source
• Contributions welcome
4. What’s “High-Performance”?
4
Based on latest data from http://benchmarksgame.alioth.debian.org/
Geometric mean over available benchmarks.
Disclaimer: Not indicate for application performance!
Competitively Fast!
0
3
5
8
10
13
15
18
Java V8 C# Dart Python Lua PHP Ruby
5. Small and
Manageable
16
260
525
562
1 10 100 1000
What’s “Low Effort”?
5
KLOC: 1000 Lines of Code, without blank lines and comments
V8 JavaScript
HotSpot
Java Virtual Machine
Dart VM
Lua 5.3 interp.
8. VMs are Highly Complex
8
Interpreter
Input
Output
Compiler Optimizer
Garbage
Collector
CodeGen
Foreign
Function
Interface
Threads
and
Memory
Model
How to reuse most parts
for a new language?
Debugging
Profiling
…
Easily
500 KLOC
9. How to reuse most parts
for a new language?
9
Input
Output
Make Interpreters Replaceable Components!
Interpreter
Compiler Optimizer
Garbage
Collector
CodeGen
Foreign
Function
Interface
Threads
and
Memory
Model
Garbage
Collector
…
Interpreter
Interpreter
…
10. Interpreter-based Approaches
Truffle + Graal
with Partial Evaluation
Oracle Labs
RPython
with Meta-Tracing
[3] Würthinger et al., One VM to Rule Them All, Onward!
2013, ACM, pp. 187-204.
[2] Bolz et al., Tracing the Meta-level: PyPy's Tracing JIT
Compiler, ICOOOLPS Workshop 2009, ACM, pp. 18-25.
11. SELF-OPTIMIZING TREES
A Simple Technique for Language Implementation and Optimization
[1] Würthinger, T.; Wöß, A.; Stadler, L.; Duboscq, G.; Simon, D. & Wimmer, C. (2012), Self-
Optimizing AST Interpreters, in 'Proc. of the 8th Dynamic Languages Symposium' , pp. 73-82.
13. A Simple
Abstract Syntax Tree Interpreter
13
root_node = parse(file)
root_node.execute(Frame())
if (condition) {
cnt := cnt + 1;
} else {
cnt := 0;
}
cnt
1
+
cnt:
=
if
cnt:
=
0
cond
root_node
14. Implementing AST Nodes
14
if (condition) {
cnt := cnt + 1;
} else {
cnt := 0;
}
class Literal(ASTNode):
final value
def execute(frame):
return value
class VarWrite(ASTNode):
child sub_expr
final idx
def execute(frame):
val := sub_expr.execute(frame)
frame.local_obj[idx]:= val
return val
class VarRead(ASTNode):
final idx
def execute(frame):
return frame.local_obj[idx]
cnt
1
+
cnt:
=
if
cnt:
=
0
cond
17. Some Possible Self-Optimizations
• Type profiling and specialization
• Value caching
• Inline caching
• Operation inlining
• Library Lowering
17
18. Library Lowering for Array class
createSomeArray() { return Array.new(1000, ‘fast fast fast’); }
18
class Array {
static new(size, lambda) {
return new(size).setAll(lambda);
}
setAll(lambda) {
forEach((i, v) -> { this[i] = lambda.eval(); });
}
}
class Object {
eval() { return this; }
}
19. Optimizing for Object Values
19
createSomeArray() { return Array.new(1000, ‘fast fast fast’); }
.new
Array
global lookup
method
invocation
1000
int literal
‘fast’
string literal
Object, but not a lambda
Optimization
potential
20. Specialized new(size, lambda)
def UninitArrNew.execute(frame):
size := size_expr.execute(frame)
val := val_expr.execute(frame)
return specialize(size, val).
execute_evaluated(frame, size, val)
20
createSomeArray() { return Array.new(1000, ‘fast fast fast’); }
def UninitArrNew.specialize(size, val):
if val instanceof Lambda:
return replace(StdMethodInvocation())
else:
return replace(ArrNewWithValue())
23. How to Get Fast Program Execution?
23
VarWrite.execute(frame)
IntVarWrite.execute(frame)
VarRead.execute(frame)
Literal.execute(frame)
ArrayNewWithValue.execute(frame)
..VW_execute() # bin
..IVW_execute() # bin
..VR_execute() # bin
..L_execute() # bin
..ANWV_execute() # bin
Standard Compilation: 1 node at a time
Minimal Optimization Potential
24. Problems with Node-by-Node Compilation
24
cnt
1
+
cnt:
=
Slow Polymorphic Dispatches
def IntVarWrite.execute(frame):
try:
val := sub_expr.execute_int(frame)
return execute_eval_int(frame, val)
except ResultExp, e:
return respecialize(e.result).
execute_evaluated(frame, e.result)
cnt:
=
Runtime checks in general
25. Compilation Unit based on User Program
Meta-Tracing Partial Evaluation
Guided By AST
25
cnt
1
+
cnt:
=
if
cnt:
=
0
cnt
1
+
cnt:
=if cnt:
=
0
[3] Würthinger et al., One VM to Rule Them
All, Onward! 2013, ACM, pp. 187-204.
[2] Bolz et al., Tracing the Meta-level: PyPy's
Tracing JIT Compiler, ICOOOLPS Workshop
2009, ACM, pp. 18-25.
28. Meta-Tracing of an Interpreter
28
cnt
1
+cnt:=
if
cnt:= 0
[2] Bolz et al., Tracing the Meta-level: PyPy's Tracing JIT Compiler, ICOOOLPS Workshop
2009, ACM, pp. 18-25.
29. Meta Tracers need to know the Loops
class WhileNode(ASTNode):
child cond_expr
child body_expr
def execute(frame):
while True:
jit_merge_point(node=self)
cond = cond_expr.execute_bool(frame)
if not cond:
break
body_expr.execute(frame)
29
while (cnt < 100) {
cnt := cnt + 1;
}
Trace
guard(cond_expr == Const(IntLessThan))
30. Tracing Records one Concrete Execution
class IntLessThan(ASTNode):
child left_expr
child right_expr
def execute_bool(frame):
try:
left = left_expr.execute_int()
except UnexpectedResult r:
...
try:
right = right_expr.execute_int()
expect UnexpectedResult r:
...
return left < right
30
while (cnt < 100) {
cnt := cnt + 1;
}
Trace
guard(cond_expr == Const(IntLessThan))
guard(left_expr == Const(IntVarRead))
31. Tracing Records one Concrete Execution
class IntVarRead(ASTNode):
final idx
def execute_int(frame):
if frame.is_int(idx):
return frame.local_int[idx]
else:
new_node = respecialize()
raise UnexpectedResult(new_node.execute())
31
while (cnt < 100) {
cnt := cnt + 1;
}
Trace
guard(cond_expr == Const(IntLessThan))
guard(left_expr == Const(IntVarRead))
i1 := left_expr.idx # Const(1)
32. Tracing Records one Concrete Execution
class IntVarRead(ASTNode):
final idx
def execute_int(frame):
if frame.is_int(idx):
return frame.local_int[idx]
else:
new_node = respecialize()
raise UnexpectedResult(new_node.execute())
32
while (cnt < 100) {
cnt := cnt + 1;
}
Trace
guard(cond_expr == Const(IntLessThan))
guard(left_expr == Const(IntVarRead))
i1 := left_expr.idx # Const(1)
a1 := frame.layout
i2 := a1[i1]
guard(i2 == Const(F_INT))
44. Partial Evaluation Guided By AST
44
cnt
1
+cnt:=
if
cnt:= 0
[3] Würthinger et al., One VM to Rule Them All, Onward! 2013, ACM, pp. 187-204.
45. Partial Evaluation inlines
based on Runtime Constants
class WhileNode(ASTNode):
child cond_expr
child body_expr
def execute(frame):
while True:
cond = cond_expr.execute_bool(frame)
if not cond:
break
body_expr.execute(frame)
45
while (cnt < 100) {
cnt := cnt + 1;
}
46. Partial Evaluation inlines
based on Runtime Constants
class WhileNode(ASTNode):
child cond_expr
child body_expr
def execute(frame):
while True:
cond = cond_expr.execute_bool(frame)
if not cond:
break
body_expr.execute(frame)
46
while (cnt < 100) {
cnt := cnt + 1;
}
class IntLessThan(ASTNode):
child left_expr
child right_expr
def execute_bool(frame):
try:
left = left_expr.execute_int()
except UnexpectedResult r:
...
try:
right = right_expr.execute_int()
expect UnexpectedResult r:
...
return left < right
47. Partial Evaluation inlines
based on Runtime Constants
class WhileNode(ASTNode):
child cond_expr
child body_expr
def execute(frame):
while True:
try:
left = cond_expr.left_expr.execute_int()
except UnexpectedResult r:
...
try:
right = cond_expr.right_expr.execute_int()
expect UnexpectedResult r:
...
cond = left < right
if not cond:
break
body_expr.execute(frame)
47
while (cnt < 100) {
cnt := cnt + 1;
}
48. Partial Evaluation inlines
based on Runtime Constants
class WhileNode(ASTNode):
child cond_expr
child body_expr
def execute(frame):
while True:
try:
left = cond_expr.left_expr.execute_int()
except UnexpectedResult r:
...
try:
right = cond_expr.right_expr.execute_int()
expect UnexpectedResult r:
...
cond = left < right
if not cond:
break
body_expr.execute(frame)
while (cnt < 100) {
cnt := cnt + 1;
}
class IntVarRead(ASTNode):
final idx
def execute_int(frame):
if frame.is_int(idx):
return frame.local_int[idx]
else:
new_node = respecialize()
raise UnexpectedResult(new_node.ex
49. Partial Evaluation inlines
based on Runtime Constants
class WhileNode(ASTNode):
child cond_expr
child body_expr
def execute(frame):
while True:
try:
if frame.is_int(1):
left = frame.local_int[1]
else:
new_node = respecialize()
raise UnexpectedResult(new_node.execute())
except UnexpectedResult r:
...
try:
right = cond_expr.right_expr.execute_int()
expect UnexpectedResult r:
...
cond = left < right
if not cond:
break
while (cnt < 100) {
cnt := cnt + 1;
}
50. Optimize Optimistically
class WhileNode(ASTNode):
child cond_expr
child body_expr
def execute(frame):
while True:
try:
if frame.is_int(1):
left = frame.local_int[1]
else:
new_node = respecialize()
raise UnexpectedResult(new_node.execute())
except UnexpectedResult r:
...
try:
right = cond_expr.right_expr.execute_int()
expect UnexpectedResult r:
...
cond = left < right
if not cond:
break
while (cnt < 100) {
cnt := cnt + 1;
}
51. Optimize Optimistically
class WhileNode(ASTNode):
child cond_expr
child body_expr
def execute(frame):
while True:
if frame.is_int(1):
left = frame.local_int[1]
else:
__deopt_return_to_interp()
try:
right = cond_expr.right_expr.execute_int()
expect UnexpectedResult r:
...
cond = left < right
if not cond:
break
body_expr.execute(frame)
while (cnt < 100) {
cnt := cnt + 1;
}
52. Partial Evaluation inlines
based on Runtime Constants
class WhileNode(ASTNode):
child cond_expr
child body_expr
def execute(frame):
while True:
if frame.is_int(1):
left = frame.local_int[1]
else:
__deopt_return_to_interp()
try:
right = cond_expr.right_expr.execute_int()
expect UnexpectedResult r:
...
cond = left < right
if not cond:
break
body_expr.execute(frame)
while (cnt < 100) {
cnt := cnt + 1;
}
class IntLiteral(ASTNode):
final value
def execute_int(frame):
return value
53. Partial Evaluation inlines
based on Runtime Constants
class WhileNode(ASTNode):
child cond_expr
child body_expr
def execute(frame):
while True:
if frame.is_int(1):
left = frame.local_int[1]
else:
__deopt_return_to_interp()
try:
right = 100
expect UnexpectedResult r:
...
cond = left < right
if not cond:
break
body_expr.execute(frame)
while (cnt < 100) {
cnt := cnt + 1;
}
class IntLiteral(ASTNode):
final value
def execute_int(frame):
return value
54. Classic Optimizations:
Dead Code Elimination
class WhileNode(ASTNode):
child cond_expr
child body_expr
def execute(frame):
while True:
if frame.is_int(1):
left = frame.local_int[1]
else:
__deopt_return_to_interp()
try:
right = 100
expect UnexpectedResult r:
...
cond = left < right
if not cond:
break
body_expr.execute(frame)
while (cnt < 100) {
cnt := cnt + 1;
}
class IntLiteral(ASTNode):
final value
def execute_int(frame):
return value
55. Classic Optimizations:
Constant Propagation
class WhileNode(ASTNode):
child cond_expr
child body_expr
def execute(frame):
while True:
if frame.is_int(1):
left = frame.local_int[1]
else:
__deopt_return_to_interp()
right = 100
cond = left < right
if not cond:
break
body_expr.execute(frame)
while (cnt < 100) {
cnt := cnt + 1;
}
class IntLiteral(ASTNode):
final value
def execute_int(frame):
return value
56. Classic Optimizations:
Loop Invariant Code Motion
class WhileNode(ASTNode):
child cond_expr
child body_expr
def execute(frame):
while True:
if frame.is_int(1):
left = frame.local_int[1]
else:
__deopt_return_to_interp()
if not (left < 100):
break
body_expr.execute(frame)
while (cnt < 100) {
cnt := cnt + 1;
}
57. class WhileNode(ASTNode):
child cond_expr
child body_expr
def execute(frame):
if not frame.is_int(1):
__deopt_return_to_interp()
while True:
if not (frame.local_int[1] < 100):
break
body_expr.execute(frame)
while (cnt < 100) {
cnt := cnt + 1;
}
Classic Optimizations:
Loop Invariant Code Motion
58. Compilation Unit based on User Program
Meta-Tracing Partial Evaluation
Guided by AST
58
cnt
1
+
cnt:
=
if
cnt:
=
0
cnt
1
+
cnt:
=if cnt:
=
0
[3] Würthinger et al., One VM to Rule Them
All, Onward! 2013, ACM, pp. 187-204.
[2] Bolz et al., Tracing the Meta-level: PyPy's
Tracing JIT Compiler, ICOOOLPS Workshop
2009, ACM, pp. 18-25.
60. Designed for Teaching:
• Simple
• Conceptual Clarity
• An Interpreter family
– in C, C++, Java, JavaScript,
RPython, Smalltalk
Used in the past by:
http://som-st.github.io
60
65. Simple and Fast Interpreters are Possible!
• Self-optimizing AST interpreters
• RPython or Truffle for JIT Compilation
65
[1] Würthinger et al., Self-Optimizing AST Interpreters, Proc. of the 8th Dynamic Languages Symposium, 2012, pp.
73-82.
[3] Würthinger et al., One VM to Rule Them All, Onward! 2013, ACM, pp. 187-204.
[4] Marr et al., Are We There Yet? Simple Language Implementation Techniques for the 21st Century. IEEE Software
31(5):60—67, 2014
[2] Bolz et al., Tracing the Meta-level: PyPy's Tracing JIT Compiler, ICOOOLPS Workshop 2009, ACM, pp. 18-25.
Literature on the ideas:
66. RPython
• #pypy on irc.freenode.net
• rpython.readthedocs.org
• Kermit Example interpreter
https://bitbucket.org/pypy/example-interpreter
• A Tutorial
http://morepypy.blogspot.be/2011/04/tutorial-
writing-interpreter-with-pypy.html
• Language implementations
https://www.evernote.com/shard/s130/sh/4d42
a591-c540-4516-9911-
c5684334bd45/d391564875442656a514f7ece5
602210
Truffle
• http://mail.openjdk.java.net/
mailman/listinfo/graal-dev
• SimpleLanguage interpreter
https://github.com/OracleLabs/GraalVM/tree/mast
er/graal/com.oracle.truffle.sl/src/com/oracle/truffle
/sl
• A Tutorial
http://cesquivias.github.io/blog/2014/10/13/writin
g-a-language-in-truffle-part-1-a-simple-slow-
interpreter/
• Project
– http://www.ssw.uni-
linz.ac.at/Research/Projects/JVM/Truffle.html
– http://www.oracle.com/technetwork/oracle-
labs/program-languages/overview/index-
2301583.html 66
Big Thank You!
to both communities,
for help, answering questions, debugging support, etc…!!!
Self-opt interpreters good way to communicate to compiler
ASTs
Nodes specialize themselves at runtime
Based on observed types or values
Using speculation
And fallback handling
This communicates essential information to optimizer
It is about how to determine the compilation unit.
Remember, the interpreter is implemented in one language, and the compilation works on the meta-level.
The main idea is that we want to take the implementation, add information from the execution context, and use that to do very aggressive and speculative optimizations on the interpreter implementation.
This avoids the need to write custom JIT compilers.
It is about how to determine the compilation unit.
Remember, the interpreter is implemented in one language, and the compilation works on the meta-level.
The main idea is that we want to take the implementation, add information from the execution context, and use that to do very aggressive and speculative optimizations on the interpreter implementation.
This avoids the need to write custom JIT compilers.
- No control flow
- Just all instructions directly layed out
Ideal to identify data dependencies
Remove redundant operations
Flatten abstraction levels for frameworks, etc
It is about how to determine the compilation unit.
Remember, the interpreter is implemented in one language, and the compilation works on the meta-level.
The main idea is that we want to take the implementation, add information from the execution context, and use that to do very aggressive and speculative optimizations on the interpreter implementation.
This avoids the need to write custom JIT compilers.
It is about how to determine the compilation unit.
Remember, the interpreter is implemented in one language, and the compilation works on the meta-level.
The main idea is that we want to take the implementation, add information from the execution context, and use that to do very aggressive and speculative optimizations on the interpreter implementation.
This avoids the need to write custom JIT compilers.