Most scalability bottlenecks come from code containing locks, producing significant contention under heavy loads. We'll cover striping, copy-on-write, ring buffer, spinning, to reduce this contention or get lock free code & explain concepts like Compare-And-Swap and memory barriers.
Traditionally, computer software has been written for serial computation. To solve a problem, an algorithm is constructed and implemented as a serial stream of instructions. These instructions are executed on a central processing unit on one computer. Only one instruction may execute at a time—after that instruction is finished, the next is executed.
Most scalability bottlenecks come from code containing locks, producing significant contention under heavy loads. We'll cover striping, copy-on-write, ring buffer, spinning, to reduce this contention or get lock free code & explain concepts like Compare-And-Swap and memory barriers.
Traditionally, computer software has been written for serial computation. To solve a problem, an algorithm is constructed and implemented as a serial stream of instructions. These instructions are executed on a central processing unit on one computer. Only one instruction may execute at a time—after that instruction is finished, the next is executed.
Use of conventional sources of energy to generate electricity is
increasing rapidly due to growing energy demands. This is a
major cause of pollution as well and also is an environmental
concern for future. Considering this, there is lot of R&D going on in the field of alternate energy sources with recent advancements in technology. One of the most recent advancement is the perovskite solar technology in the photovoltaics industry. The power conversion efficiency of perovskite solar cells has been improved from 9.7 to 20.1% within 4 years which is the fastest advancement ever in the photovoltaic industry. Such a high photovoltaic performance can be attributed to optically high absorption characteristics of the hybrid lead perovskite materials. In this review, different perovskite materials are breifly discussed along with the fundamental details of the hybrid lead halide perovskite materials. The fabrication techniques, stability, device structure and the chemistry of the perovskite structure are also briefly described aiming for a better understanding of these materials and thus highly efficient perovskite solar cell devices. The main focus of this resarch is to understand possible methods to reduce toxicity due to lead and to improve Perovskite stability.
hey guyz this is the presentation iv made in my last year of engineering and got very nice feedbacks. my topic was oled(organic light emitting diodes).. iv given all its highlited informations with pictures
What Scalable Programs Need from Transactional MemoryDonald Nguyen
Transactional memory (TM) has been the focus of numerous studies, and it is supported in processors such as the IBM Blue Gene/Q and Intel Haswell. Many studies have used the STAMP benchmark suite to evaluate their designs. However, the speedups obtained for the STAMP benchmarks on all TM systems we know of are quite limited; for example, with 64 threads on the IBM Blue Gene/Q, we observe a median speedup of 1.4X using the Blue Gene/Q hardware transac- tional memory (HTM), and a median speedup of 4.1X using a software transactional memory (STM).
What limits the performance of these benchmarks on TMs? In this paper, we argue that the problem lies with the programming model and data structures used to write them. To make this point, we articulate two principles that we believe must be embodied in any scalable program and argue that STAMP programs violate both of them. By modifying the STAMP programs to satisfy both principles, we produce a new set of programs that we call the Stampede suite. Its median speedup on the Blue Gene/Q is 8.0X when using an STM. The two principles also permit us to simplify the TM design. Using this new STM with the Stampede benchmarks, we obtain a median speedup of 17.7X with 64 threads on the Blue Gene/Q and 13.2X with 32 threads on an Intel Westmere system.
These results suggest that HTM and STM designs will benefit if more attention is paid to the division of labor between application programs, systems software, and hardware.
With the ever increasing number of networking protocols, it can be difficult for vendors, integrators, and end-users to determine how well different products and systems perform in real-world networking situations. Each protocol has their own method of defining traffic streams and message structures. Packet analyzers, like Wireshark, have been developed to interpret individual network packets and can perform rudimentary analysis of traffic streams for well-known packet types. Analyzing industrial protocols usually requires much more massaging of the data and in many cases requires a user to do much of the work by hand. This session will present a method to break-down industrial traffic streams into the core components necessary to analyze their performance. By identifying a few key fields in each protocol, a user can define their own method to identify individual traffic streams and analyze their performance.
S4x14 Session: You Name It; We Analyze ItDigital Bond
Jim Gilsinn and Bryan Singer of Kenexis Consulting Corporation had a quick 12-slide/15-minute session on analyzing ICS protocols. Good information on the what and why of pub/sub in these protocols, as well as some protocol plots showing some of the challenges of analyzing these protocols.
As a follow-up of the previous session about TFB, we will discuss what kind of tuning was made to the mORMot library, and its associated TFB sample implementation, to reach the top scores in charts. How can a pure Pascal project reach 7 millions of HTTP requests per seconds? How to scale and measure on high-end hardware? Are ORM frameworks damned to slow down everything? How to circumvent the lack of “async” programming at language level? How realistic is such a benchmark?
"Enterprise Messaging with RabbitMQ" by Luis Majano is a comprehensive presentation that explores the concept of enterprise messaging and demonstrates how RabbitMQ, a widely used message broker, can facilitate effective communication within complex software systems. This presentation aims to provide a deep understanding of messaging patterns, architectures, and the benefits of adopting RabbitMQ in enterprise-level applications.
The presentation begins by introducing the fundamental principles of enterprise messaging and its significance in modern software development. It emphasizes the need for scalable, reliable, and decoupled communication between components and systems to ensure seamless operation and flexibility.
Luis Majano, an expert in enterprise software development, then delves into the key features and capabilities of RabbitMQ. He explains how RabbitMQ enables the implementation of various messaging patterns such as publish/subscribe, request/reply, and message routing. The presentation highlights the versatility of RabbitMQ in handling different messaging scenarios and discusses its integration possibilities with different programming languages and frameworks.
Moreover, the presentation addresses crucial concepts related to RabbitMQ, including exchanges, queues, bindings, and message acknowledgment. It explains how these components work together to enable message routing, filtering, and delivery guarantees in complex distributed systems.
Luis Majano also demonstrates the practical implementation of RabbitMQ through code examples and architectural diagrams. He illustrates how to set up RabbitMQ clusters for high availability and scalability, and how to handle common messaging challenges such as message persistence, fault tolerance, and load balancing.
Throughout the presentation, Luis Majano shares best practices, tips, and strategies for designing and implementing robust messaging solutions using RabbitMQ. He discusses message serialization, error handling, monitoring, and performance optimization techniques to ensure optimal message throughput and system responsiveness.
Use of conventional sources of energy to generate electricity is
increasing rapidly due to growing energy demands. This is a
major cause of pollution as well and also is an environmental
concern for future. Considering this, there is lot of R&D going on in the field of alternate energy sources with recent advancements in technology. One of the most recent advancement is the perovskite solar technology in the photovoltaics industry. The power conversion efficiency of perovskite solar cells has been improved from 9.7 to 20.1% within 4 years which is the fastest advancement ever in the photovoltaic industry. Such a high photovoltaic performance can be attributed to optically high absorption characteristics of the hybrid lead perovskite materials. In this review, different perovskite materials are breifly discussed along with the fundamental details of the hybrid lead halide perovskite materials. The fabrication techniques, stability, device structure and the chemistry of the perovskite structure are also briefly described aiming for a better understanding of these materials and thus highly efficient perovskite solar cell devices. The main focus of this resarch is to understand possible methods to reduce toxicity due to lead and to improve Perovskite stability.
hey guyz this is the presentation iv made in my last year of engineering and got very nice feedbacks. my topic was oled(organic light emitting diodes).. iv given all its highlited informations with pictures
What Scalable Programs Need from Transactional MemoryDonald Nguyen
Transactional memory (TM) has been the focus of numerous studies, and it is supported in processors such as the IBM Blue Gene/Q and Intel Haswell. Many studies have used the STAMP benchmark suite to evaluate their designs. However, the speedups obtained for the STAMP benchmarks on all TM systems we know of are quite limited; for example, with 64 threads on the IBM Blue Gene/Q, we observe a median speedup of 1.4X using the Blue Gene/Q hardware transac- tional memory (HTM), and a median speedup of 4.1X using a software transactional memory (STM).
What limits the performance of these benchmarks on TMs? In this paper, we argue that the problem lies with the programming model and data structures used to write them. To make this point, we articulate two principles that we believe must be embodied in any scalable program and argue that STAMP programs violate both of them. By modifying the STAMP programs to satisfy both principles, we produce a new set of programs that we call the Stampede suite. Its median speedup on the Blue Gene/Q is 8.0X when using an STM. The two principles also permit us to simplify the TM design. Using this new STM with the Stampede benchmarks, we obtain a median speedup of 17.7X with 64 threads on the Blue Gene/Q and 13.2X with 32 threads on an Intel Westmere system.
These results suggest that HTM and STM designs will benefit if more attention is paid to the division of labor between application programs, systems software, and hardware.
With the ever increasing number of networking protocols, it can be difficult for vendors, integrators, and end-users to determine how well different products and systems perform in real-world networking situations. Each protocol has their own method of defining traffic streams and message structures. Packet analyzers, like Wireshark, have been developed to interpret individual network packets and can perform rudimentary analysis of traffic streams for well-known packet types. Analyzing industrial protocols usually requires much more massaging of the data and in many cases requires a user to do much of the work by hand. This session will present a method to break-down industrial traffic streams into the core components necessary to analyze their performance. By identifying a few key fields in each protocol, a user can define their own method to identify individual traffic streams and analyze their performance.
S4x14 Session: You Name It; We Analyze ItDigital Bond
Jim Gilsinn and Bryan Singer of Kenexis Consulting Corporation had a quick 12-slide/15-minute session on analyzing ICS protocols. Good information on the what and why of pub/sub in these protocols, as well as some protocol plots showing some of the challenges of analyzing these protocols.
As a follow-up of the previous session about TFB, we will discuss what kind of tuning was made to the mORMot library, and its associated TFB sample implementation, to reach the top scores in charts. How can a pure Pascal project reach 7 millions of HTTP requests per seconds? How to scale and measure on high-end hardware? Are ORM frameworks damned to slow down everything? How to circumvent the lack of “async” programming at language level? How realistic is such a benchmark?
"Enterprise Messaging with RabbitMQ" by Luis Majano is a comprehensive presentation that explores the concept of enterprise messaging and demonstrates how RabbitMQ, a widely used message broker, can facilitate effective communication within complex software systems. This presentation aims to provide a deep understanding of messaging patterns, architectures, and the benefits of adopting RabbitMQ in enterprise-level applications.
The presentation begins by introducing the fundamental principles of enterprise messaging and its significance in modern software development. It emphasizes the need for scalable, reliable, and decoupled communication between components and systems to ensure seamless operation and flexibility.
Luis Majano, an expert in enterprise software development, then delves into the key features and capabilities of RabbitMQ. He explains how RabbitMQ enables the implementation of various messaging patterns such as publish/subscribe, request/reply, and message routing. The presentation highlights the versatility of RabbitMQ in handling different messaging scenarios and discusses its integration possibilities with different programming languages and frameworks.
Moreover, the presentation addresses crucial concepts related to RabbitMQ, including exchanges, queues, bindings, and message acknowledgment. It explains how these components work together to enable message routing, filtering, and delivery guarantees in complex distributed systems.
Luis Majano also demonstrates the practical implementation of RabbitMQ through code examples and architectural diagrams. He illustrates how to set up RabbitMQ clusters for high availability and scalability, and how to handle common messaging challenges such as message persistence, fault tolerance, and load balancing.
Throughout the presentation, Luis Majano shares best practices, tips, and strategies for designing and implementing robust messaging solutions using RabbitMQ. He discusses message serialization, error handling, monitoring, and performance optimization techniques to ensure optimal message throughput and system responsiveness.
Tracing versus Partial Evaluation: Which Meta-Compilation Approach is Better ...Stefan Marr
Tracing and partial evaluation have been proposed as meta-compilation techniques for interpreters to make just-in-time compilation language-independent. They promise that programs executing on simple interpreters can reach performance of the same order of magnitude as if they would be executed on state-of-the-art virtual machines with highly optimizing just-in-time compilers built for a specific language. Tracing and partial evaluation approach this meta-compilation from two ends of a spectrum, resulting in different sets of tradeoffs.
This study investigates both approaches in the context of self-optimizing interpreters, a technique for building fast abstract-syntax-tree interpreters. Based on RPython for tracing and Truffle for partial evaluation, we assess the two approaches by comparing the impact of various optimizations on the performance of an interpreter for SOM, an object-oriented dynamically-typed language. The goal is to determine whether either approach yields clear performance or engineering benefits. We find that tracing and partial evaluation both reach roughly the same level of performance. SOM based on meta-tracing is on average 3x slower than Java, while SOM based on partial evaluation is on average 2.3x slower than Java. With respect to the engineering, tracing has however significant benefits, because it requires language implementers to apply fewer optimizations to reach the same level of performance.
Into the Box Keynote Day 2: Unveiling amazing updates and announcements for modern CFML developers! Get ready for exciting releases and updates on Ortus tools and products. Stay tuned for cutting-edge innovations designed to boost your productivity.
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Shahin Sheidaei
Games are powerful teaching tools, fostering hands-on engagement and fun. But they require careful consideration to succeed. Join me to explore factors in running and selecting games, ensuring they serve as effective teaching tools. Learn to maintain focus on learning objectives while playing, and how to measure the ROI of gaming in education. Discover strategies for pitching gaming to leadership. This session offers insights, tips, and examples for coaches, team leads, and enterprise leaders seeking to teach from simple to complex concepts.
Enhancing Research Orchestration Capabilities at ORNL.pdfGlobus
Cross-facility research orchestration comes with ever-changing constraints regarding the availability and suitability of various compute and data resources. In short, a flexible data and processing fabric is needed to enable the dynamic redirection of data and compute tasks throughout the lifecycle of an experiment. In this talk, we illustrate how we easily leveraged Globus services to instrument the ACE research testbed at the Oak Ridge Leadership Computing Facility with flexible data and task orchestration capabilities.
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...Juraj Vysvader
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I didn't get rich from it but it did have 63K downloads (powered possible tens of thousands of websites).
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTier1 app
Even though at surface level ‘java.lang.OutOfMemoryError’ appears as one single error; underlyingly there are 9 types of OutOfMemoryError. Each type of OutOfMemoryError has different causes, diagnosis approaches and solutions. This session equips you with the knowledge, tools, and techniques needed to troubleshoot and conquer OutOfMemoryError in all its forms, ensuring smoother, more efficient Java applications.
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Globus
The Earth System Grid Federation (ESGF) is a global network of data servers that archives and distributes the planet’s largest collection of Earth system model output for thousands of climate and environmental scientists worldwide. Many of these petabyte-scale data archives are located in proximity to large high-performance computing (HPC) or cloud computing resources, but the primary workflow for data users consists of transferring data, and applying computations on a different system. As a part of the ESGF 2.0 US project (funded by the United States Department of Energy Office of Science), we developed pre-defined data workflows, which can be run on-demand, capable of applying many data reduction and data analysis to the large ESGF data archives, transferring only the resultant analysis (ex. visualizations, smaller data files). In this talk, we will showcase a few of these workflows, highlighting how Globus Flows can be used for petabyte-scale climate analysis.
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdfJay Das
With the advent of artificial intelligence or AI tools, project management processes are undergoing a transformative shift. By using tools like ChatGPT, and Bard organizations can empower their leaders and managers to plan, execute, and monitor projects more effectively.
First Steps with Globus Compute Multi-User EndpointsGlobus
In this presentation we will share our experiences around getting started with the Globus Compute multi-user endpoint. Working with the Pharmacology group at the University of Auckland, we have previously written an application using Globus Compute that can offload computationally expensive steps in the researcher's workflows, which they wish to manage from their familiar Windows environments, onto the NeSI (New Zealand eScience Infrastructure) cluster. Some of the challenges we have encountered were that each researcher had to set up and manage their own single-user globus compute endpoint and that the workloads had varying resource requirements (CPUs, memory and wall time) between different runs. We hope that the multi-user endpoint will help to address these challenges and share an update on our progress here.
Listen to the keynote address and hear about the latest developments from Rachana Ananthakrishnan and Ian Foster who review the updates to the Globus Platform and Service, and the relevance of Globus to the scientific community as an automation platform to accelerate scientific discovery.
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Globus
The U.S. Geological Survey (USGS) has made substantial investments in meeting evolving scientific, technical, and policy driven demands on storing, managing, and delivering data. As these demands continue to grow in complexity and scale, the USGS must continue to explore innovative solutions to improve its management, curation, sharing, delivering, and preservation approaches for large-scale research data. Supporting these needs, the USGS has partnered with the University of Chicago-Globus to research and develop advanced repository components and workflows leveraging its current investment in Globus. The primary outcome of this partnership includes the development of a prototype enterprise repository, driven by USGS Data Release requirements, through exploration and implementation of the entire suite of the Globus platform offerings, including Globus Flow, Globus Auth, Globus Transfer, and Globus Search. This presentation will provide insights into this research partnership, introduce the unique requirements and challenges being addressed and provide relevant project progress.
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Globus
Large Language Models (LLMs) are currently the center of attention in the tech world, particularly for their potential to advance research. In this presentation, we'll explore a straightforward and effective method for quickly initiating inference runs on supercomputers using the vLLM tool with Globus Compute, specifically on the Polaris system at ALCF. We'll begin by briefly discussing the popularity and applications of LLMs in various fields. Following this, we will introduce the vLLM tool, and explain how it integrates with Globus Compute to efficiently manage LLM operations on Polaris. Attendees will learn the practical aspects of setting up and remotely triggering LLMs from local machines, focusing on ease of use and efficiency. This talk is ideal for researchers and practitioners looking to leverage the power of LLMs in their work, offering a clear guide to harnessing supercomputing resources for quick and effective LLM inference.
Large Language Models and the End of ProgrammingMatt Welsh
Talk by Matt Welsh at Craft Conference 2024 on the impact that Large Language Models will have on the future of software development. In this talk, I discuss the ways in which LLMs will impact the software industry, from replacing human software developers with AI, to replacing conventional software with models that perform reasoning, computation, and problem-solving.
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Anthony Dahanne
Les Buildpacks existent depuis plus de 10 ans ! D’abord, ils étaient utilisés pour détecter et construire une application avant de la déployer sur certains PaaS. Ensuite, nous avons pu créer des images Docker (OCI) avec leur dernière génération, les Cloud Native Buildpacks (CNCF en incubation). Sont-ils une bonne alternative au Dockerfile ? Que sont les buildpacks Paketo ? Quelles communautés les soutiennent et comment ?
Venez le découvrir lors de cette session ignite
May Marketo Masterclass, London MUG May 22 2024.pdfAdele Miller
Can't make Adobe Summit in Vegas? No sweat because the EMEA Marketo Engage Champions are coming to London to share their Summit sessions, insights and more!
This is a MUG with a twist you don't want to miss.
Unleash Unlimited Potential with One-Time Purchase
BoxLang is more than just a language; it's a community. By choosing a Visionary License, you're not just investing in your success, you're actively contributing to the ongoing development and support of BoxLang.
1. Hybrid STM/HTM for Nested
Transactions on OpenJDK
Keith Chapman
Purdue U
Tony Hosking
ANU/Data61, Purdue U
Eliot Moss
UMass
2. Motivation
STM has been around for ages
• But STM is slow
Commodity hardware for transactions available now
• But HTM approaches are only best effort
Out goal: Accelerate STM with HTM when possible
3. Nested Transactions
Allow composition of transactions
Two flavors
• Closed Nested Transactions
• HTM inside STM not useful; must do all the same work
• Open Nested Transactions
• HTM inside STM avoids most TM work – HTM well-suited!
4. XJ: Transactional Java
XJ Language
• Supports flat and closed/open/boosted nested transactions
The implementation supports hybrid transactions
• HTM support via Intel TSX
• HTM and STM can co-exist
Uses an Optimistic-reads / Pessimistic-writes protocol
22. Hybrid Transaction Protocol
STM – HTM conflicts detected by lock word accesses
• Explicit XABORT if locked by another transaction
• HTM reads – Read the metadata word
• STM writes modify the metadata word
• Causes HTM to abort
• HTM writes – Increment version number
• Causes STM read invalidation / HTM abort
24. Abstract Locks for STM
Open / Boosted Atomic Method
Body
Acquire abstract locks
Release abstract locks
If top level transaction
Open / Boosted Atomic Method
Body
Acquire abstract locks
Log abstract locks
If nested transaction
Log undo operations
25. Abstract Locks for Hybrid TM
Open / Boosted Atomic Method
Body
Validate abstract locks
Open / Boosted Atomic Method
Body
Acquire abstract locks
Log abstract locks
If top level is HTM
Log undo operations
26. Why Validation Works
• HTM vs STM
• If abstract locks conflict they must touch some same
physical words in the abstract locking data structure
— otherwise they could not detect the conflict
Open / Boosted Atomic Method
Body
Validate abstract locks
27. Why Validation Works
• HTM vs STM
• If abstract locks conflict they must touch some same
physical words in the abstract locking data structure
— otherwise they could not detect the conflict
• HTM vs HTM
• No conflict in the locking data structure because all
accesses to it are reads
• Any real conflicts that exist will occur on the actual
data structure
Open / Boosted Atomic Method
Body
Validate abstract locks
28. STM and HTM Methods
STM needs logging HTM doesn't
Different actions during read/write
Different actions for abstract locks
HTM should fall back to STM
29. STM and HTM Methods
STM needs logging HTM doesn't
Different actions during read/write
Different actions for abstract locks
HTM should fall back to STM
Maintain separate HTM and STM versions of methods
42. Hybrid TM Method Variants
Router
Original
(Non-txnal)Can call
A B
43. Hybrid TM Method Variants
Router
Original
(Non-txnal)Can call
A B
44. Hybrid TM Method Variants
Top-level Txn
STM
Router
Original
(Non-txnal)Can call
A B
45. Hybrid TM Method Variants
Top-level Txn
HTM
Top-level Txn
STM
Router
Original
(Non-txnal)Can call
A B
46. Hybrid TM Method Variants
HTM
Transactionalized
Top-level Txn
HTM
Top-level Txn
STM
Router
Original
(Non-txnal)Can call
A B
47. Hybrid TM Method Variants
HTM
Transactionalized
Top-level Txn
HTM
Top-level Txn
STM
Router
Original
(Non-txnal)Can call
A B
48. Hybrid TM Method Variants
Optimized
New nested Txn
HTM
HTM
Transactionalized
Top-level Txn
HTM
Top-level Txn
STM
Router
Original
(Non-txnal)Can call
A B
49. Hybrid TM Method Variants
Optimized
New nested Txn
HTM
HTM
Transactionalized
Top-level Txn
HTM
Top-level Txn
STM
Router
Original
(Non-txnal)Can call
A B
50. Hybrid TM Method Variants
Optimized
New nested Txn
HTM
HTM
Transactionalized
Top-level Txn
HTM
Top-level Txn
STM
Router
Original
(Non-txnal)Can call
A B
51. Hybrid TM Method Variants
Optimized
New nested Txn
HTM
HTM
Transactionalized
Top-level Txn
HTM
STM
Transactionalized
Top-level Txn
STM
Router
Original
(Non-txnal)Can call
A B
52. Hybrid TM Method Variants
Optimized
New nested Txn
HTM
HTM
Transactionalized
Top-level Txn
HTM
STM
Transactionalized
Top-level Txn
STM
Router
Original
(Non-txnal)Can call
A B
53. Hybrid TM Method Variants
Optimized
New nested Txn
HTM
HTM
Transactionalized
Top-level Txn
HTM
STM
Transactionalized
Top-level Txn
STM
New nested
Txn STM
Router
Original
(Non-txnal)Can call
A B
Parent is closed
A B
54. Hybrid TM Method Variants
Optimized
New nested Txn
HTM
HTM
Transactionalized
Top-level Txn
HTM
STM
Transactionalized
Top-level Txn
STM
New nested
Txn STM
Router
Original
(Non-txnal)Can call
A B
Parent is closed
A B
55. Hybrid TM Method Variants
Optimized
New nested Txn
HTM
HTM
Transactionalized
Top-level Txn
HTM
STM
Transactionalized
Top-level Txn
STM
New nested
Txn STM
Router
Original
(Non-txnal)Can call
A B
Parent is closed
A B
56. Hybrid TM Method Variants
Optimized
New nested Txn
HTM
HTM
Transactionalized
Top-level Txn
HTM
STM
Transactionalized
Top-level Txn
STM
New nested
Txn STM
Router Nested Router
Original
(Non-txnal)Can call
A B
Parent is closed
A B
Parent is open
A B
57. Hybrid TM Method Variants
Optimized
New nested Txn
HTM
HTM
Transactionalized
Top-level Txn
HTM
STM
Transactionalized
Top-level Txn
STM
New nested
Txn STM
Router Nested Router
Original
(Non-txnal)Can call
A B
Parent is closed
A B
Parent is open
A B
58. Hybrid TM Method Variants
Optimized
New nested Txn
HTM
HTM
Transactionalized
Top-level Txn
HTM
New nested
Txn HTM
STM
Transactionalized
Top-level Txn
STM
New nested
Txn STM
Router Nested Router
Original
(Non-txnal)Can call
A B
Parent is closed
A B
Parent is open
A B
59. Hybrid TM Method Variants
Optimized
New nested Txn
HTM
HTM
Transactionalized
Top-level Txn
HTM
New nested
Txn HTM
STM
Transactionalized
Top-level Txn
STM
New nested
Txn STM
Router Nested Router
Original
(Non-txnal)Can call
A B
Parent is closed
A B
Parent is open
A B
60. Hybrid TM Method Variants
Optimized
New nested Txn
HTM
HTM
Transactionalized
Top-level Txn
HTM
New nested
Txn HTM
STM
Transactionalized
Top-level Txn
STM
New nested
Txn STM
Router Nested Router
Original
(Non-txnal)Can call
A B
Parent is closed
A B
Parent is open
A B
64. XJ System Architecture
XJ source code XJ Compiler
standard Java
bytecode XJ Rewriter
bytecode
+ run-time calls
compile load
65. XJ System Architecture
XJ source code
XJ run-time
library
XJ Compiler
standard Java
bytecode XJ Rewriter
bytecode
+ run-time calls
HTM-enabled
JVM
compile load run
66. XJ System Architecture
HTM 4-5 times faster than STM
XJ source code
XJ run-time
library
XJ Compiler
standard Java
bytecode XJ Rewriter
bytecode
+ run-time calls
HTM-enabled
JVM
compile load run
67. VM Modifications
Kept to a minimum
Modifications done on OpenJDK:
• Native methods to begin, end, and abort a HW transaction
• Made them intrinsic to the HotSpot C1/C2 compilers
Had to go through several hoops to get HTM to work with HotSpot’s
optimizing compilers
69. Synchrobench
Micro-benchmarks to evaluate synchronization performance on various
data structures
Added ability to run multiple operations within a single transaction
(group size)
Included XJ versions of the benchmarks
• TransactionalFriendlyTreeSet
48-way, Intel Xeon E5-2690 v3 machine with 2 sockets of 12
hyperthreaded cores
77. Conclusions
STM and HTM can co-exist for nested transactions in Java
• Closed nesting — Similar to previous schemes
• Open nesting — Novel validation mechanism
• Implemented in OpenJDK on Intel TSX — Artifact evaluated
78. Conclusions
STM and HTM can co-exist for nested transactions in Java
• Closed nesting — Similar to previous schemes
• Open nesting — Novel validation mechanism
• Implemented in OpenJDK on Intel TSX — Artifact evaluated
79. Conclusions
STM and HTM can co-exist for nested transactions in Java
• Closed nesting — Similar to previous schemes
• Open nesting — Novel validation mechanism
• Implemented in OpenJDK on Intel TSX — Artifact evaluated
When it works, HTM is ~4-5× faster than STM
80. Conclusions
STM and HTM can co-exist for nested transactions in Java
• Closed nesting — Similar to previous schemes
• Open nesting — Novel validation mechanism
• Implemented in OpenJDK on Intel TSX — Artifact evaluated
When it works, HTM is ~4-5× faster than STM
Open nesting increases the envelope of effectiveness for HTM
81. Conclusions
STM and HTM can co-exist for nested transactions in Java
• Closed nesting — Similar to previous schemes
• Open nesting — Novel validation mechanism
• Implemented in OpenJDK on Intel TSX — Artifact evaluated
When it works, HTM is ~4-5× faster than STM
Open nesting increases the envelope of effectiveness for HTM
Production VM would need deeper modification