Youtube version: https://www.youtube.com/watch?v=-A6mcD4FbKA
Software transactional memory (STM) enhances both ease-of-use and concurrency, and is considered state-of-the-art for parallel applications to scale on modern multicore hardware. However, there are certain situations where STM performs even worse than traditional locks. Upon hotspots where most threads contend over a few pieces of shared data, going transactional will result in excessive conflicts and aborts that adversely degrade performance. We present a new design of adaptive thread scheduler that manages concurrency when the system is about entering and leaving hotspots. The scheduler controls the number of threads spawning new transactions according to the live commit throughput. We implemented two feedback-control policies called Throttle and Probe to realize this adaptive scheduling. Performance evaluation with the STAMP benchmarks shows that enabling Throttle and Probe obtain best-case speedups of 87.5% and 108.7% respectively.
TrC-MC: Decentralized Software Transactional Memory for Multi-Multicore Compu...Kinson Chan
Youtube version: https://www.youtube.com/watch?v=CldxhRFTdqw
To achieve single-lock atomicity in software transactional memory systems, the commit procedure often goes through a common clock variable. When there are frequent transactional commits, clock sharing becomes inefficient. Tremendous cache contention takes place between the processors and the computing throughput no longer scales with processor count. Therefore, traditional transactional memories are unable to accelerate applications with frequent commits regardless of thread count. While systems with decentralized data structures have better performance on these applications, we argue they are incomplete as they create much more aborts than traditional transactional systems. In this paper we apply two design changes, namely zone partitioning and timestamp extension, to optimize an existing decentralized algorithm. We prove the correctness and evaluate some benchmark programs with frequent transactional commits. We find it as much as several times faster than the state-of-the-art software transactional memory system. We have also reduced the abort rate of the system to an acceptable level.
University of Virginia
cs4414: Operating Systems
http://rust-class.org
For embedded notes, see:
http://rust-class.org/class-22-microkernels-and-beyond.html
Learn about Tensorflow for Deep Learning now! Part 1Tyrone Systems
In this comprehensive workshop, learn how to use TensorFlow, how to build data pipelines and implement a simple deep learning model using Tensorflow Keras. Enhance your knowledge and skills by have better understanding of Tensorflow with all the resources we have available for you!
Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2Tyrone Systems
For all that we're unable to attend or would like to recap our live webinar Deep Learning for Tensorflow Series part 2, we have all the information for you so would not miss out!
Optimizing Parallel Reduction in CUDA : NOTESSubhajit Sahu
Highlighted notes on Optimizing Parallel Reduction in CUDA
While doing research work under Prof. Dip Banerjee, Prof. Kishore Kothapalli.
Interesting optimizations, i should try these soon as PageRank is basically lots of sums.
Java and the machine - Martijn Verburg and Kirk PepperdineJAX London
In Terminator 3 - Rise of the Machines, bare metal comes back to haunt humanity, ruthlessly crushing all resistance. This keynote is here to warn you that the same thing is happening to Java and the JVM! Java was designed in a world where there were a wide range of hardware platforms to support. Its premise of Write Once Run Anywhere (WORA) proved to be one of the compelling reasons behind Java's dominance (even if the reality didn't quite meet the marketing hype). However, this WORA property means that Java and the JVM struggled to utilise specialist hardware and operating system features that could make a massive difference in the performance of your application. This problem has recently gotten much, much worse. Due to the rise of multi-core processors, massive increases in main memory and enhancements to other major hardware components (e.g. SSD), the JVM is now distant from utilising that hardware, causing some major performance and scalability issues! Kirk Pepperdine and Martijn Verburg will take you through the complexities of where Java meets the machine and loses. They'll give up some of their hard-won insights on how to work around these issues so that you can plan to avoid termination, unlike some of the poor souls that ran into the T-800...
TrC-MC: Decentralized Software Transactional Memory for Multi-Multicore Compu...Kinson Chan
Youtube version: https://www.youtube.com/watch?v=CldxhRFTdqw
To achieve single-lock atomicity in software transactional memory systems, the commit procedure often goes through a common clock variable. When there are frequent transactional commits, clock sharing becomes inefficient. Tremendous cache contention takes place between the processors and the computing throughput no longer scales with processor count. Therefore, traditional transactional memories are unable to accelerate applications with frequent commits regardless of thread count. While systems with decentralized data structures have better performance on these applications, we argue they are incomplete as they create much more aborts than traditional transactional systems. In this paper we apply two design changes, namely zone partitioning and timestamp extension, to optimize an existing decentralized algorithm. We prove the correctness and evaluate some benchmark programs with frequent transactional commits. We find it as much as several times faster than the state-of-the-art software transactional memory system. We have also reduced the abort rate of the system to an acceptable level.
University of Virginia
cs4414: Operating Systems
http://rust-class.org
For embedded notes, see:
http://rust-class.org/class-22-microkernels-and-beyond.html
Learn about Tensorflow for Deep Learning now! Part 1Tyrone Systems
In this comprehensive workshop, learn how to use TensorFlow, how to build data pipelines and implement a simple deep learning model using Tensorflow Keras. Enhance your knowledge and skills by have better understanding of Tensorflow with all the resources we have available for you!
Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2Tyrone Systems
For all that we're unable to attend or would like to recap our live webinar Deep Learning for Tensorflow Series part 2, we have all the information for you so would not miss out!
Optimizing Parallel Reduction in CUDA : NOTESSubhajit Sahu
Highlighted notes on Optimizing Parallel Reduction in CUDA
While doing research work under Prof. Dip Banerjee, Prof. Kishore Kothapalli.
Interesting optimizations, i should try these soon as PageRank is basically lots of sums.
Java and the machine - Martijn Verburg and Kirk PepperdineJAX London
In Terminator 3 - Rise of the Machines, bare metal comes back to haunt humanity, ruthlessly crushing all resistance. This keynote is here to warn you that the same thing is happening to Java and the JVM! Java was designed in a world where there were a wide range of hardware platforms to support. Its premise of Write Once Run Anywhere (WORA) proved to be one of the compelling reasons behind Java's dominance (even if the reality didn't quite meet the marketing hype). However, this WORA property means that Java and the JVM struggled to utilise specialist hardware and operating system features that could make a massive difference in the performance of your application. This problem has recently gotten much, much worse. Due to the rise of multi-core processors, massive increases in main memory and enhancements to other major hardware components (e.g. SSD), the JVM is now distant from utilising that hardware, causing some major performance and scalability issues! Kirk Pepperdine and Martijn Verburg will take you through the complexities of where Java meets the machine and loses. They'll give up some of their hard-won insights on how to work around these issues so that you can plan to avoid termination, unlike some of the poor souls that ran into the T-800...
Если нашлась одна ошибка — есть и другие. Один способ выявить «наследуемые» у...Positive Hack Days
Ведущий: Асука Накадзима (Asuka Nakajima)
Практика повторного использования исходного кода позволяет сократить расходы на разработку программного обеспечения. Тем не менее, если в оригинальном исходном коде кроется уязвимость, она будет перенесена и в новое приложение. Докладчик расскажет о необычном способе обнаружения «наследуемых» уязвимостей в бинарных файлах без необходимости обращаться к исходному коду или символьным файлам.
Lightning talk showing various aspectos of software system performance. It goes through: latency, data structures, garbage collection, troubleshooting method like workload saturation method, quick diagnostic tools, famegraph and perfview
GPU programing
The Brick Wall -- UC Berkeley's View
Power Wall: power expensive, transistors free
Memory Wall: Memory slow, multiplies fast ILP Wall: diminishing returns on more ILP HW
invited netflix talk: JVM issues in the age of scale! We take an under the hood look at java locking, memory model, overheads, serialization, uuid, gc tuning, CMS, ParallelGC, java.
Provenance for Data Munging EnvironmentsPaul Groth
Data munging is a crucial task across domains ranging from drug discovery and policy studies to data science. Indeed, it has been reported that data munging accounts for 60% of the time spent in data analysis. Because data munging involves a wide variety of tasks using data from multiple sources, it often becomes difficult to understand how a cleaned dataset was actually produced (i.e. its provenance). In this talk, I discuss our recent work on tracking data provenance within desktop systems, which addresses problems of efficient and fine grained capture. I also describe our work on scalable provence tracking within a triple store/graph database that supports messy web data. Finally, I briefly touch on whether we will move from adhoc data munging approaches to more declarative knowledge representation languages such as Probabilistic Soft Logic.
Presented at Information Sciences Institute - August 13, 2015
The Java Memory Model describes how threads in the Java programming language interact through memory. Together with the description of single-threaded execution of code, the memory model provides the semantics of the Java programming language.
It is crucial for a programmer to know how, according to Java Language Specification, write correctly synchronized, race free programs.
BSides LV 2016 - Beyond the tip of the iceberg - fuzzing binary protocols for...Alexandre Moneger
This presentation shows that code coverage guided fuzzing is possible in the context of network daemon fuzzing.
Some fuzzers are blackbox while others are protocol aware. Even ones which are made protocol aware, fuzzer writers typically model the protocol specification and implement packet awareness logic in the fuzzer. Unfortunately, just because the fuzzer is protocol aware, it does not guarantee that sufficient code paths have been reached.
The presentation deals with specific scenarios where the target protocol is completely unknown (proprietary) and no source code or protocol specs are accessible. The tool developed builds a feedback loop between the client and the server components using the concept of "gate functions". A gate function triggers monitoring. The pintool component tracks the binary code coverage for all the functions untill it reaches an exit gate. By instrumenting such gated functions, the tool is able to measure code coverage during packet processing.
How EverTrue is building a donor CRM on top of ElasticSearch. We cover some of the issues around scaling ElasticSearch and which aspects of ElasticSearch we are using to deliver value to our customers.
Если нашлась одна ошибка — есть и другие. Один способ выявить «наследуемые» у...Positive Hack Days
Ведущий: Асука Накадзима (Asuka Nakajima)
Практика повторного использования исходного кода позволяет сократить расходы на разработку программного обеспечения. Тем не менее, если в оригинальном исходном коде кроется уязвимость, она будет перенесена и в новое приложение. Докладчик расскажет о необычном способе обнаружения «наследуемых» уязвимостей в бинарных файлах без необходимости обращаться к исходному коду или символьным файлам.
Lightning talk showing various aspectos of software system performance. It goes through: latency, data structures, garbage collection, troubleshooting method like workload saturation method, quick diagnostic tools, famegraph and perfview
GPU programing
The Brick Wall -- UC Berkeley's View
Power Wall: power expensive, transistors free
Memory Wall: Memory slow, multiplies fast ILP Wall: diminishing returns on more ILP HW
invited netflix talk: JVM issues in the age of scale! We take an under the hood look at java locking, memory model, overheads, serialization, uuid, gc tuning, CMS, ParallelGC, java.
Provenance for Data Munging EnvironmentsPaul Groth
Data munging is a crucial task across domains ranging from drug discovery and policy studies to data science. Indeed, it has been reported that data munging accounts for 60% of the time spent in data analysis. Because data munging involves a wide variety of tasks using data from multiple sources, it often becomes difficult to understand how a cleaned dataset was actually produced (i.e. its provenance). In this talk, I discuss our recent work on tracking data provenance within desktop systems, which addresses problems of efficient and fine grained capture. I also describe our work on scalable provence tracking within a triple store/graph database that supports messy web data. Finally, I briefly touch on whether we will move from adhoc data munging approaches to more declarative knowledge representation languages such as Probabilistic Soft Logic.
Presented at Information Sciences Institute - August 13, 2015
The Java Memory Model describes how threads in the Java programming language interact through memory. Together with the description of single-threaded execution of code, the memory model provides the semantics of the Java programming language.
It is crucial for a programmer to know how, according to Java Language Specification, write correctly synchronized, race free programs.
BSides LV 2016 - Beyond the tip of the iceberg - fuzzing binary protocols for...Alexandre Moneger
This presentation shows that code coverage guided fuzzing is possible in the context of network daemon fuzzing.
Some fuzzers are blackbox while others are protocol aware. Even ones which are made protocol aware, fuzzer writers typically model the protocol specification and implement packet awareness logic in the fuzzer. Unfortunately, just because the fuzzer is protocol aware, it does not guarantee that sufficient code paths have been reached.
The presentation deals with specific scenarios where the target protocol is completely unknown (proprietary) and no source code or protocol specs are accessible. The tool developed builds a feedback loop between the client and the server components using the concept of "gate functions". A gate function triggers monitoring. The pintool component tracks the binary code coverage for all the functions untill it reaches an exit gate. By instrumenting such gated functions, the tool is able to measure code coverage during packet processing.
How EverTrue is building a donor CRM on top of ElasticSearch. We cover some of the issues around scaling ElasticSearch and which aspects of ElasticSearch we are using to deliver value to our customers.
- Understanding Time Series
- What's the Fundamental Problem
- Prometheus Solution (v1.x)
- New Design of Prometheus (v2.x)
- Data Compression Algorithm
May Marketo Masterclass, London MUG May 22 2024.pdfAdele Miller
Can't make Adobe Summit in Vegas? No sweat because the EMEA Marketo Engage Champions are coming to London to share their Summit sessions, insights and more!
This is a MUG with a twist you don't want to miss.
Utilocate offers a comprehensive solution for locate ticket management by automating and streamlining the entire process. By integrating with Geospatial Information Systems (GIS), it provides accurate mapping and visualization of utility locations, enhancing decision-making and reducing the risk of errors. The system's advanced data analytics tools help identify trends, predict potential issues, and optimize resource allocation, making the locate ticket management process smarter and more efficient. Additionally, automated ticket management ensures consistency and reduces human error, while real-time notifications keep all relevant personnel informed and ready to respond promptly.
The system's ability to streamline workflows and automate ticket routing significantly reduces the time taken to process each ticket, making the process faster and more efficient. Mobile access allows field technicians to update ticket information on the go, ensuring that the latest information is always available and accelerating the locate process. Overall, Utilocate not only enhances the efficiency and accuracy of locate ticket management but also improves safety by minimizing the risk of utility damage through precise and timely locates.
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Mind IT Systems
Healthcare providers often struggle with the complexities of chronic conditions and remote patient monitoring, as each patient requires personalized care and ongoing monitoring. Off-the-shelf solutions may not meet these diverse needs, leading to inefficiencies and gaps in care. It’s here, custom healthcare software offers a tailored solution, ensuring improved care and effectiveness.
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppGoogle
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
👉👉 Click Here To Get More Info 👇👇
https://sumonreview.com/ai-fusion-buddy-review
AI Fusion Buddy Review: Key Features
✅Create Stunning AI App Suite Fully Powered By Google's Latest AI technology, Gemini
✅Use Gemini to Build high-converting Converting Sales Video Scripts, ad copies, Trending Articles, blogs, etc.100% unique!
✅Create Ultra-HD graphics with a single keyword or phrase that commands 10x eyeballs!
✅Fully automated AI articles bulk generation!
✅Auto-post or schedule stunning AI content across all your accounts at once—WordPress, Facebook, LinkedIn, Blogger, and more.
✅With one keyword or URL, generate complete websites, landing pages, and more…
✅Automatically create & sell AI content, graphics, websites, landing pages, & all that gets you paid non-stop 24*7.
✅Pre-built High-Converting 100+ website Templates and 2000+ graphic templates logos, banners, and thumbnail images in Trending Niches.
✅Say goodbye to wasting time logging into multiple Chat GPT & AI Apps once & for all!
✅Save over $5000 per year and kick out dependency on third parties completely!
✅Brand New App: Not available anywhere else!
✅ Beginner-friendly!
✅ZERO upfront cost or any extra expenses
✅Risk-Free: 30-Day Money-Back Guarantee!
✅Commercial License included!
See My Other Reviews Article:
(1) AI Genie Review: https://sumonreview.com/ai-genie-review
(2) SocioWave Review: https://sumonreview.com/sociowave-review
(3) AI Partner & Profit Review: https://sumonreview.com/ai-partner-profit-review
(4) AI Ebook Suite Review: https://sumonreview.com/ai-ebook-suite-review
#AIFusionBuddyReview,
#AIFusionBuddyFeatures,
#AIFusionBuddyPricing,
#AIFusionBuddyProsandCons,
#AIFusionBuddyTutorial,
#AIFusionBuddyUserExperience
#AIFusionBuddyforBeginners,
#AIFusionBuddyBenefits,
#AIFusionBuddyComparison,
#AIFusionBuddyInstallation,
#AIFusionBuddyRefundPolicy,
#AIFusionBuddyDemo,
#AIFusionBuddyMaintenanceFees,
#AIFusionBuddyNewbieFriendly,
#WhatIsAIFusionBuddy?,
#HowDoesAIFusionBuddyWorks
First Steps with Globus Compute Multi-User EndpointsGlobus
In this presentation we will share our experiences around getting started with the Globus Compute multi-user endpoint. Working with the Pharmacology group at the University of Auckland, we have previously written an application using Globus Compute that can offload computationally expensive steps in the researcher's workflows, which they wish to manage from their familiar Windows environments, onto the NeSI (New Zealand eScience Infrastructure) cluster. Some of the challenges we have encountered were that each researcher had to set up and manage their own single-user globus compute endpoint and that the workloads had varying resource requirements (CPUs, memory and wall time) between different runs. We hope that the multi-user endpoint will help to address these challenges and share an update on our progress here.
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Globus
The Earth System Grid Federation (ESGF) is a global network of data servers that archives and distributes the planet’s largest collection of Earth system model output for thousands of climate and environmental scientists worldwide. Many of these petabyte-scale data archives are located in proximity to large high-performance computing (HPC) or cloud computing resources, but the primary workflow for data users consists of transferring data, and applying computations on a different system. As a part of the ESGF 2.0 US project (funded by the United States Department of Energy Office of Science), we developed pre-defined data workflows, which can be run on-demand, capable of applying many data reduction and data analysis to the large ESGF data archives, transferring only the resultant analysis (ex. visualizations, smaller data files). In this talk, we will showcase a few of these workflows, highlighting how Globus Flows can be used for petabyte-scale climate analysis.
Mobile App Development Company In Noida | Drona InfotechDrona Infotech
Looking for a reliable mobile app development company in Noida? Look no further than Drona Infotech. We specialize in creating customized apps for your business needs.
Visit Us For : https://www.dronainfotech.com/mobile-application-development/
Check out the webinar slides to learn more about how XfilesPro transforms Salesforce document management by leveraging its world-class applications. For more details, please connect with sales@xfilespro.com
If you want to watch the on-demand webinar, please click here: https://www.xfilespro.com/webinars/salesforce-document-management-2-0-smarter-faster-better/
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Łukasz Chruściel
No one wants their application to drag like a car stuck in the slow lane! Yet it’s all too common to encounter bumpy, pothole-filled solutions that slow the speed of any application. Symfony apps are not an exception.
In this talk, I will take you for a spin around the performance racetrack. We’ll explore common pitfalls - those hidden potholes on your application that can cause unexpected slowdowns. Learn how to spot these performance bumps early, and more importantly, how to navigate around them to keep your application running at top speed.
We will focus in particular on tuning your engine at the application level, making the right adjustments to ensure that your system responds like a well-oiled, high-performance race car.
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisGlobus
JASMIN is the UK’s high-performance data analysis platform for environmental science, operated by STFC on behalf of the UK Natural Environment Research Council (NERC). In addition to its role in hosting the CEDA Archive (NERC’s long-term repository for climate, atmospheric science & Earth observation data in the UK), JASMIN provides a collaborative platform to a community of around 2,000 scientists in the UK and beyond, providing nearly 400 environmental science projects with working space, compute resources and tools to facilitate their work. High-performance data transfer into and out of JASMIN has always been a key feature, with many scientists bringing model outputs from supercomputers elsewhere in the UK, to analyse against observational or other model data in the CEDA Archive. A growing number of JASMIN users are now realising the benefits of using the Globus service to provide reliable and efficient data movement and other tasks in this and other contexts. Further use cases involve long-distance (intercontinental) transfers to and from JASMIN, and collecting results from a mobile atmospheric radar system, pushing data to JASMIN via a lightweight Globus deployment. We provide details of how Globus fits into our current infrastructure, our experience of the recent migration to GCSv5.4, and of our interest in developing use of the wider ecosystem of Globus services for the benefit of our user community.
E-commerce Application Development Company.pdfHornet Dynamics
Your business can reach new heights with our assistance as we design solutions that are specifically appropriate for your goals and vision. Our eCommerce application solutions can digitally coordinate all retail operations processes to meet the demands of the marketplace while maintaining business continuity.
Understanding Nidhi Software Pricing: A Quick Guide 🌟
Choosing the right software is vital for Nidhi companies to streamline operations. Our latest presentation covers Nidhi software pricing, key factors, costs, and negotiation tips.
📊 What You’ll Learn:
Key factors influencing Nidhi software price
Understanding the true cost beyond the initial price
Tips for negotiating the best deal
Affordable and customizable pricing options with Vector Nidhi Software
🔗 Learn more at: www.vectornidhisoftware.com/software-for-nidhi-company/
#NidhiSoftwarePrice #NidhiSoftware #VectorNidhi
Zoom is a comprehensive platform designed to connect individuals and teams efficiently. With its user-friendly interface and powerful features, Zoom has become a go-to solution for virtual communication and collaboration. It offers a range of tools, including virtual meetings, team chat, VoIP phone systems, online whiteboards, and AI companions, to streamline workflows and enhance productivity.
Enhancing Research Orchestration Capabilities at ORNL.pdfGlobus
Cross-facility research orchestration comes with ever-changing constraints regarding the availability and suitability of various compute and data resources. In short, a flexible data and processing fabric is needed to enable the dynamic redirection of data and compute tasks throughout the lifecycle of an experiment. In this talk, we illustrate how we easily leveraged Globus services to instrument the ACE research testbed at the Oak Ridge Leadership Computing Facility with flexible data and task orchestration capabilities.
Navigating the Metaverse: A Journey into Virtual Evolution"Donna Lenk
Join us for an exploration of the Metaverse's evolution, where innovation meets imagination. Discover new dimensions of virtual events, engage with thought-provoking discussions, and witness the transformative power of digital realms."
AI Genie Review: World’s First Open AI WordPress Website CreatorGoogle
AI Genie Review: World’s First Open AI WordPress Website Creator
👉👉 Click Here To Get More Info 👇👇
https://sumonreview.com/ai-genie-review
AI Genie Review: Key Features
✅Creates Limitless Real-Time Unique Content, auto-publishing Posts, Pages & Images directly from Chat GPT & Open AI on WordPress in any Niche
✅First & Only Google Bard Approved Software That Publishes 100% Original, SEO Friendly Content using Open AI
✅Publish Automated Posts and Pages using AI Genie directly on Your website
✅50 DFY Websites Included Without Adding Any Images, Content Or Doing Anything Yourself
✅Integrated Chat GPT Bot gives Instant Answers on Your Website to Visitors
✅Just Enter the title, and your Content for Pages and Posts will be ready on your website
✅Automatically insert visually appealing images into posts based on keywords and titles.
✅Choose the temperature of the content and control its randomness.
✅Control the length of the content to be generated.
✅Never Worry About Paying Huge Money Monthly To Top Content Creation Platforms
✅100% Easy-to-Use, Newbie-Friendly Technology
✅30-Days Money-Back Guarantee
See My Other Reviews Article:
(1) TubeTrivia AI Review: https://sumonreview.com/tubetrivia-ai-review
(2) SocioWave Review: https://sumonreview.com/sociowave-review
(3) AI Partner & Profit Review: https://sumonreview.com/ai-partner-profit-review
(4) AI Ebook Suite Review: https://sumonreview.com/ai-ebook-suite-review
#AIGenieApp #AIGenieBonus #AIGenieBonuses #AIGenieDemo #AIGenieDownload #AIGenieLegit #AIGenieLiveDemo #AIGenieOTO #AIGeniePreview #AIGenieReview #AIGenieReviewandBonus #AIGenieScamorLegit #AIGenieSoftware #AIGenieUpgrades #AIGenieUpsells #HowDoesAlGenie #HowtoBuyAIGenie #HowtoMakeMoneywithAIGenie #MakeMoneyOnline #MakeMoneywithAIGenie
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeAftab Hussain
Understanding variable roles in code has been found to be helpful by students
in learning programming -- could variable roles help deep neural models in
performing coding tasks? We do an exploratory study.
- These are slides of the talk given at InteNSE'23: The 1st International Workshop on Interpretability and Robustness in Neural Software Engineering, co-located with the 45th International Conference on Software Engineering, ICSE 2023, Melbourne Australia
Game Development with Unity3D (Game Development lecture 3)
Adaptive Thread Scheduling Techniques for Improving Scalability of Software Transactional Memory
1. Adaptive Thread Scheduling Techniques for
Improving Scalability of Software
Transactional Memory
Kinson Chan, King Tin Lam, Cho-Li Wang
Presenter: Kinson Chan
Date: 16 February 2010
PDCN 2011, Innsbruck, Austria
DEPARTMENT OF COMPUTER SCIENCE
THE UNIVERSITY OF HONG KONG
2. Outline
• Motivation –
‣ hardware trend and software transactional memory
• Background –
‣ performance scalability
‣ ratio-based concurrency control and its myth
• Solution –
‣ our rate-based heuristic, Probe.
• Evaluation –
‣ performance comparison
2
3. Motivation
What is the current computing hardware trend,
and why is software transactional memory relevant?
4. Hardware trend: multicores
• Multicore processors
‣ a.k.a. chip multiprocessing
‣ multiple cores on a processor die
‣ cores share a common cache
‣ faster data sharing among threads
‣ more running threads per cabinet
• Chip Multithreading
‣ e.g. hyperthreading, coolthreads
‣ more than one threads per core
‣ hide the data load latency
4
L1! L1! L1! L1!
L2! L2! L2! L2!
L3!
1! 2! 3! 4! 5! 6! 7! 8!
a typical modern processor
5. Hardware trend: multicores
• Multicore processors
‣ a.k.a. chip multiprocessing
‣ multiple cores on a processor die
‣ cores share a common cache
‣ faster data sharing among threads
‣ more running threads per cabinet
• Chip Multithreading
‣ e.g. hyperthreading, coolthreads
‣ more than one threads per core
‣ hide the data load latency
4
L1! L1! L1! L1!
L2! L2! L2! L2!
L3!
1! 2! 3! 4! 5! 6! 7! 8!
Multiple cores
a typical modern processor
6. Hardware trend: multicores
• Multicore processors
‣ a.k.a. chip multiprocessing
‣ multiple cores on a processor die
‣ cores share a common cache
‣ faster data sharing among threads
‣ more running threads per cabinet
• Chip Multithreading
‣ e.g. hyperthreading, coolthreads
‣ more than one threads per core
‣ hide the data load latency
4
L1! L1! L1! L1!
L2! L2! L2! L2!
L3!
1! 2! 3! 4! 5! 6! 7! 8!
Mutli-thread
per core
a typical modern processor
7. Now and future multicores
5
Micro-
architecture
Clock rate Cores
Threads per
core
Threads per
package
Shared cache
Memory
arrangement
IBM Power 7 ~ 3 GHz 4 ~ 8 4 32 Max
4 MB
shared L3
NUMA
Sun Niagara2 1.2 ~ 1.6 GHz 4 ~ 8 8 64 Max
4 MB
shared L2
NUMA
Intel
Westmere
~ 2 GHz 4 ~ 8 2 16 Max
12 ~ 24 MB
shared L3
NUMA
Intel
Harpertown
~ 3 GHz 2 x 2 2 8
2 x 6 MB
shared L3
UMA
AMD
Bulldozer
~ 2 GHz 2 x 6 ~ 2 x 8 1 16 Max
8 MB
shared L3
NUMA
AMD Magny-
Cours
~ 3 GHz 8 modules 2 per module 16 Max
8 MB
shared L3
NUMA
Intel Terascale ~ 4 GHz 80? 1? 80?
80 x 2 KB
dist. cache
NUCA
8. Now and future multicores
5
Micro-
architecture
Clock rate Cores
Threads per
core
Threads per
package
Shared cache
Memory
arrangement
IBM Power 7 ~ 3 GHz 4 ~ 8 4 32 Max
4 MB
shared L3
NUMA
Sun Niagara2 1.2 ~ 1.6 GHz 4 ~ 8 8 64 Max
4 MB
shared L2
NUMA
Intel
Westmere
~ 2 GHz 4 ~ 8 2 16 Max
12 ~ 24 MB
shared L3
NUMA
Intel
Harpertown
~ 3 GHz 2 x 2 2 8
2 x 6 MB
shared L3
UMA
AMD
Bulldozer
~ 2 GHz 2 x 6 ~ 2 x 8 1 16 Max
8 MB
shared L3
NUMA
AMD Magny-
Cours
~ 3 GHz 8 modules 2 per module 16 Max
8 MB
shared L3
NUMA
Intel Terascale ~ 4 GHz 80? 1? 80?
80 x 2 KB
dist. cache
NUCA
How can we scale our program to have these many threads?
11. Multi-threading and synchronization
6
Coarse grain
locking
Easy / Correct
(few locks,
predictable)
Difficult to scale
(excessive mutual
exclusion)
Fine-grain
locking
Error prone
(deadlock, forget to
lock, ...)
Scales better
(allows more
parallelism)
12. Multi-threading and synchronization
6
Coarse grain
locking
Easy / Correct
(few locks,
predictable)
Difficult to scale
(excessive mutual
exclusion)
Fine-grain
locking
Error prone
(deadlock, forget to
lock, ...)
Scales better
(allows more
parallelism)
Do we have
anything in
between?
Easy / Correct
Scales good
13. STM optimistic execution
7
Begin Begin
Proceed Proceed
Commit
Commit
Retry
Commit
x=x+4
y=y-4
x=x+2
y=y-2
Begin
Proceed
Begin
Proceed
Commit
Commit
x=x+4
y=y-4
w=w+5
z=w
Thread 1 Thread 2 Thread 3 Thread 1 Thread 2 Thread 3
x=x+2
y=y-2
Success
Success
Success
Success
conflict detection
conflict detection
begin;
x=x+4;
y=y-4;
commit;
begin;
x=x+2;
y=y-2;
commit;
begin;
x=x+4;
y=y-4;
commit;
begin;
w=w+5;
z=w;
commit;
begin;
x = x + 4;
y = y - 4;
commit;
15. STM optimistic execution
7
Begin Begin
Proceed Proceed
Commit
Commit
Retry
Commit
x=x+4
y=y-4
x=x+2
y=y-2
Begin
Proceed
Begin
Proceed
Commit
Commit
x=x+4
y=y-4
w=w+5
z=w
Thread 1 Thread 2 Thread 3 Thread 1 Thread 2 Thread 3
x=x+2
y=y-2
Success
Success
Success
Success
conflict detection
conflict detection
begin;
x=x+4;
y=y-4;
commit;
begin;
x=x+2;
y=y-2;
commit;
begin;
x=x+4;
y=y-4;
commit;
begin;
w=w+5;
z=w;
commit;
case 1: two transactions conflicts:
rollback and retry one of them.
16. STM optimistic execution
7
Begin Begin
Proceed Proceed
Commit
Commit
Retry
Commit
x=x+4
y=y-4
x=x+2
y=y-2
Begin
Proceed
Begin
Proceed
Commit
Commit
x=x+4
y=y-4
w=w+5
z=w
Thread 1 Thread 2 Thread 3 Thread 1 Thread 2 Thread 3
x=x+2
y=y-2
Success
Success
Success
Success
conflict detection
conflict detection
begin;
x=x+4;
y=y-4;
commit;
begin;
x=x+2;
y=y-2;
commit;
begin;
x=x+4;
y=y-4;
commit;
begin;
w=w+5;
z=w;
commit;
case 1: two transactions conflicts:
rollback and retry one of them.
case 2: two transactions do not conflict:
they execute together,
achieving better parallelism.
17. C. J. Rossbach, O. S. Hofmann and Emmett Witchel, Is transactional programming actually easier, In Proceedings of the
15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 45–56, 2010.
STM is easy
• In the University of Texas at Austin, 237 students taking
Operating System courses were instructed to program the same
problem with coarse locks, fine-grained locks, monitors and
transactions...
8
Development
Time
Errors
Code
Complexity
18. C. J. Rossbach, O. S. Hofmann and Emmett Witchel, Is transactional programming actually easier, In Proceedings of the
15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 45–56, 2010.
STM is easy
• In the University of Texas at Austin, 237 students taking
Operating System courses were instructed to program the same
problem with coarse locks, fine-grained locks, monitors and
transactions...
8
Development
Time
Errors
Code
Complexity
LongShort
TMCoarse Fine
19. C. J. Rossbach, O. S. Hofmann and Emmett Witchel, Is transactional programming actually easier, In Proceedings of the
15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 45–56, 2010.
STM is easy
• In the University of Texas at Austin, 237 students taking
Operating System courses were instructed to program the same
problem with coarse locks, fine-grained locks, monitors and
transactions...
8
Development
Time
Errors
Code
Complexity
LongShort
TMCoarse Fine
Simple Complex
TMCoarse Fine
20. C. J. Rossbach, O. S. Hofmann and Emmett Witchel, Is transactional programming actually easier, In Proceedings of the
15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 45–56, 2010.
STM is easy
• In the University of Texas at Austin, 237 students taking
Operating System courses were instructed to program the same
problem with coarse locks, fine-grained locks, monitors and
transactions...
8
Development
Time
Errors
Code
Complexity
LongShort
TMCoarse Fine
Simple Complex
TMCoarse Fine
Less More
TM Coarse Fine
22. STM is a research toy
9
STM
SXM
OSTM
DSTM
ASTM
TL2
TinySTM
TLRW
SwissTM
NOrec
TML
RingTM
InvalTM
DeuceTM
D2STM
23. STM is a research toy
9
STM
SXM
OSTM
DSTM
ASTM
TL2
TinySTM
TLRW
SwissTM
NOrec
TML
RingTM
InvalTM
DeuceTM
D2STM
not
Company Products Research
Sun Dynamic STM library DSTM, TL2, TLRW,
Rock processor, ...
Intel Intel C++ STM compiler McRT-STM, ...
IBM C/C++ for TM on AIX STM extension on X10, ...
Microsoft STM.NET STM on Haskell, ...
AMD ASF instruction set
extension
24. Background
What affects the transactional memory performance?
How can we adjust concurrency for best performance?
28. Threads and performance
11
× ∝thread#
attempt#
more threads,
more transactional
attempts
thread#commitprob.
more threads,
smaller portion of
transactions to
commit
thread#
performance
concave curve of
performance
29. Threads and performance
11
× ∝thread#
attempt#
more threads,
more transactional
attempts
thread#commitprob.
more threads,
smaller portion of
transactions to
commit
thread#
performance
concave curve of
performance
optimal
31. Ratio- vs rate-based concurrency controls
12
• Concurrency control in STM:
‣ achieve optimal performance by scheduling means.
32. Ratio- vs rate-based concurrency controls
12
• Concurrency control in STM:
‣ achieve optimal performance by scheduling means.
• Different concepts:
33. Ratio- vs rate-based concurrency controls
12
• Concurrency control in STM:
‣ achieve optimal performance by scheduling means.
• Different concepts:
‣ commit ratio-based heuristics
34. Ratio- vs rate-based concurrency controls
12
• Concurrency control in STM:
‣ achieve optimal performance by scheduling means.
• Different concepts:
‣ commit ratio-based heuristics
✴ ratio = commits / (commits + aborts)
35. Ratio- vs rate-based concurrency controls
12
• Concurrency control in STM:
‣ achieve optimal performance by scheduling means.
• Different concepts:
‣ commit ratio-based heuristics
✴ ratio = commits / (commits + aborts)
✴ reduce concurrency when ratio gets too low
✴ relax concurrency when ratio gets higher than a threshold
36. Ratio- vs rate-based concurrency controls
12
• Concurrency control in STM:
‣ achieve optimal performance by scheduling means.
• Different concepts:
‣ commit ratio-based heuristics
✴ ratio = commits / (commits + aborts)
✴ reduce concurrency when ratio gets too low
✴ relax concurrency when ratio gets higher than a threshold
‣ commit rate-based heuristics
37. Ratio- vs rate-based concurrency controls
12
• Concurrency control in STM:
‣ achieve optimal performance by scheduling means.
• Different concepts:
‣ commit ratio-based heuristics
✴ ratio = commits / (commits + aborts)
✴ reduce concurrency when ratio gets too low
✴ relax concurrency when ratio gets higher than a threshold
‣ commit rate-based heuristics
✴ rate = commits / time
38. Ratio- vs rate-based concurrency controls
12
• Concurrency control in STM:
‣ achieve optimal performance by scheduling means.
• Different concepts:
‣ commit ratio-based heuristics
✴ ratio = commits / (commits + aborts)
✴ reduce concurrency when ratio gets too low
✴ relax concurrency when ratio gets higher than a threshold
‣ commit rate-based heuristics
✴ rate = commits / time
‣ queuing after winner transactions
✴ kernel-level programming, conditional waiting...
39. Ratio-based solutions
• Ansari, et al.:
‣ introduces total commit [ratio] (TCR)
‣ increases / decreases threads by comparing TCR and set-point (70%)
• Yoo and Lee:
‣ introduces per-thread contention intensity (CI)
‣ (likelihood of a thread to encounter contentions)
‣ stalls for acquiring mutex when CI goes above a value (70%)
• Dolev, et al.:
‣ activates hotspot detection when CI goes above a value (40%)
‣ a thread stalls for acquiring mutex when hotspot is detected
13
40. 14
Myths of ratio-based heuristics
• We want an application finishes faster
‣ i.e. more transactions committed per unit time
‣ (assumption: constant number of transactions)
• High commit ratio ≠ high performance
‣ 1 thread + 100% ratio vs 4 threads + 40% commit ratio
‣ engine rotation ≠ vehicle velocity
• Watching commit ratio is an inexact science
‣ happens to be close estimation, though
• Drawbacks
‣ over-serialization when commit ratio is low
‣ over-relaxation when the commit ratio is high
41. 14
Myths of ratio-based heuristics
• We want an application finishes faster
‣ i.e. more transactions committed per unit time
‣ (assumption: constant number of transactions)
• High commit ratio ≠ high performance
‣ 1 thread + 100% ratio vs 4 threads + 40% commit ratio
‣ engine rotation ≠ vehicle velocity
• Watching commit ratio is an inexact science
‣ happens to be close estimation, though
• Drawbacks
‣ over-serialization when commit ratio is low
‣ over-relaxation when the commit ratio is high
42. • At any instance, we can only pick a thread count
‣ “what if” questions not allowed in run-time
• What is high and what is low?
‣ commit ratio goes between 0% and 100%
‣ commit rate depends on transaction lengths
• Changing patterns
‣ transaction nature may change along execution:
‣ getting longer / shorter
‣ getting more / less contentions
• Pre-defined bounds not acceptable
‣ the optimal spot changes across execution timeline.
15
Challenges with rate-based solution
43. 16
Commit ratio vs Commit rate
CommitRatio/%(GreenLine)!
CommitRate(RedLine)!
44. 16
Commit ratio vs Commit rate
CommitRatio/%(GreenLine)!
CommitRate(RedLine)!
commit ratio
45. 16
Commit ratio vs Commit rate
CommitRatio/%(GreenLine)!
CommitRate(RedLine)!
commit rate
46. 16
Commit ratio vs Commit rate
CommitRatio/%(GreenLine)!
CommitRate(RedLine)!
More threads
results in better
performance
47. 16
Commit ratio vs Commit rate
CommitRatio/%(GreenLine)!
CommitRate(RedLine)!
Excessive
threads kills
performance
48. 16
Commit ratio vs Commit rate
CommitRatio/%(GreenLine)!
CommitRate(RedLine)!
Changing
application
natures
49. 16
Commit ratio vs Commit rate
CommitRatio/%(GreenLine)!
CommitRate(RedLine)!
Excessive
threads yields
fluctuations
50. 16
Commit ratio vs Commit rate
CommitRatio/%(GreenLine)!
CommitRate(RedLine)!
Dropping ratio
Increasing rate
Shorter time
51. 16
Commit ratio vs Commit rate
CommitRatio/%(GreenLine)!
CommitRate(RedLine)!
Low commit
ratio, but still
scalable...
52. 16
Commit ratio vs Commit rate
CommitRatio/%(GreenLine)!
CommitRate(RedLine)!
53. Solution
Now we know ratio-based solutions are not right.
What shall we do for a rate-based alternative?
54. Counting variables
18
commits:
number of commits in a time-slice
aborts:
number of aborts in a time-slice
quota:
maximum concurrency
entered:
currently active transactions
peak:
peak concurrency in a time-slice
55. Counting variables
18
commits:
number of commits in a time-slice
aborts:
number of aborts in a time-slice
quota:
maximum concurrency
entered:
currently active transactions
peak:
peak concurrency in a time-slice
commits per timeslice:
commit rate
56. Counting variables
18
commits:
number of commits in a time-slice
aborts:
number of aborts in a time-slice
quota:
maximum concurrency
entered:
currently active transactions
peak:
peak concurrency in a time-slice
commit ratio
57. Counting variables
18
commits:
number of commits in a time-slice
aborts:
number of aborts in a time-slice
quota:
maximum concurrency
entered:
currently active transactions
peak:
peak concurrency in a time-slice
quota ≤ entered:
no more new transactions
stall new
comers with
pthread_sched();
58. Counting variables
18
commits:
number of commits in a time-slice
aborts:
number of aborts in a time-slice
quota:
maximum concurrency
entered:
currently active transactions
peak:
peak concurrency in a time-slice
quota ≥ peak:
unused quota
reduce quota
for a tight limit...
59. Counting variables
18
commits:
number of commits in a time-slice
aborts:
number of aborts in a time-slice
quota:
maximum concurrency
entered:
currently active transactions
peak:
peak concurrency in a time-slice
61. System architecture
19
Thread
creation
Thread
scheduling
Memory
management
Input / output
system
Concurrency
control unit
Conflict
detection
engine
Shared memory
Activity
logger
Txn
Activity
logger
Txn
Activity
logger
Txn
Activity
logger
Txn
User code
Transactions
execute as normal,
with conflict
detection
Transactional
threads
Transactional
memory
system
Operating
system
62. System architecture
19
Thread
creation
Thread
scheduling
Memory
management
Input / output
system
Concurrency
control unit
Conflict
detection
engine
Shared memory
Activity
logger
Txn
Activity
logger
Txn
Activity
logger
Txn
Activity
logger
Txn
User code
Concurrency
control unit is
added as a hook,
monitoring the
performance
Transactional
threads
Transactional
memory
system
Operating
system
63. System architecture
19
Thread
creation
Thread
scheduling
Memory
management
Input / output
system
Concurrency
control unit
Conflict
detection
engine
Shared memory
Activity
logger
Txn
Activity
logger
Txn
Activity
logger
Txn
Activity
logger
Txn
User code
Scheduler is
invoked to stall
some new
transactions, if
appropriate
Transactional
threads
Transactional
memory
system
Operating
system
82. Evaluation Platform
• Dell PowerEdge M610 Blade Server
‣ 2x Intel “Nehalem” Xeon E5540 2.53 GHz (8 cores, 16 threads)
‣ ECC DDR3-1066 36 GB main memory
• STAMP Benchmark
‣ original from Stanford University – http://stamp.stanford.edu/
‣ modified version for TinySTM: http://www.tinystm.org/
• TinySTM 0.9.5
‣ open-source version – http://www.tinystm.org/
• Yoo’s and Shrink concurrency control
‣ from EPFL Distributed Programming Laboratory –
http://lpd.epfl.ch/site/research/tmeval/
22
83. Probing in effect vs other heuristics
23
throttle2 probe2 basic yoo shrink
100
75
50
0
25
100
75
50
0
25
680K
510K
340K
0
170K
1.6M
1.2M
800K
0
400K
4
3
2
0
1
16
12
8
0
4
CommitRt(GreenDottedLine)/%
CommitRate(RedSolidLine)/
TransactionsperSecond
StalledThreads(BlueDashedLine)
Figure 3. Commit Ratio, Commit Rate and Number of Stalled Threads of Some TM Applications
found Probe more favourable than Throttle, as well
r concurrency control policies. We have also found
s are sensitive to the cache sharing, and refined our
ion accordingly.
we may consider new adaptive scheduling policies
International Symposium on Computer Architecture, pages 289–300,
1993.
[10] M. Herlihy, V. Luchangco, and M. Moir. Obstruction free synchroniza-
tion: Double ended queues as an example. In Proceedings of the 23rd
International Conference on Distributed Computing Systems, pages
throttle2 probe2 basic yoo shrink
kmeans-2
16 threads
yada
8 threads
100
75
50
0
25
100
75
50
0
25
680K
510K
340K
0
170K
1.6M
1.2M
800K
0
400K
CommitRt(GreenDottedLine)/%
CommitRate(RedSolidLine)/
Figure 3. Commit Ratio, Commit Rate and Number of Stalled Threads of Some TM Applications
We have found Probe more favourable than Throttle, as well
as two other concurrency control policies. We have also found
our solutions are sensitive to the cache sharing, and refined our
implementation accordingly.
In future we may consider new adaptive scheduling policies
International Symposium on Computer Architecture, pa
1993.
[10] M. Herlihy, V. Luchangco, and M. Moir. Obstruction free
tion: Double ended queues as an example. In Proceedin
International Conference on Distributed Computing S
dolev
84. Probing in effect vs other heuristics
23
throttle2 probe2 basic yoo shrink
100
75
50
0
25
100
75
50
0
25
680K
510K
340K
0
170K
1.6M
1.2M
800K
0
400K
4
3
2
0
1
16
12
8
0
4
CommitRt(GreenDottedLine)/%
CommitRate(RedSolidLine)/
TransactionsperSecond
StalledThreads(BlueDashedLine)
Figure 3. Commit Ratio, Commit Rate and Number of Stalled Threads of Some TM Applications
found Probe more favourable than Throttle, as well
r concurrency control policies. We have also found
s are sensitive to the cache sharing, and refined our
ion accordingly.
we may consider new adaptive scheduling policies
International Symposium on Computer Architecture, pages 289–300,
1993.
[10] M. Herlihy, V. Luchangco, and M. Moir. Obstruction free synchroniza-
tion: Double ended queues as an example. In Proceedings of the 23rd
International Conference on Distributed Computing Systems, pages
throttle2 probe2 basic yoo shrink
kmeans-2
16 threads
yada
8 threads
100
75
50
0
25
100
75
50
0
25
680K
510K
340K
0
170K
1.6M
1.2M
800K
0
400K
CommitRt(GreenDottedLine)/%
CommitRate(RedSolidLine)/
Figure 3. Commit Ratio, Commit Rate and Number of Stalled Threads of Some TM Applications
We have found Probe more favourable than Throttle, as well
as two other concurrency control policies. We have also found
our solutions are sensitive to the cache sharing, and refined our
implementation accordingly.
In future we may consider new adaptive scheduling policies
International Symposium on Computer Architecture, pa
1993.
[10] M. Herlihy, V. Luchangco, and M. Moir. Obstruction free
tion: Double ended queues as an example. In Proceedin
International Conference on Distributed Computing S
commit ratio
dolev
85. Probing in effect vs other heuristics
23
throttle2 probe2 basic yoo shrink
100
75
50
0
25
100
75
50
0
25
680K
510K
340K
0
170K
1.6M
1.2M
800K
0
400K
4
3
2
0
1
16
12
8
0
4
CommitRt(GreenDottedLine)/%
CommitRate(RedSolidLine)/
TransactionsperSecond
StalledThreads(BlueDashedLine)
Figure 3. Commit Ratio, Commit Rate and Number of Stalled Threads of Some TM Applications
found Probe more favourable than Throttle, as well
r concurrency control policies. We have also found
s are sensitive to the cache sharing, and refined our
ion accordingly.
we may consider new adaptive scheduling policies
International Symposium on Computer Architecture, pages 289–300,
1993.
[10] M. Herlihy, V. Luchangco, and M. Moir. Obstruction free synchroniza-
tion: Double ended queues as an example. In Proceedings of the 23rd
International Conference on Distributed Computing Systems, pages
throttle2 probe2 basic yoo shrink
kmeans-2
16 threads
yada
8 threads
100
75
50
0
25
100
75
50
0
25
680K
510K
340K
0
170K
1.6M
1.2M
800K
0
400K
CommitRt(GreenDottedLine)/%
CommitRate(RedSolidLine)/
Figure 3. Commit Ratio, Commit Rate and Number of Stalled Threads of Some TM Applications
We have found Probe more favourable than Throttle, as well
as two other concurrency control policies. We have also found
our solutions are sensitive to the cache sharing, and refined our
implementation accordingly.
In future we may consider new adaptive scheduling policies
International Symposium on Computer Architecture, pa
1993.
[10] M. Herlihy, V. Luchangco, and M. Moir. Obstruction free
tion: Double ended queues as an example. In Proceedin
International Conference on Distributed Computing S
commit rate
dolev
86. Probing in effect vs other heuristics
23
throttle2 probe2 basic yoo shrink
100
75
50
0
25
100
75
50
0
25
680K
510K
340K
0
170K
1.6M
1.2M
800K
0
400K
4
3
2
0
1
16
12
8
0
4
CommitRt(GreenDottedLine)/%
CommitRate(RedSolidLine)/
TransactionsperSecond
StalledThreads(BlueDashedLine)
Figure 3. Commit Ratio, Commit Rate and Number of Stalled Threads of Some TM Applications
found Probe more favourable than Throttle, as well
r concurrency control policies. We have also found
s are sensitive to the cache sharing, and refined our
ion accordingly.
we may consider new adaptive scheduling policies
International Symposium on Computer Architecture, pages 289–300,
1993.
[10] M. Herlihy, V. Luchangco, and M. Moir. Obstruction free synchroniza-
tion: Double ended queues as an example. In Proceedings of the 23rd
International Conference on Distributed Computing Systems, pages
throttle2 probe2 basic yoo shrink
kmeans-2
16 threads
yada
8 threads
100
75
50
0
25
100
75
50
0
25
680K
510K
340K
0
170K
1.6M
1.2M
800K
0
400K
CommitRt(GreenDottedLine)/%
CommitRate(RedSolidLine)/
Figure 3. Commit Ratio, Commit Rate and Number of Stalled Threads of Some TM Applications
We have found Probe more favourable than Throttle, as well
as two other concurrency control policies. We have also found
our solutions are sensitive to the cache sharing, and refined our
implementation accordingly.
In future we may consider new adaptive scheduling policies
International Symposium on Computer Architecture, pa
1993.
[10] M. Herlihy, V. Luchangco, and M. Moir. Obstruction free
tion: Double ended queues as an example. In Proceedin
International Conference on Distributed Computing S
threads
stalled
dolev
87. Probing in effect vs other heuristics
23
throttle2 probe2 basic yoo shrink
100
75
50
0
25
100
75
50
0
25
680K
510K
340K
0
170K
1.6M
1.2M
800K
0
400K
4
3
2
0
1
16
12
8
0
4
CommitRt(GreenDottedLine)/%
CommitRate(RedSolidLine)/
TransactionsperSecond
StalledThreads(BlueDashedLine)
Figure 3. Commit Ratio, Commit Rate and Number of Stalled Threads of Some TM Applications
found Probe more favourable than Throttle, as well
r concurrency control policies. We have also found
s are sensitive to the cache sharing, and refined our
ion accordingly.
we may consider new adaptive scheduling policies
International Symposium on Computer Architecture, pages 289–300,
1993.
[10] M. Herlihy, V. Luchangco, and M. Moir. Obstruction free synchroniza-
tion: Double ended queues as an example. In Proceedings of the 23rd
International Conference on Distributed Computing Systems, pages
throttle2 probe2 basic yoo shrink
kmeans-2
16 threads
yada
8 threads
100
75
50
0
25
100
75
50
0
25
680K
510K
340K
0
170K
1.6M
1.2M
800K
0
400K
CommitRt(GreenDottedLine)/%
CommitRate(RedSolidLine)/
Figure 3. Commit Ratio, Commit Rate and Number of Stalled Threads of Some TM Applications
We have found Probe more favourable than Throttle, as well
as two other concurrency control policies. We have also found
our solutions are sensitive to the cache sharing, and refined our
implementation accordingly.
In future we may consider new adaptive scheduling policies
International Symposium on Computer Architecture, pa
1993.
[10] M. Herlihy, V. Luchangco, and M. Moir. Obstruction free
tion: Double ended queues as an example. In Proceedin
International Conference on Distributed Computing S
low
commit ratio
Original
TinySTM
dolev
88. Probing in effect vs other heuristics
23
throttle2 probe2 basic yoo shrink
100
75
50
0
25
100
75
50
0
25
680K
510K
340K
0
170K
1.6M
1.2M
800K
0
400K
4
3
2
0
1
16
12
8
0
4
CommitRt(GreenDottedLine)/%
CommitRate(RedSolidLine)/
TransactionsperSecond
StalledThreads(BlueDashedLine)
Figure 3. Commit Ratio, Commit Rate and Number of Stalled Threads of Some TM Applications
found Probe more favourable than Throttle, as well
r concurrency control policies. We have also found
s are sensitive to the cache sharing, and refined our
ion accordingly.
we may consider new adaptive scheduling policies
International Symposium on Computer Architecture, pages 289–300,
1993.
[10] M. Herlihy, V. Luchangco, and M. Moir. Obstruction free synchroniza-
tion: Double ended queues as an example. In Proceedings of the 23rd
International Conference on Distributed Computing Systems, pages
throttle2 probe2 basic yoo shrink
kmeans-2
16 threads
yada
8 threads
100
75
50
0
25
100
75
50
0
25
680K
510K
340K
0
170K
1.6M
1.2M
800K
0
400K
CommitRt(GreenDottedLine)/%
CommitRate(RedSolidLine)/
Figure 3. Commit Ratio, Commit Rate and Number of Stalled Threads of Some TM Applications
We have found Probe more favourable than Throttle, as well
as two other concurrency control policies. We have also found
our solutions are sensitive to the cache sharing, and refined our
implementation accordingly.
In future we may consider new adaptive scheduling policies
International Symposium on Computer Architecture, pa
1993.
[10] M. Herlihy, V. Luchangco, and M. Moir. Obstruction free
tion: Double ended queues as an example. In Proceedin
International Conference on Distributed Computing S
mild
adjustment
mild
adjustment
Probe: rate-based
concurrency control
dolev
89. Probing in effect vs other heuristics
23
throttle2 probe2 basic yoo shrink
100
75
50
0
25
100
75
50
0
25
680K
510K
340K
0
170K
1.6M
1.2M
800K
0
400K
4
3
2
0
1
16
12
8
0
4
CommitRt(GreenDottedLine)/%
CommitRate(RedSolidLine)/
TransactionsperSecond
StalledThreads(BlueDashedLine)
Figure 3. Commit Ratio, Commit Rate and Number of Stalled Threads of Some TM Applications
found Probe more favourable than Throttle, as well
r concurrency control policies. We have also found
s are sensitive to the cache sharing, and refined our
ion accordingly.
we may consider new adaptive scheduling policies
International Symposium on Computer Architecture, pages 289–300,
1993.
[10] M. Herlihy, V. Luchangco, and M. Moir. Obstruction free synchroniza-
tion: Double ended queues as an example. In Proceedings of the 23rd
International Conference on Distributed Computing Systems, pages
throttle2 probe2 basic yoo shrink
kmeans-2
16 threads
yada
8 threads
100
75
50
0
25
100
75
50
0
25
680K
510K
340K
0
170K
1.6M
1.2M
800K
0
400K
CommitRt(GreenDottedLine)/%
CommitRate(RedSolidLine)/
Figure 3. Commit Ratio, Commit Rate and Number of Stalled Threads of Some TM Applications
We have found Probe more favourable than Throttle, as well
as two other concurrency control policies. We have also found
our solutions are sensitive to the cache sharing, and refined our
implementation accordingly.
In future we may consider new adaptive scheduling policies
International Symposium on Computer Architecture, pa
1993.
[10] M. Herlihy, V. Luchangco, and M. Moir. Obstruction free
tion: Double ended queues as an example. In Proceedin
International Conference on Distributed Computing S
higher
commit rate
higher
commit rate
Probe: rate-based
concurrency control
dolev
90. Probing in effect vs other heuristics
23
throttle2 probe2 basic yoo shrink
100
75
50
0
25
100
75
50
0
25
680K
510K
340K
0
170K
1.6M
1.2M
800K
0
400K
4
3
2
0
1
16
12
8
0
4
CommitRt(GreenDottedLine)/%
CommitRate(RedSolidLine)/
TransactionsperSecond
StalledThreads(BlueDashedLine)
Figure 3. Commit Ratio, Commit Rate and Number of Stalled Threads of Some TM Applications
found Probe more favourable than Throttle, as well
r concurrency control policies. We have also found
s are sensitive to the cache sharing, and refined our
ion accordingly.
we may consider new adaptive scheduling policies
International Symposium on Computer Architecture, pages 289–300,
1993.
[10] M. Herlihy, V. Luchangco, and M. Moir. Obstruction free synchroniza-
tion: Double ended queues as an example. In Proceedings of the 23rd
International Conference on Distributed Computing Systems, pages
throttle2 probe2 basic yoo shrink
kmeans-2
16 threads
yada
8 threads
100
75
50
0
25
100
75
50
0
25
680K
510K
340K
0
170K
1.6M
1.2M
800K
0
400K
CommitRt(GreenDottedLine)/%
CommitRate(RedSolidLine)/
Figure 3. Commit Ratio, Commit Rate and Number of Stalled Threads of Some TM Applications
We have found Probe more favourable than Throttle, as well
as two other concurrency control policies. We have also found
our solutions are sensitive to the cache sharing, and refined our
implementation accordingly.
In future we may consider new adaptive scheduling policies
International Symposium on Computer Architecture, pa
1993.
[10] M. Herlihy, V. Luchangco, and M. Moir. Obstruction free
tion: Double ended queues as an example. In Proceedin
International Conference on Distributed Computing S
aim: high
ratio
aim: high
ratio
dolev
Yoo’s and Dolev’s
ratio-based concurrency control
91. Probing in effect vs other heuristics
23
throttle2 probe2 basic yoo shrink
100
75
50
0
25
100
75
50
0
25
680K
510K
340K
0
170K
1.6M
1.2M
800K
0
400K
4
3
2
0
1
16
12
8
0
4
CommitRt(GreenDottedLine)/%
CommitRate(RedSolidLine)/
TransactionsperSecond
StalledThreads(BlueDashedLine)
Figure 3. Commit Ratio, Commit Rate and Number of Stalled Threads of Some TM Applications
found Probe more favourable than Throttle, as well
r concurrency control policies. We have also found
s are sensitive to the cache sharing, and refined our
ion accordingly.
we may consider new adaptive scheduling policies
International Symposium on Computer Architecture, pages 289–300,
1993.
[10] M. Herlihy, V. Luchangco, and M. Moir. Obstruction free synchroniza-
tion: Double ended queues as an example. In Proceedings of the 23rd
International Conference on Distributed Computing Systems, pages
throttle2 probe2 basic yoo shrink
kmeans-2
16 threads
yada
8 threads
100
75
50
0
25
100
75
50
0
25
680K
510K
340K
0
170K
1.6M
1.2M
800K
0
400K
CommitRt(GreenDottedLine)/%
CommitRate(RedSolidLine)/
Figure 3. Commit Ratio, Commit Rate and Number of Stalled Threads of Some TM Applications
We have found Probe more favourable than Throttle, as well
as two other concurrency control policies. We have also found
our solutions are sensitive to the cache sharing, and refined our
implementation accordingly.
In future we may consider new adaptive scheduling policies
International Symposium on Computer Architecture, pa
1993.
[10] M. Herlihy, V. Luchangco, and M. Moir. Obstruction free
tion: Double ended queues as an example. In Proceedin
International Conference on Distributed Computing S
large
adjustment
large
adjustment
dolev
Yoo’s and Dolev’s
ratio-based concurrency control
92. Probing in effect vs other heuristics
23
throttle2 probe2 basic yoo shrink
100
75
50
0
25
100
75
50
0
25
680K
510K
340K
0
170K
1.6M
1.2M
800K
0
400K
4
3
2
0
1
16
12
8
0
4
CommitRt(GreenDottedLine)/%
CommitRate(RedSolidLine)/
TransactionsperSecond
StalledThreads(BlueDashedLine)
Figure 3. Commit Ratio, Commit Rate and Number of Stalled Threads of Some TM Applications
found Probe more favourable than Throttle, as well
r concurrency control policies. We have also found
s are sensitive to the cache sharing, and refined our
ion accordingly.
we may consider new adaptive scheduling policies
International Symposium on Computer Architecture, pages 289–300,
1993.
[10] M. Herlihy, V. Luchangco, and M. Moir. Obstruction free synchroniza-
tion: Double ended queues as an example. In Proceedings of the 23rd
International Conference on Distributed Computing Systems, pages
throttle2 probe2 basic yoo shrink
kmeans-2
16 threads
yada
8 threads
100
75
50
0
25
100
75
50
0
25
680K
510K
340K
0
170K
1.6M
1.2M
800K
0
400K
CommitRt(GreenDottedLine)/%
CommitRate(RedSolidLine)/
Figure 3. Commit Ratio, Commit Rate and Number of Stalled Threads of Some TM Applications
We have found Probe more favourable than Throttle, as well
as two other concurrency control policies. We have also found
our solutions are sensitive to the cache sharing, and refined our
implementation accordingly.
In future we may consider new adaptive scheduling policies
International Symposium on Computer Architecture, pa
1993.
[10] M. Herlihy, V. Luchangco, and M. Moir. Obstruction free
tion: Double ended queues as an example. In Proceedin
International Conference on Distributed Computing S
even
lower rate...
even
lower rate...
dolev
Yoo’s and Dolev’s
ratio-based concurrency control
93. Probing in effect vs other heuristics
23
throttle2 probe2 basic yoo shrink
100
75
50
0
25
100
75
50
0
25
680K
510K
340K
0
170K
1.6M
1.2M
800K
0
400K
4
3
2
0
1
16
12
8
0
4
CommitRt(GreenDottedLine)/%
CommitRate(RedSolidLine)/
TransactionsperSecond
StalledThreads(BlueDashedLine)
Figure 3. Commit Ratio, Commit Rate and Number of Stalled Threads of Some TM Applications
found Probe more favourable than Throttle, as well
r concurrency control policies. We have also found
s are sensitive to the cache sharing, and refined our
ion accordingly.
we may consider new adaptive scheduling policies
International Symposium on Computer Architecture, pages 289–300,
1993.
[10] M. Herlihy, V. Luchangco, and M. Moir. Obstruction free synchroniza-
tion: Double ended queues as an example. In Proceedings of the 23rd
International Conference on Distributed Computing Systems, pages
throttle2 probe2 basic yoo shrink
kmeans-2
16 threads
yada
8 threads
100
75
50
0
25
100
75
50
0
25
680K
510K
340K
0
170K
1.6M
1.2M
800K
0
400K
CommitRt(GreenDottedLine)/%
CommitRate(RedSolidLine)/
Figure 3. Commit Ratio, Commit Rate and Number of Stalled Threads of Some TM Applications
We have found Probe more favourable than Throttle, as well
as two other concurrency control policies. We have also found
our solutions are sensitive to the cache sharing, and refined our
implementation accordingly.
In future we may consider new adaptive scheduling policies
International Symposium on Computer Architecture, pa
1993.
[10] M. Herlihy, V. Luchangco, and M. Moir. Obstruction free
tion: Double ended queues as an example. In Proceedin
International Conference on Distributed Computing S
dolev
103. Conclusions
• Trend of multicore urges us to write parallel computer programs
• Software transactional memory is part of future computation
‣ easier to program, less errors, neat code
‣ but it needs concurrency control for the best performance
• Ratio-based vs rate-based concurrency heuristics
‣ ratio-based heuristics are inexact approximations
‣ watching ratio only causes over-reaction / over-relaxation
• Rate-based concurrency heuristics, Probe, outperforms
25