SlideShare a Scribd company logo
1 of 18
Remote Core Locking:
Migrating Critical-Section Execution
to Improve the Performance of Multithreaded Applications
Written by Jean-Pierre Lozi Florian David Gaël Thomas Julia Lawall Gilles Muller
Presented by Andrea Lombardo
Problem:
Some applications that work well on a small number of cores do not scale to the
number of cores available in today's multicore architectures.
Performance in lock algorithms is influenced:
1. access contention
• Solution: Reduce the number of threads that, simultaneously, require
the access to the critical section
2. cache misses
• Solution: improve locality
REMOTE CORE LOCKING 2
EXAMPLE OF THE PROBLEM
MEMCACHE IS AN EXAMPLE OF APPLICATION WHICH HAS
THIS PROBLEM WHERE WORKS IN BEST PERFORMANCE :
•FOR A GET OPERATION WITH 10 CORES
•FOR A SET OPERATION WITH 2 CORES
ONE OF THE BOTTLENECK OF THIS APPLICATION ARE
CRITICAL SECTIONS, WHICH THE INFORMATION SHOULD BE
ACCESSED IN ATOMIC WAY AND THEY’RE PROTECTED BY
LOCKS.
HIGH CONTENTION MEANS MORE PROCESSING TIME AND SO
IT ‘S MORE EXPENSIVE.
IT’S A PROBLEM WHEN THE NUMBER OF CORES START
INCREASING ..
REMOTE CORE LOCKING 3
Test system: Opteron 6172 with 48-core running at 3.0.0 Linux kernel with glibc 2.13
1)Time spent in critical section
2)number of cache miss
3)Others measurements.
RCL performance are better
than other lock algorithms in
the case of increasing number
of clients
Memcached application
performance:
-no Flat combining because it
periodically blocks on
condition variables, which Flat
Combining does not support.
REMOTE CORE LOCKING 4
Performance application
Other studies for optimizing the execution of critical sections on multicore
architectures
Software solution where the server is an ordinary client thread and the role of server is distributed
between client threads, approach produces overhead for the management of the server
Hardware-based solution whose introduces new instructions to perform the transfer of control,
and uses a special fast core to execute critical sections
Insert a fast transfer of control from other client cores to the server, to reduce access contention
Execution a succession of critical sections on a single server core to improve cache locality
In the last 20 years, several approaches have been
developped for optimizing critical sections execution
on multi-core architectures:
REMOTE CORE LOCKING 5
Real motivations of low performance of lock-based approaches:
1) Cache misses when execute critical section
2) Bus saturation caused by spinlock because induces frequent broadcast on bus.
RCL is introduced to address both issues simultaneously
Solution: Design better locks
REMOTE CORE LOCKING 6
RCL key features:
Goal:
Improve performance
execution of critical
section into legacy
applications that run on
top of multicore
architectures.
1
Developed entirely in
software on x86
architecture
2
Works better than other
kind of locks works better.
-POSIX
-CAS SPINLOCK
-MCS
-FLAT COMBINING
3
REMOTE CORE LOCKING 7
Replace the management
of critical section with an
optimized remote
procedure which call to a
dedicated server core.
Shared information in the
server core’s cache
No need to transfer data
between a core to another
core
How it works
REMOTE CORE LOCKING 8
Overview
Transfer the execution and management of the critical
section to a server core, choosen according through
profiler, the client with the most frequent lock usage
Client as an handler locks implemented as a remote
procedure calls of critical sections
REMOTE CORE LOCKING 9
Core algorithm
• The remote call is transformed into a clients and server communicate done through an array of
request structures of CL dimension which is unique for each server .
• C is the max number of clients
• L is the size of the hardware cache line and represents a request done by a client to the server
• Each request is mapped into a single cache line
Each request contain in order:
1. Address of the lock associated with the critical section
2. Address of the structure refered to the context
3. Address of the function that include critical section
1. Client has requested the access
2. NULL not request.
REMOTE CORE LOCKING 10
Server side
• A thread analyze all the request and wait
those that have an address refers to a
critical section.
• Iterate for each entry:
• If function value is an address and lock is free,
server thread acquires the lock and executes
the critical
• server reset the element
• resume the iteration.
• After writing the entry cache line with all the
informations
• it waits that the address of the function is point to NULL.
• In case the number of client is less than the number of
cores available: it’s used SSE3 monitor/mwait routine for
sleeping the client sleeps until the server answer.
Client side
REMOTE CORE LOCKING 11
Profiler: it’s developed by authors to detect the information locks:
Lock frequency usage
Time spent in critical section
These information are used for identify the core in which running the
server and locks need replacing from POSIX to RCL
A tool Coccinelle used for transforming critical section to remote
procedure call.
Critical section looks like separate functions:
PROBLEMS:
Shared variable
Additional elements:
REMOTE CORE LOCKING 12
Implementation of the RCL Runtime(supported by Posix thread)
The runtime ensure responsiveness and liveness respectively avoiding the block of thread at OS
level or inversion priority and managing at run time a pool of threads for each server :
-if the servicing thread is blocked/waited, replace it with another in the pool.
The management thread used for management the pool of threads:
- Highest priority
- Check the progress threads every time is woken up
1) modify the priority
2) nothing change
The backup thread used when all threads are blocked at OS so it woke up the management thread.
1) The runtime implement a POSIX FIFO scheduling policy to execute a thread until blocked by
processor:
1.1) could induce priority inversion between threads
2)Reduce the delay minimizing the length FIFO queue
REMOTE CORE LOCKING 13
There’re situations to avoid which generate a deadlock because the server is unable to execute critical section
of other locks. Core algorithm is applied to a thread and it requires that the thread is never blocked at the OS
level and never spin into a waitloop. Now we focus on runtime RCL liveness and responsiveness: different
situation.
The thread could be blocked at the OS level
The thread could spin if the critical section try to acquire a spinlock
The thread could be preempted at the OS level
• Critical sections every time is executed in all cores, execept one that manages the lifecyle of the thread
• Vary the degree of contention on the lock by varying the delay between the execution of the critical
section
• Locality of the critical section varying the number of shared cache lines each one accesses.
• Cache access line are not pipelined: construct the address of the next memory access from the previously
read value.
REMOTE CORE LOCKING 14
Comparison when varying degree contention
average of 30 runs
False serialization
• For adapting Berkley DB application to the usage of RCL you need to allocate the
2 most used lock and then other 9. All 11 locks should be implemented as RCLs on
the same server. Their critical sections (refer to 11 locks) are artificially serialized
• Now we focus the impact of the serialization with two metrics:
• Use rate:
• The use rate measures the server workload.
• False serialization rate:
• The false serialization rate a ratio of the number of
iterations over the request array
• It’s important how change the rate between one or 2 different server:
• High rate with 1 and elimination of false serialization and increasing throughput of an amount 50 %
REMOTE CORE LOCKING 15
Analysis of performance
• Execution time incurred when each critical section accesses
5 cache lines.
• The average number of L2 cache
misses(top)
• The average execution time (bottom)
• When a critical section over 5000
iterations when critical sections access
one shared cache line
REMOTE CORE LOCKING 16
Conclusion
• Rcl is techinque focus on reducing lock acquisition time and improving execution
speed of critical sections through increased data locality and the migration of
execution to the server core.
• RCL powerful is when an application relies on highly conteded locks
REMOTE CORE LOCKING 17
Future work
DESIGN NEW APPLICATION WITH THESE
STRATEGIES
CONSIDER THE DESIGN AND THE
IMPLEMENTATION OF AN ADAPTIVE
RCL RUNTIME.
SYSTEM ABLE TO DYNAMICALLY
SWITCH BETWEEN LOCKING
STRATEGIES
CAPABILITY TO MIGRATE LOCKS
BETWEEN MULTIPLE SERVERS FOR
BALANCING DYNAMICALLY THE LOAD
AND AVOID FALSE SERIALIZATION.
REMOTE CORE LOCKING 18

More Related Content

What's hot

Troubleshooting Tracebacks
Troubleshooting TracebacksTroubleshooting Tracebacks
Troubleshooting TracebacksJames Denton
 
VLIW(Very Long Instruction Word)
VLIW(Very Long Instruction Word)VLIW(Very Long Instruction Word)
VLIW(Very Long Instruction Word)Pragnya Dash
 
Interface between kernel and user space
Interface between kernel and user spaceInterface between kernel and user space
Interface between kernel and user spaceSusant Sahani
 
remote procedure calls
  remote procedure calls  remote procedure calls
remote procedure callsAshish Kumar
 
Superscalar Architecture_AIUB
Superscalar Architecture_AIUBSuperscalar Architecture_AIUB
Superscalar Architecture_AIUBNusrat Mary
 
IBM MQ - Comparing Distributed and z/OS platforms
IBM MQ - Comparing Distributed and z/OS platformsIBM MQ - Comparing Distributed and z/OS platforms
IBM MQ - Comparing Distributed and z/OS platformsMarkTaylorIBM
 
Apache Kafka Reliability Guarantees StrataHadoop NYC 2015
Apache Kafka Reliability Guarantees StrataHadoop NYC 2015 Apache Kafka Reliability Guarantees StrataHadoop NYC 2015
Apache Kafka Reliability Guarantees StrataHadoop NYC 2015 Jeff Holoman
 
Superscalar & superpipeline processor
Superscalar & superpipeline processorSuperscalar & superpipeline processor
Superscalar & superpipeline processorMuhammad Ishaq
 
Performance Enhancement with Pipelining
Performance Enhancement with PipeliningPerformance Enhancement with Pipelining
Performance Enhancement with PipeliningAneesh Raveendran
 
2014 OpenStack Summit - Neutron OVS to LinuxBridge Migration
2014 OpenStack Summit - Neutron OVS to LinuxBridge Migration2014 OpenStack Summit - Neutron OVS to LinuxBridge Migration
2014 OpenStack Summit - Neutron OVS to LinuxBridge MigrationJames Denton
 
RPC communication,thread and processes
RPC communication,thread and processesRPC communication,thread and processes
RPC communication,thread and processesshraddha mane
 
Veryx Launches Virtual Service Assurance Using Intel® Xeon® Scalable Processors
Veryx Launches Virtual Service Assurance Using Intel® Xeon® Scalable ProcessorsVeryx Launches Virtual Service Assurance Using Intel® Xeon® Scalable Processors
Veryx Launches Virtual Service Assurance Using Intel® Xeon® Scalable ProcessorsSelvaraj Balasubramanian
 
Remote Procedure Call (RPC) Server creation semantics & call semantics
Remote Procedure Call (RPC) Server creation semantics & call semanticsRemote Procedure Call (RPC) Server creation semantics & call semantics
Remote Procedure Call (RPC) Server creation semantics & call semanticssvm
 
Lac2006 Lee Revell Slides
Lac2006 Lee Revell SlidesLac2006 Lee Revell Slides
Lac2006 Lee Revell Slidesrlrevell
 
Training Slides: Basics 107: Simple Tungsten Replicator Installation to Extra...
Training Slides: Basics 107: Simple Tungsten Replicator Installation to Extra...Training Slides: Basics 107: Simple Tungsten Replicator Installation to Extra...
Training Slides: Basics 107: Simple Tungsten Replicator Installation to Extra...Continuent
 

What's hot (20)

Superscalar processors
Superscalar processorsSuperscalar processors
Superscalar processors
 
Troubleshooting Tracebacks
Troubleshooting TracebacksTroubleshooting Tracebacks
Troubleshooting Tracebacks
 
Lec1 final
Lec1 finalLec1 final
Lec1 final
 
VLIW(Very Long Instruction Word)
VLIW(Very Long Instruction Word)VLIW(Very Long Instruction Word)
VLIW(Very Long Instruction Word)
 
Interface between kernel and user space
Interface between kernel and user spaceInterface between kernel and user space
Interface between kernel and user space
 
remote procedure calls
  remote procedure calls  remote procedure calls
remote procedure calls
 
VLIW Processors
VLIW ProcessorsVLIW Processors
VLIW Processors
 
message passing
 message passing message passing
message passing
 
Superscalar Architecture_AIUB
Superscalar Architecture_AIUBSuperscalar Architecture_AIUB
Superscalar Architecture_AIUB
 
IBM MQ - Comparing Distributed and z/OS platforms
IBM MQ - Comparing Distributed and z/OS platformsIBM MQ - Comparing Distributed and z/OS platforms
IBM MQ - Comparing Distributed and z/OS platforms
 
Apache Kafka Reliability Guarantees StrataHadoop NYC 2015
Apache Kafka Reliability Guarantees StrataHadoop NYC 2015 Apache Kafka Reliability Guarantees StrataHadoop NYC 2015
Apache Kafka Reliability Guarantees StrataHadoop NYC 2015
 
Demystifying openvswitch
Demystifying openvswitchDemystifying openvswitch
Demystifying openvswitch
 
Superscalar & superpipeline processor
Superscalar & superpipeline processorSuperscalar & superpipeline processor
Superscalar & superpipeline processor
 
Performance Enhancement with Pipelining
Performance Enhancement with PipeliningPerformance Enhancement with Pipelining
Performance Enhancement with Pipelining
 
2014 OpenStack Summit - Neutron OVS to LinuxBridge Migration
2014 OpenStack Summit - Neutron OVS to LinuxBridge Migration2014 OpenStack Summit - Neutron OVS to LinuxBridge Migration
2014 OpenStack Summit - Neutron OVS to LinuxBridge Migration
 
RPC communication,thread and processes
RPC communication,thread and processesRPC communication,thread and processes
RPC communication,thread and processes
 
Veryx Launches Virtual Service Assurance Using Intel® Xeon® Scalable Processors
Veryx Launches Virtual Service Assurance Using Intel® Xeon® Scalable ProcessorsVeryx Launches Virtual Service Assurance Using Intel® Xeon® Scalable Processors
Veryx Launches Virtual Service Assurance Using Intel® Xeon® Scalable Processors
 
Remote Procedure Call (RPC) Server creation semantics & call semantics
Remote Procedure Call (RPC) Server creation semantics & call semanticsRemote Procedure Call (RPC) Server creation semantics & call semantics
Remote Procedure Call (RPC) Server creation semantics & call semantics
 
Lac2006 Lee Revell Slides
Lac2006 Lee Revell SlidesLac2006 Lee Revell Slides
Lac2006 Lee Revell Slides
 
Training Slides: Basics 107: Simple Tungsten Replicator Installation to Extra...
Training Slides: Basics 107: Simple Tungsten Replicator Installation to Extra...Training Slides: Basics 107: Simple Tungsten Replicator Installation to Extra...
Training Slides: Basics 107: Simple Tungsten Replicator Installation to Extra...
 

Similar to Remote core locking-Andrea Lombardo

Webinar Slides: Tungsten Connector / Proxy – The Secret Sauce Behind Zero-Dow...
Webinar Slides: Tungsten Connector / Proxy – The Secret Sauce Behind Zero-Dow...Webinar Slides: Tungsten Connector / Proxy – The Secret Sauce Behind Zero-Dow...
Webinar Slides: Tungsten Connector / Proxy – The Secret Sauce Behind Zero-Dow...Continuent
 
Linux kernel development_ch9-10_20120410
Linux kernel development_ch9-10_20120410Linux kernel development_ch9-10_20120410
Linux kernel development_ch9-10_20120410huangachou
 
Linux kernel development chapter 10
Linux kernel development chapter 10Linux kernel development chapter 10
Linux kernel development chapter 10huangachou
 
SOC System Design Approach
SOC System Design ApproachSOC System Design Approach
SOC System Design ApproachA B Shinde
 
VMworld 2014: Extreme Performance Series
VMworld 2014: Extreme Performance Series VMworld 2014: Extreme Performance Series
VMworld 2014: Extreme Performance Series VMworld
 
Module2 MultiThreads.ppt
Module2 MultiThreads.pptModule2 MultiThreads.ppt
Module2 MultiThreads.pptshreesha16
 
Multithreaded processors ppt
Multithreaded processors pptMultithreaded processors ppt
Multithreaded processors pptSiddhartha Anand
 
Motivation for multithreaded architectures
Motivation for multithreaded architecturesMotivation for multithreaded architectures
Motivation for multithreaded architecturesYoung Alista
 
Reduced instruction set computers
Reduced instruction set computersReduced instruction set computers
Reduced instruction set computersSyed Zaid Irshad
 
Advanced processor principles
Advanced processor principlesAdvanced processor principles
Advanced processor principlesDhaval Bagal
 
Network architecure (3).pptx
Network architecure (3).pptxNetwork architecure (3).pptx
Network architecure (3).pptxKaythry P
 
Infinit's Next Generation Key-value Store - Julien Quintard and Quentin Hocqu...
Infinit's Next Generation Key-value Store - Julien Quintard and Quentin Hocqu...Infinit's Next Generation Key-value Store - Julien Quintard and Quentin Hocqu...
Infinit's Next Generation Key-value Store - Julien Quintard and Quentin Hocqu...Docker, Inc.
 
Introduction to netlink in linux kernel (english)
Introduction to netlink in linux kernel (english)Introduction to netlink in linux kernel (english)
Introduction to netlink in linux kernel (english)Sneeker Yeh
 
Cosmos DB at VLDB 2019
Cosmos DB at VLDB 2019Cosmos DB at VLDB 2019
Cosmos DB at VLDB 2019Dharma Shukla
 
Xen and-the-art-of-rails-deployment2640
Xen and-the-art-of-rails-deployment2640Xen and-the-art-of-rails-deployment2640
Xen and-the-art-of-rails-deployment2640Newlink
 
Xen and-the-art-of-rails-deployment2640
Xen and-the-art-of-rails-deployment2640Xen and-the-art-of-rails-deployment2640
Xen and-the-art-of-rails-deployment2640Newlink
 

Similar to Remote core locking-Andrea Lombardo (20)

Webinar Slides: Tungsten Connector / Proxy – The Secret Sauce Behind Zero-Dow...
Webinar Slides: Tungsten Connector / Proxy – The Secret Sauce Behind Zero-Dow...Webinar Slides: Tungsten Connector / Proxy – The Secret Sauce Behind Zero-Dow...
Webinar Slides: Tungsten Connector / Proxy – The Secret Sauce Behind Zero-Dow...
 
Linux kernel development_ch9-10_20120410
Linux kernel development_ch9-10_20120410Linux kernel development_ch9-10_20120410
Linux kernel development_ch9-10_20120410
 
Linux kernel development chapter 10
Linux kernel development chapter 10Linux kernel development chapter 10
Linux kernel development chapter 10
 
SOC System Design Approach
SOC System Design ApproachSOC System Design Approach
SOC System Design Approach
 
VMworld 2014: Extreme Performance Series
VMworld 2014: Extreme Performance Series VMworld 2014: Extreme Performance Series
VMworld 2014: Extreme Performance Series
 
Module2 MultiThreads.ppt
Module2 MultiThreads.pptModule2 MultiThreads.ppt
Module2 MultiThreads.ppt
 
Chap2 slides
Chap2 slidesChap2 slides
Chap2 slides
 
Multithreaded processors ppt
Multithreaded processors pptMultithreaded processors ppt
Multithreaded processors ppt
 
Remote core locking (rcl)
Remote core locking (rcl)Remote core locking (rcl)
Remote core locking (rcl)
 
Factored Operating Systems paper review
Factored Operating Systems paper reviewFactored Operating Systems paper review
Factored Operating Systems paper review
 
Motivation for multithreaded architectures
Motivation for multithreaded architecturesMotivation for multithreaded architectures
Motivation for multithreaded architectures
 
Reduced instruction set computers
Reduced instruction set computersReduced instruction set computers
Reduced instruction set computers
 
Advanced processor principles
Advanced processor principlesAdvanced processor principles
Advanced processor principles
 
Network architecure (3).pptx
Network architecure (3).pptxNetwork architecure (3).pptx
Network architecure (3).pptx
 
Difficulties in Pipelining
Difficulties in PipeliningDifficulties in Pipelining
Difficulties in Pipelining
 
Infinit's Next Generation Key-value Store - Julien Quintard and Quentin Hocqu...
Infinit's Next Generation Key-value Store - Julien Quintard and Quentin Hocqu...Infinit's Next Generation Key-value Store - Julien Quintard and Quentin Hocqu...
Infinit's Next Generation Key-value Store - Julien Quintard and Quentin Hocqu...
 
Introduction to netlink in linux kernel (english)
Introduction to netlink in linux kernel (english)Introduction to netlink in linux kernel (english)
Introduction to netlink in linux kernel (english)
 
Cosmos DB at VLDB 2019
Cosmos DB at VLDB 2019Cosmos DB at VLDB 2019
Cosmos DB at VLDB 2019
 
Xen and-the-art-of-rails-deployment2640
Xen and-the-art-of-rails-deployment2640Xen and-the-art-of-rails-deployment2640
Xen and-the-art-of-rails-deployment2640
 
Xen and-the-art-of-rails-deployment2640
Xen and-the-art-of-rails-deployment2640Xen and-the-art-of-rails-deployment2640
Xen and-the-art-of-rails-deployment2640
 

Recently uploaded

HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 
High Profile Call Girls Dahisar Arpita 9907093804 Independent Escort Service ...
High Profile Call Girls Dahisar Arpita 9907093804 Independent Escort Service ...High Profile Call Girls Dahisar Arpita 9907093804 Independent Escort Service ...
High Profile Call Girls Dahisar Arpita 9907093804 Independent Escort Service ...Call girls in Ahmedabad High profile
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...RajaP95
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxthe ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxhumanexperienceaaa
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 

Recently uploaded (20)

HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 
High Profile Call Girls Dahisar Arpita 9907093804 Independent Escort Service ...
High Profile Call Girls Dahisar Arpita 9907093804 Independent Escort Service ...High Profile Call Girls Dahisar Arpita 9907093804 Independent Escort Service ...
High Profile Call Girls Dahisar Arpita 9907093804 Independent Escort Service ...
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxthe ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 

Remote core locking-Andrea Lombardo

  • 1. Remote Core Locking: Migrating Critical-Section Execution to Improve the Performance of Multithreaded Applications Written by Jean-Pierre Lozi Florian David Gaël Thomas Julia Lawall Gilles Muller Presented by Andrea Lombardo
  • 2. Problem: Some applications that work well on a small number of cores do not scale to the number of cores available in today's multicore architectures. Performance in lock algorithms is influenced: 1. access contention • Solution: Reduce the number of threads that, simultaneously, require the access to the critical section 2. cache misses • Solution: improve locality REMOTE CORE LOCKING 2
  • 3. EXAMPLE OF THE PROBLEM MEMCACHE IS AN EXAMPLE OF APPLICATION WHICH HAS THIS PROBLEM WHERE WORKS IN BEST PERFORMANCE : •FOR A GET OPERATION WITH 10 CORES •FOR A SET OPERATION WITH 2 CORES ONE OF THE BOTTLENECK OF THIS APPLICATION ARE CRITICAL SECTIONS, WHICH THE INFORMATION SHOULD BE ACCESSED IN ATOMIC WAY AND THEY’RE PROTECTED BY LOCKS. HIGH CONTENTION MEANS MORE PROCESSING TIME AND SO IT ‘S MORE EXPENSIVE. IT’S A PROBLEM WHEN THE NUMBER OF CORES START INCREASING .. REMOTE CORE LOCKING 3 Test system: Opteron 6172 with 48-core running at 3.0.0 Linux kernel with glibc 2.13
  • 4. 1)Time spent in critical section 2)number of cache miss 3)Others measurements. RCL performance are better than other lock algorithms in the case of increasing number of clients Memcached application performance: -no Flat combining because it periodically blocks on condition variables, which Flat Combining does not support. REMOTE CORE LOCKING 4 Performance application
  • 5. Other studies for optimizing the execution of critical sections on multicore architectures Software solution where the server is an ordinary client thread and the role of server is distributed between client threads, approach produces overhead for the management of the server Hardware-based solution whose introduces new instructions to perform the transfer of control, and uses a special fast core to execute critical sections Insert a fast transfer of control from other client cores to the server, to reduce access contention Execution a succession of critical sections on a single server core to improve cache locality In the last 20 years, several approaches have been developped for optimizing critical sections execution on multi-core architectures: REMOTE CORE LOCKING 5
  • 6. Real motivations of low performance of lock-based approaches: 1) Cache misses when execute critical section 2) Bus saturation caused by spinlock because induces frequent broadcast on bus. RCL is introduced to address both issues simultaneously Solution: Design better locks REMOTE CORE LOCKING 6
  • 7. RCL key features: Goal: Improve performance execution of critical section into legacy applications that run on top of multicore architectures. 1 Developed entirely in software on x86 architecture 2 Works better than other kind of locks works better. -POSIX -CAS SPINLOCK -MCS -FLAT COMBINING 3 REMOTE CORE LOCKING 7
  • 8. Replace the management of critical section with an optimized remote procedure which call to a dedicated server core. Shared information in the server core’s cache No need to transfer data between a core to another core How it works REMOTE CORE LOCKING 8
  • 9. Overview Transfer the execution and management of the critical section to a server core, choosen according through profiler, the client with the most frequent lock usage Client as an handler locks implemented as a remote procedure calls of critical sections REMOTE CORE LOCKING 9
  • 10. Core algorithm • The remote call is transformed into a clients and server communicate done through an array of request structures of CL dimension which is unique for each server . • C is the max number of clients • L is the size of the hardware cache line and represents a request done by a client to the server • Each request is mapped into a single cache line Each request contain in order: 1. Address of the lock associated with the critical section 2. Address of the structure refered to the context 3. Address of the function that include critical section 1. Client has requested the access 2. NULL not request. REMOTE CORE LOCKING 10
  • 11. Server side • A thread analyze all the request and wait those that have an address refers to a critical section. • Iterate for each entry: • If function value is an address and lock is free, server thread acquires the lock and executes the critical • server reset the element • resume the iteration. • After writing the entry cache line with all the informations • it waits that the address of the function is point to NULL. • In case the number of client is less than the number of cores available: it’s used SSE3 monitor/mwait routine for sleeping the client sleeps until the server answer. Client side REMOTE CORE LOCKING 11
  • 12. Profiler: it’s developed by authors to detect the information locks: Lock frequency usage Time spent in critical section These information are used for identify the core in which running the server and locks need replacing from POSIX to RCL A tool Coccinelle used for transforming critical section to remote procedure call. Critical section looks like separate functions: PROBLEMS: Shared variable Additional elements: REMOTE CORE LOCKING 12
  • 13. Implementation of the RCL Runtime(supported by Posix thread) The runtime ensure responsiveness and liveness respectively avoiding the block of thread at OS level or inversion priority and managing at run time a pool of threads for each server : -if the servicing thread is blocked/waited, replace it with another in the pool. The management thread used for management the pool of threads: - Highest priority - Check the progress threads every time is woken up 1) modify the priority 2) nothing change The backup thread used when all threads are blocked at OS so it woke up the management thread. 1) The runtime implement a POSIX FIFO scheduling policy to execute a thread until blocked by processor: 1.1) could induce priority inversion between threads 2)Reduce the delay minimizing the length FIFO queue REMOTE CORE LOCKING 13 There’re situations to avoid which generate a deadlock because the server is unable to execute critical section of other locks. Core algorithm is applied to a thread and it requires that the thread is never blocked at the OS level and never spin into a waitloop. Now we focus on runtime RCL liveness and responsiveness: different situation. The thread could be blocked at the OS level The thread could spin if the critical section try to acquire a spinlock The thread could be preempted at the OS level
  • 14. • Critical sections every time is executed in all cores, execept one that manages the lifecyle of the thread • Vary the degree of contention on the lock by varying the delay between the execution of the critical section • Locality of the critical section varying the number of shared cache lines each one accesses. • Cache access line are not pipelined: construct the address of the next memory access from the previously read value. REMOTE CORE LOCKING 14 Comparison when varying degree contention average of 30 runs
  • 15. False serialization • For adapting Berkley DB application to the usage of RCL you need to allocate the 2 most used lock and then other 9. All 11 locks should be implemented as RCLs on the same server. Their critical sections (refer to 11 locks) are artificially serialized • Now we focus the impact of the serialization with two metrics: • Use rate: • The use rate measures the server workload. • False serialization rate: • The false serialization rate a ratio of the number of iterations over the request array • It’s important how change the rate between one or 2 different server: • High rate with 1 and elimination of false serialization and increasing throughput of an amount 50 % REMOTE CORE LOCKING 15
  • 16. Analysis of performance • Execution time incurred when each critical section accesses 5 cache lines. • The average number of L2 cache misses(top) • The average execution time (bottom) • When a critical section over 5000 iterations when critical sections access one shared cache line REMOTE CORE LOCKING 16
  • 17. Conclusion • Rcl is techinque focus on reducing lock acquisition time and improving execution speed of critical sections through increased data locality and the migration of execution to the server core. • RCL powerful is when an application relies on highly conteded locks REMOTE CORE LOCKING 17
  • 18. Future work DESIGN NEW APPLICATION WITH THESE STRATEGIES CONSIDER THE DESIGN AND THE IMPLEMENTATION OF AN ADAPTIVE RCL RUNTIME. SYSTEM ABLE TO DYNAMICALLY SWITCH BETWEEN LOCKING STRATEGIES CAPABILITY TO MIGRATE LOCKS BETWEEN MULTIPLE SERVERS FOR BALANCING DYNAMICALLY THE LOAD AND AVOID FALSE SERIALIZATION. REMOTE CORE LOCKING 18