SlideShare a Scribd company logo
1 of 17
Download to read offline
Elastic multicore scheduling with
the XiTAO runtime
Jing Chen, Pirah Noor, Mustafa Abduljabbar,
Miquel Pericàs
Chalmers University of Technology
Embedded Multicore Programming -
Industrial state-of-the-art and future directions
Edinburgh, April 17th
, 2019
22/01/2019 HiPEAC CSW Spring 2019 2
Heterogeneous-Parallel Platforms
Heterogeneity + Parallelism common in embedded platforms
●
Power-efficiency, battery-constrained devices
●
Examples:
– ARM big.LITTLE
– Nvidia Jetson TX2 (Denver2/A57/Pascal)
– Dynamic heterogeneity: DVFS, interference, cache
partitioning
HiKEY 960 Nvidia Jetson TX2
04/25/19 CSW Spring 2019 3
Heterogeneity as a dynamic property
Heterogeneity: cores in the system have different performance,
energy-efficiency etc.
Two types of heterogeneity: static and dynamic
●
Static:
– big.LITTLE, CPU-GPU
●
Dynamic:
– DVFS, cache partitioning, interference
– Interference:
●
Intra-process: cache, memory oversubscription
●
Inter-process: cache, memory, processor timesharing
●
Heterogeneity needs to be addressed dynamically by the
runtime!
22/01/2019 HiPEAC CSW Spring 2019 4
EU LEGaTO Project
• Create software stack-support for energy-
efficient heterogeneous computing
22/01/2019 HiPEAC CSW Spring 2019 5
EU LEGaTO Project
XiTAO
22/01/2019 HiPEAC CSW Spring 2019 6

Many applications can be expressed as mixed mode parallel
applications := external task parallelism + internal data parallelism

Naturally supports hierarchy/heterogeneity in modern architectures

Challenge: how to schedule? how many resources?
Mixed-mode parallelism
#pragma omp parallel for...
can be generalized to other
forms of parallelism!
22/01/2019 HiPEAC CSW Spring 2019 7

Improves Parallel Slackness

Bulk creation of parallelism
(low overhead)

Interference-avoidance

Constructive sharing
XiTAO mixed-mode runtime
1.Schedule external task parallelism via work stealing + locally
expand internal parallel tasks across multiple cores
2.Reduce inter-task interference by decoupling internal parallelism
from resources: Task Assembly Objects (TAO)
22/01/2019 HiPEAC CSW Spring 2019 8
XiTAO application
●
Example of 2D stencil execution on XiTAO
w=2
w=1
Application
22/01/2019 HiPEAC CSW Spring 2019 9
Elastic Places: Adaptivity
●
Example: Cilksort reduction on 48 cores. Dynamically resize places
as external parallelism decreases and TAO working set increases
●
Each colored box is a resource container, executing one TAO
Quick generation of parallelism, low overheads and good
isolation + constructive sharing
22/01/2019 HiPEAC CSW Spring 2019 10
XiTAO implementation
Basic TAO
class (XiTAO)
User-level API
for defining TAOs
User-level API for
defining TAO-DAGs
+ locality-awareness
●
XiTAO is fully implemented in C++11
●
Decentralized design targeting scalability
XiTAO API
22/01/2019 HiPEAC CSW Spring 2019 11
critical
path
internal DAG
fixed resource
container (cores, caches, ...)
Task Assembly Object (TAO)external
task
DAG
Heterogeneous scheduling
Main Idea: map only those tasks to high performance cores that
benefit due to criticality or due to performance characteristics
Faster Cores Slower Cores
Heterogeneous Platforms:
HiKEY 960,
Nvidia Jetson TX2
PTT
schedule
Performance Monitor
“Performance Trace Table”
22/01/2019 HiPEAC CSW Spring 2019 12
Performance Trace Table (PTT)
• Function: record the running time of each core in each resource
width;
• Aim: which is the best core and the best width to execute in the
available resources, efficiently resource usage;
• Implementation: table of size core_number * resource_width
1 PTT for each task type (in XiTAO: for each TAO type)
Resource width := number of cores that execute a TAO
22/01/2019 HiPEAC CSW Spring 2019 13
Random DAGs
250 500 1000 2000 4000
Task Number
16
8
4
2
1
Parallelism
500
750
1000
1250
1500
Throughput(TAOs/s)
250 500 1000 2000 4000
Task Number
16
8
4
2
1
Parallelism
500
750
1000
1250
1500
Throughput(TAOs/s)
Performance-based SchedulerPerformance-based Scheduler
(PTT-based)(PTT-based)
Homogeneous SchedulerHomogeneous Scheduler
(random work stealing)(random work stealing)
average DAG parallelism
throughput (performance)

Runtime assessment of resource partitions +
criticality-aware scheduling
22/01/2019 HiPEAC CSW Spring 2019 14
0 2 4 6 8 10 12 14
Elapsed Time [s]
0
1
2
3
4
5
6
7
8
9
Thread
8
10
12
14
16
18
20
PTTValue[ms]
Interference-awareness

Detects interference episodes and migrates critical tasks
tasks with multiple resources critical task schedules
interference episode PTT evolution for core=0 & width=1
●
Porting VGG-16 in Darknet framework to XiTAO
Current directions: VGG-16
maxpool
CONV3-64
CONV3-64
maxpool
CONV3-128
CONV3-128
maxpool
CONV3-256
CONV3-256
CONV3-256
CONV3-512
CONV3-512
CONV3-512
CONV3-512
CONV3-512
CONV3-512
FC-4096
FC-4096
FC-1000
maxpool
maxpool
GEMM
GEMM
GEMM
GEMM
GEMM
GEMM
GEMM
GEMM
GEMM
GEMM
GEMM
GEMM
GEMM
GEMM
maxpool
softmax
GEMM
GEMM
 TAO 0  TAO 1  TAO N.....
XiTAO
●
PTT automatically finds best widths to execute
VGG-16 on the dual-socket Intel platform (20 cores)
69,06
90,89
66,67
53,81
30,94
5,83
3,38
1,68
3,28
0,74
14,76
29,21
29,31
0,45
0
20
40
60
80
100
2 4 8 16
PercentageofTAOsw.r.t
TAO-width
Number of threads
1
2
4
8
16
22/01/2019 HiPEAC CSW Spring 2019 16
Future Directions
●
Front-ends for XiTAO
– OmpSs to XiTAO
– Array (tensor) programming
●
Low-energy runtime optimizations
●
Automatic DAG partitioning for generation of
mixed-mode computations
22/01/2019 HiPEAC CSW Spring 2019 17
Thank you!
Acknowledgements:
The XiTAO team
Jing Chen Pirah Noor Mustafa Abduljabbar Miquel Pericàs

More Related Content

What's hot

SRDS18: Security, Performance and Energy Trade-offs of Hardware-assisted Memo...
SRDS18: Security, Performance and Energy Trade-offs of Hardware-assisted Memo...SRDS18: Security, Performance and Energy Trade-offs of Hardware-assisted Memo...
SRDS18: Security, Performance and Energy Trade-offs of Hardware-assisted Memo...LEGATO project
 
Greenplum for Kubernetes - Greenplum Summit 2019
Greenplum for Kubernetes - Greenplum Summit 2019Greenplum for Kubernetes - Greenplum Summit 2019
Greenplum for Kubernetes - Greenplum Summit 2019VMware Tanzu
 
HPC Midlands - Supercomputing for Research and Industry (Hartree Centre prese...
HPC Midlands - Supercomputing for Research and Industry (Hartree Centre prese...HPC Midlands - Supercomputing for Research and Industry (Hartree Centre prese...
HPC Midlands - Supercomputing for Research and Industry (Hartree Centre prese...Martin Hamilton
 
KDB database (EPAM tech talks, Sofia, April, 2015)
KDB database (EPAM tech talks, Sofia, April, 2015)KDB database (EPAM tech talks, Sofia, April, 2015)
KDB database (EPAM tech talks, Sofia, April, 2015)Martin Toshev
 
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facilityinside-BigData.com
 
Scrooge Attack: Undervolting ARM Processors for Profit
Scrooge Attack: Undervolting ARM Processors for ProfitScrooge Attack: Undervolting ARM Processors for Profit
Scrooge Attack: Undervolting ARM Processors for ProfitLEGATO project
 
HPC Midlands - E.ON Supercomputing Case Study
HPC Midlands - E.ON Supercomputing Case StudyHPC Midlands - E.ON Supercomputing Case Study
HPC Midlands - E.ON Supercomputing Case StudyMartin Hamilton
 
OpenPOWER Application Optimisation meet up
OpenPOWER Application Optimisation meet up OpenPOWER Application Optimisation meet up
OpenPOWER Application Optimisation meet up Ganesan Narayanasamy
 
HPC Midlands Launch - Introduction to HPC Midlands
HPC Midlands Launch - Introduction to HPC MidlandsHPC Midlands Launch - Introduction to HPC Midlands
HPC Midlands Launch - Introduction to HPC MidlandsMartin Hamilton
 
OPTIMIZING THE TICK STACK
OPTIMIZING THE TICK STACKOPTIMIZING THE TICK STACK
OPTIMIZING THE TICK STACKInfluxData
 

What's hot (10)

SRDS18: Security, Performance and Energy Trade-offs of Hardware-assisted Memo...
SRDS18: Security, Performance and Energy Trade-offs of Hardware-assisted Memo...SRDS18: Security, Performance and Energy Trade-offs of Hardware-assisted Memo...
SRDS18: Security, Performance and Energy Trade-offs of Hardware-assisted Memo...
 
Greenplum for Kubernetes - Greenplum Summit 2019
Greenplum for Kubernetes - Greenplum Summit 2019Greenplum for Kubernetes - Greenplum Summit 2019
Greenplum for Kubernetes - Greenplum Summit 2019
 
HPC Midlands - Supercomputing for Research and Industry (Hartree Centre prese...
HPC Midlands - Supercomputing for Research and Industry (Hartree Centre prese...HPC Midlands - Supercomputing for Research and Industry (Hartree Centre prese...
HPC Midlands - Supercomputing for Research and Industry (Hartree Centre prese...
 
KDB database (EPAM tech talks, Sofia, April, 2015)
KDB database (EPAM tech talks, Sofia, April, 2015)KDB database (EPAM tech talks, Sofia, April, 2015)
KDB database (EPAM tech talks, Sofia, April, 2015)
 
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
 
Scrooge Attack: Undervolting ARM Processors for Profit
Scrooge Attack: Undervolting ARM Processors for ProfitScrooge Attack: Undervolting ARM Processors for Profit
Scrooge Attack: Undervolting ARM Processors for Profit
 
HPC Midlands - E.ON Supercomputing Case Study
HPC Midlands - E.ON Supercomputing Case StudyHPC Midlands - E.ON Supercomputing Case Study
HPC Midlands - E.ON Supercomputing Case Study
 
OpenPOWER Application Optimisation meet up
OpenPOWER Application Optimisation meet up OpenPOWER Application Optimisation meet up
OpenPOWER Application Optimisation meet up
 
HPC Midlands Launch - Introduction to HPC Midlands
HPC Midlands Launch - Introduction to HPC MidlandsHPC Midlands Launch - Introduction to HPC Midlands
HPC Midlands Launch - Introduction to HPC Midlands
 
OPTIMIZING THE TICK STACK
OPTIMIZING THE TICK STACKOPTIMIZING THE TICK STACK
OPTIMIZING THE TICK STACK
 

Similar to Elastic multicore scheduling with the XiTAO runtime

LEGaTO: Software Stack Runtimes
LEGaTO: Software Stack RuntimesLEGaTO: Software Stack Runtimes
LEGaTO: Software Stack RuntimesLEGATO project
 
DATE 2020: Design, Automation and Test in Europe Conference
DATE 2020: Design, Automation and Test in Europe ConferenceDATE 2020: Design, Automation and Test in Europe Conference
DATE 2020: Design, Automation and Test in Europe ConferenceLEGATO project
 
[EWiLi2016] Towards a performance-aware power capping orchestrator for the Xe...
[EWiLi2016] Towards a performance-aware power capping orchestrator for the Xe...[EWiLi2016] Towards a performance-aware power capping orchestrator for the Xe...
[EWiLi2016] Towards a performance-aware power capping orchestrator for the Xe...Matteo Ferroni
 
HiPEAC 2020: Energy-aware Task Scheduling in LEGaTO: Low Energy Toolset for H...
HiPEAC 2020: Energy-aware Task Scheduling in LEGaTO: Low Energy Toolset for H...HiPEAC 2020: Energy-aware Task Scheduling in LEGaTO: Low Energy Toolset for H...
HiPEAC 2020: Energy-aware Task Scheduling in LEGaTO: Low Energy Toolset for H...LEGATO project
 
PADAL19: Runtime-Assisted Locality Abstraction Using Elastic Places and Virtu...
PADAL19: Runtime-Assisted Locality Abstraction Using Elastic Places and Virtu...PADAL19: Runtime-Assisted Locality Abstraction Using Elastic Places and Virtu...
PADAL19: Runtime-Assisted Locality Abstraction Using Elastic Places and Virtu...LEGATO project
 
SAMOS 2018: LEGaTO: first steps towards energy-efficient toolset for heteroge...
SAMOS 2018: LEGaTO: first steps towards energy-efficient toolset for heteroge...SAMOS 2018: LEGaTO: first steps towards energy-efficient toolset for heteroge...
SAMOS 2018: LEGaTO: first steps towards energy-efficient toolset for heteroge...LEGATO project
 
Red Hat Storage: Emerging Use Cases
Red Hat Storage: Emerging Use CasesRed Hat Storage: Emerging Use Cases
Red Hat Storage: Emerging Use CasesRed_Hat_Storage
 
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AI
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AIArm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AI
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AIinside-BigData.com
 
Opportunities of ML-based data analytics in ABCI
Opportunities of ML-based data analytics in ABCIOpportunities of ML-based data analytics in ABCI
Opportunities of ML-based data analytics in ABCIRyousei Takano
 
TWISummit 2019 - Return of Reconfigurable Computing
TWISummit 2019 - Return of Reconfigurable ComputingTWISummit 2019 - Return of Reconfigurable Computing
TWISummit 2019 - Return of Reconfigurable ComputingThoughtworks
 
Lecture_IIITD.pptx
Lecture_IIITD.pptxLecture_IIITD.pptx
Lecture_IIITD.pptxachakracu
 
The CAOS framework: Democratize the acceleration of compute intensive applica...
The CAOS framework: Democratize the acceleration of compute intensive applica...The CAOS framework: Democratize the acceleration of compute intensive applica...
The CAOS framework: Democratize the acceleration of compute intensive applica...NECST Lab @ Politecnico di Milano
 
Proof of Concept for Hadoop: storage and analytics of electrical time-series
Proof of Concept for Hadoop: storage and analytics of electrical time-seriesProof of Concept for Hadoop: storage and analytics of electrical time-series
Proof of Concept for Hadoop: storage and analytics of electrical time-seriesDataWorks Summit
 
FUNDAMENTALS OF COMPUTER DESIGN
FUNDAMENTALS OF COMPUTER DESIGNFUNDAMENTALS OF COMPUTER DESIGN
FUNDAMENTALS OF COMPUTER DESIGNvenkatraman227
 
electronics-11-03883.pdf
electronics-11-03883.pdfelectronics-11-03883.pdf
electronics-11-03883.pdfRioCarthiis
 
MIG 5th Data Centre Summit 2016 PTS Presentation v1
MIG 5th Data Centre Summit 2016 PTS Presentation v1MIG 5th Data Centre Summit 2016 PTS Presentation v1
MIG 5th Data Centre Summit 2016 PTS Presentation v1blewington
 
Panel: NRP Science Impacts​
Panel: NRP Science Impacts​Panel: NRP Science Impacts​
Panel: NRP Science Impacts​Larry Smarr
 
Data Plane Evolution: Towards Openness and Flexibility
Data Plane Evolution: Towards Openness and FlexibilityData Plane Evolution: Towards Openness and Flexibility
Data Plane Evolution: Towards Openness and FlexibilityAPNIC
 
Automatically partitioning packet processing applications for pipelined archi...
Automatically partitioning packet processing applications for pipelined archi...Automatically partitioning packet processing applications for pipelined archi...
Automatically partitioning packet processing applications for pipelined archi...Ashley Carter
 

Similar to Elastic multicore scheduling with the XiTAO runtime (20)

LEGaTO: Software Stack Runtimes
LEGaTO: Software Stack RuntimesLEGaTO: Software Stack Runtimes
LEGaTO: Software Stack Runtimes
 
DATE 2020: Design, Automation and Test in Europe Conference
DATE 2020: Design, Automation and Test in Europe ConferenceDATE 2020: Design, Automation and Test in Europe Conference
DATE 2020: Design, Automation and Test in Europe Conference
 
[EWiLi2016] Towards a performance-aware power capping orchestrator for the Xe...
[EWiLi2016] Towards a performance-aware power capping orchestrator for the Xe...[EWiLi2016] Towards a performance-aware power capping orchestrator for the Xe...
[EWiLi2016] Towards a performance-aware power capping orchestrator for the Xe...
 
HiPEAC 2020: Energy-aware Task Scheduling in LEGaTO: Low Energy Toolset for H...
HiPEAC 2020: Energy-aware Task Scheduling in LEGaTO: Low Energy Toolset for H...HiPEAC 2020: Energy-aware Task Scheduling in LEGaTO: Low Energy Toolset for H...
HiPEAC 2020: Energy-aware Task Scheduling in LEGaTO: Low Energy Toolset for H...
 
PADAL19: Runtime-Assisted Locality Abstraction Using Elastic Places and Virtu...
PADAL19: Runtime-Assisted Locality Abstraction Using Elastic Places and Virtu...PADAL19: Runtime-Assisted Locality Abstraction Using Elastic Places and Virtu...
PADAL19: Runtime-Assisted Locality Abstraction Using Elastic Places and Virtu...
 
NWU and HPC
NWU and HPCNWU and HPC
NWU and HPC
 
SAMOS 2018: LEGaTO: first steps towards energy-efficient toolset for heteroge...
SAMOS 2018: LEGaTO: first steps towards energy-efficient toolset for heteroge...SAMOS 2018: LEGaTO: first steps towards energy-efficient toolset for heteroge...
SAMOS 2018: LEGaTO: first steps towards energy-efficient toolset for heteroge...
 
Red Hat Storage: Emerging Use Cases
Red Hat Storage: Emerging Use CasesRed Hat Storage: Emerging Use Cases
Red Hat Storage: Emerging Use Cases
 
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AI
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AIArm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AI
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AI
 
Opportunities of ML-based data analytics in ABCI
Opportunities of ML-based data analytics in ABCIOpportunities of ML-based data analytics in ABCI
Opportunities of ML-based data analytics in ABCI
 
TWISummit 2019 - Return of Reconfigurable Computing
TWISummit 2019 - Return of Reconfigurable ComputingTWISummit 2019 - Return of Reconfigurable Computing
TWISummit 2019 - Return of Reconfigurable Computing
 
Lecture_IIITD.pptx
Lecture_IIITD.pptxLecture_IIITD.pptx
Lecture_IIITD.pptx
 
The CAOS framework: Democratize the acceleration of compute intensive applica...
The CAOS framework: Democratize the acceleration of compute intensive applica...The CAOS framework: Democratize the acceleration of compute intensive applica...
The CAOS framework: Democratize the acceleration of compute intensive applica...
 
Proof of Concept for Hadoop: storage and analytics of electrical time-series
Proof of Concept for Hadoop: storage and analytics of electrical time-seriesProof of Concept for Hadoop: storage and analytics of electrical time-series
Proof of Concept for Hadoop: storage and analytics of electrical time-series
 
FUNDAMENTALS OF COMPUTER DESIGN
FUNDAMENTALS OF COMPUTER DESIGNFUNDAMENTALS OF COMPUTER DESIGN
FUNDAMENTALS OF COMPUTER DESIGN
 
electronics-11-03883.pdf
electronics-11-03883.pdfelectronics-11-03883.pdf
electronics-11-03883.pdf
 
MIG 5th Data Centre Summit 2016 PTS Presentation v1
MIG 5th Data Centre Summit 2016 PTS Presentation v1MIG 5th Data Centre Summit 2016 PTS Presentation v1
MIG 5th Data Centre Summit 2016 PTS Presentation v1
 
Panel: NRP Science Impacts​
Panel: NRP Science Impacts​Panel: NRP Science Impacts​
Panel: NRP Science Impacts​
 
Data Plane Evolution: Towards Openness and Flexibility
Data Plane Evolution: Towards Openness and FlexibilityData Plane Evolution: Towards Openness and Flexibility
Data Plane Evolution: Towards Openness and Flexibility
 
Automatically partitioning packet processing applications for pipelined archi...
Automatically partitioning packet processing applications for pipelined archi...Automatically partitioning packet processing applications for pipelined archi...
Automatically partitioning packet processing applications for pipelined archi...
 

Recently uploaded

why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...aditisharan08
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfPower Karaoke
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsMehedi Hasan Shohan
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyFrank van der Linden
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 

Recently uploaded (20)

why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdf
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software Solutions
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The Ugly
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 

Elastic multicore scheduling with the XiTAO runtime

  • 1. Elastic multicore scheduling with the XiTAO runtime Jing Chen, Pirah Noor, Mustafa Abduljabbar, Miquel Pericàs Chalmers University of Technology Embedded Multicore Programming - Industrial state-of-the-art and future directions Edinburgh, April 17th , 2019
  • 2. 22/01/2019 HiPEAC CSW Spring 2019 2 Heterogeneous-Parallel Platforms Heterogeneity + Parallelism common in embedded platforms ● Power-efficiency, battery-constrained devices ● Examples: – ARM big.LITTLE – Nvidia Jetson TX2 (Denver2/A57/Pascal) – Dynamic heterogeneity: DVFS, interference, cache partitioning HiKEY 960 Nvidia Jetson TX2
  • 3. 04/25/19 CSW Spring 2019 3 Heterogeneity as a dynamic property Heterogeneity: cores in the system have different performance, energy-efficiency etc. Two types of heterogeneity: static and dynamic ● Static: – big.LITTLE, CPU-GPU ● Dynamic: – DVFS, cache partitioning, interference – Interference: ● Intra-process: cache, memory oversubscription ● Inter-process: cache, memory, processor timesharing ● Heterogeneity needs to be addressed dynamically by the runtime!
  • 4. 22/01/2019 HiPEAC CSW Spring 2019 4 EU LEGaTO Project • Create software stack-support for energy- efficient heterogeneous computing
  • 5. 22/01/2019 HiPEAC CSW Spring 2019 5 EU LEGaTO Project XiTAO
  • 6. 22/01/2019 HiPEAC CSW Spring 2019 6  Many applications can be expressed as mixed mode parallel applications := external task parallelism + internal data parallelism  Naturally supports hierarchy/heterogeneity in modern architectures  Challenge: how to schedule? how many resources? Mixed-mode parallelism #pragma omp parallel for... can be generalized to other forms of parallelism!
  • 7. 22/01/2019 HiPEAC CSW Spring 2019 7  Improves Parallel Slackness  Bulk creation of parallelism (low overhead)  Interference-avoidance  Constructive sharing XiTAO mixed-mode runtime 1.Schedule external task parallelism via work stealing + locally expand internal parallel tasks across multiple cores 2.Reduce inter-task interference by decoupling internal parallelism from resources: Task Assembly Objects (TAO)
  • 8. 22/01/2019 HiPEAC CSW Spring 2019 8 XiTAO application ● Example of 2D stencil execution on XiTAO w=2 w=1 Application
  • 9. 22/01/2019 HiPEAC CSW Spring 2019 9 Elastic Places: Adaptivity ● Example: Cilksort reduction on 48 cores. Dynamically resize places as external parallelism decreases and TAO working set increases ● Each colored box is a resource container, executing one TAO Quick generation of parallelism, low overheads and good isolation + constructive sharing
  • 10. 22/01/2019 HiPEAC CSW Spring 2019 10 XiTAO implementation Basic TAO class (XiTAO) User-level API for defining TAOs User-level API for defining TAO-DAGs + locality-awareness ● XiTAO is fully implemented in C++11 ● Decentralized design targeting scalability XiTAO API
  • 11. 22/01/2019 HiPEAC CSW Spring 2019 11 critical path internal DAG fixed resource container (cores, caches, ...) Task Assembly Object (TAO)external task DAG Heterogeneous scheduling Main Idea: map only those tasks to high performance cores that benefit due to criticality or due to performance characteristics Faster Cores Slower Cores Heterogeneous Platforms: HiKEY 960, Nvidia Jetson TX2 PTT schedule Performance Monitor “Performance Trace Table”
  • 12. 22/01/2019 HiPEAC CSW Spring 2019 12 Performance Trace Table (PTT) • Function: record the running time of each core in each resource width; • Aim: which is the best core and the best width to execute in the available resources, efficiently resource usage; • Implementation: table of size core_number * resource_width 1 PTT for each task type (in XiTAO: for each TAO type) Resource width := number of cores that execute a TAO
  • 13. 22/01/2019 HiPEAC CSW Spring 2019 13 Random DAGs 250 500 1000 2000 4000 Task Number 16 8 4 2 1 Parallelism 500 750 1000 1250 1500 Throughput(TAOs/s) 250 500 1000 2000 4000 Task Number 16 8 4 2 1 Parallelism 500 750 1000 1250 1500 Throughput(TAOs/s) Performance-based SchedulerPerformance-based Scheduler (PTT-based)(PTT-based) Homogeneous SchedulerHomogeneous Scheduler (random work stealing)(random work stealing) average DAG parallelism throughput (performance)  Runtime assessment of resource partitions + criticality-aware scheduling
  • 14. 22/01/2019 HiPEAC CSW Spring 2019 14 0 2 4 6 8 10 12 14 Elapsed Time [s] 0 1 2 3 4 5 6 7 8 9 Thread 8 10 12 14 16 18 20 PTTValue[ms] Interference-awareness  Detects interference episodes and migrates critical tasks tasks with multiple resources critical task schedules interference episode PTT evolution for core=0 & width=1
  • 15. ● Porting VGG-16 in Darknet framework to XiTAO Current directions: VGG-16 maxpool CONV3-64 CONV3-64 maxpool CONV3-128 CONV3-128 maxpool CONV3-256 CONV3-256 CONV3-256 CONV3-512 CONV3-512 CONV3-512 CONV3-512 CONV3-512 CONV3-512 FC-4096 FC-4096 FC-1000 maxpool maxpool GEMM GEMM GEMM GEMM GEMM GEMM GEMM GEMM GEMM GEMM GEMM GEMM GEMM GEMM maxpool softmax GEMM GEMM  TAO 0  TAO 1  TAO N..... XiTAO ● PTT automatically finds best widths to execute VGG-16 on the dual-socket Intel platform (20 cores) 69,06 90,89 66,67 53,81 30,94 5,83 3,38 1,68 3,28 0,74 14,76 29,21 29,31 0,45 0 20 40 60 80 100 2 4 8 16 PercentageofTAOsw.r.t TAO-width Number of threads 1 2 4 8 16
  • 16. 22/01/2019 HiPEAC CSW Spring 2019 16 Future Directions ● Front-ends for XiTAO – OmpSs to XiTAO – Array (tensor) programming ● Low-energy runtime optimizations ● Automatic DAG partitioning for generation of mixed-mode computations
  • 17. 22/01/2019 HiPEAC CSW Spring 2019 17 Thank you! Acknowledgements: The XiTAO team Jing Chen Pirah Noor Mustafa Abduljabbar Miquel Pericàs