SlideShare a Scribd company logo
1 of 17
Download to read offline
Elastic multicore scheduling with
the XiTAO runtime
Jing Chen, Pirah Noor, Mustafa Abduljabbar,
Miquel Pericàs
Chalmers University of Technology
Embedded Multicore Programming -
Industrial state-of-the-art and future directions
Edinburgh, April 17th
, 2019
22/01/2019 HiPEAC CSW Spring 2019 2
Heterogeneous-Parallel Platforms
Heterogeneity + Parallelism common in embedded platforms
●
Power-efficiency, battery-constrained devices
●
Examples:
– ARM big.LITTLE
– Nvidia Jetson TX2 (Denver2/A57/Pascal)
– Dynamic heterogeneity: DVFS, interference, cache
partitioning
HiKEY 960 Nvidia Jetson TX2
04/25/19 CSW Spring 2019 3
Heterogeneity as a dynamic property
Heterogeneity: cores in the system have different performance,
energy-efficiency etc.
Two types of heterogeneity: static and dynamic
●
Static:
– big.LITTLE, CPU-GPU
●
Dynamic:
– DVFS, cache partitioning, interference
– Interference:
●
Intra-process: cache, memory oversubscription
●
Inter-process: cache, memory, processor timesharing
●
Heterogeneity needs to be addressed dynamically by the
runtime!
22/01/2019 HiPEAC CSW Spring 2019 4
EU LEGaTO Project
• Create software stack-support for energy-
efficient heterogeneous computing
22/01/2019 HiPEAC CSW Spring 2019 5
EU LEGaTO Project
XiTAO
22/01/2019 HiPEAC CSW Spring 2019 6

Many applications can be expressed as mixed mode parallel
applications := external task parallelism + internal data parallelism

Naturally supports hierarchy/heterogeneity in modern architectures

Challenge: how to schedule? how many resources?
Mixed-mode parallelism
#pragma omp parallel for...
can be generalized to other
forms of parallelism!
22/01/2019 HiPEAC CSW Spring 2019 7

Improves Parallel Slackness

Bulk creation of parallelism
(low overhead)

Interference-avoidance

Constructive sharing
XiTAO mixed-mode runtime
1.Schedule external task parallelism via work stealing + locally
expand internal parallel tasks across multiple cores
2.Reduce inter-task interference by decoupling internal parallelism
from resources: Task Assembly Objects (TAO)
22/01/2019 HiPEAC CSW Spring 2019 8
XiTAO application
●
Example of 2D stencil execution on XiTAO
w=2
w=1
Application
22/01/2019 HiPEAC CSW Spring 2019 9
Elastic Places: Adaptivity
●
Example: Cilksort reduction on 48 cores. Dynamically resize places
as external parallelism decreases and TAO working set increases
●
Each colored box is a resource container, executing one TAO
Quick generation of parallelism, low overheads and good
isolation + constructive sharing
22/01/2019 HiPEAC CSW Spring 2019 10
XiTAO implementation
Basic TAO
class (XiTAO)
User-level API
for defining TAOs
User-level API for
defining TAO-DAGs
+ locality-awareness
●
XiTAO is fully implemented in C++11
●
Decentralized design targeting scalability
XiTAO API
22/01/2019 HiPEAC CSW Spring 2019 11
critical
path
internal DAG
fixed resource
container (cores, caches, ...)
Task Assembly Object (TAO)external
task
DAG
Heterogeneous scheduling
Main Idea: map only those tasks to high performance cores that
benefit due to criticality or due to performance characteristics
Faster Cores Slower Cores
Heterogeneous Platforms:
HiKEY 960,
Nvidia Jetson TX2
PTT
schedule
Performance Monitor
“Performance Trace Table”
22/01/2019 HiPEAC CSW Spring 2019 12
Performance Trace Table (PTT)
• Function: record the running time of each core in each resource
width;
• Aim: which is the best core and the best width to execute in the
available resources, efficiently resource usage;
• Implementation: table of size core_number * resource_width
1 PTT for each task type (in XiTAO: for each TAO type)
Resource width := number of cores that execute a TAO
22/01/2019 HiPEAC CSW Spring 2019 13
Random DAGs
250 500 1000 2000 4000
Task Number
16
8
4
2
1
Parallelism
500
750
1000
1250
1500
Throughput(TAOs/s)
250 500 1000 2000 4000
Task Number
16
8
4
2
1
Parallelism
500
750
1000
1250
1500
Throughput(TAOs/s)
Performance-based SchedulerPerformance-based Scheduler
(PTT-based)(PTT-based)
Homogeneous SchedulerHomogeneous Scheduler
(random work stealing)(random work stealing)
average DAG parallelism
throughput (performance)

Runtime assessment of resource partitions +
criticality-aware scheduling
22/01/2019 HiPEAC CSW Spring 2019 14
0 2 4 6 8 10 12 14
Elapsed Time [s]
0
1
2
3
4
5
6
7
8
9
Thread
8
10
12
14
16
18
20
PTTValue[ms]
Interference-awareness

Detects interference episodes and migrates critical tasks
tasks with multiple resources critical task schedules
interference episode PTT evolution for core=0 & width=1
●
Porting VGG-16 in Darknet framework to XiTAO
Current directions: VGG-16
maxpool
CONV3-64
CONV3-64
maxpool
CONV3-128
CONV3-128
maxpool
CONV3-256
CONV3-256
CONV3-256
CONV3-512
CONV3-512
CONV3-512
CONV3-512
CONV3-512
CONV3-512
FC-4096
FC-4096
FC-1000
maxpool
maxpool
GEMM
GEMM
GEMM
GEMM
GEMM
GEMM
GEMM
GEMM
GEMM
GEMM
GEMM
GEMM
GEMM
GEMM
maxpool
softmax
GEMM
GEMM
 TAO 0  TAO 1  TAO N.....
XiTAO
●
PTT automatically finds best widths to execute
VGG-16 on the dual-socket Intel platform (20 cores)
69,06
90,89
66,67
53,81
30,94
5,83
3,38
1,68
3,28
0,74
14,76
29,21
29,31
0,45
0
20
40
60
80
100
2 4 8 16
PercentageofTAOsw.r.t
TAO-width
Number of threads
1
2
4
8
16
22/01/2019 HiPEAC CSW Spring 2019 16
Future Directions
●
Front-ends for XiTAO
– OmpSs to XiTAO
– Array (tensor) programming
●
Low-energy runtime optimizations
●
Automatic DAG partitioning for generation of
mixed-mode computations
22/01/2019 HiPEAC CSW Spring 2019 17
Thank you!
Acknowledgements:
The XiTAO team
Jing Chen Pirah Noor Mustafa Abduljabbar Miquel Pericàs

More Related Content

What's hot

SRDS18: Security, Performance and Energy Trade-offs of Hardware-assisted Memo...
SRDS18: Security, Performance and Energy Trade-offs of Hardware-assisted Memo...SRDS18: Security, Performance and Energy Trade-offs of Hardware-assisted Memo...
SRDS18: Security, Performance and Energy Trade-offs of Hardware-assisted Memo...LEGATO project
 
Greenplum for Kubernetes - Greenplum Summit 2019
Greenplum for Kubernetes - Greenplum Summit 2019Greenplum for Kubernetes - Greenplum Summit 2019
Greenplum for Kubernetes - Greenplum Summit 2019VMware Tanzu
 
HPC Midlands - Supercomputing for Research and Industry (Hartree Centre prese...
HPC Midlands - Supercomputing for Research and Industry (Hartree Centre prese...HPC Midlands - Supercomputing for Research and Industry (Hartree Centre prese...
HPC Midlands - Supercomputing for Research and Industry (Hartree Centre prese...Martin Hamilton
 
KDB database (EPAM tech talks, Sofia, April, 2015)
KDB database (EPAM tech talks, Sofia, April, 2015)KDB database (EPAM tech talks, Sofia, April, 2015)
KDB database (EPAM tech talks, Sofia, April, 2015)Martin Toshev
 
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facilityinside-BigData.com
 
Scrooge Attack: Undervolting ARM Processors for Profit
Scrooge Attack: Undervolting ARM Processors for ProfitScrooge Attack: Undervolting ARM Processors for Profit
Scrooge Attack: Undervolting ARM Processors for ProfitLEGATO project
 
HPC Midlands - E.ON Supercomputing Case Study
HPC Midlands - E.ON Supercomputing Case StudyHPC Midlands - E.ON Supercomputing Case Study
HPC Midlands - E.ON Supercomputing Case StudyMartin Hamilton
 
OpenPOWER Application Optimisation meet up
OpenPOWER Application Optimisation meet up OpenPOWER Application Optimisation meet up
OpenPOWER Application Optimisation meet up Ganesan Narayanasamy
 
HPC Midlands Launch - Introduction to HPC Midlands
HPC Midlands Launch - Introduction to HPC MidlandsHPC Midlands Launch - Introduction to HPC Midlands
HPC Midlands Launch - Introduction to HPC MidlandsMartin Hamilton
 
OPTIMIZING THE TICK STACK
OPTIMIZING THE TICK STACKOPTIMIZING THE TICK STACK
OPTIMIZING THE TICK STACKInfluxData
 

What's hot (10)

SRDS18: Security, Performance and Energy Trade-offs of Hardware-assisted Memo...
SRDS18: Security, Performance and Energy Trade-offs of Hardware-assisted Memo...SRDS18: Security, Performance and Energy Trade-offs of Hardware-assisted Memo...
SRDS18: Security, Performance and Energy Trade-offs of Hardware-assisted Memo...
 
Greenplum for Kubernetes - Greenplum Summit 2019
Greenplum for Kubernetes - Greenplum Summit 2019Greenplum for Kubernetes - Greenplum Summit 2019
Greenplum for Kubernetes - Greenplum Summit 2019
 
HPC Midlands - Supercomputing for Research and Industry (Hartree Centre prese...
HPC Midlands - Supercomputing for Research and Industry (Hartree Centre prese...HPC Midlands - Supercomputing for Research and Industry (Hartree Centre prese...
HPC Midlands - Supercomputing for Research and Industry (Hartree Centre prese...
 
KDB database (EPAM tech talks, Sofia, April, 2015)
KDB database (EPAM tech talks, Sofia, April, 2015)KDB database (EPAM tech talks, Sofia, April, 2015)
KDB database (EPAM tech talks, Sofia, April, 2015)
 
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
 
Scrooge Attack: Undervolting ARM Processors for Profit
Scrooge Attack: Undervolting ARM Processors for ProfitScrooge Attack: Undervolting ARM Processors for Profit
Scrooge Attack: Undervolting ARM Processors for Profit
 
HPC Midlands - E.ON Supercomputing Case Study
HPC Midlands - E.ON Supercomputing Case StudyHPC Midlands - E.ON Supercomputing Case Study
HPC Midlands - E.ON Supercomputing Case Study
 
OpenPOWER Application Optimisation meet up
OpenPOWER Application Optimisation meet up OpenPOWER Application Optimisation meet up
OpenPOWER Application Optimisation meet up
 
HPC Midlands Launch - Introduction to HPC Midlands
HPC Midlands Launch - Introduction to HPC MidlandsHPC Midlands Launch - Introduction to HPC Midlands
HPC Midlands Launch - Introduction to HPC Midlands
 
OPTIMIZING THE TICK STACK
OPTIMIZING THE TICK STACKOPTIMIZING THE TICK STACK
OPTIMIZING THE TICK STACK
 

Similar to Elastic multicore scheduling with the XiTAO runtime

LEGaTO: Software Stack Runtimes
LEGaTO: Software Stack RuntimesLEGaTO: Software Stack Runtimes
LEGaTO: Software Stack RuntimesLEGATO project
 
DATE 2020: Design, Automation and Test in Europe Conference
DATE 2020: Design, Automation and Test in Europe ConferenceDATE 2020: Design, Automation and Test in Europe Conference
DATE 2020: Design, Automation and Test in Europe ConferenceLEGATO project
 
[EWiLi2016] Towards a performance-aware power capping orchestrator for the Xe...
[EWiLi2016] Towards a performance-aware power capping orchestrator for the Xe...[EWiLi2016] Towards a performance-aware power capping orchestrator for the Xe...
[EWiLi2016] Towards a performance-aware power capping orchestrator for the Xe...Matteo Ferroni
 
HiPEAC 2020: Energy-aware Task Scheduling in LEGaTO: Low Energy Toolset for H...
HiPEAC 2020: Energy-aware Task Scheduling in LEGaTO: Low Energy Toolset for H...HiPEAC 2020: Energy-aware Task Scheduling in LEGaTO: Low Energy Toolset for H...
HiPEAC 2020: Energy-aware Task Scheduling in LEGaTO: Low Energy Toolset for H...LEGATO project
 
PADAL19: Runtime-Assisted Locality Abstraction Using Elastic Places and Virtu...
PADAL19: Runtime-Assisted Locality Abstraction Using Elastic Places and Virtu...PADAL19: Runtime-Assisted Locality Abstraction Using Elastic Places and Virtu...
PADAL19: Runtime-Assisted Locality Abstraction Using Elastic Places and Virtu...LEGATO project
 
SAMOS 2018: LEGaTO: first steps towards energy-efficient toolset for heteroge...
SAMOS 2018: LEGaTO: first steps towards energy-efficient toolset for heteroge...SAMOS 2018: LEGaTO: first steps towards energy-efficient toolset for heteroge...
SAMOS 2018: LEGaTO: first steps towards energy-efficient toolset for heteroge...LEGATO project
 
Red Hat Storage: Emerging Use Cases
Red Hat Storage: Emerging Use CasesRed Hat Storage: Emerging Use Cases
Red Hat Storage: Emerging Use CasesRed_Hat_Storage
 
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AI
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AIArm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AI
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AIinside-BigData.com
 
Opportunities of ML-based data analytics in ABCI
Opportunities of ML-based data analytics in ABCIOpportunities of ML-based data analytics in ABCI
Opportunities of ML-based data analytics in ABCIRyousei Takano
 
TWISummit 2019 - Return of Reconfigurable Computing
TWISummit 2019 - Return of Reconfigurable ComputingTWISummit 2019 - Return of Reconfigurable Computing
TWISummit 2019 - Return of Reconfigurable ComputingThoughtworks
 
Lecture_IIITD.pptx
Lecture_IIITD.pptxLecture_IIITD.pptx
Lecture_IIITD.pptxachakracu
 
The CAOS framework: Democratize the acceleration of compute intensive applica...
The CAOS framework: Democratize the acceleration of compute intensive applica...The CAOS framework: Democratize the acceleration of compute intensive applica...
The CAOS framework: Democratize the acceleration of compute intensive applica...NECST Lab @ Politecnico di Milano
 
Proof of Concept for Hadoop: storage and analytics of electrical time-series
Proof of Concept for Hadoop: storage and analytics of electrical time-seriesProof of Concept for Hadoop: storage and analytics of electrical time-series
Proof of Concept for Hadoop: storage and analytics of electrical time-seriesDataWorks Summit
 
FUNDAMENTALS OF COMPUTER DESIGN
FUNDAMENTALS OF COMPUTER DESIGNFUNDAMENTALS OF COMPUTER DESIGN
FUNDAMENTALS OF COMPUTER DESIGNvenkatraman227
 
electronics-11-03883.pdf
electronics-11-03883.pdfelectronics-11-03883.pdf
electronics-11-03883.pdfRioCarthiis
 
MIG 5th Data Centre Summit 2016 PTS Presentation v1
MIG 5th Data Centre Summit 2016 PTS Presentation v1MIG 5th Data Centre Summit 2016 PTS Presentation v1
MIG 5th Data Centre Summit 2016 PTS Presentation v1blewington
 
Panel: NRP Science Impacts​
Panel: NRP Science Impacts​Panel: NRP Science Impacts​
Panel: NRP Science Impacts​Larry Smarr
 
Data Plane Evolution: Towards Openness and Flexibility
Data Plane Evolution: Towards Openness and FlexibilityData Plane Evolution: Towards Openness and Flexibility
Data Plane Evolution: Towards Openness and FlexibilityAPNIC
 
Automatically partitioning packet processing applications for pipelined archi...
Automatically partitioning packet processing applications for pipelined archi...Automatically partitioning packet processing applications for pipelined archi...
Automatically partitioning packet processing applications for pipelined archi...Ashley Carter
 

Similar to Elastic multicore scheduling with the XiTAO runtime (20)

LEGaTO: Software Stack Runtimes
LEGaTO: Software Stack RuntimesLEGaTO: Software Stack Runtimes
LEGaTO: Software Stack Runtimes
 
DATE 2020: Design, Automation and Test in Europe Conference
DATE 2020: Design, Automation and Test in Europe ConferenceDATE 2020: Design, Automation and Test in Europe Conference
DATE 2020: Design, Automation and Test in Europe Conference
 
[EWiLi2016] Towards a performance-aware power capping orchestrator for the Xe...
[EWiLi2016] Towards a performance-aware power capping orchestrator for the Xe...[EWiLi2016] Towards a performance-aware power capping orchestrator for the Xe...
[EWiLi2016] Towards a performance-aware power capping orchestrator for the Xe...
 
HiPEAC 2020: Energy-aware Task Scheduling in LEGaTO: Low Energy Toolset for H...
HiPEAC 2020: Energy-aware Task Scheduling in LEGaTO: Low Energy Toolset for H...HiPEAC 2020: Energy-aware Task Scheduling in LEGaTO: Low Energy Toolset for H...
HiPEAC 2020: Energy-aware Task Scheduling in LEGaTO: Low Energy Toolset for H...
 
PADAL19: Runtime-Assisted Locality Abstraction Using Elastic Places and Virtu...
PADAL19: Runtime-Assisted Locality Abstraction Using Elastic Places and Virtu...PADAL19: Runtime-Assisted Locality Abstraction Using Elastic Places and Virtu...
PADAL19: Runtime-Assisted Locality Abstraction Using Elastic Places and Virtu...
 
NWU and HPC
NWU and HPCNWU and HPC
NWU and HPC
 
SAMOS 2018: LEGaTO: first steps towards energy-efficient toolset for heteroge...
SAMOS 2018: LEGaTO: first steps towards energy-efficient toolset for heteroge...SAMOS 2018: LEGaTO: first steps towards energy-efficient toolset for heteroge...
SAMOS 2018: LEGaTO: first steps towards energy-efficient toolset for heteroge...
 
Red Hat Storage: Emerging Use Cases
Red Hat Storage: Emerging Use CasesRed Hat Storage: Emerging Use Cases
Red Hat Storage: Emerging Use Cases
 
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AI
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AIArm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AI
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AI
 
Opportunities of ML-based data analytics in ABCI
Opportunities of ML-based data analytics in ABCIOpportunities of ML-based data analytics in ABCI
Opportunities of ML-based data analytics in ABCI
 
TWISummit 2019 - Return of Reconfigurable Computing
TWISummit 2019 - Return of Reconfigurable ComputingTWISummit 2019 - Return of Reconfigurable Computing
TWISummit 2019 - Return of Reconfigurable Computing
 
Lecture_IIITD.pptx
Lecture_IIITD.pptxLecture_IIITD.pptx
Lecture_IIITD.pptx
 
The CAOS framework: Democratize the acceleration of compute intensive applica...
The CAOS framework: Democratize the acceleration of compute intensive applica...The CAOS framework: Democratize the acceleration of compute intensive applica...
The CAOS framework: Democratize the acceleration of compute intensive applica...
 
Proof of Concept for Hadoop: storage and analytics of electrical time-series
Proof of Concept for Hadoop: storage and analytics of electrical time-seriesProof of Concept for Hadoop: storage and analytics of electrical time-series
Proof of Concept for Hadoop: storage and analytics of electrical time-series
 
FUNDAMENTALS OF COMPUTER DESIGN
FUNDAMENTALS OF COMPUTER DESIGNFUNDAMENTALS OF COMPUTER DESIGN
FUNDAMENTALS OF COMPUTER DESIGN
 
electronics-11-03883.pdf
electronics-11-03883.pdfelectronics-11-03883.pdf
electronics-11-03883.pdf
 
MIG 5th Data Centre Summit 2016 PTS Presentation v1
MIG 5th Data Centre Summit 2016 PTS Presentation v1MIG 5th Data Centre Summit 2016 PTS Presentation v1
MIG 5th Data Centre Summit 2016 PTS Presentation v1
 
Panel: NRP Science Impacts​
Panel: NRP Science Impacts​Panel: NRP Science Impacts​
Panel: NRP Science Impacts​
 
Data Plane Evolution: Towards Openness and Flexibility
Data Plane Evolution: Towards Openness and FlexibilityData Plane Evolution: Towards Openness and Flexibility
Data Plane Evolution: Towards Openness and Flexibility
 
Automatically partitioning packet processing applications for pipelined archi...
Automatically partitioning packet processing applications for pipelined archi...Automatically partitioning packet processing applications for pipelined archi...
Automatically partitioning packet processing applications for pipelined archi...
 

More from LEGATO project

A practical approach for updating an integrity-enforced operating system
A practical approach for updating an integrity-enforced operating systemA practical approach for updating an integrity-enforced operating system
A practical approach for updating an integrity-enforced operating systemLEGATO project
 
TEEMon: A continuous performance monitoring framework for TEEs
TEEMon: A continuous performance monitoring framework for TEEsTEEMon: A continuous performance monitoring framework for TEEs
TEEMon: A continuous performance monitoring framework for TEEsLEGATO project
 
secureTF: A Secure TensorFlow Framework
secureTF: A Secure TensorFlow FrameworksecureTF: A Secure TensorFlow Framework
secureTF: A Secure TensorFlow FrameworkLEGATO project
 
PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...
PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...
PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...LEGATO project
 
LEGaTO: Machine Learning Use Case
LEGaTO: Machine Learning Use CaseLEGaTO: Machine Learning Use Case
LEGaTO: Machine Learning Use CaseLEGATO project
 
Smart Home AI at the edge
Smart Home AI at the edgeSmart Home AI at the edge
Smart Home AI at the edgeLEGATO project
 
LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the project
LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the projectLEGaTO: Low-Energy Heterogeneous Computing Use of AI in the project
LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the projectLEGATO project
 
LEGaTO: Software Stack Programming Models
LEGaTO: Software Stack Programming ModelsLEGaTO: Software Stack Programming Models
LEGaTO: Software Stack Programming ModelsLEGATO project
 
LEGaTO Heterogeneous Hardware
LEGaTO Heterogeneous HardwareLEGaTO Heterogeneous Hardware
LEGaTO Heterogeneous HardwareLEGATO project
 
LEGaTO: Low-Energy Heterogeneous Computing Workshop
LEGaTO: Low-Energy Heterogeneous Computing WorkshopLEGaTO: Low-Energy Heterogeneous Computing Workshop
LEGaTO: Low-Energy Heterogeneous Computing WorkshopLEGATO project
 
Infection Research with Maxeler Dataflow Computing
Infection Research with Maxeler Dataflow ComputingInfection Research with Maxeler Dataflow Computing
Infection Research with Maxeler Dataflow ComputingLEGATO project
 
Smart Home - AI at the edge
Smart Home - AI at the edgeSmart Home - AI at the edge
Smart Home - AI at the edgeLEGATO project
 
FPGA Undervolting and Checkpointing for Energy-Efficiency and Error-Resiliency
FPGA Undervolting and Checkpointing for Energy-Efficiency and Error-ResiliencyFPGA Undervolting and Checkpointing for Energy-Efficiency and Error-Resiliency
FPGA Undervolting and Checkpointing for Energy-Efficiency and Error-ResiliencyLEGATO project
 
Device Data Directory and Asynchronous execution: A path to heterogeneous com...
Device Data Directory and Asynchronous execution: A path to heterogeneous com...Device Data Directory and Asynchronous execution: A path to heterogeneous com...
Device Data Directory and Asynchronous execution: A path to heterogeneous com...LEGATO project
 
Scheduling Task-parallel Applications in Dynamically Asymmetric Environments
Scheduling Task-parallel Applications in Dynamically Asymmetric EnvironmentsScheduling Task-parallel Applications in Dynamically Asymmetric Environments
Scheduling Task-parallel Applications in Dynamically Asymmetric EnvironmentsLEGATO project
 
RECS – Cloud to Edge Microserver Platform for Energy-Efficient Computing
RECS – Cloud to Edge Microserver Platform for Energy-Efficient ComputingRECS – Cloud to Edge Microserver Platform for Energy-Efficient Computing
RECS – Cloud to Edge Microserver Platform for Energy-Efficient ComputingLEGATO project
 
Secure Task-Based Programming with OmpSs and SGX
Secure Task-Based Programming with OmpSs and SGXSecure Task-Based Programming with OmpSs and SGX
Secure Task-Based Programming with OmpSs and SGXLEGATO project
 
HiPerMAb: A statistical tool for judging the potential of short fat data
HiPerMAb: A statistical tool for judging the potential of short fat dataHiPerMAb: A statistical tool for judging the potential of short fat data
HiPerMAb: A statistical tool for judging the potential of short fat dataLEGATO project
 

More from LEGATO project (20)

A practical approach for updating an integrity-enforced operating system
A practical approach for updating an integrity-enforced operating systemA practical approach for updating an integrity-enforced operating system
A practical approach for updating an integrity-enforced operating system
 
TEEMon: A continuous performance monitoring framework for TEEs
TEEMon: A continuous performance monitoring framework for TEEsTEEMon: A continuous performance monitoring framework for TEEs
TEEMon: A continuous performance monitoring framework for TEEs
 
secureTF: A Secure TensorFlow Framework
secureTF: A Secure TensorFlow FrameworksecureTF: A Secure TensorFlow Framework
secureTF: A Secure TensorFlow Framework
 
PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...
PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...
PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...
 
LEGaTO: Machine Learning Use Case
LEGaTO: Machine Learning Use CaseLEGaTO: Machine Learning Use Case
LEGaTO: Machine Learning Use Case
 
Smart Home AI at the edge
Smart Home AI at the edgeSmart Home AI at the edge
Smart Home AI at the edge
 
LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the project
LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the projectLEGaTO: Low-Energy Heterogeneous Computing Use of AI in the project
LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the project
 
LEGaTO Integration
LEGaTO IntegrationLEGaTO Integration
LEGaTO Integration
 
LEGaTO: Use cases
LEGaTO: Use casesLEGaTO: Use cases
LEGaTO: Use cases
 
LEGaTO: Software Stack Programming Models
LEGaTO: Software Stack Programming ModelsLEGaTO: Software Stack Programming Models
LEGaTO: Software Stack Programming Models
 
LEGaTO Heterogeneous Hardware
LEGaTO Heterogeneous HardwareLEGaTO Heterogeneous Hardware
LEGaTO Heterogeneous Hardware
 
LEGaTO: Low-Energy Heterogeneous Computing Workshop
LEGaTO: Low-Energy Heterogeneous Computing WorkshopLEGaTO: Low-Energy Heterogeneous Computing Workshop
LEGaTO: Low-Energy Heterogeneous Computing Workshop
 
Infection Research with Maxeler Dataflow Computing
Infection Research with Maxeler Dataflow ComputingInfection Research with Maxeler Dataflow Computing
Infection Research with Maxeler Dataflow Computing
 
Smart Home - AI at the edge
Smart Home - AI at the edgeSmart Home - AI at the edge
Smart Home - AI at the edge
 
FPGA Undervolting and Checkpointing for Energy-Efficiency and Error-Resiliency
FPGA Undervolting and Checkpointing for Energy-Efficiency and Error-ResiliencyFPGA Undervolting and Checkpointing for Energy-Efficiency and Error-Resiliency
FPGA Undervolting and Checkpointing for Energy-Efficiency and Error-Resiliency
 
Device Data Directory and Asynchronous execution: A path to heterogeneous com...
Device Data Directory and Asynchronous execution: A path to heterogeneous com...Device Data Directory and Asynchronous execution: A path to heterogeneous com...
Device Data Directory and Asynchronous execution: A path to heterogeneous com...
 
Scheduling Task-parallel Applications in Dynamically Asymmetric Environments
Scheduling Task-parallel Applications in Dynamically Asymmetric EnvironmentsScheduling Task-parallel Applications in Dynamically Asymmetric Environments
Scheduling Task-parallel Applications in Dynamically Asymmetric Environments
 
RECS – Cloud to Edge Microserver Platform for Energy-Efficient Computing
RECS – Cloud to Edge Microserver Platform for Energy-Efficient ComputingRECS – Cloud to Edge Microserver Platform for Energy-Efficient Computing
RECS – Cloud to Edge Microserver Platform for Energy-Efficient Computing
 
Secure Task-Based Programming with OmpSs and SGX
Secure Task-Based Programming with OmpSs and SGXSecure Task-Based Programming with OmpSs and SGX
Secure Task-Based Programming with OmpSs and SGX
 
HiPerMAb: A statistical tool for judging the potential of short fat data
HiPerMAb: A statistical tool for judging the potential of short fat dataHiPerMAb: A statistical tool for judging the potential of short fat data
HiPerMAb: A statistical tool for judging the potential of short fat data
 

Recently uploaded

Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...apidays
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusZilliz
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 

Recently uploaded (20)

Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 

Elastic multicore scheduling with the XiTAO runtime

  • 1. Elastic multicore scheduling with the XiTAO runtime Jing Chen, Pirah Noor, Mustafa Abduljabbar, Miquel Pericàs Chalmers University of Technology Embedded Multicore Programming - Industrial state-of-the-art and future directions Edinburgh, April 17th , 2019
  • 2. 22/01/2019 HiPEAC CSW Spring 2019 2 Heterogeneous-Parallel Platforms Heterogeneity + Parallelism common in embedded platforms ● Power-efficiency, battery-constrained devices ● Examples: – ARM big.LITTLE – Nvidia Jetson TX2 (Denver2/A57/Pascal) – Dynamic heterogeneity: DVFS, interference, cache partitioning HiKEY 960 Nvidia Jetson TX2
  • 3. 04/25/19 CSW Spring 2019 3 Heterogeneity as a dynamic property Heterogeneity: cores in the system have different performance, energy-efficiency etc. Two types of heterogeneity: static and dynamic ● Static: – big.LITTLE, CPU-GPU ● Dynamic: – DVFS, cache partitioning, interference – Interference: ● Intra-process: cache, memory oversubscription ● Inter-process: cache, memory, processor timesharing ● Heterogeneity needs to be addressed dynamically by the runtime!
  • 4. 22/01/2019 HiPEAC CSW Spring 2019 4 EU LEGaTO Project • Create software stack-support for energy- efficient heterogeneous computing
  • 5. 22/01/2019 HiPEAC CSW Spring 2019 5 EU LEGaTO Project XiTAO
  • 6. 22/01/2019 HiPEAC CSW Spring 2019 6  Many applications can be expressed as mixed mode parallel applications := external task parallelism + internal data parallelism  Naturally supports hierarchy/heterogeneity in modern architectures  Challenge: how to schedule? how many resources? Mixed-mode parallelism #pragma omp parallel for... can be generalized to other forms of parallelism!
  • 7. 22/01/2019 HiPEAC CSW Spring 2019 7  Improves Parallel Slackness  Bulk creation of parallelism (low overhead)  Interference-avoidance  Constructive sharing XiTAO mixed-mode runtime 1.Schedule external task parallelism via work stealing + locally expand internal parallel tasks across multiple cores 2.Reduce inter-task interference by decoupling internal parallelism from resources: Task Assembly Objects (TAO)
  • 8. 22/01/2019 HiPEAC CSW Spring 2019 8 XiTAO application ● Example of 2D stencil execution on XiTAO w=2 w=1 Application
  • 9. 22/01/2019 HiPEAC CSW Spring 2019 9 Elastic Places: Adaptivity ● Example: Cilksort reduction on 48 cores. Dynamically resize places as external parallelism decreases and TAO working set increases ● Each colored box is a resource container, executing one TAO Quick generation of parallelism, low overheads and good isolation + constructive sharing
  • 10. 22/01/2019 HiPEAC CSW Spring 2019 10 XiTAO implementation Basic TAO class (XiTAO) User-level API for defining TAOs User-level API for defining TAO-DAGs + locality-awareness ● XiTAO is fully implemented in C++11 ● Decentralized design targeting scalability XiTAO API
  • 11. 22/01/2019 HiPEAC CSW Spring 2019 11 critical path internal DAG fixed resource container (cores, caches, ...) Task Assembly Object (TAO)external task DAG Heterogeneous scheduling Main Idea: map only those tasks to high performance cores that benefit due to criticality or due to performance characteristics Faster Cores Slower Cores Heterogeneous Platforms: HiKEY 960, Nvidia Jetson TX2 PTT schedule Performance Monitor “Performance Trace Table”
  • 12. 22/01/2019 HiPEAC CSW Spring 2019 12 Performance Trace Table (PTT) • Function: record the running time of each core in each resource width; • Aim: which is the best core and the best width to execute in the available resources, efficiently resource usage; • Implementation: table of size core_number * resource_width 1 PTT for each task type (in XiTAO: for each TAO type) Resource width := number of cores that execute a TAO
  • 13. 22/01/2019 HiPEAC CSW Spring 2019 13 Random DAGs 250 500 1000 2000 4000 Task Number 16 8 4 2 1 Parallelism 500 750 1000 1250 1500 Throughput(TAOs/s) 250 500 1000 2000 4000 Task Number 16 8 4 2 1 Parallelism 500 750 1000 1250 1500 Throughput(TAOs/s) Performance-based SchedulerPerformance-based Scheduler (PTT-based)(PTT-based) Homogeneous SchedulerHomogeneous Scheduler (random work stealing)(random work stealing) average DAG parallelism throughput (performance)  Runtime assessment of resource partitions + criticality-aware scheduling
  • 14. 22/01/2019 HiPEAC CSW Spring 2019 14 0 2 4 6 8 10 12 14 Elapsed Time [s] 0 1 2 3 4 5 6 7 8 9 Thread 8 10 12 14 16 18 20 PTTValue[ms] Interference-awareness  Detects interference episodes and migrates critical tasks tasks with multiple resources critical task schedules interference episode PTT evolution for core=0 & width=1
  • 15. ● Porting VGG-16 in Darknet framework to XiTAO Current directions: VGG-16 maxpool CONV3-64 CONV3-64 maxpool CONV3-128 CONV3-128 maxpool CONV3-256 CONV3-256 CONV3-256 CONV3-512 CONV3-512 CONV3-512 CONV3-512 CONV3-512 CONV3-512 FC-4096 FC-4096 FC-1000 maxpool maxpool GEMM GEMM GEMM GEMM GEMM GEMM GEMM GEMM GEMM GEMM GEMM GEMM GEMM GEMM maxpool softmax GEMM GEMM  TAO 0  TAO 1  TAO N..... XiTAO ● PTT automatically finds best widths to execute VGG-16 on the dual-socket Intel platform (20 cores) 69,06 90,89 66,67 53,81 30,94 5,83 3,38 1,68 3,28 0,74 14,76 29,21 29,31 0,45 0 20 40 60 80 100 2 4 8 16 PercentageofTAOsw.r.t TAO-width Number of threads 1 2 4 8 16
  • 16. 22/01/2019 HiPEAC CSW Spring 2019 16 Future Directions ● Front-ends for XiTAO – OmpSs to XiTAO – Array (tensor) programming ● Low-energy runtime optimizations ● Automatic DAG partitioning for generation of mixed-mode computations
  • 17. 22/01/2019 HiPEAC CSW Spring 2019 17 Thank you! Acknowledgements: The XiTAO team Jing Chen Pirah Noor Mustafa Abduljabbar Miquel Pericàs