SlideShare a Scribd company logo

Device Data Directory and Asynchronous execution: A path to heterogeneous computing with OmpSs 2

Poster presented by Rubén Cano at the LEGaTO Final Event: 'Low-Energy Heterogeneous Computing Workshop'

1 of 1
Download to read offline
The LEGaTO project has received funding from the European Union’s
Horizon 2020 research and innovation programme under the grant
agreement No 780681.
www.legato-project.eu
Device Data Directory and Asynchronous execution:
A path to heterogeneous computing with OmpSs-2
Rubén Cano, Carlos Álvarez, Daniel Jiménez-González, Xavier Martorell
Barcelona Supercomputing Center and Universitat Politècnica de Catalunya
Benchmarking
Device: GPU
Algorithm: Matrix Multiply Configuration: Matrix Size [16384]
Device: FPGA
Algorithm: Matrix Multiply Configuration: Block Size [256]
Conclusions
• Better performance than CUDA Unified-memory hardware
approach.
• Improved support for range dependences and multi-device
copies with the same performance as OmpSs runtime.
• Framework to easily adapt any new device to OmpSs-2
runtime.
References
• OmpSs-2 Programming Model: https://pm.bsc.es/ompss-2
• Nanos6 repository: https://github.com/bsc-pm/nanos6
• OmpSs@FPGA: https://pm.bsc.es/ompss-at-fpga
Depends
in(x[0; N])
out(y[0; N])
Kind (FPGA, CUDA…)
Task
Instance (Device Id)
Function (Saxpy ,matmul…)
Accelerator
Allocation Engine
(Allocates, reallocates and frees device-
memory)
Directory
Host-Device range-
based mapping cache
(Keeps track of the validity status of any
given region in any device. Can translate
addresses from host to any device, for a
given region)
Copy Engine
(Generates copies between devices)
Symbol-aware mapping
(Dependencies that are part of the same
symbol, but are non-contiguous in memory,
will have the same offsets between them in
the device mapping)
Stream
COPIES
TASK
EXECUTION
TASK
FINALIZATION
Dependency System
This work has been supported by the Ministry of Science
and Innovation, under the project "Computación de
Altas Prestaciones VIII" (PID2019-107255GB).
Problem
1
2
3
Memory management and communication of different
devices is challenging and error-prone.
Hardware approaches relax the memory model, but not
all the devices have support for these mechanisms.
These mechanisms usually are page-based, which can incur
in huge performance-degradation due to false-sharing.
Task is ready to be
executed
Hardware-accelerator selected.
Ensure the symbol validity
Set-up a symbol-translation table
to translate host-pointers to the
destination device.
If a symbol is not already valid, enqueue all the
copies from a valid address-space into the
destination device memory.
Can a software Unified Memory model be faster than current hardware-
based solutions?
Research
Proof-of-concept Solution
Device Directory
Unifies the memory model managing device
memories explicitly, and ensuring the
availability of the data before executing.
Stream
Unifies the execution-model of any device
to be an asynchronous queue of
sequential operations.
Zynq UltraScale+ 9EG
3 [256X256] matmul accelerators
IBM Power9 8335-GTH
NVIDIA V100 x1
We would like to thank Xilinx
University Program for software and
boards donations.
OmpSs OmpSs-2

Recommended

Moldable pipelines for CNNs on heterogeneous edge devices
Moldable pipelines for CNNs on heterogeneous edge devicesMoldable pipelines for CNNs on heterogeneous edge devices
Moldable pipelines for CNNs on heterogeneous edge devicesLEGATO project
 
Low Energy Task Scheduling based on Work Stealing
Low Energy Task Scheduling based on Work StealingLow Energy Task Scheduling based on Work Stealing
Low Energy Task Scheduling based on Work StealingLEGATO project
 
Colloque IMT -04/04/2019- L'IA au cœur des mutations industrielles - L'IA pou...
Colloque IMT -04/04/2019- L'IA au cœur des mutations industrielles - L'IA pou...Colloque IMT -04/04/2019- L'IA au cœur des mutations industrielles - L'IA pou...
Colloque IMT -04/04/2019- L'IA au cœur des mutations industrielles - L'IA pou...I MT
 
Accumulo and the Convergence of Machine Learning, Big Data, and Supercomputing
Accumulo and the Convergence of Machine Learning, Big Data, and SupercomputingAccumulo and the Convergence of Machine Learning, Big Data, and Supercomputing
Accumulo and the Convergence of Machine Learning, Big Data, and SupercomputingAccumulo Summit
 
Resume_Mahadevan_new (2)
Resume_Mahadevan_new (2)Resume_Mahadevan_new (2)
Resume_Mahadevan_new (2)Mahadevan N
 
Spine net learning scale permuted backbone for recognition and localization
Spine net learning scale permuted backbone for recognition and localizationSpine net learning scale permuted backbone for recognition and localization
Spine net learning scale permuted backbone for recognition and localizationDevansh16
 
Big_Data_Heterogeneous_Programming IEEE_Big_Data 2015
Big_Data_Heterogeneous_Programming IEEE_Big_Data 2015Big_Data_Heterogeneous_Programming IEEE_Big_Data 2015
Big_Data_Heterogeneous_Programming IEEE_Big_Data 2015Junli Gu
 

More Related Content

What's hot

OpenCL caffe IWOCL 2016 presentation final
OpenCL caffe IWOCL 2016 presentation finalOpenCL caffe IWOCL 2016 presentation final
OpenCL caffe IWOCL 2016 presentation finalJunli Gu
 
High Performance Parallel Computing with Clouds and Cloud Technologies
High Performance Parallel Computing with Clouds and Cloud TechnologiesHigh Performance Parallel Computing with Clouds and Cloud Technologies
High Performance Parallel Computing with Clouds and Cloud Technologiesjaliyae
 
From Cloud to Fog: the Tao of IT Infrastructure Decentralization
From Cloud to Fog: the Tao of IT Infrastructure DecentralizationFrom Cloud to Fog: the Tao of IT Infrastructure Decentralization
From Cloud to Fog: the Tao of IT Infrastructure DecentralizationFogGuru MSCA Project
 
Realizing Robust and Scalable Evolutionary Algorithms toward Exascale Era
Realizing Robust and Scalable Evolutionary Algorithms toward Exascale EraRealizing Robust and Scalable Evolutionary Algorithms toward Exascale Era
Realizing Robust and Scalable Evolutionary Algorithms toward Exascale EraMasaharu Munetomo
 
APSys Presentation Final copy2
APSys Presentation Final copy2APSys Presentation Final copy2
APSys Presentation Final copy2Junli Gu
 
HPC with Clouds and Cloud Technologies
HPC with Clouds and Cloud TechnologiesHPC with Clouds and Cloud Technologies
HPC with Clouds and Cloud TechnologiesInderjeet Singh
 
AI On the Edge: Model Compression
AI On the Edge: Model CompressionAI On the Edge: Model Compression
AI On the Edge: Model CompressionApache MXNet
 
Architecture and Performance of Runtime Environments for Data Intensive Scala...
Architecture and Performance of Runtime Environments for Data Intensive Scala...Architecture and Performance of Runtime Environments for Data Intensive Scala...
Architecture and Performance of Runtime Environments for Data Intensive Scala...jaliyae
 
Hardware Acceleration of SVM Training for Real-time Embedded Systems: An Over...
Hardware Acceleration of SVM Training for Real-time Embedded Systems: An Over...Hardware Acceleration of SVM Training for Real-time Embedded Systems: An Over...
Hardware Acceleration of SVM Training for Real-time Embedded Systems: An Over...Ilham Amezzane
 
Varun Gatne - Resume - Final
Varun Gatne - Resume - FinalVarun Gatne - Resume - Final
Varun Gatne - Resume - FinalVarun Gatne
 
IEEE CloudCom 2014参加報告
IEEE CloudCom 2014参加報告IEEE CloudCom 2014参加報告
IEEE CloudCom 2014参加報告Ryousei Takano
 
08 Supercomputer Fugaku
08 Supercomputer Fugaku08 Supercomputer Fugaku
08 Supercomputer FugakuRCCSRENKEI
 
Scalable Parallel Computing on Clouds
Scalable Parallel Computing on CloudsScalable Parallel Computing on Clouds
Scalable Parallel Computing on CloudsThilina Gunarathne
 
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsImproving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsHPCC Systems
 
Globe2Train: A Framework for Distributed ML Model Training using IoT Devices ...
Globe2Train: A Framework for Distributed ML Model Training using IoT Devices ...Globe2Train: A Framework for Distributed ML Model Training using IoT Devices ...
Globe2Train: A Framework for Distributed ML Model Training using IoT Devices ...Bharath Sudharsan
 
Expectations for optical network from the viewpoint of system software research
Expectations for optical network from the viewpoint of system software researchExpectations for optical network from the viewpoint of system software research
Expectations for optical network from the viewpoint of system software researchRyousei Takano
 
hetshah_resume
hetshah_resumehetshah_resume
hetshah_resumehet shah
 

What's hot (20)

OpenCL caffe IWOCL 2016 presentation final
OpenCL caffe IWOCL 2016 presentation finalOpenCL caffe IWOCL 2016 presentation final
OpenCL caffe IWOCL 2016 presentation final
 
High Performance Parallel Computing with Clouds and Cloud Technologies
High Performance Parallel Computing with Clouds and Cloud TechnologiesHigh Performance Parallel Computing with Clouds and Cloud Technologies
High Performance Parallel Computing with Clouds and Cloud Technologies
 
Cloud, Fog, or Edge: Where and When to Compute?
Cloud, Fog, or Edge: Where and When to Compute?Cloud, Fog, or Edge: Where and When to Compute?
Cloud, Fog, or Edge: Where and When to Compute?
 
From Cloud to Fog: the Tao of IT Infrastructure Decentralization
From Cloud to Fog: the Tao of IT Infrastructure DecentralizationFrom Cloud to Fog: the Tao of IT Infrastructure Decentralization
From Cloud to Fog: the Tao of IT Infrastructure Decentralization
 
Realizing Robust and Scalable Evolutionary Algorithms toward Exascale Era
Realizing Robust and Scalable Evolutionary Algorithms toward Exascale EraRealizing Robust and Scalable Evolutionary Algorithms toward Exascale Era
Realizing Robust and Scalable Evolutionary Algorithms toward Exascale Era
 
APSys Presentation Final copy2
APSys Presentation Final copy2APSys Presentation Final copy2
APSys Presentation Final copy2
 
HPC with Clouds and Cloud Technologies
HPC with Clouds and Cloud TechnologiesHPC with Clouds and Cloud Technologies
HPC with Clouds and Cloud Technologies
 
Control of computing systems
Control of computing systemsControl of computing systems
Control of computing systems
 
AI On the Edge: Model Compression
AI On the Edge: Model CompressionAI On the Edge: Model Compression
AI On the Edge: Model Compression
 
Stream Processing
Stream Processing Stream Processing
Stream Processing
 
Architecture and Performance of Runtime Environments for Data Intensive Scala...
Architecture and Performance of Runtime Environments for Data Intensive Scala...Architecture and Performance of Runtime Environments for Data Intensive Scala...
Architecture and Performance of Runtime Environments for Data Intensive Scala...
 
Hardware Acceleration of SVM Training for Real-time Embedded Systems: An Over...
Hardware Acceleration of SVM Training for Real-time Embedded Systems: An Over...Hardware Acceleration of SVM Training for Real-time Embedded Systems: An Over...
Hardware Acceleration of SVM Training for Real-time Embedded Systems: An Over...
 
Varun Gatne - Resume - Final
Varun Gatne - Resume - FinalVarun Gatne - Resume - Final
Varun Gatne - Resume - Final
 
IEEE CloudCom 2014参加報告
IEEE CloudCom 2014参加報告IEEE CloudCom 2014参加報告
IEEE CloudCom 2014参加報告
 
08 Supercomputer Fugaku
08 Supercomputer Fugaku08 Supercomputer Fugaku
08 Supercomputer Fugaku
 
Scalable Parallel Computing on Clouds
Scalable Parallel Computing on CloudsScalable Parallel Computing on Clouds
Scalable Parallel Computing on Clouds
 
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsImproving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
 
Globe2Train: A Framework for Distributed ML Model Training using IoT Devices ...
Globe2Train: A Framework for Distributed ML Model Training using IoT Devices ...Globe2Train: A Framework for Distributed ML Model Training using IoT Devices ...
Globe2Train: A Framework for Distributed ML Model Training using IoT Devices ...
 
Expectations for optical network from the viewpoint of system software research
Expectations for optical network from the viewpoint of system software researchExpectations for optical network from the viewpoint of system software research
Expectations for optical network from the viewpoint of system software research
 
hetshah_resume
hetshah_resumehetshah_resume
hetshah_resume
 

Similar to Device Data Directory and Asynchronous execution: A path to heterogeneous computing with OmpSs 2

Lecture_IIITD.pptx
Lecture_IIITD.pptxLecture_IIITD.pptx
Lecture_IIITD.pptxachakracu
 
ZCloud Consensus on Hardware for Distributed Systems
ZCloud Consensus on Hardware for Distributed SystemsZCloud Consensus on Hardware for Distributed Systems
ZCloud Consensus on Hardware for Distributed SystemsGokhan Boranalp
 
FPGA Hardware Accelerator for Machine Learning
FPGA Hardware Accelerator for Machine Learning FPGA Hardware Accelerator for Machine Learning
FPGA Hardware Accelerator for Machine Learning Dr. Swaminathan Kathirvel
 
Exploring emerging technologies in the HPC co-design space
Exploring emerging technologies in the HPC co-design spaceExploring emerging technologies in the HPC co-design space
Exploring emerging technologies in the HPC co-design spacejsvetter
 
PIMRC-2012, Sydney, Australia, 28 July, 2012
PIMRC-2012, Sydney, Australia, 28 July, 2012PIMRC-2012, Sydney, Australia, 28 July, 2012
PIMRC-2012, Sydney, Australia, 28 July, 2012Charith Perera
 
37248136-Nano-Technology.pdf
37248136-Nano-Technology.pdf37248136-Nano-Technology.pdf
37248136-Nano-Technology.pdfTB107thippeswamyM
 
2014 IEEE JAVA MOBILE COMPUTING PROJECT Efficient and privacy aware data aggr...
2014 IEEE JAVA MOBILE COMPUTING PROJECT Efficient and privacy aware data aggr...2014 IEEE JAVA MOBILE COMPUTING PROJECT Efficient and privacy aware data aggr...
2014 IEEE JAVA MOBILE COMPUTING PROJECT Efficient and privacy aware data aggr...IEEEFINALYEARSTUDENTSPROJECTS
 
IEEE 2014 JAVA MOBILE COMPUTING PROJECTS Efficient and privacy aware data agg...
IEEE 2014 JAVA MOBILE COMPUTING PROJECTS Efficient and privacy aware data agg...IEEE 2014 JAVA MOBILE COMPUTING PROJECTS Efficient and privacy aware data agg...
IEEE 2014 JAVA MOBILE COMPUTING PROJECTS Efficient and privacy aware data agg...IEEEFINALYEARSTUDENTPROJECTS
 
2014 IEEE JAVA MOBILE COMPUTING PROJECT Efficient and privacy aware data aggr...
2014 IEEE JAVA MOBILE COMPUTING PROJECT Efficient and privacy aware data aggr...2014 IEEE JAVA MOBILE COMPUTING PROJECT Efficient and privacy aware data aggr...
2014 IEEE JAVA MOBILE COMPUTING PROJECT Efficient and privacy aware data aggr...IEEEFINALYEARSTUDENTPROJECT
 
From Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computersFrom Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computersRyousei Takano
 
ParaForming - Patterns and Refactoring for Parallel Programming
ParaForming - Patterns and Refactoring for Parallel ProgrammingParaForming - Patterns and Refactoring for Parallel Programming
ParaForming - Patterns and Refactoring for Parallel Programmingkhstandrews
 
An octa core processor with shared memory and message-passing
An octa core processor with shared memory and message-passingAn octa core processor with shared memory and message-passing
An octa core processor with shared memory and message-passingeSAT Journals
 
HOMOGENEOUS MULTISTAGE ARCHITECTURE FOR REAL-TIME IMAGE PROCESSING
HOMOGENEOUS MULTISTAGE ARCHITECTURE FOR REAL-TIME IMAGE PROCESSINGHOMOGENEOUS MULTISTAGE ARCHITECTURE FOR REAL-TIME IMAGE PROCESSING
HOMOGENEOUS MULTISTAGE ARCHITECTURE FOR REAL-TIME IMAGE PROCESSINGcscpconf
 
OpenACC and Hackathons Monthly Highlights: April 2023
OpenACC and Hackathons Monthly Highlights: April  2023OpenACC and Hackathons Monthly Highlights: April  2023
OpenACC and Hackathons Monthly Highlights: April 2023OpenACC
 
IRJET- ALPYNE - A Grid Computing Framework
IRJET- ALPYNE - A Grid Computing FrameworkIRJET- ALPYNE - A Grid Computing Framework
IRJET- ALPYNE - A Grid Computing FrameworkIRJET Journal
 

Similar to Device Data Directory and Asynchronous execution: A path to heterogeneous computing with OmpSs 2 (20)

Lecture_IIITD.pptx
Lecture_IIITD.pptxLecture_IIITD.pptx
Lecture_IIITD.pptx
 
ZCloud Consensus on Hardware for Distributed Systems
ZCloud Consensus on Hardware for Distributed SystemsZCloud Consensus on Hardware for Distributed Systems
ZCloud Consensus on Hardware for Distributed Systems
 
FPGA Hardware Accelerator for Machine Learning
FPGA Hardware Accelerator for Machine Learning FPGA Hardware Accelerator for Machine Learning
FPGA Hardware Accelerator for Machine Learning
 
Exploring emerging technologies in the HPC co-design space
Exploring emerging technologies in the HPC co-design spaceExploring emerging technologies in the HPC co-design space
Exploring emerging technologies in the HPC co-design space
 
PIMRC-2012, Sydney, Australia, 28 July, 2012
PIMRC-2012, Sydney, Australia, 28 July, 2012PIMRC-2012, Sydney, Australia, 28 July, 2012
PIMRC-2012, Sydney, Australia, 28 July, 2012
 
37248136-Nano-Technology.pdf
37248136-Nano-Technology.pdf37248136-Nano-Technology.pdf
37248136-Nano-Technology.pdf
 
2014 IEEE JAVA MOBILE COMPUTING PROJECT Efficient and privacy aware data aggr...
2014 IEEE JAVA MOBILE COMPUTING PROJECT Efficient and privacy aware data aggr...2014 IEEE JAVA MOBILE COMPUTING PROJECT Efficient and privacy aware data aggr...
2014 IEEE JAVA MOBILE COMPUTING PROJECT Efficient and privacy aware data aggr...
 
IEEE 2014 JAVA MOBILE COMPUTING PROJECTS Efficient and privacy aware data agg...
IEEE 2014 JAVA MOBILE COMPUTING PROJECTS Efficient and privacy aware data agg...IEEE 2014 JAVA MOBILE COMPUTING PROJECTS Efficient and privacy aware data agg...
IEEE 2014 JAVA MOBILE COMPUTING PROJECTS Efficient and privacy aware data agg...
 
2014 IEEE JAVA MOBILE COMPUTING PROJECT Efficient and privacy aware data aggr...
2014 IEEE JAVA MOBILE COMPUTING PROJECT Efficient and privacy aware data aggr...2014 IEEE JAVA MOBILE COMPUTING PROJECT Efficient and privacy aware data aggr...
2014 IEEE JAVA MOBILE COMPUTING PROJECT Efficient and privacy aware data aggr...
 
From Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computersFrom Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computers
 
Smartblitzmerker
SmartblitzmerkerSmartblitzmerker
Smartblitzmerker
 
ParaForming - Patterns and Refactoring for Parallel Programming
ParaForming - Patterns and Refactoring for Parallel ProgrammingParaForming - Patterns and Refactoring for Parallel Programming
ParaForming - Patterns and Refactoring for Parallel Programming
 
Priorities Shift In IC Design
Priorities Shift In IC DesignPriorities Shift In IC Design
Priorities Shift In IC Design
 
Cisco project ideas
Cisco   project ideasCisco   project ideas
Cisco project ideas
 
An octa core processor with shared memory and message-passing
An octa core processor with shared memory and message-passingAn octa core processor with shared memory and message-passing
An octa core processor with shared memory and message-passing
 
team12.project_ver_1_(1).pptx
team12.project_ver_1_(1).pptxteam12.project_ver_1_(1).pptx
team12.project_ver_1_(1).pptx
 
HOMOGENEOUS MULTISTAGE ARCHITECTURE FOR REAL-TIME IMAGE PROCESSING
HOMOGENEOUS MULTISTAGE ARCHITECTURE FOR REAL-TIME IMAGE PROCESSINGHOMOGENEOUS MULTISTAGE ARCHITECTURE FOR REAL-TIME IMAGE PROCESSING
HOMOGENEOUS MULTISTAGE ARCHITECTURE FOR REAL-TIME IMAGE PROCESSING
 
OpenACC and Hackathons Monthly Highlights: April 2023
OpenACC and Hackathons Monthly Highlights: April  2023OpenACC and Hackathons Monthly Highlights: April  2023
OpenACC and Hackathons Monthly Highlights: April 2023
 
Shantanu's Resume
Shantanu's ResumeShantanu's Resume
Shantanu's Resume
 
IRJET- ALPYNE - A Grid Computing Framework
IRJET- ALPYNE - A Grid Computing FrameworkIRJET- ALPYNE - A Grid Computing Framework
IRJET- ALPYNE - A Grid Computing Framework
 

More from LEGATO project

Scrooge Attack: Undervolting ARM Processors for Profit
Scrooge Attack: Undervolting ARM Processors for ProfitScrooge Attack: Undervolting ARM Processors for Profit
Scrooge Attack: Undervolting ARM Processors for ProfitLEGATO project
 
A practical approach for updating an integrity-enforced operating system
A practical approach for updating an integrity-enforced operating systemA practical approach for updating an integrity-enforced operating system
A practical approach for updating an integrity-enforced operating systemLEGATO project
 
TEEMon: A continuous performance monitoring framework for TEEs
TEEMon: A continuous performance monitoring framework for TEEsTEEMon: A continuous performance monitoring framework for TEEs
TEEMon: A continuous performance monitoring framework for TEEsLEGATO project
 
secureTF: A Secure TensorFlow Framework
secureTF: A Secure TensorFlow FrameworksecureTF: A Secure TensorFlow Framework
secureTF: A Secure TensorFlow FrameworkLEGATO project
 
PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...
PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...
PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...LEGATO project
 
LEGaTO: Machine Learning Use Case
LEGaTO: Machine Learning Use CaseLEGaTO: Machine Learning Use Case
LEGaTO: Machine Learning Use CaseLEGATO project
 
Smart Home AI at the edge
Smart Home AI at the edgeSmart Home AI at the edge
Smart Home AI at the edgeLEGATO project
 
LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the project
LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the projectLEGaTO: Low-Energy Heterogeneous Computing Use of AI in the project
LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the projectLEGATO project
 
LEGaTO: Software Stack Programming Models
LEGaTO: Software Stack Programming ModelsLEGaTO: Software Stack Programming Models
LEGaTO: Software Stack Programming ModelsLEGATO project
 
LEGaTO: Software Stack Runtimes
LEGaTO: Software Stack RuntimesLEGaTO: Software Stack Runtimes
LEGaTO: Software Stack RuntimesLEGATO project
 
LEGaTO Heterogeneous Hardware
LEGaTO Heterogeneous HardwareLEGaTO Heterogeneous Hardware
LEGaTO Heterogeneous HardwareLEGATO project
 
LEGaTO: Low-Energy Heterogeneous Computing Workshop
LEGaTO: Low-Energy Heterogeneous Computing WorkshopLEGaTO: Low-Energy Heterogeneous Computing Workshop
LEGaTO: Low-Energy Heterogeneous Computing WorkshopLEGATO project
 
TZ4Fabric: Executing Smart Contracts with ARM TrustZone
TZ4Fabric: Executing Smart Contracts with ARM TrustZoneTZ4Fabric: Executing Smart Contracts with ARM TrustZone
TZ4Fabric: Executing Smart Contracts with ARM TrustZoneLEGATO project
 
Infection Research with Maxeler Dataflow Computing
Infection Research with Maxeler Dataflow ComputingInfection Research with Maxeler Dataflow Computing
Infection Research with Maxeler Dataflow ComputingLEGATO project
 
Smart Home - AI at the edge
Smart Home - AI at the edgeSmart Home - AI at the edge
Smart Home - AI at the edgeLEGATO project
 
FPGA Undervolting and Checkpointing for Energy-Efficiency and Error-Resiliency
FPGA Undervolting and Checkpointing for Energy-Efficiency and Error-ResiliencyFPGA Undervolting and Checkpointing for Energy-Efficiency and Error-Resiliency
FPGA Undervolting and Checkpointing for Energy-Efficiency and Error-ResiliencyLEGATO project
 
Scheduling Task-parallel Applications in Dynamically Asymmetric Environments
Scheduling Task-parallel Applications in Dynamically Asymmetric EnvironmentsScheduling Task-parallel Applications in Dynamically Asymmetric Environments
Scheduling Task-parallel Applications in Dynamically Asymmetric EnvironmentsLEGATO project
 
RECS – Cloud to Edge Microserver Platform for Energy-Efficient Computing
RECS – Cloud to Edge Microserver Platform for Energy-Efficient ComputingRECS – Cloud to Edge Microserver Platform for Energy-Efficient Computing
RECS – Cloud to Edge Microserver Platform for Energy-Efficient ComputingLEGATO project
 

More from LEGATO project (20)

Scrooge Attack: Undervolting ARM Processors for Profit
Scrooge Attack: Undervolting ARM Processors for ProfitScrooge Attack: Undervolting ARM Processors for Profit
Scrooge Attack: Undervolting ARM Processors for Profit
 
A practical approach for updating an integrity-enforced operating system
A practical approach for updating an integrity-enforced operating systemA practical approach for updating an integrity-enforced operating system
A practical approach for updating an integrity-enforced operating system
 
TEEMon: A continuous performance monitoring framework for TEEs
TEEMon: A continuous performance monitoring framework for TEEsTEEMon: A continuous performance monitoring framework for TEEs
TEEMon: A continuous performance monitoring framework for TEEs
 
secureTF: A Secure TensorFlow Framework
secureTF: A Secure TensorFlow FrameworksecureTF: A Secure TensorFlow Framework
secureTF: A Secure TensorFlow Framework
 
PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...
PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...
PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...
 
LEGaTO: Machine Learning Use Case
LEGaTO: Machine Learning Use CaseLEGaTO: Machine Learning Use Case
LEGaTO: Machine Learning Use Case
 
Smart Home AI at the edge
Smart Home AI at the edgeSmart Home AI at the edge
Smart Home AI at the edge
 
LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the project
LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the projectLEGaTO: Low-Energy Heterogeneous Computing Use of AI in the project
LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the project
 
LEGaTO Integration
LEGaTO IntegrationLEGaTO Integration
LEGaTO Integration
 
LEGaTO: Use cases
LEGaTO: Use casesLEGaTO: Use cases
LEGaTO: Use cases
 
LEGaTO: Software Stack Programming Models
LEGaTO: Software Stack Programming ModelsLEGaTO: Software Stack Programming Models
LEGaTO: Software Stack Programming Models
 
LEGaTO: Software Stack Runtimes
LEGaTO: Software Stack RuntimesLEGaTO: Software Stack Runtimes
LEGaTO: Software Stack Runtimes
 
LEGaTO Heterogeneous Hardware
LEGaTO Heterogeneous HardwareLEGaTO Heterogeneous Hardware
LEGaTO Heterogeneous Hardware
 
LEGaTO: Low-Energy Heterogeneous Computing Workshop
LEGaTO: Low-Energy Heterogeneous Computing WorkshopLEGaTO: Low-Energy Heterogeneous Computing Workshop
LEGaTO: Low-Energy Heterogeneous Computing Workshop
 
TZ4Fabric: Executing Smart Contracts with ARM TrustZone
TZ4Fabric: Executing Smart Contracts with ARM TrustZoneTZ4Fabric: Executing Smart Contracts with ARM TrustZone
TZ4Fabric: Executing Smart Contracts with ARM TrustZone
 
Infection Research with Maxeler Dataflow Computing
Infection Research with Maxeler Dataflow ComputingInfection Research with Maxeler Dataflow Computing
Infection Research with Maxeler Dataflow Computing
 
Smart Home - AI at the edge
Smart Home - AI at the edgeSmart Home - AI at the edge
Smart Home - AI at the edge
 
FPGA Undervolting and Checkpointing for Energy-Efficiency and Error-Resiliency
FPGA Undervolting and Checkpointing for Energy-Efficiency and Error-ResiliencyFPGA Undervolting and Checkpointing for Energy-Efficiency and Error-Resiliency
FPGA Undervolting and Checkpointing for Energy-Efficiency and Error-Resiliency
 
Scheduling Task-parallel Applications in Dynamically Asymmetric Environments
Scheduling Task-parallel Applications in Dynamically Asymmetric EnvironmentsScheduling Task-parallel Applications in Dynamically Asymmetric Environments
Scheduling Task-parallel Applications in Dynamically Asymmetric Environments
 
RECS – Cloud to Edge Microserver Platform for Energy-Efficient Computing
RECS – Cloud to Edge Microserver Platform for Energy-Efficient ComputingRECS – Cloud to Edge Microserver Platform for Energy-Efficient Computing
RECS – Cloud to Edge Microserver Platform for Energy-Efficient Computing
 

Recently uploaded

Hydro-Thermal Liquefaction Of Lignocellulosic biomass to produce Bio-Crude oil
Hydro-Thermal Liquefaction Of Lignocellulosic biomass to produce Bio-Crude oilHydro-Thermal Liquefaction Of Lignocellulosic biomass to produce Bio-Crude oil
Hydro-Thermal Liquefaction Of Lignocellulosic biomass to produce Bio-Crude oilZeeshan Nazir
 
Elbow joint - Anatomy of the Elbow joint
Elbow joint - Anatomy of the Elbow jointElbow joint - Anatomy of the Elbow joint
Elbow joint - Anatomy of the Elbow jointTELISHA2
 
Open Access Publishing in Astrophysics and the Open Journal of Astrophysics
Open Access Publishing in Astrophysics and the Open Journal of AstrophysicsOpen Access Publishing in Astrophysics and the Open Journal of Astrophysics
Open Access Publishing in Astrophysics and the Open Journal of AstrophysicsPeter Coles
 
Cytotoxic Activity of Linum usitatissimum L. Essential oil against Lung Adeno...
Cytotoxic Activity of Linum usitatissimum L. Essential oil against Lung Adeno...Cytotoxic Activity of Linum usitatissimum L. Essential oil against Lung Adeno...
Cytotoxic Activity of Linum usitatissimum L. Essential oil against Lung Adeno...AmalDhivaharS
 
Earth and Planetary Science | Volume 01 | Issue 01 | April 2022
Earth and Planetary Science | Volume 01 | Issue 01 | April 2022Earth and Planetary Science | Volume 01 | Issue 01 | April 2022
Earth and Planetary Science | Volume 01 | Issue 01 | April 2022Nan Yang Academy of Sciences
 
1.0 - The Light Miscroscope.ppt microscopy
1.0 - The Light Miscroscope.ppt microscopy1.0 - The Light Miscroscope.ppt microscopy
1.0 - The Light Miscroscope.ppt microscopystephenopokuasante
 
Introduction to Chromatography (Column chromatography)
Introduction to Chromatography (Column chromatography)Introduction to Chromatography (Column chromatography)
Introduction to Chromatography (Column chromatography)Ahmed Metwaly
 
Weak-lensing detection of intracluster filaments in the Coma cluster
Weak-lensing detection of intracluster filaments in the Coma clusterWeak-lensing detection of intracluster filaments in the Coma cluster
Weak-lensing detection of intracluster filaments in the Coma clusterSérgio Sacani
 
Chemical Bonding and it's Types 001.pptx
Chemical Bonding and it's Types 001.pptxChemical Bonding and it's Types 001.pptx
Chemical Bonding and it's Types 001.pptxperiyar arts college
 
Microbial Fermentation(Strain Improvement)
Microbial  Fermentation(Strain Improvement)Microbial  Fermentation(Strain Improvement)
Microbial Fermentation(Strain Improvement)Rachana Choudhary
 
Seminario Biologia Molecular Nicole Michel Rojas Torres
Seminario Biologia Molecular Nicole Michel Rojas TorresSeminario Biologia Molecular Nicole Michel Rojas Torres
Seminario Biologia Molecular Nicole Michel Rojas Torresnicoledoc2004
 
Open Access Publishing in Astrophysics and the Open Journal of Astrophysics
Open Access Publishing in Astrophysics and the Open Journal of AstrophysicsOpen Access Publishing in Astrophysics and the Open Journal of Astrophysics
Open Access Publishing in Astrophysics and the Open Journal of AstrophysicsPeter Coles
 
green chemistry, clean sustainable environment.ppt
green chemistry, clean sustainable environment.pptgreen chemistry, clean sustainable environment.ppt
green chemistry, clean sustainable environment.pptRashmiSanghi1
 
Anti-Obesity Activity of Anthocyanins and Corresponding Introduction in Dieta...
Anti-Obesity Activity of Anthocyanins and Corresponding Introduction in Dieta...Anti-Obesity Activity of Anthocyanins and Corresponding Introduction in Dieta...
Anti-Obesity Activity of Anthocyanins and Corresponding Introduction in Dieta...AmalDhivaharS
 
American Eclipse A Nation’s Epic Race to Catch the_240225_095603
American Eclipse A Nation’s Epic Race to Catch the_240225_095603American Eclipse A Nation’s Epic Race to Catch the_240225_095603
American Eclipse A Nation’s Epic Race to Catch the_240225_095603SOCIEDAD JULIO GARAVITO
 
Salesforce Starter Package Presentation.
Salesforce Starter Package Presentation.Salesforce Starter Package Presentation.
Salesforce Starter Package Presentation.Naresh Gupta
 
Seminario biología molecular Lina Charris
Seminario biología molecular Lina CharrisSeminario biología molecular Lina Charris
Seminario biología molecular Lina CharrisLinaMarcelaCharrisRa
 
Presentacion Mariana Arango- biología molecular
Presentacion Mariana Arango- biología molecularPresentacion Mariana Arango- biología molecular
Presentacion Mariana Arango- biología molecularmarianaarangop
 
discussion on the endocrine system for science grade10.pptx
discussion on the endocrine system for science grade10.pptxdiscussion on the endocrine system for science grade10.pptx
discussion on the endocrine system for science grade10.pptxShePerezDelaCruz
 
LIGHT Community Medicine LIGHT IS A SOURCE OF ENERGY THERE ARE TWO TYPE OF S...
LIGHT  Community Medicine LIGHT IS A SOURCE OF ENERGY THERE ARE TWO TYPE OF S...LIGHT  Community Medicine LIGHT IS A SOURCE OF ENERGY THERE ARE TWO TYPE OF S...
LIGHT Community Medicine LIGHT IS A SOURCE OF ENERGY THERE ARE TWO TYPE OF S...Abhinav S
 

Recently uploaded (20)

Hydro-Thermal Liquefaction Of Lignocellulosic biomass to produce Bio-Crude oil
Hydro-Thermal Liquefaction Of Lignocellulosic biomass to produce Bio-Crude oilHydro-Thermal Liquefaction Of Lignocellulosic biomass to produce Bio-Crude oil
Hydro-Thermal Liquefaction Of Lignocellulosic biomass to produce Bio-Crude oil
 
Elbow joint - Anatomy of the Elbow joint
Elbow joint - Anatomy of the Elbow jointElbow joint - Anatomy of the Elbow joint
Elbow joint - Anatomy of the Elbow joint
 
Open Access Publishing in Astrophysics and the Open Journal of Astrophysics
Open Access Publishing in Astrophysics and the Open Journal of AstrophysicsOpen Access Publishing in Astrophysics and the Open Journal of Astrophysics
Open Access Publishing in Astrophysics and the Open Journal of Astrophysics
 
Cytotoxic Activity of Linum usitatissimum L. Essential oil against Lung Adeno...
Cytotoxic Activity of Linum usitatissimum L. Essential oil against Lung Adeno...Cytotoxic Activity of Linum usitatissimum L. Essential oil against Lung Adeno...
Cytotoxic Activity of Linum usitatissimum L. Essential oil against Lung Adeno...
 
Earth and Planetary Science | Volume 01 | Issue 01 | April 2022
Earth and Planetary Science | Volume 01 | Issue 01 | April 2022Earth and Planetary Science | Volume 01 | Issue 01 | April 2022
Earth and Planetary Science | Volume 01 | Issue 01 | April 2022
 
1.0 - The Light Miscroscope.ppt microscopy
1.0 - The Light Miscroscope.ppt microscopy1.0 - The Light Miscroscope.ppt microscopy
1.0 - The Light Miscroscope.ppt microscopy
 
Introduction to Chromatography (Column chromatography)
Introduction to Chromatography (Column chromatography)Introduction to Chromatography (Column chromatography)
Introduction to Chromatography (Column chromatography)
 
Weak-lensing detection of intracluster filaments in the Coma cluster
Weak-lensing detection of intracluster filaments in the Coma clusterWeak-lensing detection of intracluster filaments in the Coma cluster
Weak-lensing detection of intracluster filaments in the Coma cluster
 
Chemical Bonding and it's Types 001.pptx
Chemical Bonding and it's Types 001.pptxChemical Bonding and it's Types 001.pptx
Chemical Bonding and it's Types 001.pptx
 
Microbial Fermentation(Strain Improvement)
Microbial  Fermentation(Strain Improvement)Microbial  Fermentation(Strain Improvement)
Microbial Fermentation(Strain Improvement)
 
Seminario Biologia Molecular Nicole Michel Rojas Torres
Seminario Biologia Molecular Nicole Michel Rojas TorresSeminario Biologia Molecular Nicole Michel Rojas Torres
Seminario Biologia Molecular Nicole Michel Rojas Torres
 
Open Access Publishing in Astrophysics and the Open Journal of Astrophysics
Open Access Publishing in Astrophysics and the Open Journal of AstrophysicsOpen Access Publishing in Astrophysics and the Open Journal of Astrophysics
Open Access Publishing in Astrophysics and the Open Journal of Astrophysics
 
green chemistry, clean sustainable environment.ppt
green chemistry, clean sustainable environment.pptgreen chemistry, clean sustainable environment.ppt
green chemistry, clean sustainable environment.ppt
 
Anti-Obesity Activity of Anthocyanins and Corresponding Introduction in Dieta...
Anti-Obesity Activity of Anthocyanins and Corresponding Introduction in Dieta...Anti-Obesity Activity of Anthocyanins and Corresponding Introduction in Dieta...
Anti-Obesity Activity of Anthocyanins and Corresponding Introduction in Dieta...
 
American Eclipse A Nation’s Epic Race to Catch the_240225_095603
American Eclipse A Nation’s Epic Race to Catch the_240225_095603American Eclipse A Nation’s Epic Race to Catch the_240225_095603
American Eclipse A Nation’s Epic Race to Catch the_240225_095603
 
Salesforce Starter Package Presentation.
Salesforce Starter Package Presentation.Salesforce Starter Package Presentation.
Salesforce Starter Package Presentation.
 
Seminario biología molecular Lina Charris
Seminario biología molecular Lina CharrisSeminario biología molecular Lina Charris
Seminario biología molecular Lina Charris
 
Presentacion Mariana Arango- biología molecular
Presentacion Mariana Arango- biología molecularPresentacion Mariana Arango- biología molecular
Presentacion Mariana Arango- biología molecular
 
discussion on the endocrine system for science grade10.pptx
discussion on the endocrine system for science grade10.pptxdiscussion on the endocrine system for science grade10.pptx
discussion on the endocrine system for science grade10.pptx
 
LIGHT Community Medicine LIGHT IS A SOURCE OF ENERGY THERE ARE TWO TYPE OF S...
LIGHT  Community Medicine LIGHT IS A SOURCE OF ENERGY THERE ARE TWO TYPE OF S...LIGHT  Community Medicine LIGHT IS A SOURCE OF ENERGY THERE ARE TWO TYPE OF S...
LIGHT Community Medicine LIGHT IS A SOURCE OF ENERGY THERE ARE TWO TYPE OF S...
 

Device Data Directory and Asynchronous execution: A path to heterogeneous computing with OmpSs 2

  • 1. The LEGaTO project has received funding from the European Union’s Horizon 2020 research and innovation programme under the grant agreement No 780681. www.legato-project.eu Device Data Directory and Asynchronous execution: A path to heterogeneous computing with OmpSs-2 Rubén Cano, Carlos Álvarez, Daniel Jiménez-González, Xavier Martorell Barcelona Supercomputing Center and Universitat Politècnica de Catalunya Benchmarking Device: GPU Algorithm: Matrix Multiply Configuration: Matrix Size [16384] Device: FPGA Algorithm: Matrix Multiply Configuration: Block Size [256] Conclusions • Better performance than CUDA Unified-memory hardware approach. • Improved support for range dependences and multi-device copies with the same performance as OmpSs runtime. • Framework to easily adapt any new device to OmpSs-2 runtime. References • OmpSs-2 Programming Model: https://pm.bsc.es/ompss-2 • Nanos6 repository: https://github.com/bsc-pm/nanos6 • OmpSs@FPGA: https://pm.bsc.es/ompss-at-fpga Depends in(x[0; N]) out(y[0; N]) Kind (FPGA, CUDA…) Task Instance (Device Id) Function (Saxpy ,matmul…) Accelerator Allocation Engine (Allocates, reallocates and frees device- memory) Directory Host-Device range- based mapping cache (Keeps track of the validity status of any given region in any device. Can translate addresses from host to any device, for a given region) Copy Engine (Generates copies between devices) Symbol-aware mapping (Dependencies that are part of the same symbol, but are non-contiguous in memory, will have the same offsets between them in the device mapping) Stream COPIES TASK EXECUTION TASK FINALIZATION Dependency System This work has been supported by the Ministry of Science and Innovation, under the project "Computación de Altas Prestaciones VIII" (PID2019-107255GB). Problem 1 2 3 Memory management and communication of different devices is challenging and error-prone. Hardware approaches relax the memory model, but not all the devices have support for these mechanisms. These mechanisms usually are page-based, which can incur in huge performance-degradation due to false-sharing. Task is ready to be executed Hardware-accelerator selected. Ensure the symbol validity Set-up a symbol-translation table to translate host-pointers to the destination device. If a symbol is not already valid, enqueue all the copies from a valid address-space into the destination device memory. Can a software Unified Memory model be faster than current hardware- based solutions? Research Proof-of-concept Solution Device Directory Unifies the memory model managing device memories explicitly, and ensuring the availability of the data before executing. Stream Unifies the execution-model of any device to be an asynchronous queue of sequential operations. Zynq UltraScale+ 9EG 3 [256X256] matmul accelerators IBM Power9 8335-GTH NVIDIA V100 x1 We would like to thank Xilinx University Program for software and boards donations. OmpSs OmpSs-2