SlideShare a Scribd company logo
oneAPI DPC++ Workshop
9th December 2020
Intel Confidential 2
Agenda
• Intel® oneAPI
• Introduction
• DPC++
• Introduction
• DPC++ “Hello world”
• Lab
• Intel® DPC++ Compatibility Tool
• Introduction
• Demo
Optimization Notice
2
Introduction to Intel®
oneAPI
Intel Confidential 4
XPUs
Programming
Challenges
Growth in specialized workloads
Variety of data-centric hardware required
No common programming language or APIs
Inconsistent tool support across platforms
Each platform requires unique software investment
Middleware / Frameworks
Application Workloads Need Diverse Hardware
Language & Libraries
Scalar Vector Matrix Spatial
4
CP
U
GP
U
FP
GA
Other
accel.
Intel Confidential 5
5
introducing
oneapi
Unified programming model to simplify development across diverse
architectures
Unified and simplified language and libraries for expressing parallelism
Uncompromised native high-level language performance
Based on industry standards and open specifications
Interoperable with existing HPC programming models
Industry Intel
Initiative Product
Middleware / Frameworks
Application Workloads Need Diverse Hardware
Scalar Vector Matrix Spatial
XPUs
CP
U
GP
U
FP
GA
Other
accel.
oneAPI
Data Parallel C++
Subarnarekha Ghosal
Intel Confidential 7
Introduction
Intel Confidential 8
Intel® oneAPI DPC++ Overview
DPC++
SYCL Next
(Intel
Extensions)
Latest Available
SYCL Spec
C++ 17
Intel Confidential 9
Intel® oneAPI DPC++ Overview
1.
• Data Parallel C++ is a high-level language designed to target
heterogenous architecture and take advantage of data parallelism.
2.
• Reuse Code across CPU and accelerators while performing custom
tuning.
3.
• Open-source implementation in Github helps to incorporate ideas
from end users.
9
Intel Confidential 10
Before we start
Lambda Expressions #include <algorithm>
#include <cmath>
void abssort(float* x, unsigned n) {
std::sort(x, x + n,
// Lambda expression
[ ](float a, float b)
{
return (std::abs(a) < std::abs(b));
}
);
}
• A convenient way of defining an
anonymous function object right at
the location where it is invoked or
passed as an argument to a function
• Lambda functions can be used to
define kernels in SYCL
• The kernel lambda MUST use copy
for all its captures (i.e., [=])
Capture clause
Parameter list
Lambda body
10
Intel Confidential 11
COMMAND GROUP
HANDLER
DEVICE (S)
Query for the
Available device
Kernel Model: Send a kernel (lambda) for
execution.
Queue executes the commands on the
device
parallel_for will execute in parallel across
the compute elements of the device
BUF A
BUF B
BUF C
ACC B
ACC C
Read
Read
Write
ACC A
Command groups control
execution on the device
Dispatches Kernels to the
device
Buffers and Accessors
manage memory across
Host and Device
QUEUE
HOST
DPC++ Program Flow
Intel Confidential 12
DPC++ “Hello world”
Intel Confidential 13
13
Step 1
#include <CL/sycl.hpp>
using namespace cl::sycl;
Intel Confidential 14
Step 2
buffer bufA (A, range(SIZE) );
buffer bufB (B, range (SIZE) );
buffer bufC (C, range (SIZE) );
14
Intel Confidential 15
Step 3
gpu_selector deviceSelector;
queue myQueue(deviceSelector);
15
• The device selector can be a default selector or a cpu or gpu selector or intel::fpga_selector.
• If the device is not explicitly mentioned during the creation of command queue, the runtime
selects one for you.
• It is a good practice to specify the selector to make sure the right device is chosen.
Intel Confidential 16
Step 4
myQueue.submit([&](handler& cgh) {
16
Intel Confidential 17
Step 5
auto A = bufA.get_access(cgh, read_only);
auto B = bufB.get_access(cgh, read_only);
auto C = bufC.get_access(cgh);
17
Intel Confidential 18
Step 6
cgh.parallel_for<class vector_add>(N, [=](auto i) {
C[i] = A[i] + B[i];});
18
 Each iteration (work-
item) will have a
separate index id (i)
Intel Confidential 19
int main() {
float A[N], B[N], C[N];
{ buffer bufA (A, range(N));
buffer bufB (B, range(N));
buffer bufC (C, range(N));
queue myQueue;
myQueue.submit([&](handler& cgh) {
auto A = bufA.get_access(cgh, read_only);
auto B = bufB.get_access(cgh, read_only);
auto C = bufC.get_access(cgh);
cgh.parallel_for<class vector_add>(N, [=](auto i){
C[i] = A[i] + B[i];});
});
}
for (int i = 0; i < 5; i++){
cout << "C[" << i << "] = " << C[i] <<std::endl;
}
return 0;
}
DPC++ “Hello World”: Vector Addition Entire Code
19
Intel Confidential 20
int main() {
float A[N], B[N], C[N];
{ buffer bufA (A, range(N));
buffer bufB (B, range(N));
buffer bufC (C, range(N));
queue myQueue;
myQueue.submit([&](handler& cgh) {
auto A = bufA.get_access(cgh, read_only);
auto B = bufB.get_access(cgh, read_only);
auto C = bufC.get_access(cgh);
cgh.parallel_for<class vector_add>(N, [=](auto i) {
C[i] = A[i] + B[i];});
});
}
for (int i = 0; i < 5; i++){
cout << "C[" << i << "] = " << C[i] <<std::endl;}
return 0;
}
Host code
Anatomy of a DPC++ Application
20
Host code
Intel Confidential 21
int main() {
float A[N], B[N], C[N];
{ buffer bufA (A, range(N));
buffer bufB (B, range(N));
buffer bufC (C, range(N));
queue myQueue;
myQueue.submit([&](handler& cgh) {
auto A = bufA.get_access(cgh, read_only);
auto B = bufB.get_access(cgh, read_only);
auto C = bufC.get_access(cgh);
cgh.parallel_for<class vector_add>(N, [=](auto i) {
C[i] = A[i] + B[i];});
});
}
for (int i = 0; i < 5; i++){
cout << "C[" << i << "] = " << C[i] <<std::endl;
}
return 0;
}
Accelerator
device code
Anatomy of a DPC++ Application
21
Host code
Host code
Intel Confidential 22
int main() {
float A[N], B[N], C[N];
{ buffer bufA (A, range(N));
buffer bufB (B, range(N));
buffer bufC (C, range(N));
queue myQueue;
myQueue.submit([&](handler& cgh) {
auto A = bufA.get_access(cgh, read_only);
auto B = bufB.get_access(cgh, read_only);
auto C = bufC.get_access(cgh);
cgh.parallel_for<class vector_add>(N, [=](auto i) {
C[i] = A[i] + B[i];});
});
}
for (int i = 0; i < 5; i++){
cout << "C[" << i << "] = " << C[i] <<std::endl;
}
return 0;
}
22
DPC++ basics
 Write-buffer is now out-of-scope, so
kernel completes, and host pointer
has consistent view of output.
Intel Confidential 23
int main() {
float A[N], B[N], C[N];
{ buffer bufA (A, range(N));
buffer bufB (B, range(N));
buffer bufC (C, range(N));
queue myQueue;
myQueue.submit([&](handler& cgh) {
auto A = bufA.get_access(cgh, read_only);
auto B = bufB.get_access(cgh, read_only);
auto C = bufC.get_access(cgh);
cgh.parallel_for<class vector_add>(N, [=](auto i) {
C[i] = A[i] + B[i];});
});
}
for (int i = 0; i < 5; i++){
cout << "C[" << i << "] = " << C[i] <<std::endl;
}
return 0;
}
23
DPC++ basics
Intel Confidential 24
DPCPP Demo session
Intel Confidential 25
Intel® oneAPI DPC++ Heterogenous Platform
CPU
(Host)
GPU
(Device)
FPGA
(Device)
Other
Accelerator
(Device)
CPU
(Device)
25
26Intel Confidential
For code samples on all these concepts Visit:
https://github.com/oneapi-src/oneAPI-samples/
Intel Confidential 27
DPC++ Summary
•DPC++ is an open standard based programming model for Heterogenous Platforms.
•It can target different accelerators from different vendors
•Single sourced programming model
•oneAPI specifications available publicly:
https://github.com/intel/llvm/tree/sycl/sycl/doc/extensions
Feedback and active participation encouraged
Intel® DPC++ Compatibility Tool
Intel Confidential 29
 Migrates some portion of their existing code written in CUDA to the newly developed DPC++
language.
 Our experience has shown that this can vary greatly, but on average, about 80-90% of CUDA code in
applications can be migrated by this tool.
 Completion of the code and verification of the final code is expected to be manual process done by
the developer.
https://software.intel.com/content/www/us/en/develop/documentation/get-started-with-intel-dpcpp-
compatibility-tool/top.html
What is the Intel® DPC++ Compatibility Tool?
Intel Confidential 30
DPCT* Demo session
Intel Confidential 31
Backup
Intel Confidential 32
DPC++ Deep Dive
Intel Confidential 33
Intel® oneAPI DPC++ Heterogenous Platform
CPU
(Host)
GPU
(Device)
FPGA
(Device)
Other
Accelerator
(Device)
CPU
(Device)
33
Intel Confidential 34
Execution Flow
Global/Constant Memory
Host Memory
Host
Device
(CPU)
(GPU, MIC, FPGA, …)
Compute Unit
(CU)
LocalMemoryLocalMemoryLocalMemoryLocalMemory
Command
Group
• Synchronization cmd
• Data movement ops
• User-defined kernels
Command
GroupCommand
GroupCommand
Group
Command
Queue
Executed on…
submits...
Command
QueueCommand
Queue
Host code
Executed on…
DPC++ Application
Device code
Private Memory
34
Intel Confidential 35
Execution Flow Contd.
Execution of Kernel Instances
Device (GPU, FPGA, …)
Compute Unit
(CU)
Kernel instance =
Kernel object &
nd_range &
work-group
decomposition
Work-pool
Command
QueueCommand
QueueCommand
Queue
enqueued…
35
Intel Confidential 36
Memory Model
Intel Confidential 37
Hardware Architecture
Intel Confidential 38
 Global memory:
 Accessible to all work-items in all work-
groups.
 Reads and writes may be cached.
 Persistent across kernel invocations
Memory Model
Constant memory:
• A region of global memory that
remains constant during the
execution of a kernel
Local Memory:
• Memory region shared between work-items
in a single work-group.
Private Memory:
• Region of memory private to a work-item.
Variables defined in one work-item’s private
memory are not visible to another work-item
Global/Constant Memory
Device (GPU, FPGA, …)
Compute Unit
(CU)
LocalMemoryLocalMemoryLocalMemoryLocalMemory
Private Memory
38
Intel Confidential 39
DPC++ - device memory model
Local Memory
Private
Memory
Work-Item
Private
Memory
Work-Item
Private
Memory
Work-Item
Work-Group
Global Memory Constant
Memory
Device
Work-Group
……
Work-GroupWork-Group
…
…
Local Memory
Private
Memory
Work-Item
Private
Memory
Work-Item
Private
Memory
Work-Item…
Work-Group
…
Device
Intel Confidential 40
Unified Shared Memory
 SYCL 1.2.1 specification offers: – Buffer/Accessor: For tracking and managing memory transfer and
guarantee data consistency across host and DPC++ devices.
 Many HPC and Enterprise applications use pointers to manage data.
 DPC++ Extension for Pointer Based programming: – Unified Shared Memory (USM): Device Kernels
can access the data using pointers
Intel Confidential 41
USM Allocation
Device(Explicit
data movement)
Host(Data sent
over bus, such
as PCIe)
Shared(Data can
migrate b/w host
and memory)
Types of USM
Intel Confidential 42
Kernel Model
Intel Confidential 43
Kernel Execution Model
 Kernel Parallelism
 Multi Dimensional Kernel
 ND-Range
 Sub-group
 Work-Group
 Work Item
Intel Confidential 44
Kernel Execution Model
 Explicit ND-range for control- similar to programming models such as OpenCL, SYCL, CUDA.
ND-range
Global work size
Work-group
Work-item
44
Intel Confidential 45
nd_range & nd_item
 Example: Process every pixel in a 1920x1080 image
 Each pixel needs processing, kernel is executed on each pixel (work-item)
 1920 x 1080 = 2M pixels = global size
 Not all 2M can run in parallel on device, there is hardware resource limits.
 We have to split into smaller groups of pixel blocks = local size (work-group)
 Either let the complier determine work-group size OR we can specify the work-group size using nd_range()

Intel Confidential 46
Example: Process every pixel in a 1920x1080 image
 Let compiler determine work-group size

 Programmer specifies work-group size
h.parallel_for(nd_range<2>(range<2>(1920,1080),range<2>(8,8)),
[=](id<2> item){
// CODE THAT RUNS ON DEVICE
})
h.parallel_for(range<2>(1920,1080), [=](id<2>
item){
// CODE THAT RUNS ON DEVICE
});
nd_range & nd_item
global
size
local size
(work-group
size)
Intel Confidential 47
nd_range & nd_item
 Example: Process every pixel in a 1920x1080 image
 How do we choose work-group size?
• Work-group size of 8x8 divides equally for 1920x1080
• Work-group size of 9x9 does not divide equally for 1920x1080
• Compiler will throw error (invalid work group size error)
• Work-group size of 10x10 divides equally for 1920x1080
• Works, but always better to use multiple of 8 for better resource utilization
• Work-group size of 24x24 divides equally for 1920x1080
• 24x24=576, will fail compile assuming GPU max work-group size is 256
GOOD
48

More Related Content

What's hot

When HPC meet ML/DL: Manage HPC Data Center with Kubernetes
When HPC meet ML/DL: Manage HPC Data Center with KubernetesWhen HPC meet ML/DL: Manage HPC Data Center with Kubernetes
When HPC meet ML/DL: Manage HPC Data Center with Kubernetes
Yong Feng
 
Intel Developer Program
Intel Developer ProgramIntel Developer Program
Intel Developer Program
Intel® Software
 
OpenCV for Embedded: Lessons Learned
OpenCV for Embedded: Lessons LearnedOpenCV for Embedded: Lessons Learned
OpenCV for Embedded: Lessons Learned
Yury Gorbachev
 
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Intel® Software
 
DCEU 18: Edge Computing with Docker Enterprise
DCEU 18: Edge Computing with Docker EnterpriseDCEU 18: Edge Computing with Docker Enterprise
DCEU 18: Edge Computing with Docker Enterprise
Docker, Inc.
 
Resilient microservices with Kubernetes - Mete Atamel
Resilient microservices with Kubernetes - Mete AtamelResilient microservices with Kubernetes - Mete Atamel
Resilient microservices with Kubernetes - Mete Atamel
ITCamp
 
End-to-End Big Data AI with Analytics Zoo
End-to-End Big Data AI with Analytics ZooEnd-to-End Big Data AI with Analytics Zoo
End-to-End Big Data AI with Analytics Zoo
Jason Dai
 
HPC DAY 2017 | Altair's PBS Pro: Your Gateway to HPC Computing
HPC DAY 2017 | Altair's PBS Pro: Your Gateway to HPC ComputingHPC DAY 2017 | Altair's PBS Pro: Your Gateway to HPC Computing
HPC DAY 2017 | Altair's PBS Pro: Your Gateway to HPC Computing
HPC DAY
 
Tesla Accelerated Computing Platform
Tesla Accelerated Computing PlatformTesla Accelerated Computing Platform
Tesla Accelerated Computing Platform
inside-BigData.com
 
Journey Through Four Stages of Kubernetes Deployment Maturity
Journey Through Four Stages of Kubernetes Deployment MaturityJourney Through Four Stages of Kubernetes Deployment Maturity
Journey Through Four Stages of Kubernetes Deployment Maturity
Altoros
 
Fabio rapposelli pks-vmug
Fabio rapposelli   pks-vmugFabio rapposelli   pks-vmug
Fabio rapposelli pks-vmug
VMUG IT
 
NFV features in kubernetes
NFV features in kubernetesNFV features in kubernetes
NFV features in kubernetes
Kuralamudhan Ramakrishnan
 
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
Jason Dai
 
IS-4082, Real-Time insight in Big Data – Even faster using HSA, by Norbert He...
IS-4082, Real-Time insight in Big Data – Even faster using HSA, by Norbert He...IS-4082, Real-Time insight in Big Data – Even faster using HSA, by Norbert He...
IS-4082, Real-Time insight in Big Data – Even faster using HSA, by Norbert He...
AMD Developer Central
 
Modest scale HPC on Azure using CGYRO
Modest scale HPC on Azure using CGYROModest scale HPC on Azure using CGYRO
Modest scale HPC on Azure using CGYRO
Igor Sfiligoi
 
Kube con china_2019_7 missing factors for your production-quality 12-factor apps
Kube con china_2019_7 missing factors for your production-quality 12-factor appsKube con china_2019_7 missing factors for your production-quality 12-factor apps
Kube con china_2019_7 missing factors for your production-quality 12-factor apps
Shikha Srivastava
 
Red Hat OpenShift Container Platform Overview
Red Hat OpenShift Container Platform OverviewRed Hat OpenShift Container Platform Overview
Red Hat OpenShift Container Platform Overview
James Falkner
 
ONS 2018 LA - Intel Tutorial: Cloud Native to NFV - Alon Bernstein, Cisco & K...
ONS 2018 LA - Intel Tutorial: Cloud Native to NFV - Alon Bernstein, Cisco & K...ONS 2018 LA - Intel Tutorial: Cloud Native to NFV - Alon Bernstein, Cisco & K...
ONS 2018 LA - Intel Tutorial: Cloud Native to NFV - Alon Bernstein, Cisco & K...
Kuralamudhan Ramakrishnan
 
GDG Cloud Southlake #8 Steve Cravens: Infrastructure as-Code (IaC) in 2022: ...
GDG Cloud Southlake #8  Steve Cravens: Infrastructure as-Code (IaC) in 2022: ...GDG Cloud Southlake #8  Steve Cravens: Infrastructure as-Code (IaC) in 2022: ...
GDG Cloud Southlake #8 Steve Cravens: Infrastructure as-Code (IaC) in 2022: ...
James Anderson
 
OSDC 2018 | Apache Ignite - the in-memory hammer for your data science toolki...
OSDC 2018 | Apache Ignite - the in-memory hammer for your data science toolki...OSDC 2018 | Apache Ignite - the in-memory hammer for your data science toolki...
OSDC 2018 | Apache Ignite - the in-memory hammer for your data science toolki...
NETWAYS
 

What's hot (20)

When HPC meet ML/DL: Manage HPC Data Center with Kubernetes
When HPC meet ML/DL: Manage HPC Data Center with KubernetesWhen HPC meet ML/DL: Manage HPC Data Center with Kubernetes
When HPC meet ML/DL: Manage HPC Data Center with Kubernetes
 
Intel Developer Program
Intel Developer ProgramIntel Developer Program
Intel Developer Program
 
OpenCV for Embedded: Lessons Learned
OpenCV for Embedded: Lessons LearnedOpenCV for Embedded: Lessons Learned
OpenCV for Embedded: Lessons Learned
 
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
 
DCEU 18: Edge Computing with Docker Enterprise
DCEU 18: Edge Computing with Docker EnterpriseDCEU 18: Edge Computing with Docker Enterprise
DCEU 18: Edge Computing with Docker Enterprise
 
Resilient microservices with Kubernetes - Mete Atamel
Resilient microservices with Kubernetes - Mete AtamelResilient microservices with Kubernetes - Mete Atamel
Resilient microservices with Kubernetes - Mete Atamel
 
End-to-End Big Data AI with Analytics Zoo
End-to-End Big Data AI with Analytics ZooEnd-to-End Big Data AI with Analytics Zoo
End-to-End Big Data AI with Analytics Zoo
 
HPC DAY 2017 | Altair's PBS Pro: Your Gateway to HPC Computing
HPC DAY 2017 | Altair's PBS Pro: Your Gateway to HPC ComputingHPC DAY 2017 | Altair's PBS Pro: Your Gateway to HPC Computing
HPC DAY 2017 | Altair's PBS Pro: Your Gateway to HPC Computing
 
Tesla Accelerated Computing Platform
Tesla Accelerated Computing PlatformTesla Accelerated Computing Platform
Tesla Accelerated Computing Platform
 
Journey Through Four Stages of Kubernetes Deployment Maturity
Journey Through Four Stages of Kubernetes Deployment MaturityJourney Through Four Stages of Kubernetes Deployment Maturity
Journey Through Four Stages of Kubernetes Deployment Maturity
 
Fabio rapposelli pks-vmug
Fabio rapposelli   pks-vmugFabio rapposelli   pks-vmug
Fabio rapposelli pks-vmug
 
NFV features in kubernetes
NFV features in kubernetesNFV features in kubernetes
NFV features in kubernetes
 
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
 
IS-4082, Real-Time insight in Big Data – Even faster using HSA, by Norbert He...
IS-4082, Real-Time insight in Big Data – Even faster using HSA, by Norbert He...IS-4082, Real-Time insight in Big Data – Even faster using HSA, by Norbert He...
IS-4082, Real-Time insight in Big Data – Even faster using HSA, by Norbert He...
 
Modest scale HPC on Azure using CGYRO
Modest scale HPC on Azure using CGYROModest scale HPC on Azure using CGYRO
Modest scale HPC on Azure using CGYRO
 
Kube con china_2019_7 missing factors for your production-quality 12-factor apps
Kube con china_2019_7 missing factors for your production-quality 12-factor appsKube con china_2019_7 missing factors for your production-quality 12-factor apps
Kube con china_2019_7 missing factors for your production-quality 12-factor apps
 
Red Hat OpenShift Container Platform Overview
Red Hat OpenShift Container Platform OverviewRed Hat OpenShift Container Platform Overview
Red Hat OpenShift Container Platform Overview
 
ONS 2018 LA - Intel Tutorial: Cloud Native to NFV - Alon Bernstein, Cisco & K...
ONS 2018 LA - Intel Tutorial: Cloud Native to NFV - Alon Bernstein, Cisco & K...ONS 2018 LA - Intel Tutorial: Cloud Native to NFV - Alon Bernstein, Cisco & K...
ONS 2018 LA - Intel Tutorial: Cloud Native to NFV - Alon Bernstein, Cisco & K...
 
GDG Cloud Southlake #8 Steve Cravens: Infrastructure as-Code (IaC) in 2022: ...
GDG Cloud Southlake #8  Steve Cravens: Infrastructure as-Code (IaC) in 2022: ...GDG Cloud Southlake #8  Steve Cravens: Infrastructure as-Code (IaC) in 2022: ...
GDG Cloud Southlake #8 Steve Cravens: Infrastructure as-Code (IaC) in 2022: ...
 
OSDC 2018 | Apache Ignite - the in-memory hammer for your data science toolki...
OSDC 2018 | Apache Ignite - the in-memory hammer for your data science toolki...OSDC 2018 | Apache Ignite - the in-memory hammer for your data science toolki...
OSDC 2018 | Apache Ignite - the in-memory hammer for your data science toolki...
 

Similar to OneAPI dpc++ Virtual Workshop 9th Dec-20

3 Open-Source-SYCL-Intel-Khronos-EVS-Workshop_May19.pdf
3 Open-Source-SYCL-Intel-Khronos-EVS-Workshop_May19.pdf3 Open-Source-SYCL-Intel-Khronos-EVS-Workshop_May19.pdf
3 Open-Source-SYCL-Intel-Khronos-EVS-Workshop_May19.pdf
JunZhao68
 
Griffon Topic2 Presentation (Tia)
Griffon Topic2 Presentation (Tia)Griffon Topic2 Presentation (Tia)
Griffon Topic2 Presentation (Tia)Nat Weerawan
 
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
tdc-globalcode
 
Performance Verification for ESL Design Methodology from AADL Models
Performance Verification for ESL Design Methodology from AADL ModelsPerformance Verification for ESL Design Methodology from AADL Models
Performance Verification for ESL Design Methodology from AADL Models
Space Codesign
 
"Making OpenCV Code Run Fast," a Presentation from Intel
"Making OpenCV Code Run Fast," a Presentation from Intel"Making OpenCV Code Run Fast," a Presentation from Intel
"Making OpenCV Code Run Fast," a Presentation from Intel
Edge AI and Vision Alliance
 
ProjectVault[VivekKumar_CS-C_6Sem_MIT].pptx
ProjectVault[VivekKumar_CS-C_6Sem_MIT].pptxProjectVault[VivekKumar_CS-C_6Sem_MIT].pptx
ProjectVault[VivekKumar_CS-C_6Sem_MIT].pptxVivek Kumar
 
“Intel Video AI Box—Converging AI, Media and Computing in a Compact and Open ...
“Intel Video AI Box—Converging AI, Media and Computing in a Compact and Open ...“Intel Video AI Box—Converging AI, Media and Computing in a Compact and Open ...
“Intel Video AI Box—Converging AI, Media and Computing in a Compact and Open ...
Edge AI and Vision Alliance
 
Deep Learning Edge
Deep Learning Edge Deep Learning Edge
Deep Learning Edge
Ganesan Narayanasamy
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
inside-BigData.com
 
Open CL For Haifa Linux Club
Open CL For Haifa Linux ClubOpen CL For Haifa Linux Club
Open CL For Haifa Linux ClubOfer Rosenberg
 
Brillo/Weave Part 2: Deep Dive
Brillo/Weave Part 2: Deep DiveBrillo/Weave Part 2: Deep Dive
Brillo/Weave Part 2: Deep Dive
Jalal Rohani
 
GNU Compiler Collection - August 2005
GNU Compiler Collection - August 2005GNU Compiler Collection - August 2005
GNU Compiler Collection - August 2005
Saleem Ansari
 
HPC DAY 2017 | FlyElephant Solutions for Data Science and HPC
HPC DAY 2017 | FlyElephant Solutions for Data Science and HPCHPC DAY 2017 | FlyElephant Solutions for Data Science and HPC
HPC DAY 2017 | FlyElephant Solutions for Data Science and HPC
HPC DAY
 
ELC North America 2021 Introduction to pin muxing and gpio control under linux
ELC  North America 2021 Introduction to pin muxing and gpio control under linuxELC  North America 2021 Introduction to pin muxing and gpio control under linux
ELC North America 2021 Introduction to pin muxing and gpio control under linux
Neil Armstrong
 
Mesa and Its Debugging, Вадим Шовкопляс
Mesa and Its Debugging, Вадим ШовкоплясMesa and Its Debugging, Вадим Шовкопляс
Mesa and Its Debugging, Вадим Шовкопляс
Sigma Software
 
Perceptual Computing Workshop à Paris
Perceptual Computing Workshop à ParisPerceptual Computing Workshop à Paris
Perceptual Computing Workshop à Paris
BeMyApp
 
“Khronos Standard APIs for Accelerating Vision and Inferencing,” a Presentati...
“Khronos Standard APIs for Accelerating Vision and Inferencing,” a Presentati...“Khronos Standard APIs for Accelerating Vision and Inferencing,” a Presentati...
“Khronos Standard APIs for Accelerating Vision and Inferencing,” a Presentati...
Edge AI and Vision Alliance
 
Developing new zynq based instruments
Developing new zynq based instrumentsDeveloping new zynq based instruments
Developing new zynq based instruments
Graham NAYLOR
 

Similar to OneAPI dpc++ Virtual Workshop 9th Dec-20 (20)

3 Open-Source-SYCL-Intel-Khronos-EVS-Workshop_May19.pdf
3 Open-Source-SYCL-Intel-Khronos-EVS-Workshop_May19.pdf3 Open-Source-SYCL-Intel-Khronos-EVS-Workshop_May19.pdf
3 Open-Source-SYCL-Intel-Khronos-EVS-Workshop_May19.pdf
 
Griffon Topic2 Presentation (Tia)
Griffon Topic2 Presentation (Tia)Griffon Topic2 Presentation (Tia)
Griffon Topic2 Presentation (Tia)
 
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
 
Performance Verification for ESL Design Methodology from AADL Models
Performance Verification for ESL Design Methodology from AADL ModelsPerformance Verification for ESL Design Methodology from AADL Models
Performance Verification for ESL Design Methodology from AADL Models
 
"Making OpenCV Code Run Fast," a Presentation from Intel
"Making OpenCV Code Run Fast," a Presentation from Intel"Making OpenCV Code Run Fast," a Presentation from Intel
"Making OpenCV Code Run Fast," a Presentation from Intel
 
ProjectVault[VivekKumar_CS-C_6Sem_MIT].pptx
ProjectVault[VivekKumar_CS-C_6Sem_MIT].pptxProjectVault[VivekKumar_CS-C_6Sem_MIT].pptx
ProjectVault[VivekKumar_CS-C_6Sem_MIT].pptx
 
Agnostic Device Drivers
Agnostic Device DriversAgnostic Device Drivers
Agnostic Device Drivers
 
“Intel Video AI Box—Converging AI, Media and Computing in a Compact and Open ...
“Intel Video AI Box—Converging AI, Media and Computing in a Compact and Open ...“Intel Video AI Box—Converging AI, Media and Computing in a Compact and Open ...
“Intel Video AI Box—Converging AI, Media and Computing in a Compact and Open ...
 
Deep Learning Edge
Deep Learning Edge Deep Learning Edge
Deep Learning Edge
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
 
Open CL For Haifa Linux Club
Open CL For Haifa Linux ClubOpen CL For Haifa Linux Club
Open CL For Haifa Linux Club
 
Brillo/Weave Part 2: Deep Dive
Brillo/Weave Part 2: Deep DiveBrillo/Weave Part 2: Deep Dive
Brillo/Weave Part 2: Deep Dive
 
GNU Compiler Collection - August 2005
GNU Compiler Collection - August 2005GNU Compiler Collection - August 2005
GNU Compiler Collection - August 2005
 
HPC DAY 2017 | FlyElephant Solutions for Data Science and HPC
HPC DAY 2017 | FlyElephant Solutions for Data Science and HPCHPC DAY 2017 | FlyElephant Solutions for Data Science and HPC
HPC DAY 2017 | FlyElephant Solutions for Data Science and HPC
 
ELC North America 2021 Introduction to pin muxing and gpio control under linux
ELC  North America 2021 Introduction to pin muxing and gpio control under linuxELC  North America 2021 Introduction to pin muxing and gpio control under linux
ELC North America 2021 Introduction to pin muxing and gpio control under linux
 
Mesa and Its Debugging, Вадим Шовкопляс
Mesa and Its Debugging, Вадим ШовкоплясMesa and Its Debugging, Вадим Шовкопляс
Mesa and Its Debugging, Вадим Шовкопляс
 
Perceptual Computing Workshop à Paris
Perceptual Computing Workshop à ParisPerceptual Computing Workshop à Paris
Perceptual Computing Workshop à Paris
 
“Khronos Standard APIs for Accelerating Vision and Inferencing,” a Presentati...
“Khronos Standard APIs for Accelerating Vision and Inferencing,” a Presentati...“Khronos Standard APIs for Accelerating Vision and Inferencing,” a Presentati...
“Khronos Standard APIs for Accelerating Vision and Inferencing,” a Presentati...
 
Developing new zynq based instruments
Developing new zynq based instrumentsDeveloping new zynq based instruments
Developing new zynq based instruments
 
Xilinx track g
Xilinx   track gXilinx   track g
Xilinx track g
 

More from Tyrone Systems

Kubernetes in The Enterprise
Kubernetes in The EnterpriseKubernetes in The Enterprise
Kubernetes in The Enterprise
Tyrone Systems
 
Why minio wins the hybrid cloud?
Why minio wins the hybrid cloud?Why minio wins the hybrid cloud?
Why minio wins the hybrid cloud?
Tyrone Systems
 
why min io wins the hybrid cloud
why min io wins the hybrid cloudwhy min io wins the hybrid cloud
why min io wins the hybrid cloud
Tyrone Systems
 
5 ways hci (hyper-converged infrastructure) powering today’s modern learning ...
5 ways hci (hyper-converged infrastructure) powering today’s modern learning ...5 ways hci (hyper-converged infrastructure) powering today’s modern learning ...
5 ways hci (hyper-converged infrastructure) powering today’s modern learning ...
Tyrone Systems
 
5 current and near-future use cases of ai in broadcast and media.
5 current and near-future use cases of ai in broadcast and media.5 current and near-future use cases of ai in broadcast and media.
5 current and near-future use cases of ai in broadcast and media.
Tyrone Systems
 
How hci is driving digital transformation in the insurance firms to enable pr...
How hci is driving digital transformation in the insurance firms to enable pr...How hci is driving digital transformation in the insurance firms to enable pr...
How hci is driving digital transformation in the insurance firms to enable pr...
Tyrone Systems
 
How blockchain is revolutionising healthcare industry’s challenges of genomic...
How blockchain is revolutionising healthcare industry’s challenges of genomic...How blockchain is revolutionising healthcare industry’s challenges of genomic...
How blockchain is revolutionising healthcare industry’s challenges of genomic...
Tyrone Systems
 
5 ways hpc can provides cost savings and flexibility to meet the technology i...
5 ways hpc can provides cost savings and flexibility to meet the technology i...5 ways hpc can provides cost savings and flexibility to meet the technology i...
5 ways hpc can provides cost savings and flexibility to meet the technology i...
Tyrone Systems
 
How Emerging Technologies are Enabling The Banking Industry
How Emerging Technologies are Enabling The Banking IndustryHow Emerging Technologies are Enabling The Banking Industry
How Emerging Technologies are Enabling The Banking Industry
Tyrone Systems
 
Five Exciting Ways HCI can accelerates digital transformation for Media and E...
Five Exciting Ways HCI can accelerates digital transformation for Media and E...Five Exciting Ways HCI can accelerates digital transformation for Media and E...
Five Exciting Ways HCI can accelerates digital transformation for Media and E...
Tyrone Systems
 
Design and Optimize your code for high-performance with Intel® Advisor and I...
Design and Optimize your code for high-performance with Intel®  Advisor and I...Design and Optimize your code for high-performance with Intel®  Advisor and I...
Design and Optimize your code for high-performance with Intel® Advisor and I...
Tyrone Systems
 
Fast-Track Your Digital Transformation with Intelligent Automation
Fast-Track Your Digital Transformation with Intelligent AutomationFast-Track Your Digital Transformation with Intelligent Automation
Fast-Track Your Digital Transformation with Intelligent Automation
Tyrone Systems
 
Top Five benefits of Hyper-Converged Infrastructure
Top Five benefits of Hyper-Converged InfrastructureTop Five benefits of Hyper-Converged Infrastructure
Top Five benefits of Hyper-Converged Infrastructure
Tyrone Systems
 
An Effective Approach to Cloud Migration for Small and Medium Enterprises (SMEs)
An Effective Approach to Cloud Migration for Small and Medium Enterprises (SMEs)An Effective Approach to Cloud Migration for Small and Medium Enterprises (SMEs)
An Effective Approach to Cloud Migration for Small and Medium Enterprises (SMEs)
Tyrone Systems
 
How can Artificial Intelligence improve software development process?
How can Artificial Intelligence improve software development process?How can Artificial Intelligence improve software development process?
How can Artificial Intelligence improve software development process?
Tyrone Systems
 
3 Ways Machine Learning Facilitates Fraud Detection
3 Ways Machine Learning Facilitates Fraud Detection3 Ways Machine Learning Facilitates Fraud Detection
3 Ways Machine Learning Facilitates Fraud Detection
Tyrone Systems
 
Four ways to digitally transform with HPC in the cloud
Four ways to digitally transform with HPC in the cloudFour ways to digitally transform with HPC in the cloud
Four ways to digitally transform with HPC in the cloud
Tyrone Systems
 
How to Secure Containerized Environments?
How to Secure Containerized Environments?How to Secure Containerized Environments?
How to Secure Containerized Environments?
Tyrone Systems
 
Tyrone-Intel oneAPI Webinar: Optimized Tools for Performance-Driven, Cross-Ar...
Tyrone-Intel oneAPI Webinar: Optimized Tools for Performance-Driven, Cross-Ar...Tyrone-Intel oneAPI Webinar: Optimized Tools for Performance-Driven, Cross-Ar...
Tyrone-Intel oneAPI Webinar: Optimized Tools for Performance-Driven, Cross-Ar...
Tyrone Systems
 
Top 5 Benefits of Hyper-Converged Infrastructure
Top 5 Benefits of Hyper-Converged InfrastructureTop 5 Benefits of Hyper-Converged Infrastructure
Top 5 Benefits of Hyper-Converged Infrastructure
Tyrone Systems
 

More from Tyrone Systems (20)

Kubernetes in The Enterprise
Kubernetes in The EnterpriseKubernetes in The Enterprise
Kubernetes in The Enterprise
 
Why minio wins the hybrid cloud?
Why minio wins the hybrid cloud?Why minio wins the hybrid cloud?
Why minio wins the hybrid cloud?
 
why min io wins the hybrid cloud
why min io wins the hybrid cloudwhy min io wins the hybrid cloud
why min io wins the hybrid cloud
 
5 ways hci (hyper-converged infrastructure) powering today’s modern learning ...
5 ways hci (hyper-converged infrastructure) powering today’s modern learning ...5 ways hci (hyper-converged infrastructure) powering today’s modern learning ...
5 ways hci (hyper-converged infrastructure) powering today’s modern learning ...
 
5 current and near-future use cases of ai in broadcast and media.
5 current and near-future use cases of ai in broadcast and media.5 current and near-future use cases of ai in broadcast and media.
5 current and near-future use cases of ai in broadcast and media.
 
How hci is driving digital transformation in the insurance firms to enable pr...
How hci is driving digital transformation in the insurance firms to enable pr...How hci is driving digital transformation in the insurance firms to enable pr...
How hci is driving digital transformation in the insurance firms to enable pr...
 
How blockchain is revolutionising healthcare industry’s challenges of genomic...
How blockchain is revolutionising healthcare industry’s challenges of genomic...How blockchain is revolutionising healthcare industry’s challenges of genomic...
How blockchain is revolutionising healthcare industry’s challenges of genomic...
 
5 ways hpc can provides cost savings and flexibility to meet the technology i...
5 ways hpc can provides cost savings and flexibility to meet the technology i...5 ways hpc can provides cost savings and flexibility to meet the technology i...
5 ways hpc can provides cost savings and flexibility to meet the technology i...
 
How Emerging Technologies are Enabling The Banking Industry
How Emerging Technologies are Enabling The Banking IndustryHow Emerging Technologies are Enabling The Banking Industry
How Emerging Technologies are Enabling The Banking Industry
 
Five Exciting Ways HCI can accelerates digital transformation for Media and E...
Five Exciting Ways HCI can accelerates digital transformation for Media and E...Five Exciting Ways HCI can accelerates digital transformation for Media and E...
Five Exciting Ways HCI can accelerates digital transformation for Media and E...
 
Design and Optimize your code for high-performance with Intel® Advisor and I...
Design and Optimize your code for high-performance with Intel®  Advisor and I...Design and Optimize your code for high-performance with Intel®  Advisor and I...
Design and Optimize your code for high-performance with Intel® Advisor and I...
 
Fast-Track Your Digital Transformation with Intelligent Automation
Fast-Track Your Digital Transformation with Intelligent AutomationFast-Track Your Digital Transformation with Intelligent Automation
Fast-Track Your Digital Transformation with Intelligent Automation
 
Top Five benefits of Hyper-Converged Infrastructure
Top Five benefits of Hyper-Converged InfrastructureTop Five benefits of Hyper-Converged Infrastructure
Top Five benefits of Hyper-Converged Infrastructure
 
An Effective Approach to Cloud Migration for Small and Medium Enterprises (SMEs)
An Effective Approach to Cloud Migration for Small and Medium Enterprises (SMEs)An Effective Approach to Cloud Migration for Small and Medium Enterprises (SMEs)
An Effective Approach to Cloud Migration for Small and Medium Enterprises (SMEs)
 
How can Artificial Intelligence improve software development process?
How can Artificial Intelligence improve software development process?How can Artificial Intelligence improve software development process?
How can Artificial Intelligence improve software development process?
 
3 Ways Machine Learning Facilitates Fraud Detection
3 Ways Machine Learning Facilitates Fraud Detection3 Ways Machine Learning Facilitates Fraud Detection
3 Ways Machine Learning Facilitates Fraud Detection
 
Four ways to digitally transform with HPC in the cloud
Four ways to digitally transform with HPC in the cloudFour ways to digitally transform with HPC in the cloud
Four ways to digitally transform with HPC in the cloud
 
How to Secure Containerized Environments?
How to Secure Containerized Environments?How to Secure Containerized Environments?
How to Secure Containerized Environments?
 
Tyrone-Intel oneAPI Webinar: Optimized Tools for Performance-Driven, Cross-Ar...
Tyrone-Intel oneAPI Webinar: Optimized Tools for Performance-Driven, Cross-Ar...Tyrone-Intel oneAPI Webinar: Optimized Tools for Performance-Driven, Cross-Ar...
Tyrone-Intel oneAPI Webinar: Optimized Tools for Performance-Driven, Cross-Ar...
 
Top 5 Benefits of Hyper-Converged Infrastructure
Top 5 Benefits of Hyper-Converged InfrastructureTop 5 Benefits of Hyper-Converged Infrastructure
Top 5 Benefits of Hyper-Converged Infrastructure
 

Recently uploaded

Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
beazzy04
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
MysoreMuleSoftMeetup
 
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdfESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
Fundacja Rozwoju Społeczeństwa Przedsiębiorczego
 
Basic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumersBasic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumers
PedroFerreira53928
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
siemaillard
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
Sectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdfSectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdf
Vivekanand Anglo Vedic Academy
 
PART A. Introduction to Costumer Service
PART A. Introduction to Costumer ServicePART A. Introduction to Costumer Service
PART A. Introduction to Costumer Service
PedroFerreira53928
 
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
AzmatAli747758
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
Vivekanand Anglo Vedic Academy
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
Jisc
 
Introduction to Quality Improvement Essentials
Introduction to Quality Improvement EssentialsIntroduction to Quality Improvement Essentials
Introduction to Quality Improvement Essentials
Excellence Foundation for South Sudan
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
Jisc
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
JosvitaDsouza2
 
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
Nguyen Thanh Tu Collection
 
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCECLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
BhavyaRajput3
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
Delapenabediema
 
How to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS ModuleHow to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS Module
Celine George
 
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxStudents, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
EduSkills OECD
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
Vikramjit Singh
 

Recently uploaded (20)

Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
 
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdfESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
 
Basic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumersBasic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumers
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
 
Sectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdfSectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdf
 
PART A. Introduction to Costumer Service
PART A. Introduction to Costumer ServicePART A. Introduction to Costumer Service
PART A. Introduction to Costumer Service
 
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
 
Introduction to Quality Improvement Essentials
Introduction to Quality Improvement EssentialsIntroduction to Quality Improvement Essentials
Introduction to Quality Improvement Essentials
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
 
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
 
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCECLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
 
How to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS ModuleHow to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS Module
 
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxStudents, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
 

OneAPI dpc++ Virtual Workshop 9th Dec-20

  • 2. Intel Confidential 2 Agenda • Intel® oneAPI • Introduction • DPC++ • Introduction • DPC++ “Hello world” • Lab • Intel® DPC++ Compatibility Tool • Introduction • Demo Optimization Notice 2
  • 4. Intel Confidential 4 XPUs Programming Challenges Growth in specialized workloads Variety of data-centric hardware required No common programming language or APIs Inconsistent tool support across platforms Each platform requires unique software investment Middleware / Frameworks Application Workloads Need Diverse Hardware Language & Libraries Scalar Vector Matrix Spatial 4 CP U GP U FP GA Other accel.
  • 5. Intel Confidential 5 5 introducing oneapi Unified programming model to simplify development across diverse architectures Unified and simplified language and libraries for expressing parallelism Uncompromised native high-level language performance Based on industry standards and open specifications Interoperable with existing HPC programming models Industry Intel Initiative Product Middleware / Frameworks Application Workloads Need Diverse Hardware Scalar Vector Matrix Spatial XPUs CP U GP U FP GA Other accel. oneAPI
  • 8. Intel Confidential 8 Intel® oneAPI DPC++ Overview DPC++ SYCL Next (Intel Extensions) Latest Available SYCL Spec C++ 17
  • 9. Intel Confidential 9 Intel® oneAPI DPC++ Overview 1. • Data Parallel C++ is a high-level language designed to target heterogenous architecture and take advantage of data parallelism. 2. • Reuse Code across CPU and accelerators while performing custom tuning. 3. • Open-source implementation in Github helps to incorporate ideas from end users. 9
  • 10. Intel Confidential 10 Before we start Lambda Expressions #include <algorithm> #include <cmath> void abssort(float* x, unsigned n) { std::sort(x, x + n, // Lambda expression [ ](float a, float b) { return (std::abs(a) < std::abs(b)); } ); } • A convenient way of defining an anonymous function object right at the location where it is invoked or passed as an argument to a function • Lambda functions can be used to define kernels in SYCL • The kernel lambda MUST use copy for all its captures (i.e., [=]) Capture clause Parameter list Lambda body 10
  • 11. Intel Confidential 11 COMMAND GROUP HANDLER DEVICE (S) Query for the Available device Kernel Model: Send a kernel (lambda) for execution. Queue executes the commands on the device parallel_for will execute in parallel across the compute elements of the device BUF A BUF B BUF C ACC B ACC C Read Read Write ACC A Command groups control execution on the device Dispatches Kernels to the device Buffers and Accessors manage memory across Host and Device QUEUE HOST DPC++ Program Flow
  • 12. Intel Confidential 12 DPC++ “Hello world”
  • 13. Intel Confidential 13 13 Step 1 #include <CL/sycl.hpp> using namespace cl::sycl;
  • 14. Intel Confidential 14 Step 2 buffer bufA (A, range(SIZE) ); buffer bufB (B, range (SIZE) ); buffer bufC (C, range (SIZE) ); 14
  • 15. Intel Confidential 15 Step 3 gpu_selector deviceSelector; queue myQueue(deviceSelector); 15 • The device selector can be a default selector or a cpu or gpu selector or intel::fpga_selector. • If the device is not explicitly mentioned during the creation of command queue, the runtime selects one for you. • It is a good practice to specify the selector to make sure the right device is chosen.
  • 16. Intel Confidential 16 Step 4 myQueue.submit([&](handler& cgh) { 16
  • 17. Intel Confidential 17 Step 5 auto A = bufA.get_access(cgh, read_only); auto B = bufB.get_access(cgh, read_only); auto C = bufC.get_access(cgh); 17
  • 18. Intel Confidential 18 Step 6 cgh.parallel_for<class vector_add>(N, [=](auto i) { C[i] = A[i] + B[i];}); 18  Each iteration (work- item) will have a separate index id (i)
  • 19. Intel Confidential 19 int main() { float A[N], B[N], C[N]; { buffer bufA (A, range(N)); buffer bufB (B, range(N)); buffer bufC (C, range(N)); queue myQueue; myQueue.submit([&](handler& cgh) { auto A = bufA.get_access(cgh, read_only); auto B = bufB.get_access(cgh, read_only); auto C = bufC.get_access(cgh); cgh.parallel_for<class vector_add>(N, [=](auto i){ C[i] = A[i] + B[i];}); }); } for (int i = 0; i < 5; i++){ cout << "C[" << i << "] = " << C[i] <<std::endl; } return 0; } DPC++ “Hello World”: Vector Addition Entire Code 19
  • 20. Intel Confidential 20 int main() { float A[N], B[N], C[N]; { buffer bufA (A, range(N)); buffer bufB (B, range(N)); buffer bufC (C, range(N)); queue myQueue; myQueue.submit([&](handler& cgh) { auto A = bufA.get_access(cgh, read_only); auto B = bufB.get_access(cgh, read_only); auto C = bufC.get_access(cgh); cgh.parallel_for<class vector_add>(N, [=](auto i) { C[i] = A[i] + B[i];}); }); } for (int i = 0; i < 5; i++){ cout << "C[" << i << "] = " << C[i] <<std::endl;} return 0; } Host code Anatomy of a DPC++ Application 20 Host code
  • 21. Intel Confidential 21 int main() { float A[N], B[N], C[N]; { buffer bufA (A, range(N)); buffer bufB (B, range(N)); buffer bufC (C, range(N)); queue myQueue; myQueue.submit([&](handler& cgh) { auto A = bufA.get_access(cgh, read_only); auto B = bufB.get_access(cgh, read_only); auto C = bufC.get_access(cgh); cgh.parallel_for<class vector_add>(N, [=](auto i) { C[i] = A[i] + B[i];}); }); } for (int i = 0; i < 5; i++){ cout << "C[" << i << "] = " << C[i] <<std::endl; } return 0; } Accelerator device code Anatomy of a DPC++ Application 21 Host code Host code
  • 22. Intel Confidential 22 int main() { float A[N], B[N], C[N]; { buffer bufA (A, range(N)); buffer bufB (B, range(N)); buffer bufC (C, range(N)); queue myQueue; myQueue.submit([&](handler& cgh) { auto A = bufA.get_access(cgh, read_only); auto B = bufB.get_access(cgh, read_only); auto C = bufC.get_access(cgh); cgh.parallel_for<class vector_add>(N, [=](auto i) { C[i] = A[i] + B[i];}); }); } for (int i = 0; i < 5; i++){ cout << "C[" << i << "] = " << C[i] <<std::endl; } return 0; } 22 DPC++ basics  Write-buffer is now out-of-scope, so kernel completes, and host pointer has consistent view of output.
  • 23. Intel Confidential 23 int main() { float A[N], B[N], C[N]; { buffer bufA (A, range(N)); buffer bufB (B, range(N)); buffer bufC (C, range(N)); queue myQueue; myQueue.submit([&](handler& cgh) { auto A = bufA.get_access(cgh, read_only); auto B = bufB.get_access(cgh, read_only); auto C = bufC.get_access(cgh); cgh.parallel_for<class vector_add>(N, [=](auto i) { C[i] = A[i] + B[i];}); }); } for (int i = 0; i < 5; i++){ cout << "C[" << i << "] = " << C[i] <<std::endl; } return 0; } 23 DPC++ basics
  • 25. Intel Confidential 25 Intel® oneAPI DPC++ Heterogenous Platform CPU (Host) GPU (Device) FPGA (Device) Other Accelerator (Device) CPU (Device) 25
  • 26. 26Intel Confidential For code samples on all these concepts Visit: https://github.com/oneapi-src/oneAPI-samples/
  • 27. Intel Confidential 27 DPC++ Summary •DPC++ is an open standard based programming model for Heterogenous Platforms. •It can target different accelerators from different vendors •Single sourced programming model •oneAPI specifications available publicly: https://github.com/intel/llvm/tree/sycl/sycl/doc/extensions Feedback and active participation encouraged
  • 29. Intel Confidential 29  Migrates some portion of their existing code written in CUDA to the newly developed DPC++ language.  Our experience has shown that this can vary greatly, but on average, about 80-90% of CUDA code in applications can be migrated by this tool.  Completion of the code and verification of the final code is expected to be manual process done by the developer. https://software.intel.com/content/www/us/en/develop/documentation/get-started-with-intel-dpcpp- compatibility-tool/top.html What is the Intel® DPC++ Compatibility Tool?
  • 33. Intel Confidential 33 Intel® oneAPI DPC++ Heterogenous Platform CPU (Host) GPU (Device) FPGA (Device) Other Accelerator (Device) CPU (Device) 33
  • 34. Intel Confidential 34 Execution Flow Global/Constant Memory Host Memory Host Device (CPU) (GPU, MIC, FPGA, …) Compute Unit (CU) LocalMemoryLocalMemoryLocalMemoryLocalMemory Command Group • Synchronization cmd • Data movement ops • User-defined kernels Command GroupCommand GroupCommand Group Command Queue Executed on… submits... Command QueueCommand Queue Host code Executed on… DPC++ Application Device code Private Memory 34
  • 35. Intel Confidential 35 Execution Flow Contd. Execution of Kernel Instances Device (GPU, FPGA, …) Compute Unit (CU) Kernel instance = Kernel object & nd_range & work-group decomposition Work-pool Command QueueCommand QueueCommand Queue enqueued… 35
  • 38. Intel Confidential 38  Global memory:  Accessible to all work-items in all work- groups.  Reads and writes may be cached.  Persistent across kernel invocations Memory Model Constant memory: • A region of global memory that remains constant during the execution of a kernel Local Memory: • Memory region shared between work-items in a single work-group. Private Memory: • Region of memory private to a work-item. Variables defined in one work-item’s private memory are not visible to another work-item Global/Constant Memory Device (GPU, FPGA, …) Compute Unit (CU) LocalMemoryLocalMemoryLocalMemoryLocalMemory Private Memory 38
  • 39. Intel Confidential 39 DPC++ - device memory model Local Memory Private Memory Work-Item Private Memory Work-Item Private Memory Work-Item Work-Group Global Memory Constant Memory Device Work-Group …… Work-GroupWork-Group … … Local Memory Private Memory Work-Item Private Memory Work-Item Private Memory Work-Item… Work-Group … Device
  • 40. Intel Confidential 40 Unified Shared Memory  SYCL 1.2.1 specification offers: – Buffer/Accessor: For tracking and managing memory transfer and guarantee data consistency across host and DPC++ devices.  Many HPC and Enterprise applications use pointers to manage data.  DPC++ Extension for Pointer Based programming: – Unified Shared Memory (USM): Device Kernels can access the data using pointers
  • 41. Intel Confidential 41 USM Allocation Device(Explicit data movement) Host(Data sent over bus, such as PCIe) Shared(Data can migrate b/w host and memory) Types of USM
  • 43. Intel Confidential 43 Kernel Execution Model  Kernel Parallelism  Multi Dimensional Kernel  ND-Range  Sub-group  Work-Group  Work Item
  • 44. Intel Confidential 44 Kernel Execution Model  Explicit ND-range for control- similar to programming models such as OpenCL, SYCL, CUDA. ND-range Global work size Work-group Work-item 44
  • 45. Intel Confidential 45 nd_range & nd_item  Example: Process every pixel in a 1920x1080 image  Each pixel needs processing, kernel is executed on each pixel (work-item)  1920 x 1080 = 2M pixels = global size  Not all 2M can run in parallel on device, there is hardware resource limits.  We have to split into smaller groups of pixel blocks = local size (work-group)  Either let the complier determine work-group size OR we can specify the work-group size using nd_range() 
  • 46. Intel Confidential 46 Example: Process every pixel in a 1920x1080 image  Let compiler determine work-group size   Programmer specifies work-group size h.parallel_for(nd_range<2>(range<2>(1920,1080),range<2>(8,8)), [=](id<2> item){ // CODE THAT RUNS ON DEVICE }) h.parallel_for(range<2>(1920,1080), [=](id<2> item){ // CODE THAT RUNS ON DEVICE }); nd_range & nd_item global size local size (work-group size)
  • 47. Intel Confidential 47 nd_range & nd_item  Example: Process every pixel in a 1920x1080 image  How do we choose work-group size? • Work-group size of 8x8 divides equally for 1920x1080 • Work-group size of 9x9 does not divide equally for 1920x1080 • Compiler will throw error (invalid work group size error) • Work-group size of 10x10 divides equally for 1920x1080 • Works, but always better to use multiple of 8 for better resource utilization • Work-group size of 24x24 divides equally for 1920x1080 • 24x24=576, will fail compile assuming GPU max work-group size is 256 GOOD
  • 48. 48