Introduction to OpenCL By Hammad Ghulam Mustafa

•

0 likes•150 views

Brief Introduction to OpenCL and also includes the demonstration using a source code showing in these slides. Prepared By: 1. Hammad Ghulam Mustafa 2. Hafiz Muhammad Noman Zahid Gujjar 3. Muhammad Abdullah Ijaz Rhandawa 4. Malik Waqas BashirAbid 5. Muhammad Umar Arshad 6. Umair Javaid Chaudhary 7. Suleman Khan 8. Ali Islal Chaudhary

Technology

INTRODUCTION
Hammad Ghulam Mustafa
Hafiz Muhammad Noman Zahid
Muhammad Abdullah Ijaz
Malik Waqas Bashir Abid
Muhammad Umar Arshad
Umair Javaid
Suleman Khan
Ali Islal

• Introduction
• Programming Basics
• OpenCL Execution Model
• “Hello World”
• Conclusion

• Standard for the development of data parallel applications
• Most used for the development of GPGPU applications
• General Purpose computing on Graphics Processing Units
• A GPU is comprised of hundreds of compute cores
• Specialized for massively data parallel computation

• GPGPU: Take advantage of GPU’s computing power to make massively parallel
applications
• Parallel applications with huge acceleration in Molecular Dynamics, Image
Processing, Evolutionary Computation,…
• All cases based on data parallelism:
each thread processes a subset of the data
For example, a vector addition:

• Furthermore, OpenCL provides portability:
same code can run on different architectures
• For Example:

• Provides the following abstraction: A compute device is composed by
compute units
• OpenCL platform: Host + Compute Devices
Each manufacturer provides an SDK:
• NVIDIA SDK for GPUs
• AMD APP for CPUs/GPU
• Intel for CPUs
• IBM for PowerPC and Cell B/E

$• Kernel: function that defines the behavior of each thread • For example, kernel for vector addition: __kernel void sumKernel ( __global int* a, __global int* b, __global int* c) { int i = get_global_id(0); c[i] = a[i] + b[i]; } Written in OpenCL-C: ANSI-C + Set of kernel functions, e.g.: • get_global_id: obtains thread index • barrier: synchronizes threads$

• An OpenCL applications consists of:
• Basic host application flow:
a. Load and Compilation of kernel
b. Data copy from host to device (e.g. from CPU to GPU)
c. Execution of kernel
d. Data copy from device to host
e. Release kernels and data from device memory
• Execution using command queue in each device

• Host code: programmed using OpenCL API
• API Calls, such as:
• clCreateProgramWithSource: Load kernel from char*
• clBuildProgram: Compile kernel
• clSetKernelArgs: Set kernel arguments for the device
• clEnqueueWriteBuffer/clEnqueueRead: Copy data vector to device
• clEnqueueNDRangerKernel: Launch kernel in device
• API Types, such as:
• cl_mem: Pointer to device memory objects
• cl_program: Kernel object
• cl_float / cl_int / cl_uint: Redefinition of C types

• Kernel
• Basic unit of executable code -similar to a C function
• Data-parallel or task-parallel
• H.264Encode is not a kernel
• Kernel should be a small separate function (SAD)
• Program
• Collection of kernels and other functions
• Analogous to a dynamic library
• Applications queue kernel execution instances
• Queued in-order
• Executed in-order or out-of-order

• Define N-dimensional computation domain (N = 1, 2 or 3)
• Each independent element of execution in N-D domain is called a work-item
• The N-D domain defines the total number of work-items that execute in parallel

• Create a program
• Input: String (source code) or precompiled binary
• Analogous to a dynamic library: A collection of kernels
• Compile the program
• Specify the devices for which kernels should be compiled
• Pass in compiler flags
• Check for compilation/build errors
• Create the kernels
• Returns a kernel object used to hold arguments for a given execution

• OpenCL does not provide performance portability
• Alternative to NVIDIA CUDA:
Programming paradigm for NVIDIA GPU cards
• Combinable with other parallel programming models:
OpenMP for SMPs / MPI for MPPs
• Huge ecosystems for OpenCL, e.g. OpenACC:
Develop GPGPU applications using directives
#pragma acc kernels
for(i = 0; i< N; i++)
c[i] = b[i] + a[i];

Introduction to OpenCL By Hammad Ghulam Mustafa

What's hot

OpenNebulaConf2017EU: FairShare Scheduling by Valentina Zaccolo, INDIGOOpenNebula Project

Hyperloglog Lightning TalkSimon Prickett

OpenNebula and StorPool: Building Powerful CloudsOpenNebula Project

OpenNebulaConf2017EU: Transforming an Old Supercomputer into a Cloud Platform...OpenNebula Project

OpenNebulaconf2017EU: OpenNebula 5.4 and Beyond by Tino Vázquez and Ruben S. ...OpenNebula Project

WSO2Con USA 2015: Deployment Patterns and Capacity PlanningWSO2

Apache GobblinMike Frampton

Kubernetes Webinar - Using ConfigMaps & Secrets Janakiram MSV

OpenNebula Conf 2014 | Building Hybrid Cloud Federated Environments with Open...NETWAYS

Concourse ci container based ci for the cloudJohannes Rudolph

Workshop actualización SVG CESGA 2012 CESGA Centro de Supercomputación de Galicia

Airflow introductionChandler Huang

Hadoop analytics provisioning based on a virtual infrastructureCESGA Centro de Supercomputación de Galicia

AWS guerrilla orchestrationSlobodan Utvić

Whats new in Havana--SwiftMirantis

OpenNebula Conf 2014 | Bootstrapping a virtual infrastructure using OpenNebul...NETWAYS

Intro to Kubernetesmatthewbrahms

Micheal Pershyn "Coljure 4 Big Data"Lviv Startup Club

Kubernetes Application Deployment with Helm - A beginner Guide!Krishna-Kumar

OSOM - Operations in the CloudMarcela Oniga

What's hot (20)

OpenNebulaConf2017EU: FairShare Scheduling by Valentina Zaccolo, INDIGO

Hyperloglog Lightning Talk

OpenNebula and StorPool: Building Powerful Clouds

OpenNebulaConf2017EU: Transforming an Old Supercomputer into a Cloud Platform...

OpenNebulaconf2017EU: OpenNebula 5.4 and Beyond by Tino Vázquez and Ruben S. ...

WSO2Con USA 2015: Deployment Patterns and Capacity Planning

Apache Gobblin

Kubernetes Webinar - Using ConfigMaps & Secrets

OpenNebula Conf 2014 | Building Hybrid Cloud Federated Environments with Open...

Concourse ci container based ci for the cloud

Workshop actualización SVG CESGA 2012

Airflow introduction

Hadoop analytics provisioning based on a virtual infrastructure

AWS guerrilla orchestration

Whats new in Havana--Swift

OpenNebula Conf 2014 | Bootstrapping a virtual infrastructure using OpenNebul...

Intro to Kubernetes

Micheal Pershyn "Coljure 4 Big Data"

Kubernetes Application Deployment with Helm - A beginner Guide!

OSOM - Operations in the Cloud

Similar to Introduction to OpenCL By Hammad Ghulam Mustafa

Introduction to OpenCLUnai Lopez-Novoa

MattsonTutorialSC14.pptxgopikahari7

Hands on OpenCLVladimir Starostenkov

OpenCL Programming 101Yoss Cohen

MattsonTutorialSC14.pdfGeorge Papaioannou

The Rise of Parallel Computingbakers84

OpenCL Heterogeneous Parallel ComputingJoão Paulo Leonidas Fernandes Dias da Silva

WT-4069, WebCL: Enabling OpenCL Acceleration of Web Applications, by Mikael ...AMD Developer Central

Evaluating GPU programming Models for the LUMI SupercomputerGeorge Markomanolis

Neptune @ SoCalChris Bunch

clWrap: Nonsense free control of your GPUJohn Colvin

Server-side JS with NodeJSLilia Sfaxi

Distributed Tensorflow with Kubernetes - data2day - Jakob KaralusJakob Karalus

Freedreno on Android – XDC 2023Igalia

Utilizing AMD GPUs: Tuning, programming models, and roadmapGeorge Markomanolis

Kubernetes: The Next Research PlatformBob Killen

Energy profiler for android emulatorDiego Ruggeri

TechBeats #2applausepoland

Lec 10-linux-reviewabinaya m

Parallel and Distributed Computing Chapter 8AbdullahMunir32

Similar to Introduction to OpenCL By Hammad Ghulam Mustafa (20)

Introduction to OpenCL

MattsonTutorialSC14.pptx

Hands on OpenCL

OpenCL Programming 101

MattsonTutorialSC14.pdf

The Rise of Parallel Computing

OpenCL Heterogeneous Parallel Computing

WT-4069, WebCL: Enabling OpenCL Acceleration of Web Applications, by Mikael ...

Evaluating GPU programming Models for the LUMI Supercomputer

Neptune @ SoCal

clWrap: Nonsense free control of your GPU

Server-side JS with NodeJS

Distributed Tensorflow with Kubernetes - data2day - Jakob Karalus

Freedreno on Android – XDC 2023

Utilizing AMD GPUs: Tuning, programming models, and roadmap

Kubernetes: The Next Research Platform

Energy profiler for android emulator

TechBeats #2

Lec 10-linux-review

Parallel and Distributed Computing Chapter 8

Recently uploaded

Partners Life - Insurer Innovation Award 2024The Digital Insurer

2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong

Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra

Automating Google Workspace (GWS) & more with Apps Scriptwesley chun

MINDCTI Revenue Release Quarter One 2024MIND CTI

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10

Scaling API-first – The story of a global engineering organizationRadu Cotescu

Artificial Intelligence: Facts and MythsJoaquim Jorge

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung

Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal

A Domino Admins Adventures (Engage 2024)Gabriella Davis

TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc

Manulife - Insurer Innovation Award 2024The Digital Insurer

GenAI Risks & Security Meetup 01052024.pdflior mazor

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j

presentation ICT roal in 21st century educationjfdjdjcjdnsjd

Boost PC performance: How more available memory can improve productivityPrincipled Technologies

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software

Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun

Recently uploaded (20)

Partners Life - Insurer Innovation Award 2024

2024: Domino Containers - The Next Step. News from the Domino Container commu...

Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving

Automating Google Workspace (GWS) & more with Apps Script

MINDCTI Revenue Release Quarter One 2024

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...

Scaling API-first – The story of a global engineering organization

Artificial Intelligence: Facts and Myths

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf

A Domino Admins Adventures (Engage 2024)

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery

Manulife - Insurer Innovation Award 2024

GenAI Risks & Security Meetup 01052024.pdf

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...

presentation ICT roal in 21st century education

Boost PC performance: How more available memory can improve productivity

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

Powerful Google developer tools for immediate impact! (2023-24 C)

Introduction to OpenCL By Hammad Ghulam Mustafa

1. INTRODUCTION Hammad Ghulam Mustafa Hafiz Muhammad Noman Zahid Muhammad Abdullah Ijaz Malik Waqas Bashir Abid Muhammad Umar Arshad Umair Javaid Suleman Khan Ali Islal

2. • Introduction • Programming Basics • OpenCL Execution Model • “Hello World” • Conclusion

3. • Standard for the development of data parallel applications • Most used for the development of GPGPU applications • General Purpose computing on Graphics Processing Units • A GPU is comprised of hundreds of compute cores • Specialized for massively data parallel computation

4. • GPGPU: Take advantage of GPU’s computing power to make massively parallel applications • Parallel applications with huge acceleration in Molecular Dynamics, Image Processing, Evolutionary Computation,… • All cases based on data parallelism: each thread processes a subset of the data For example, a vector addition:

5. • Furthermore, OpenCL provides portability: same code can run on different architectures • For Example:

6. • Provides the following abstraction: A compute device is composed by compute units • OpenCL platform: Host + Compute Devices Each manufacturer provides an SDK: • NVIDIA SDK for GPUs • AMD APP for CPUs/GPU • Intel for CPUs • IBM for PowerPC and Cell B/E

7. • Kernel: function that defines the behavior of each thread • For example, kernel for vector addition: __kernel void sumKernel ( __global int* a, __global int* b, __global int* c) { int i = get_global_id(0); c[i] = a[i] + b[i]; } Written in OpenCL-C: ANSI-C + Set of kernel functions, e.g.: • get_global_id: obtains thread index • barrier: synchronizes threads

8. • An OpenCL applications consists of: • Basic host application flow: a. Load and Compilation of kernel b. Data copy from host to device (e.g. from CPU to GPU) c. Execution of kernel d. Data copy from device to host e. Release kernels and data from device memory • Execution using command queue in each device

9. • Host code: programmed using OpenCL API • API Calls, such as: • clCreateProgramWithSource: Load kernel from char* • clBuildProgram: Compile kernel • clSetKernelArgs: Set kernel arguments for the device • clEnqueueWriteBuffer/clEnqueueRead: Copy data vector to device • clEnqueueNDRangerKernel: Launch kernel in device • API Types, such as: • cl_mem: Pointer to device memory objects • cl_program: Kernel object • cl_float / cl_int / cl_uint: Redefinition of C types

10.

11. • Kernel • Basic unit of executable code -similar to a C function • Data-parallel or task-parallel • H.264Encode is not a kernel • Kernel should be a small separate function (SAD) • Program • Collection of kernels and other functions • Analogous to a dynamic library • Applications queue kernel execution instances • Queued in-order • Executed in-order or out-of-order

12. • Define N-dimensional computation domain (N = 1, 2 or 3) • Each independent element of execution in N-D domain is called a work-item • The N-D domain defines the total number of work-items that execute in parallel

13. • Create a program • Input: String (source code) or precompiled binary • Analogous to a dynamic library: A collection of kernels • Compile the program • Specify the devices for which kernels should be compiled • Pass in compiler flags • Check for compilation/build errors • Create the kernels • Returns a kernel object used to hold arguments for a given execution

14.

15.

16.

17.

18.

19.

20.

21.

22.

23.

24. • OpenCL does not provide performance portability • Alternative to NVIDIA CUDA: Programming paradigm for NVIDIA GPU cards • Combinable with other parallel programming models: OpenMP for SMPs / MPI for MPPs • Huge ecosystems for OpenCL, e.g. OpenACC: Develop GPGPU applications using directives #pragma acc kernels for(i = 0; i< N; i++) c[i] = b[i] + a[i];

Introduction to OpenCL By Hammad Ghulam Mustafa

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Introduction to OpenCL By Hammad Ghulam Mustafa

Similar to Introduction to OpenCL By Hammad Ghulam Mustafa (20)

Recently uploaded

Recently uploaded (20)

Introduction to OpenCL By Hammad Ghulam Mustafa