OpenCL
Host Programming



   Fast Forward Your Development   www.dsp-ip.com
OPENCL™ EXECUTION MODEL




  Fast Forward Your Development
OpenCL™ Execution Model
•Kernel
  ▫ Basic unit of executable code - similar to a C function
  ▫ Data-parallel or task-parallel
  ▫ H.264Encode is not a kernel
  ▫ Kernel should be a small separate function (SAD)
•Program
  ▫ Collection of kernels and other functions
  ▫ Analogous to a dynamic library
•Applications queue kernel execution instances
  ▫ Queued in-order
  ▫ Executed in-order or out-of-order


                                                              3
        Fast Forward Your Development
Data-Parallelism in OpenCL™
  •Define N-dimensional computation domain (N = 1, 2 or 3)
     ▫ Each independent element of execution in N-D
       domain is called a work-item
     ▫ The N-D domain defines the total number of work-
       items that execute in parallel
                                            Scalar                  Data-Parallel
1024 x 1024 image:
                           void                              kernel void
problem dimensions:        scalar_mul(int n,                 dp_mul(global const float *a,
1024 x 1024 = 1 kernel           const float *a,                   global const float *b,
execution per pixel:             const float *b,                   global float *result)
1,048,576 total executions       float *result)              {
                             {                                 int id = get_global_id(0);
                                 int i;                        result[id] = a[id] * b[id];
                                 for (i=0; i<n; i++)         }
                                  result[i] = a[i] * b[i];   // execute dp_mul over “n” work-items
                             }


                                                                                               4
              Fast Forward Your Development
Compiling Kernels
• Create a program
  ▫ Input: String (source code) or precompiled binary
  ▫ Analogous to a dynamic library: A collection of
    kernels
• Compile the program
  ▫ Specify the devices for which kernels should be
    compiled
  ▫ Pass in compiler flags
  ▫ Check for compilation/build errors
• Create the kernels
  ▫ Returns a kernel object used to hold arguments for
    a given execution
                                                         5
       Fast Forward Your Development
EX-1:OPENCL-”HELLO WORLD”




  Fast Forward Your Development
Fast Forward Your Development
BASIC Program structure
         Include
         Get Platform Info
         Create Context
         Load & compile program
         Create Queue
         Load and Run Kernel
                                    8
    Fast Forward Your Development
Includes
• Pay attention to include ALL OpenCL include
  files


#include   <cstdio>
#include   <cstdlib>
#include   <iostream>
#include   <SDKFile.hpp>
#include   <SDKCommon.hpp>
#include   <SDKApplication.hpp>
#include   <CL/cl.hpp>

                                                9
      Fast Forward Your Development
GetPlatformInfo
• Detects the OpenCL “Devices” in the system:
   ▫ CPUs, GPUs & DSPs
err = cl::Platform::get(&platforms);
if(err != CL_SUCCESS)
{   std::cerr << "Platform::get() failed (" << err << ")" << std::endl;
    return SDK_FAILURE;
}
std::vector<cl::Platform>::iterator i;
if(platforms.size() > 0)
{ for(i = platforms.begin(); i != platforms.end(); ++i)
   {
      if(!strcmp((*i).getInfo<CL_PLATFORM_VENDOR>(&err).c_str(), "Advanced
       Micro Devices, Inc."))
      { break;}
   }
}


                                                                             10
          Fast Forward Your Development
Create Context
• Context enables operation (Queue) and memory
  sharing between devices



cl_context_properties cps[3] =
{ CL_CONTEXT_PLATFORM, (cl_context_properties)(*i)(), 0 };
std::cout<<"Creating a context AMD platformn";
cl::Context context(CL_DEVICE_TYPE_CPU, cps, NULL, NULL, &err);
if (err != CL_SUCCESS)
{
       std::cerr << "Context::Context() failed (" << err << ")n";
       return SDK_FAILURE;
}


                                                                     11
        Fast Forward Your Development
Load Program
• Loads the kernel program (*.cl)

std::cout<<"Loading and compiling CL sourcen";
streamsdk::SDKFile file;
if (!file.open("HelloCL_Kernels.cl"))
{   std::cerr << "We couldn't load CL source coden";
    return SDK_FAILURE;}
cl::Program::Sources
sources(1, std::make_pair(file.source().data(),
file.source().size()));
cl::Program program = cl::Program(context, sources, &err);
if (err != CL_SUCCESS)
{   std::cerr << "Program::Program() failed (" << err << ")n";
    return SDK_FAILURE;
}

                                                                  12
        Fast Forward Your Development
Compile program
• Host program compiles Kernel program per
  device.
• Why compile in RT? - Like Java we don’t know the
  device till we run. We can decide in real-time
  based on load-balancing on which device to run
 err = program.build(devices);
    if (err != CL_SUCCESS) {

if(err == CL_BUILD_PROGRAM_FAILURE)
{      //Handle Error
       std::cerr << "Program::build() failed (" << err << ")n";
       return SDK_FAILURE;
}


                                                                   13
         Fast Forward Your Development
Create Kernel with program
• Associate Kernel object with our loaded and
  compiled program

cl::Kernel kernel(program, "hello", &err);
if (err != CL_SUCCESS)
{
  std::cerr << "Kernel::Kernel() failed (" << err << ")n";
  return SDK_FAILURE;
}
if (err != CL_SUCCESS) {
  std::cerr << "Kernel::setArg() failed (" << err << ")n";
  return SDK_FAILURE;
}


                                                          14
        Fast Forward Your Development
Create Queue per device & Run it
• Loads the kernel program (*.cl). This does not
  have to happen immediately
• Attention: enqueue() is Asynchronous call
  meaning : function return does not imply Kernel
  was executed or even started to execute
cl::CommandQueue queue(context, devices[0], 0, &err);
std::cout<<"Running CL programn";
err = queue.enqueueNDRangeKernel(…..)
err = queue.finish();
if (err != CL_SUCCESS) {
    std::cerr << "Event::wait() failed (" << err << ")n";
}




                                                             15
        Fast Forward Your Development
And that’s All Folks?
• Naaaa…..We still need to learn:
• Writing Kernel functions
• Synchronizing Kernel Functions
• Setting arguments to kernel functions
• Passing data from/to Host




                                          16
     Fast Forward Your Development
References
• “OpenCL Hello World” is an ATI OpenCL SDK
  programming exercise
• ATI OpenCL slides




                                              17
      Fast Forward Your Development
DSP-IP Contact information
Download slides at: www.dsp-ip.com

Course materials & lecture request
Yossi Cohen
info@dsp-ip.com
+972-9-8850956




                                   www.dsp-ip.com
                                   Mail : info@dsp-ip.com
                                   Phone: +972-9-8850956,
                                   Fax : +972-50- 8962910


       Fast Forward Your Development

OpenCL Programming 101

  • 1.
    OpenCL Host Programming Fast Forward Your Development www.dsp-ip.com
  • 2.
    OPENCL™ EXECUTION MODEL Fast Forward Your Development
  • 3.
    OpenCL™ Execution Model •Kernel ▫ Basic unit of executable code - similar to a C function ▫ Data-parallel or task-parallel ▫ H.264Encode is not a kernel ▫ Kernel should be a small separate function (SAD) •Program ▫ Collection of kernels and other functions ▫ Analogous to a dynamic library •Applications queue kernel execution instances ▫ Queued in-order ▫ Executed in-order or out-of-order 3 Fast Forward Your Development
  • 4.
    Data-Parallelism in OpenCL™ •Define N-dimensional computation domain (N = 1, 2 or 3) ▫ Each independent element of execution in N-D domain is called a work-item ▫ The N-D domain defines the total number of work- items that execute in parallel Scalar Data-Parallel 1024 x 1024 image: void kernel void problem dimensions: scalar_mul(int n, dp_mul(global const float *a, 1024 x 1024 = 1 kernel const float *a, global const float *b, execution per pixel: const float *b, global float *result) 1,048,576 total executions float *result) { { int id = get_global_id(0); int i; result[id] = a[id] * b[id]; for (i=0; i<n; i++) } result[i] = a[i] * b[i]; // execute dp_mul over “n” work-items } 4 Fast Forward Your Development
  • 5.
    Compiling Kernels • Createa program ▫ Input: String (source code) or precompiled binary ▫ Analogous to a dynamic library: A collection of kernels • Compile the program ▫ Specify the devices for which kernels should be compiled ▫ Pass in compiler flags ▫ Check for compilation/build errors • Create the kernels ▫ Returns a kernel object used to hold arguments for a given execution 5 Fast Forward Your Development
  • 6.
    EX-1:OPENCL-”HELLO WORLD” Fast Forward Your Development
  • 7.
    Fast Forward YourDevelopment
  • 8.
    BASIC Program structure Include Get Platform Info Create Context Load & compile program Create Queue Load and Run Kernel 8 Fast Forward Your Development
  • 9.
    Includes • Pay attentionto include ALL OpenCL include files #include <cstdio> #include <cstdlib> #include <iostream> #include <SDKFile.hpp> #include <SDKCommon.hpp> #include <SDKApplication.hpp> #include <CL/cl.hpp> 9 Fast Forward Your Development
  • 10.
    GetPlatformInfo • Detects theOpenCL “Devices” in the system: ▫ CPUs, GPUs & DSPs err = cl::Platform::get(&platforms); if(err != CL_SUCCESS) { std::cerr << "Platform::get() failed (" << err << ")" << std::endl; return SDK_FAILURE; } std::vector<cl::Platform>::iterator i; if(platforms.size() > 0) { for(i = platforms.begin(); i != platforms.end(); ++i) { if(!strcmp((*i).getInfo<CL_PLATFORM_VENDOR>(&err).c_str(), "Advanced Micro Devices, Inc.")) { break;} } } 10 Fast Forward Your Development
  • 11.
    Create Context • Contextenables operation (Queue) and memory sharing between devices cl_context_properties cps[3] = { CL_CONTEXT_PLATFORM, (cl_context_properties)(*i)(), 0 }; std::cout<<"Creating a context AMD platformn"; cl::Context context(CL_DEVICE_TYPE_CPU, cps, NULL, NULL, &err); if (err != CL_SUCCESS) { std::cerr << "Context::Context() failed (" << err << ")n"; return SDK_FAILURE; } 11 Fast Forward Your Development
  • 12.
    Load Program • Loadsthe kernel program (*.cl) std::cout<<"Loading and compiling CL sourcen"; streamsdk::SDKFile file; if (!file.open("HelloCL_Kernels.cl")) { std::cerr << "We couldn't load CL source coden"; return SDK_FAILURE;} cl::Program::Sources sources(1, std::make_pair(file.source().data(), file.source().size())); cl::Program program = cl::Program(context, sources, &err); if (err != CL_SUCCESS) { std::cerr << "Program::Program() failed (" << err << ")n"; return SDK_FAILURE; } 12 Fast Forward Your Development
  • 13.
    Compile program • Hostprogram compiles Kernel program per device. • Why compile in RT? - Like Java we don’t know the device till we run. We can decide in real-time based on load-balancing on which device to run err = program.build(devices); if (err != CL_SUCCESS) { if(err == CL_BUILD_PROGRAM_FAILURE) { //Handle Error std::cerr << "Program::build() failed (" << err << ")n"; return SDK_FAILURE; } 13 Fast Forward Your Development
  • 14.
    Create Kernel withprogram • Associate Kernel object with our loaded and compiled program cl::Kernel kernel(program, "hello", &err); if (err != CL_SUCCESS) { std::cerr << "Kernel::Kernel() failed (" << err << ")n"; return SDK_FAILURE; } if (err != CL_SUCCESS) { std::cerr << "Kernel::setArg() failed (" << err << ")n"; return SDK_FAILURE; } 14 Fast Forward Your Development
  • 15.
    Create Queue perdevice & Run it • Loads the kernel program (*.cl). This does not have to happen immediately • Attention: enqueue() is Asynchronous call meaning : function return does not imply Kernel was executed or even started to execute cl::CommandQueue queue(context, devices[0], 0, &err); std::cout<<"Running CL programn"; err = queue.enqueueNDRangeKernel(…..) err = queue.finish(); if (err != CL_SUCCESS) { std::cerr << "Event::wait() failed (" << err << ")n"; } 15 Fast Forward Your Development
  • 16.
    And that’s AllFolks? • Naaaa…..We still need to learn: • Writing Kernel functions • Synchronizing Kernel Functions • Setting arguments to kernel functions • Passing data from/to Host 16 Fast Forward Your Development
  • 17.
    References • “OpenCL HelloWorld” is an ATI OpenCL SDK programming exercise • ATI OpenCL slides 17 Fast Forward Your Development
  • 18.
    DSP-IP Contact information Downloadslides at: www.dsp-ip.com Course materials & lecture request Yossi Cohen info@dsp-ip.com +972-9-8850956 www.dsp-ip.com Mail : info@dsp-ip.com Phone: +972-9-8850956, Fax : +972-50- 8962910 Fast Forward Your Development