Productive OpenCL Programming An Introduction to OpenCL Libraries with ArrayFire COO Oded Green

1,857 views
1,658 views

Published on

In this webinar presentation, ArrayFire COO Oded Green demonstrates best practices to help you quickly get started with OpenCL™ programming. Learn how to get the best performance from AMD hardware in various programming languages using ArrayFire. Oded discusses the latest advancements in the OpenCL™ ecosystem, including cutting edge OpenCL™ libraries such as clBLAS, clFFT, clMAGMA and ArrayFire. Examples are shown in real code for common application domains.
Watch the webinar here: http://bit.ly/1obT0M2
For more developer resources, visit:
http://arrayfire.com/
http://developer.amd.com/
Follow us on Twitter: https://twitter.com/AMDDevCentral
See info in the slides for more contact information and resource links!

Published in: Technology, Education
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,857
On SlideShare
0
From Embeds
0
Number of Embeds
104
Actions
Shares
0
Downloads
32
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

Productive OpenCL Programming An Introduction to OpenCL Libraries with ArrayFire COO Oded Green

  1. 1. An Introduction to OpenCL Libraries Productive OpenCL Programming
  2. 2. ● We make code run faster ○ Started in 2007 by Georgia Tech researchers ○ 1000s of paying customers
  3. 3. ● We build an acceleration library ○ for really cool science, engineering, and finance applications ○ for mobile computing
  4. 4. Libraries are Great!
  5. 5. Eliminate Hidden Costs
  6. 6. Library Types ● Specialized GPU Libs ○ Targeted at a specific set of operators (functionality) ○ Optimized for specific systems ○ C-like interface ○ Raw pointer interface ● General GPU Libs ○ Manage GPU resources using containers ○ Applicable to a large set of applications and domains ○ Portable across multiple architectures ○ Higher level functions ○ C++ interface (supports templates)
  7. 7. Specialized GPU Libraries ● Fast Fourier Transforms ○ clFFT ● Random Number Generation ○ Random123 ● Linear Algebra ○ clBLAS ○ MAGMA ● Signal and Image Processing ○ OpenCLIPP
  8. 8. Specialized GPU Libraries ● C Interface ○ Use pointers to reference data ● Memory management is programmer responsibility ● Mimic existing libraries ○ clBLAS ≈ BLAS ○ MAGMA ≈ BLAS + LAPACK ○ clFFT ≈ FFTW ● Simplifies GPU integration of specialized scientific libraries ○ Still requires setting up the GPU
  9. 9. clFFT ● 1D, 2D and 3D transforms ● CPU and GPU backends ● Supports ○ Real and complex data types ○ Single and double-precision ○ Execution of multiple transformations concurrently
  10. 10. Random123 ● Counter-based RNG ● Passed SmallCrush, Crush and BigCrush tests ● Four RNG families ○ Threefry ○ Philox ○ AESNI ○ ARS ● Not suitable for cryptography
  11. 11. Magma & clBLAS ● Implements many popular linear algebra routines ● Supports ○ Real and complex data types ○ Single and double-precision
  12. 12. OpenCLIPP ● Supports multiple image types ● Similar to Intel IPP ● Primitives ○ Arithmetic and logic ○ LUT ○ Morphology ○ Transform ○ Resize ○ Histogram ○ Many more… ● C and C++ interface
  13. 13. General-Purpose GPU Libraries ● Bolt ● OpenCV ● ArrayFire Images taken from: http://wordlesstech.com/2012/10/12/leatherman-oht-multi-tool/
  14. 14. Bolt ● GPU library which resembles C++ STL ○ STL like data structures ○ Iterators ○ Fully interoperable with OpenCL ● Parallel vector operation methods ○ Reductions ○ Sorting ○ Prefix-Sum ● Customizable GPU kernels using functors ● Some functions only supported on AMD GPUs
  15. 15. Bolt - Data Structures ● Built around the device_vector ● Supports the same data types as C++ ○ device_vector<float> data(2e6); ● Useful when performing multiple operations on a vector ● Can be passed into STL algorithms ○ Always interoperability ○ Data transfer will be costly
  16. 16. Bolt - Algorithms ● Uses a C++ STL like interface ○ Pass the begin and end iterators ● Accept functors which allow you to run custom operations on OpenCL devices ● Multiple backends ○ OpenCL, C++AMP, and TBB ○ Not all algorithms implemented across all backends ● Works on vector and device_vector
  17. 17. OpenCV ● Open source computer vision library ● C++ interface with many language wrappers ● Hundreds of CV functions
  18. 18. OpenCV ArrayFire Interop ● Helper Functions ○ https://github.com/arrayfire-community/arrayfire_opencv.git Mat R; Rodrigues(poses(Rect(0, 0, 1, 3)), R); af::array af_R = mat_to_array(R);
  19. 19. ArrayFire - Data Structures ● Built around a flexible data structure named "array" ○ Lightweight wrapper around the data on the compute device ○ Manages the data and basic metadata such as size, type and dimensions ● You can transfer data into an array using constructors ● Column major float hA[6] = {0, 1, 2, 3, 4, 5}; array A(2, 3, hA);
  20. 20. ArrayFire - Indexing #include <arrayfire.h> #include <af/utils.h> void af_example() { float f[8] = {1, 2, 4, 8, 16, 32, 64, 128}; array a(2, 4, f); // 2 rows x 4 col array initialized with f values array sumSecondCol = sum(a(span, 1)); // reduce-sum over the second column print(sumSecondCol); // 12 }
  21. 21. Using ArrayFire: array tmp = img(span,span,0); // save the R channel img(span,span,0) = img(span,span,2); // R channel gets values of B img(span,span,2) = tmp; // B channel gets value of R Can also do it this way: array swapped = join(2, img(span,span,2), // blue img(span,span,1), // green img(span,span,0)); // red Or simply: array swapped = img(span,span,seq(2,-1,0)); ArrayFire Example - swap R and B
  22. 22. Using ArrayFire: array img = loadimage("image.jpg", false); // load grayscale image from disk to device array img_T = img.T(); // transpose ArrayFire Functions
  23. 23. Original
  24. 24. Grayscale
  25. 25. Box filter blur
  26. 26. Gaussian blur
  27. 27. Image Negative
  28. 28. ArrayFire // erode an image, 8-neighbor connectivity array mask8 = constant(1,3, 3); array img_out = erode(img_in, mask8); // erode an image, 4-neighbor connectivity const float h_mask4[] = { 0.0, 1.0, 0.0, 1.0, 1.0, 1.0, 0.0, 1.0, 0.0 }; array mask4 = array(3, 3, h_mask4); array img_out = erode(img_in, mask4); Erosion
  29. 29. Erosion
  30. 30. ArrayFire array R = convolve(img, ker); // 1, 2 and 3d convolution filter array R = convolve(fcol, frow, img); // Separable convolution array R = filter(img, ker); // 2d correlation filter Filtering
  31. 31. Histograms ArrayFire int nbins = 256; array hist = histogram(img,nbins);
  32. 32. Transforms ArrayFire array half = resize(0.5, img); array rot90 = rotate(img, af::Pi/2); array warped = approx2(img, xLocations, yLocations);
  33. 33. Image smoothing ArrayFire array S = bilateral(I, sigma_r, sigma_c); array M = meanshift(I, sigma_r, sigma_c, iter); array R = medfilt(img, 3, 3); // Gaussian blur array gker = gaussiankernel(ncols, ncols); array res = convolve(img, gker);
  34. 34. FFT ArrayFire array R1 = fft2(I); // 2d fft. check fft, fft3 array R2 = fft2(I, M, N); // fft2 with padding array R3 = ifft2(fft2(I, M, N) * fft2(K, M, N)); // convolve using fft2
  35. 35. ArrayFire Capabilities ● Hundreds of parallel functions for multi-disciplinary work ○ Image processing ○ Machine learning ○ Graphics ○ Sets ● Support for multiple languages ○ C/C++, Fortran, Java and R ● Linux, Windows, Mac OS X
  36. 36. ArrayFire Capabilities ● OpenGL based graphics ● JIT ○ Combine multiple operations into one kernel ● GFOR - data parallel loop ○ Allows concurrent execution over multiple data sets (for example images)
  37. 37. ArrayFire Functions ● Supports hundreds of parallel functions ○ Building blocks ■ Reductions ■ Scan ■ Set operations ■ Sorting ■ Statistics ■ Basic matrix manipulation Images taken from: http://technogems.blogspot.com/2011/06/sorting-included-files-by-importance.html http://www.cmsoft.com.br/tutorialOpenCL/CLMatrixMultExplanationSubMatrixes.png
  38. 38. ArrayFire Functions ● Hundreds of highly-optimized parallel functions ○ Signal/image processing ■ Convolution ■ FFT ■ Histograms ■ Interpolation ■ Connected components ○ Linear Algebra ■ Matrix multiply ■ Linear system solving ■ Factorization
  39. 39. GFOR: What is it? • Data-Parallel for loop, e.g. for (i = 0; i < 3; i++) C(span,span,i) = A(span,span,i) * B; gfor (array i, 3) C(span,span,i) = A(span,span,i) * B; Serial matrix-vector multiplications (3 kernel launches) Parallel matrix-vector multiplications (1 kernel launch)
  40. 40. Example: Matrix Multiply • Data-Parallel for loop, e.g. * BA(,,1) iteration i = 1 C(,,1) = for (i = 0; i < 3; i++) C(span,span,i) = A(span,span,i) * B; Serial matrix-vector multiplications (3 kernel launches)
  41. 41. Example: Matrix Multiply • Data-Parallel for loop, e.g. for (i = 0; i < 3; i++) C(span,span,i) = A(span,span,i) * B; * BA(,,1) iteration i = 1 C(,,1) = * BA(,,2) iteration i = 2 C(,,2) = Serial matrix-vector multiplications (3 kernel launches)
  42. 42. Example: Matrix Multiply • Data-Parallel for loop, e.g. for (i = 0; i < 3; i++) C(span,span,i) = A(span,span,i) * B; * BA(,,1) iteration i = 1 C(,,1) = * BA(,,2) iteration i = 2 C(,,2) = * BA(,,3) iteration i = 3 C(,,3) = Serial matrix-vector multiplications (3 kernel launches)
  43. 43. Example: Matrix Multiply gfor (array i, 3) C(span,span,i) = A(span,span,i) * B; Parallel matrix multiplications (1 kernel launch) simultaneous iterations i = 1:3 * BA(,,1)C(,,1) = * BA(,,2)C(,,2) = * BA(,,3)C(,,3) =
  44. 44. Example: Matrix Multiply simultaneous iterations i = 1:3 BA(,,1:3)C(,,1:3) *= *= *= Think of GFOR as compiling 1 stacked kernel with all iterations. gfor (array i, 3) C(span,span,i) = A(span,span,i) * B; Parallel matrix multiplications (1 kernel launch)
  45. 45. JIT Code Generation ● Run time kernel generation ● Combines multiple element wise operations into one kernel ● Reduces kernel launching overhead ● Intermediate data not allocated ● Improves cache performance
  46. 46. Success Stories Field Application Speedup Academia Power Systems Simulations 35x Finance Option Pricing 52x Government Radar Image Formation 45x Life Sciences Pathology Advances > 100x Manufacturing Tomography of Vegetation 10x Media & Computer Vision Digital Holography 17x Oil & Gas Ground Water Simulations > 20x
  47. 47. Future capabilities ● We are interested in Big Data applications ● Create capabilities for ○ Streaming video ○ Large number of images ○ Machine learning ○ Data analysis ○ Dynamic data ● Faster rendering utilities for Big Data
  48. 48. Comments on Open Source ● https://github.com/arrayfire-community
  49. 49. Q & A Speaker: Oded Green (oded@arrayfire.com) Engineers: Umar Urshad (umar@ArrayFire.com) Pavan Yalamanchili (pavan@ArrayFire.com) Sales: Scott Blakeslee (scott@ArrayFire.com)
  50. 50. Look us up www.ArrayFire.com For language wrappers and examples https://github.com/ArrayFire

×