OPENCV
OPENCL™ ACCELERATED COMPUTER VISION
 OpenCV Introduction
Andrey Pavlenko, Itseez

 Heterogeneous Compute and OpenCV
Dr. Harris Gasparakis, AMD

 OpenCV 3.0...
OpenCV introduction
Andrey Pavlenko
1.

Features

2.

History

3.

Development Process

4.

Performance
Open-source Computer Vision Library
1. 2,500+ algorithms and functions
2. Cross-platform

3. Liberal BSD license
4. High p...
Functionality overview
Image Processing

Filters

Transformations

Edges, contours

Robust features

Segmentation

Video, ...
Industrial applications
• Street View Panorama, etc. (Google)
• Vision system of the PR2 robot (Willow Garage)
• Robots fo...
OpenCV History
Popularity

Contributors

Core team
2000
First
public
release

2008

2009
v2.0
C++ API

2012
@github

2013
...
Contribution/patch workflow:
see OpenCV wiki

OpenCV infrastructure
build.opencv.org: buildbot with 50+ builders

50+ buil...
OpenCV resources
1. Home: opencv.org
2. Docs and tutorials: docs.opencv.org
3. Q&A forum: answers.opencv.org
4. Wiki and i...
OpenCL™ in OpenCV 2.4
• ‘ocl’ is a separate module (cv::ocl::resize())
• runs on various OpenCL-compliant devices and OSes...
OpenCL™ performance in OpenCV 2.4
AMD A10-6800k (with HD8670D) + Radeon HD7790
HETEROGENEOUS COMPUTE AND OPENCV
 The OpenCL™ Module in OpenCV
 Heterogeneous compute and Computer Vision
 Compute path...
OPENCV’S OPENCL™ MODULE
 Enables taking advantage of OpenCL™ acceleration, but currently it is an explicit path a develop...
COMPILING FROM SOURCE

 OpenCL™ is enabled by default in CMAKE

14 OPENCV-CL | NOVEMBER 12,2013 | DR. HARRIS GASPARAKIS |...
COMPILING FROM SOURCE
BROWSE/BUILD CODE IN AN IDE

OpenCL™ module (2.4.x).
Rebuild it even if you just
change a kernel
Ope...
INCORPORATING OPENCV INTO YOUR OWN CODE
 APP SDK provides 3 examples. Very easy integration!
 With less than 15 lines of...
INCORPORATING OPENCV INTO YOUR OWN CODE
SOME CODE, FROM APP SDK 2.9, GESTURE SAMPLE, SHOWCASING OPENNI® INTEGRATION

cv::M...
HETEROGENEOUS COMPUTE AND COMPUTER VISION
Webcams
everywhere

Heterogeneous
compute everywhere

Real time
computer vision
...
DATA REPRESENTATIONS
DISCRETE

APUS, OPENCL™ 1.2

 Copy data to/from
GPU

 Use “device Memory” for data that is used
bet...
H1’14: APUS, OPENCL™ 1.2 + HSA extensions OR OPENCL 2.0

 Can still use “device Memory” for data that is used between GPU...
COMPUTE PATHS
OpenCV 2.4.x: Face detect on CPU

// initialization
VideoCapture vcap(...);
CascadeClassifier fd("haar_ff.xm...
COMPUTE PATHS
OpenCV 2.4.x: Face detect with OpenCL™
// initialization
VideoCapture vcap(...);

ocl::OclCascadeClassifier ...
FUTURE ROADMAP
‒ Incorporate OpenCL™ 1.2 with HSA extensions, and OpenCL 2.0
‒ Shared Virtual Memory (SVM) significantly s...
DISCLAIMER & ATTRIBUTION

The information presented in this document is for informational purposes only and may contain te...
OpenCV 3.0
Vadim Pisarevsky
1.

Transparent API

2.

UMat

3.

Under the hood
OpenCV 3.0
• OpenCV 3.0 is scheduled for 2014’Q1
• Based on 2.x, but:
– transparent API and more efficient and platform-sp...
Transparent API
• same code can run on CPU or GPU

– no specialized cv::ocl::Canny vs cv::Canny
– no recompilation is need...
UMat

• Mat=>UMat is the only change needed
• Sometimes, somewhere (HSA) it’s not needed either!
// initialization
VideoCa...
Transparent API: under the hood
bool _ocl_cvtColor(InputArray src, OutputArray dst, int code) {
static ocl::ProgramSource ...
OpenCV+OpenCL™ execution model
CPU threads

…
cv::ocl::Queue

cv::ocl::Queue

cv::ocl::Device

…

…

cv::ocl::Queue

cv::o...
Summary & Future directions
• OpenCL™ is a great tool to boost performance of vision
algorithms; OpenCV unleashes its pote...
The first results
Upcoming SlideShare
Loading in …5
×

MM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey Pavlenko

4,029 views

Published on

Presentation MM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey Pavlenko at the AMD Developer Summit (APU13) November 11-13, 2013.

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
4,029
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
98
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

MM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey Pavlenko

  1. 1. OPENCV OPENCL™ ACCELERATED COMPUTER VISION
  2. 2.  OpenCV Introduction Andrey Pavlenko, Itseez  Heterogeneous Compute and OpenCV Dr. Harris Gasparakis, AMD  OpenCV 3.0 Vadim Pisarevsky, Itseez
  3. 3. OpenCV introduction Andrey Pavlenko 1. Features 2. History 3. Development Process 4. Performance
  4. 4. Open-source Computer Vision Library 1. 2,500+ algorithms and functions 2. Cross-platform 3. Liberal BSD license 4. High performance 5. Professionally developed 6. 7M+ downloads
  5. 5. Functionality overview Image Processing Filters Transformations Edges, contours Robust features Segmentation Video, Stereo, 3D Calibration Pose estimation Optical Flow Detection and recognition Depth
  6. 6. Industrial applications • Street View Panorama, etc. (Google) • Vision system of the PR2 robot (Willow Garage) • Robots for Mars exploration (NASA) • Quality control of the production of coins (China)
  7. 7. OpenCV History Popularity Contributors Core team 2000 First public release 2008 2009 v2.0 C++ API 2012 @github 2013 v2.4.3, opencl present
  8. 8. Contribution/patch workflow: see OpenCV wiki OpenCV infrastructure build.opencv.org: buildbot with 50+ builders 50+ builds nightly! github.com/itseez/opencv pullrequest.opencv.org Every patch to OpenCV must pass 7 builders!
  9. 9. OpenCV resources 1. Home: opencv.org 2. Docs and tutorials: docs.opencv.org 3. Q&A forum: answers.opencv.org 4. Wiki and issues: code.opencv.org 5. Develop: https://github.com/Itseez/opencv 6. Packages: sourceforge.net/projects/opencvlibrary/
  10. 10. OpenCL™ in OpenCV 2.4 • ‘ocl’ is a separate module (cv::ocl::resize()) • runs on various OpenCL-compliant devices and OSes • 2.4.7 release on November 6 – – – – – – – – official Windows bin pack with OpenCL enabled OpenCV pre-commit check includes OpenCL tests 200+ pull requests since 2.4.6 (most actively developed OpenCV part) dynamic OpenCL runtime loading set default OpenCL device via environment variable ~800 optimized kernels, ~30% of most commonly used functionality 8000+ accuracy and ~500 performance tests can be built without OpenCL SDK installed
  11. 11. OpenCL™ performance in OpenCV 2.4 AMD A10-6800k (with HD8670D) + Radeon HD7790
  12. 12. HETEROGENEOUS COMPUTE AND OPENCV  The OpenCL™ Module in OpenCV  Heterogeneous compute and Computer Vision  Compute paths and data representations  Future roadmap: transparent API 12 OPENCV-CL | NOVEMBER 12,2013 | DR. HARRIS GASPARAKIS | CONFIDENTIAL
  13. 13. OPENCV’S OPENCL™ MODULE  Enables taking advantage of OpenCL™ acceleration, but currently it is an explicit path a developer can choose to call. All OpenCL memory buffer types are supported, but not automatically optimized. ‒ But stay tuned for OpenCV 3.0’s transparent API.  Initial release: OpenCV 2.4.3 [11/2012]  Currently ~800 kernels ‒ Image processing ‒ Pixel-wise operations ‒ Geometric transforms ‒ Pixel transforms: filtering, edges, corners etc ‒ Feature detection and matching ‒ SURF, HOG, Haar, brute matching, kNN. templateMatching ‒ Object recognition ‒ SVM: Support Vector Machine  Applications, including: ‒ Face Detect ‒ Optical flow ‒ Stereo Matching 13 OPENCV-CL | NOVEMBER 12,2013 | DR. HARRIS GASPARAKIS | CONFIDENTIAL
  14. 14. COMPILING FROM SOURCE  OpenCL™ is enabled by default in CMAKE 14 OPENCV-CL | NOVEMBER 12,2013 | DR. HARRIS GASPARAKIS | CONFIDENTIAL
  15. 15. COMPILING FROM SOURCE BROWSE/BUILD CODE IN AN IDE OpenCL™ module (2.4.x). Rebuild it even if you just change a kernel OpenCL kernels. Those are converted to kernels.cpp by a script (hence you need to rebuild if you change a kernel). OpenCL samples. After you build them, go to [ROOT]bin[CONFIG],, observe: ocl-example-*.exe 15 OPENCV-CL | NOVEMBER 12,2013 | DR. HARRIS GASPARAKIS | CONFIDENTIAL
  16. 16. INCORPORATING OPENCV INTO YOUR OWN CODE  APP SDK provides 3 examples. Very easy integration!  With less than 15 lines of code you can have a minimal program that reads video frames, passes them to the OpenCL™ device, and runs your own simple kernel! OpenCV-CL: ‒ takes care of all OpenCL plumbing ‒ Compiles the kernels, and even caches them at runtime, and saves the OpenCL binaries on disk [user can also modify default behavior] ‒ Allows specifying an OpenCL device/platform via environment variable. ‒ Allows plugging your own kernels to OpenCV-CL, using the OpenCV-CL data-structures. 16 OPENCV-CL | NOVEMBER 12,2013 | DR. HARRIS GASPARAKIS | CONFIDENTIAL
  17. 17. INCORPORATING OPENCV INTO YOUR OWN CODE SOME CODE, FROM APP SDK 2.9, GESTURE SAMPLE, SHOWCASING OPENNI® INTEGRATION cv::Mat depthImgClamp = cv::Mat( SIZEY, SIZEX, CV_8UC1, openniBuffer); cv::ocl::oclMat oclDepthImgClamp(depthImgClamp ); In one line, populate an image in GPU! vector<pair<size_t, const void *> > args; args.push_back(make_pair(sizeof(cl_mem), (void *)&src.data)); args.push_back(make_pair(sizeof(cl_mem), (void *)&oclDst.data)); openCLExecuteKernelInterop (oclDst.clCxt, &depthConvertSrcStr, "convertDepthToWorldCoordinates", globalThreads, localThreads, args, -1, -1, "", false, false, true); } In one command, add your own kernel launch, acting on OpenCV-CL data-structures 17 OPENCV-CL | NOVEMBER 12,2013 | DR. HARRIS GASPARAKIS | CONFIDENTIAL
  18. 18. HETEROGENEOUS COMPUTE AND COMPUTER VISION Webcams everywhere Heterogeneous compute everywhere Real time computer vision everywhere  Heterogeneous compute mission: To take optimal advantage of the full capabilities of the underlying platform. ‒ APU / HSA APU ‒ Discrete GPU ‒ CPU ‒ FPGA, DSP, etc. 18 OPENCV-CL | NOVEMBER 12,2013 | DR. HARRIS GASPARAKIS | CONFIDENTIAL Many code paths? - Possibly interleaving execution between different devices Many data representations?
  19. 19. DATA REPRESENTATIONS DISCRETE APUS, OPENCL™ 1.2  Copy data to/from GPU  Use “device Memory” for data that is used between GPU kernels  Map/unmap using pinned memory ‒ True for all generations. Special memory that can be read and written fast by GPU. ‒ On APUs: physically part of main memory, possibly with special paths. ‒ But: device memory cannot be read/written very fast from CPU.  Zero copy (map/unmap): best path for data written(read) by CPU(GPU) or vice versa.  Cannot mix and match (bounce back and forth between) CPU and GPU well.  Small kernels are typically a bad idea 19 OPENCV-CL | NOVEMBER 12,2013 | DR. HARRIS GASPARAKIS | CONFIDENTIAL
  20. 20. H1’14: APUS, OPENCL™ 1.2 + HSA extensions OR OPENCL 2.0  Can still use “device Memory” for data that is used between GPU kernels, and zero copy still available.  However: SVM (shared virtual memory) can be written to/read from both CPU and GPU fast “enough” ‒ Enables ping/pong (producer/consumer) between CPU and GPU ‒ Enables concurrent producer/consumer between CPU/GPU (platform atomics) ‒ Much easier to port a vision pipeline using HSA. You can incrementally pick and choose what part of the pipeline to accelerate, and what part to allow the CPU to execute. ‒ On HSA APUs, using SVM is reasonable (and better) than current defaults., significantly simplifying code.  User mode enqueueing: much faster kernel dispatching leads to less performance degradation of small kernels. Can feed the GPU smaller computational tasks fast, and (busy) wait for results on the CPU. 20 OPENCV-CL | NOVEMBER 12,2013 | DR. HARRIS GASPARAKIS | CONFIDENTIAL
  21. 21. COMPUTE PATHS OpenCV 2.4.x: Face detect on CPU // initialization VideoCapture vcap(...); CascadeClassifier fd("haar_ff.xml"); Removed image Mat frame, frameGray; demonstrating face detect vector<Rect> faces; for(;;){ // processing loop vcap >> frame; cvtColor(frame, frameGray, BGR2GRAY); equalizeHist(frameGray, frameGray); fd.detectMultiScale(frameGray, faces, ...); // draw rectangles … // show image … } 21 OPENCV-CL | NOVEMBER 12,2013 | DR. HARRIS GASPARAKIS | CONFIDENTIAL
  22. 22. COMPUTE PATHS OpenCV 2.4.x: Face detect with OpenCL™ // initialization VideoCapture vcap(...); ocl::OclCascadeClassifier fd("haar_ff.xml"); Removed image ocl::oclMat frame, frameGray; demonstrating face detect Mat frameCpu; vector<Rect> faces; for(;;){ // processing loop vcap >> frameCpu; frame = frameCpu; ocl:: cvtColor(frame, frameGray, BGR2GRAY); ocl:: equalizeHist(frameGray, frameGray); ocl:: fd.detectMultiScale(frameGray, faces, ...); // draw rectangles … // show image … 22 OPENCV-CL | NOVEMBER 12,2013 | DR. HARRIS GASPARAKIS | CONFIDENTIAL
  23. 23. FUTURE ROADMAP ‒ Incorporate OpenCL™ 1.2 with HSA extensions, and OpenCL 2.0 ‒ Shared Virtual Memory (SVM) significantly simplifies programming model in general. Allows reusing existing memory as SVM. ‒ In SVM, a “pointer is a pointer” ‒ Pass your tree/linked list/graph data structure in the GPU, have threads explore sub-branches, or explore paths on a graph ‒ Transparent API: ‒ ‒ ‒ ‒ One code path, OpenCV will choose the best execution path at runtime, given the platform. Changes of data locality should be implemented by the framework. Includes applying heuristics appropriate for underlying hardware (dGPU, APU, HSA APU). Eventually it should be self-optimizing ‒ reasonably define optimal memory type “under the hood.” ‒ Detect data flow dependencies, in the pipeline, and automatically represent them as OpenCL events. Starting with OpenCV 3.0. 23 OPENCV-CL | NOVEMBER 12,2013 | DR. HARRIS GASPARAKIS | CONFIDENTIAL
  24. 24. DISCLAIMER & ATTRIBUTION The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes. AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. ATTRIBUTION © 2013 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices, Inc. in the United States and/or other jurisdictions. SPEC is a registered trademark of the Standard Performance Evaluation Corporation (SPEC). Other names are for informational purposes only and may be trademarks of their respective owners. 24 OPENCV-CL | NOVEMBER 12,2013 | DR. HARRIS GASPARAKIS | CONFIDENTIAL
  25. 25. OpenCV 3.0 Vadim Pisarevsky 1. Transparent API 2. UMat 3. Under the hood
  26. 26. OpenCV 3.0 • OpenCV 3.0 is scheduled for 2014’Q1 • Based on 2.x, but: – transparent API and more efficient and platform-specific OpenCL™ codepaths (including better zero-copy and SVM support) – API cleanup – a lot of new algorithms
  27. 27. Transparent API • same code can run on CPU or GPU – no specialized cv::ocl::Canny vs cv::Canny – no recompilation is needed • includes the following key components: – new data structure UMat (Universal Mat) – – simple and robust mechanism for async processing convenient API for custom algorithm implementation • minimal or no changes in the existing code – CPU-only processing – no changes required
  28. 28. UMat • Mat=>UMat is the only change needed • Sometimes, somewhere (HSA) it’s not needed either! // initialization VideoCapture vcap(...); CascadeClassifier fd("haar_ff.xml"); UMat frame, frameGray; vector<Rect> faces; for(;;){ // processing loop vcap >> frame; cvtColor(frame, frameGray, BGR2GRAY); equalizeHist(frameGray, frameGray); fd.detectMultiScale(frameGray, faces, ...); // draw rectangles … // show image … }
  29. 29. Transparent API: under the hood bool _ocl_cvtColor(InputArray src, OutputArray dst, int code) { static ocl::ProgramSource oclsrc(“//cvtcolor.cl source coden …”); UMat src_ocl = src.getUMat(), dst_ocl = dst.getUMat(); if (code == COLOR_BGR2GRAY) { // get the kernel; kernel is compiled only once and cached ocl::Kernel kernel(“bgr2gray”, oclsrc, <compile_flags>); // pass 2 arrays to the kernel and run it return kernel.args(src, dst).run(0, 0, false); } else if(code == COLOR_BGR2YUV) { … } return false; } void _cpu_cvtColor(const Mat& src, Mat& dst, int code) { … } // transparent API dispatcher function void cvtColor(InputArray src, OutputArray dst, int code) { dst.create(src.size(), …); if (useOpenCL(src, dst) && _ocl_cvtColor(src, dst, code)) return; // getMat() uses zero-copy if available; and with SVM it’s no op Mat src_cpu = src.getMat(); Mat dst_cpu = dst.getMat(); _cpu_cvtColor(src_cpu, dst_cpu, code);
  30. 30. OpenCV+OpenCL™ execution model CPU threads … cv::ocl::Queue cv::ocl::Queue cv::ocl::Device … … cv::ocl::Queue cv::ocl::Device • One OpenCL queue and one OpenCL device per CPU thread • OpenCL kernels are executed asynchronously • cv::ocl::finish() puts the barrier in the current CPU thread; .getMat() automatically calls it.
  31. 31. Summary & Future directions • OpenCL™ is a great tool to boost performance of vision algorithms; OpenCV unleashes its potential to CV community • OpenCV 3.0 transparent API makes it even easier and … more transparent • possible directions: pipelines, memory allocation optimization, more algorithms ported to OpenCL
  32. 32. The first results

×