Successfully reported this slideshow.
Your SlideShare is downloading. ×

Open Standards for ADAS: Andrew Richards, Codeplay, at AutoSens 2016

Ad

The open standards enabling vision processing in ADAS
Andrew Richards, CEO, Codeplay
AutoSens September 2016

Ad

© 2016 Codeplay Software Ltd.2
How do
we get
from
here…
… to
here?
Level 1
•Adaptive
•Assist
Level 2
•Execute
•Automated
m...

Ad

© 2016 Codeplay Software Ltd.3
We have a mountain to climb
How do
we get to
the top?
When we
don’t know
what the
top looks...

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Loading in …3
×

Check these out next

1 of 34 Ad
1 of 34 Ad

Open Standards for ADAS: Andrew Richards, Codeplay, at AutoSens 2016

Building autonomous vehicles: How do we build the software and platforms that enable the intelligence for self-driving cars and all the intermediate levels of autonomy?
We don't (yet) know the right algorithms or approach, so how do we start developing the software in a way that can deliver the safety, performance, power consumption and correctness to enable ADAS to full autonomy?

Building autonomous vehicles: How do we build the software and platforms that enable the intelligence for self-driving cars and all the intermediate levels of autonomy?
We don't (yet) know the right algorithms or approach, so how do we start developing the software in a way that can deliver the safety, performance, power consumption and correctness to enable ADAS to full autonomy?

Advertisement
Advertisement

More Related Content

Similar to Open Standards for ADAS: Andrew Richards, Codeplay, at AutoSens 2016 (20)

Advertisement

Open Standards for ADAS: Andrew Richards, Codeplay, at AutoSens 2016

  1. 1. The open standards enabling vision processing in ADAS Andrew Richards, CEO, Codeplay AutoSens September 2016
  2. 2. © 2016 Codeplay Software Ltd.2 How do we get from here… … to here? Level 1 •Adaptive •Assist Level 2 •Execute •Automated manoeuvres Level 3 •Limited overall control Level 4 •Deep self control •All conditions Level 5 •Autonomous •Stages from very local to extensive journeys We have SAE Levels to climb Level 0 •Warnings
  3. 3. © 2016 Codeplay Software Ltd.3 We have a mountain to climb How do we get to the top? When we don’t know what the top looks like... … and we want to get there in safe, manageable, affordable steps… … without getting lost on our own… … or climbing the wrong mountain
  4. 4. © 2016 Codeplay Software Ltd.4 This presentation will focus on: • The hardware and software platforms that will be able to deliver the results • The software tools to build up the solutions for those platforms • The open standards that will enable solutions to interoperate • How we build the platforms and tools
  5. 5. © 2016 Codeplay Software Ltd.5 Where do we need to go? “On a 100 millimetre-squared chip, Google needs something like 50 teraflops of performance” - Daniel Rosenband (Google’s self-driving car project) at HotChips 2016
  6. 6. © 2016 Codeplay Software Ltd.6 1 1 2 4 8 16 32 64 128 256 512 1,024 2,048 4,096 8,192 16,384 32,768 65,536 Google target Desktop GPU Integrated GPU Smartphone GPU Smartphone CPU Desktop CPU Performance trends GFLOPS Year of introduction These trend lines seem to violate the rules of physics…
  7. 7. © 2016 Codeplay Software Ltd.7 What will our platform be? •Will we build self-driving cars on graphics processors designed for videogames? Or, will we take the successful design decisions from GPUs and apply them to processors designed specially for autonomous driving? Build on GPUs Special- purpose vision processors
  8. 8. © 2016 Codeplay Software Ltd.8 How do we get there from here? 1.We need to write software today for platforms that cannot be built yet We need to start with simpler systems that are not fully autonomous We need to validate the systems as safe
  9. 9. © 2016 Codeplay Software Ltd.9 Two models of software development Design a model Validate model Select platform Implement on platform Write software Select platform Optimize for platform Validate whole platform Design next version Which method can get us all the way to autonomous vehicles?
  10. 10. © 2016 Codeplay Software Ltd.10 The different levels of programming model Device-specific programming •Assembly language •VHDL •Device-specific C- like programming models Higher-level language enabler •NVIDIA PTX •HSA •OpenCL SPIR •SPIR-V C level programming •OpenCL C •DSP C •MCAPI/MTAPI C++ level programming •SYCL •CUDA •HCC •C++ AMP Graph programming •OpenCV •OpenVX •Halide •VisionCpp •TensorFlow •Caffe
  11. 11. © 2016 Codeplay Software Ltd.11 Device-specific programming Not a route to full autonomy Can deliver quick results today Can… hand-optimize directly for the device Cannot … develop software today for future platforms
  12. 12. © 2016 Codeplay Software Ltd.12 The route to full autonomy • Graph programming • This is the most widely-adopted approach to machine vision and machine learning • Open standards • This lets you develop today for future architectures
  13. 13. © 2016 Codeplay Software Ltd.13 Why graph programming? When you scale the number of cores: • You don’t scale the number of memory ports • Your compute performance increases • But your off-chip memory bandwidth does not Therefore: • You need to reduce off-chip memory bandwidth by processing everything on- chip • This is achieved by tiling However, writing tiled image pipelines is hard If we build up a graph of operations (e.g. convolutions) and then have a runtime system split into fused tiled operations across an entire system-on-chip, we get great performance
  14. 14. © 2016 Codeplay Software Ltd.14 Graph programming: some numbers 0 5 10 15 20 25 30 35 40 45 OpenCV Halide SYCL Optimization across the whole graph Kernel (ms) Overhead (ms) Without fusion, each operation takes roughly the same amount of time on the accelerator (an AMD APU in this case) but the overhead varies a little OpenCV does not fuse in this case, but Halide and SYCL do. The fused kernels are significantly faster than non- fused when using C++ programming to achieve fusion. 0 20 40 60 80 100 OpenCV Halide SYCL Graph execution of individual nodes Channel Masking Kernel(ms) Channel Masking Overhead (ms) HSV to RGB Kernel(ms) HSV to RGB Overhead (ms) RGB to HSV Kernel(ms) RGB to HSV Overhead (ms)
  15. 15. © 2016 Codeplay Software Ltd.15 Graph programming: some numbers 0 10 20 30 40 50 60 70 80 90 100 OpenCV (nodes) OpenCV (graph) Halide (nodes) Halide (graph) SYCL (nodes) SYCL (graph) Effect of combining graph nodes on performance Kernel time (ms) Overhead time (ms) In this example, we perform 3 image processing operations on an accelerator and compare 3 systems when executing individual nodes, or a whole graph Halide and SYCL use kernel fusion, whereas OpenCV does not. For all 3 systems, the performance of the whole graph is significantly better than individual nodes executed on their own The system is an AMD APU and the operations are: RGB- >HSV, channel masking, HSV->RGB
  16. 16. © 2016 Codeplay Software Ltd.16 Graph programming • For both machine vision algorithms and machine learning, graph programming is the most widely-adopted approach • Two styles of graph programming that we commonly see: C-style graph programming • OpenVX • OpenCV C++ style graph programming • Halide • RapidMind • Eigen (also in TensorFlow) • VisionCpp
  17. 17. © 2016 Codeplay Software Ltd.17 C style graph programming OpenVX: open standard • Can be implemented by vendors • Create a graph with C API, then map to an entire SoC OpenCV: open source • Implemented on OpenCL • Implemented on device-specific accelerators • Create a graph with C API, then execute
  18. 18. © 2016 Codeplay Software Ltd.18 & Device-Specific Programming How do we adapt it for all the graph nodes we need? Runtime systems can automatically optimize the graphs Can … develop software today for future platforms What happens if we invent our own graph nodes?
  19. 19. © 2016 Codeplay Software Ltd.19 C++ style graph programming Examples in machine vision/machine learning • Halide • RapidMind • Eigen (also in TensorFlow) • VisionCpp C++ compilers that support this style • CUDA • C++ OpenMP • C++ 17 Parallel STL • SYCL
  20. 20. © 2016 Codeplay Software Ltd.20 C++ single-source programming • C++ lets us build up graphs at compile-time • This means we can map a graph to the processors offline • C++ lets us write custom nodes ourselves • This approach is called a C++ Embedded Domain-Specific Language • Very widely used, eg Eigen, Boost, TensorFlow, RapidMind, Halide
  21. 21. © 2016 Codeplay Software Ltd.21 Combining: open standards, C++ and graph programming SYCL combines C++ single-source with OpenCL acceleration OpenCL lets us run on a very wide range of accelerators now and in the future Single-source is most widely- adopted machine learning programming model C++ single source lets us create customizable graph models
  22. 22. © 2016 Codeplay Software Ltd.22 Putting it all together: Building it
  23. 23. © 2016 Codeplay Software Ltd.23 C++ 0: #include <visioncpp.hpp> 1: int main() { 2: auto in= cv::imread(“input.jpg”); 3: 4: auto a = Node<sRGB, 512, 512,Host>(in.data)); 5: auto b = Node<sRGB2lRGB>(a); 6: auto c = Node<lRGB2lHSV>(b); 7: auto d = Node<Constant>(0.1); 8: auto e = Node<lHSV2Scale>(c , d); 9: auto f = Node<lHSV2lRGB>(e); 10: auto g = Node<sRGB2lRGB>(f); 11: auto h = execute<fuse> (g ); 12: auto ptr = h.get_data(); 13: auto output = cv::Mat(512 , 512 , CV_8UC3 , ptr.get()); 14: cv::imshow (“Display Image” , output); 15: return 0; 16: } SYCL 0: #include <visioncpp.hpp> 1: int main() { 2: auto in= cv::imread(“input.jpg”); 3: auto q =get_queue<gpu_selector>(); 4: auto a = Node<sRGB, 512, 512,Image>(in.data)); 5: auto b = Node<sRGB2lRGB>(a); 6: auto c = Node<lRGB2lHSV>(b); 7: auto d = Node<Constant>(0.1); 8: auto e = Node<lHSV2Scale>(c , d); 9: auto f = Node<lHSV2lRGB>(e); 10: auto g = Node<sRGB2lRGB>(f); 11: auto h = execute<fuse> (g , q); 12: auto ptr = h.get_data(); 13: auto output = cv::Mat(512 , 512 , CV_8UC3 , ptr.get()); 14: cv::imshow (“Display Image” , output); 15: return 0; 16: } out lRGB 2 sRGB lHSV 2 lRGB lHSV 2 Scale lRGB 2 lHSV Coef sRGB 2 lRGB in h a b e d f g c Leaf Type Leaf Type Queue Queue No Queue No Queue
  24. 24. © 2016 Codeplay Software Ltd.24 SYCL Backend structure OpenMP 1: template <typename Expr, typename… Acc> void sycl (handler& cgh, Expr expr, Acc… acc) { // sycl accessor for accessing data on device 2: auto outPtr = expr.out-> template get_accessor<write>(cgh) ; // sycl range representing valid range of accessing data 3: auto rng = range < 2 > (Expr::Rows , Expr::Cols) ; // sycl parallel for for parallelisng execution across the range 4: cgh.parallel_for<Type>(rng), [=](item<2> itemID) { // rebuilding accessor tuple on the device 5: auto tuple = make_tuple (acc) ; // calling the eval function for each pixel 6: outPtr[itemID] = expr.eval ( itemID, tuple ); 7: }); 8: } 1: template <typename Expr, typename... Acc> void cpp(Expr expr, Acc.. acc) { // output pinter for accessing data on host 2: auto outPtr = expr.out->get(); // valid range for accessing data on host 3: auto rng = range (Expr::Rows , Expr::Cols ); // rebuilding the tuple of input pointer on host 4: auto tuple = make_tuple (acc) ; // OpenMP directive for parallelising for loop 5: #pragma omp parallel for 6: for(size_t i=0; i< rng.rows; i++) 7: for(size_t j=0; j< rng.cols; j++) // calling the eval function for each pixel 8: outPtr[indx] = expr.eval (index (i , j), tuple ); 9: }; Accessor Pointer C++/OpenMPParallel for
  25. 25. © 2016 Codeplay Software Ltd.25 Higher level programming enablers NVIDIA PTX •NVIDIA CUDA-only HSA • Royalty-free open standard • HSAIL is the IR • Provides a single address space, with virtual memory • Low-latency communication OpenCL SPIR • Defined for OpenCL v1.2 • Based on Clang/LLVM (the open-source compiler) SPIR-V • Open standard • Defined by Khronos • Supports compute and graphics (OpenCL, Vulkan and OpenGL) • Not tied to any compiler Open standard intermediate representations enable tools to be built on top and support a wide range of platforms
  26. 26. © 2016 Codeplay Software Ltd.26 HSA • One of the big problems of offloading to accelerators is the high cost of offload: • Moving data to the accelerator • May need to translate addresses and pointer sizes between CPU and accelerator • Going into OS kernel-level driver to start work or synchronize • High cost of compilation, especially for JIT languages • In a multi-accelerator system, this may get even more costly • HSA solves this with user-mode-queueing, user-mode synchronization, and shared virtual memory • It also provides open-source software to help implement HSA • And, we’re even going further and working towards advanced tools standardization, such as profiling and debugging CPU GPU DSP CPU- optimized code GPU- optimized code DSP- optimized code HSAIL HSAIL HSAIL The HSA Runtime gives very low- latency communication
  27. 27. © 2016 Codeplay Software Ltd.27 Which model should we choose? Device-specific programming •Assembly language •VHDL •Device-specific C- like programming models Higher-level language enabler •NVIDIA PTX •HSA •OpenCL SPIR •SPIR-V C level programming •OpenCL C •DSP C •MCAPI/MTAPI C++ level programming •SYCL •CUDA •HCC •C++ AMP Graph programming •OpenCV •OpenVX •Halide •VisionCpp •TensorFlow •Caffe
  28. 28. © 2016 Codeplay Software Ltd.28 They are not alternatives, they are layers Device-specific programming Assembly language VHDL Device-specific C-like programming models Higher-level language enabler NVIDIA PTX HSA OpenCL SPIR SPIR-V C/C++ level programming SYCL CUDA HCC C++ AMP OpenCL Graph programming OpenCV OpenVX Halide VisionCpp TensorFlow Caffe
  29. 29. © 2016 Codeplay Software Ltd.29 Can specify, test and validate each layer Device-specific programming Device-specific specification Device-specific testing and validation Higher-level language enabler SPIR/SPIR-V/HSAIL specs Conformance testsuites C/C++ level programming OpenCL/SYCL specs Clsmith testsuite Conformance testsuites Wide range of other testsuites Graph programming Validate graph models Validate the code using standard tools
  30. 30. © 2016 Codeplay Software Ltd.30 For Codeplay, these are our layer choices Device- specific programming • LLVM Higher-level language enabler • OpenCL SPIR C/C++ level programming • SYCL Graph programming • TensorFlow • OpenCV We have chosen a layer of standards, based on current market adoption • TensorFlow and OpenCV • SYCL • OpenCL (with SPIR) • LLVM as the standard compiler back-end The actual choice of standards may change based on market dynamics, but by choosing widely adopted standards and a layering approach, it is easy to adapt
  31. 31. © 2016 Codeplay Software Ltd.31 For Codeplay, these are our products Device- specific programming • LLVM Higher-level language enabler • OpenCL SPIR C/C++ level programming • SYCL Graph programming • TensorFlow • OpenCV
  32. 32. © 2016 Codeplay Software Ltd.32 Codeplay •Standards bodies •HSA Foundation: Chair of software group, spec editor of runtime and debugging •Khronos: chair & spec editor of SYCL. Contributors to OpenCL, Safety Critical, Vulkan •ISO C++: Chair of Low Latency, Embedded WG; Editor of SG1 Concurrency TS •EEMBC: members Research •Members of EU research consortiums: PEPPHER, LPGPU, LPGPU2, CARP •Sponsorship of PhDs and EngDs for heterogeneous programming: HSA, FPGAs, ray-tracing •Collaborations with academics •Members of HiPEAC Open source •HSA LLDB Debugger •SPIR-V tools •RenderScript debugger in AOSP •LLDB for Qualcomm Hexagon •TensorFlow for OpenCL •C++ 17 Parallel STL for SYCL •VisionCpp: C++ performance- portable programming model for vision Presentations •Building an LLVM back-end •Creating an SPMD Vectorizer for OpenCL with LLVM •Challenges of Mixed-Width Vector Code Gen & Scheduling in LLVM •C++ on Accelerators: Supporting Single-Source SYCL and HSA •LLDB Tutorial: Adding debugger support for your target Company •Based in Edinburgh, Scotland •57 staff, mostly engineering •License and customize technologies for semiconductor companies •ComputeAorta and ComputeCpp: implementations of OpenCL, Vulkan and SYCL •15+ years of experience in heterogeneous systems tools Codeplay build the software platforms that deliver massive performance
  33. 33. © 2016 Codeplay Software Ltd.33 Further information • OpenCL https://www.khronos.org/opencl/ • OpenVX https://www.khronos.org/openvx/ • HSA http://www.hsafoundation.com/ • SYCL http://sycl.tech • OpenCV http://opencv.org/ • Halide http://halide-lang.org/ • VisionCpp https://github.com/codeplaysoftware/visioncpp
  34. 34. /codeplaysoft@codeplaysoft codeplay.com Questions ? My contact details

×