Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

"Can We Have Both Safety and Performance in AI for Autonomous Vehicles?," a Presentation from Codeplay Software


Published on

For the full video of this presentation, please visit:

For more information about embedded vision, please visit:

Andrew Richards, CEO and Co-founder of Codeplay Software, presents the "Can We Have Both Safety and Performance in AI for Autonomous Vehicles?" tutorial at the May 2019 Embedded Vision Summit.

The need for ensuring safety in AI subsystems within autonomous vehicles is obvious. How to achieve it is not. Standard safety engineering tools are designed for software that runs on general-purpose CPUs. But AI algorithms require more performance than CPUs provide, and the specialized processors employed to achieve this performance are very difficult to qualify for safety.

How can we achieve the redundancy and very strict testing required to achieve safety, while also using specialized processors to achieve AI performance? How can ISO 26262 be applied to AI accelerators? How can standard automotive practices like coverage checking and MISRA coding guidelines be used?

Codeplay believes that safe autonomous vehicle AI subsystems are achievable, but only with cross-industry collaboration. In this presentation, Richards examines the challenges of implementing safe autonomous vehicle AI subsystems and explains the most promising approaches for overcoming these challenges, including leveraging standards bodies such as Khronos, MISRA and AUTOSAR.

Published in: Technology
  • Get Now to Read This eBook ===
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

"Can We Have Both Safety and Performance in AI for Autonomous Vehicles?," a Presentation from Codeplay Software

  1. 1. © 2019 Codeplay Software Ltd Can We Have Both Safety and Performance in AI for Autonomous Vehicles? Andrew Richards Codeplay May 2019
  2. 2. © 2019 Codeplay Software Ltd Outline About Codeplay What is functional safety? What does an automotive AI system look like in terms of architecture? => The wide variety of compute-intensive algorithms Why do we need high performance for safety? And why accelerators are the only way to get to high performance Requirements for safe engineering Challenges in bringing existing CPU safe engineering practices to accelerators 2
  3. 3. © 2019 Codeplay Software Ltd Functional safety Safety doesn’t mean the system doesn’t fail ➢ Safety means the system fails safely How do you know if the system fails? ➢ You have to detect the failure with a high level of accuracy ➢ Both incorrect results and late results are a failure What do you do if the system fails? ➢ You have to come up with a safe state to return to 3
  4. 4. © 2019 Codeplay Software Ltd Functional Safety “Absence of unreasonable risk due to hazards caused by malfunctioning behavior of electrical/electronic systems” The standard requires the Development of the Product to be “State of the Art” Functional Safety lifecycle top down approach from Vehicle to IPs & SW Components Safety Compliance from Project Initiation to Project decommission
  5. 5. © 2019 Codeplay Software Ltd Safety failure types Systematic Failures: Result from a failure in design or manufacturing Often a result of failure to follow best practices Rate of systematic failures can be reduced through continual and rigorous process improvement Random Failures: Result from random defects inherent to process or usage condition Rate of random failures cannot generally be reduced; focus must be on the detection and handling of random failures in the application
  6. 6. © 2019 Codeplay Software Ltd SOTIF: Safety Of The Intended Function Systems or subsystems can cause hazards based on erroneous decision on the environment and not necessarily caused by malfunction of Electrical/Electronic components (Addressed by ISO26262) SOTIF answers the question of “How do you intend to behave” by utilizing the PAS guidance on design, verification and validation. SOTIF intends to address sensor limitations (i.e. bad reflection, snow), decision algorithms (environment, location, highway construction etc.), misuses by drivers 6 2 1 43 Known Unknown Unsafe Safe Reduction of scenarios in areas 2 and 3 is the key, by developing them onto known scenarios SAE levels for autonomous vehicles SOTIF ISO 21448SOTIF PAS 21448 Level 5 Fully autonomous Level 4 Deep self control Level 3 Limited overall control Level 2 Execute automated manoeuvres Level 1 Adaptive assist Level 0 Warnings
  7. 7. © 2019 Codeplay Software Ltd Safety of Autonomous Driving needs High Performance - High Performance makes Safety Hard 7
  8. 8. © 2019 Codeplay Software Ltd From sensing to control Car controlPath planningSensor fusion Deep learning front-camera Machine vision and SLAM surround cameras LIDAR RADAR 8 Redundancy is achieved by having multiple, independent, sensors and perception algorithms combined via sensor fusion
  9. 9. © 2019 Codeplay Software Ltd Performance cannot be achieved with CPUs Car control Path plannin g Object trajectory tracking / prediction Sensor fusion 3D mapping Semantic segmentation Frame capture Camera 9 625 million pixels per second 1.5-7.5 TOPS for each deep learning algorithm 250 million cells updated per frame / sensor Combine all the data together and check Far beyond the processing power of a multi-core CPU This level of processing can only be achieved with a different AI accelerator designed for each class of algorithm and sensor Passive (fanless) cooling requires no more than 8 W-15 W per processor Adding a fan is a safety challenge, as well as adding a lot of cost
  10. 10. © 2019 Codeplay Software Ltd Types of AI accelerator Deep learning inference accelerator •Fixed-point precision (8-bit or 16-bit) •Can execute fast convolutions and some basic CNN layers •Very high performance, but low programmability Programmable accelerator (vision tasks e.g. SLAM) •Mix of programmable and fixed-function •Mix of fixed-point and floating-point •Highly data parallel with on- chip memory •Throughput optimized Sensor fusion accelerator •Very programmable •Floating-point •On-chip memory and caches •Latency optimized •Complex algorithms Fixed-function accelerator •Simpler LIDAR and Radar processing •Some machine vision tasks, e.g. scaling 10
  11. 11. © 2019 Codeplay Software Ltd Requirements for safety
  12. 12. © 2019 Codeplay Software Ltd Requirements for safe engineering • Redundancy (multiple systems) • Fault detection (both timing and accuracy) • Fault handling • Fault injection (to test fault detection & handling) • Coverage checking (to ensure test coverage) • Coding guidelines (e.g. MISRA) • Little or no dynamic memory management 12 How do we bring these capabilities to accelerators?
  13. 13. © 2019 Codeplay Software Ltd Redundancy: Systematic vs Random Faults This architecture allows Processor #1 to fail and Processor #2 to take over But: what if the reason Processor #1 fails is a fault that also applies to Processor #2? ➢ e.g. software failure in software that both Processor #1 and Processor #2 run A random fault may be solvable with two identical redundant systems But a systematic fault can only be solved with two fundamentally different redundant systems 13 Sensor Processor #1 Processor #2 Fusion
  14. 14. © 2019 Codeplay Software Ltd Redundancy Redundancy is much easier to achieve with sensors and perception than sensor fusion, planning and control Redundancy from two identical systems does not solve systematic faults, only transient faults By using standard programming models, much easier to achieve redundancy: much easier to mix-and-match components from different suppliers to avoid systematic faults By using standard programming models, much easier to integrate tools from multiple vendors, e.g. static checkers, or memory checkers The OpenCL SC (“Safety Critical”), Vulkan SC and SYCL SC working-groups are working towards defining safer versions of these standards 14
  15. 15. © 2019 Codeplay Software Ltd Fault detection Timing faults can be detected with a watch-dog-timer All operations must have a maximum timeout The quantity of processing required for various perception algorithms can vary by the scene: e.g. the more potential pedestrians discovered means running pedestrian-classification on more regions of an image One solution is to periodically pass known input data into each algorithm and check it against known correct output data The algorithms used must be deterministic (always give the same outputs for the same inputs) which is not true of all parallel algorithms 15
  16. 16. © 2019 Codeplay Software Ltd Fault handling Handling faults in highly parallel software is a surprisingly tough challenge Faults detected asynchronously need to be stored somewhere and then processed. They can’t be handled immediately without consuming resources asynchronously. This is a safety challenge For massively parallel software, large numbers of faults could be created at once: how to handle? Most parallel programming models handle faults very badly. It’s a much harder challenge than people expect 16
  17. 17. © 2019 Codeplay Software Ltd Multi-threaded error handling • Errors triggered on an accelerator are asynchronous • Error handling can’t be executed on the accelerator • When does the main CPU thread process error(s)? 17 Main CPU Thread Offload Accelerator ‘Thread’ Accelerator ‘Thread’ Accelerator ‘Thread’ Accelerator ‘Thread’ Accelerator ‘Thread’ Accelerator ‘Thread’ Accelerator ‘Thread’ Accelerator ‘Thread’ OffloadRunkernel Accelerator Handler CPU Thread Error Time Accelerator threads are grouped Where does this thread store the error? This thread waits for the accelerator to complete. Is that fast enough to process the error?
  18. 18. © 2019 Codeplay Software Ltd Pre-emption and independent forward progress • Most accelerators are groups of SIMD/SIMT units: this gives high performance per Watt • “Single Instruction Multiple Data/Thread” • This means each thread executes the same instruction in “lock-step” • Some threads may be inside the false branch of a conditional: they “predicate” to not apply effects of instructions until the condition ends • This means that if one “thread” in a group goes into an infinite loop, the others will also pause indefinitely • The accelerator does not complete until all groups complete 18 Accelerator ‘Thread’ Accelerator ‘Thread’ Accelerator ‘Thread’ Accelerator ‘Thread’ Accelerator ‘Thread’ Accelerator ‘Thread’ Accelerator ‘Thread’ Accelerator ‘Thread’ Accelerator ‘Thread’ Accelerator ‘Thread’ Accelerator ‘Thread’ Accelerator ‘Thread’ Accelerator ‘Thread’ Accelerator ‘Thread’ Accelerator ‘Thread’ Accelerator ‘Thread’
  19. 19. © 2019 Codeplay Software Ltd Putting an accelerator in a safe state With a CPU thread, you stop the thread by no longer giving it CPU cycles Stopping a CPU thread is instant. Stopping an accelerator thread is not Stopping one group of accelerator threads doesn’t necessarily stop other groups of threads You can’t safely free accelerator-accessed memory until all accelerator threads have safely stopped. You can’t easily predict how long this will take Simple solution: Shut down the whole chip 19 Accelerator ‘Thread’ Accelerator ‘Thread’ Accelerator ‘Thread’ Accelerator ‘Thread’ Accelerator ‘Thread’ Accelerator ‘Thread’ Accelerator ‘Thread’ Accelerator ‘Thread’ Kill threads Accelerator memory buffer
  20. 20. © 2019 Codeplay Software Ltd Fault injection Can only test good fault handling if can inject faults into a system during testing Fault injection must happen asynchronously to be sure of finding bugs Fault injection needs to work across multiple AI accelerators Fault injection must be included into continuous test processes Faults to consider: • Transient hardware faults • Overheating causing throttling • Threading errors • Algorithms taking an unusually long time to complete due to complex input data We need fault injection tools for accelerators (e.g. NVIDIA SASSIFI) 20
  21. 21. © 2019 Codeplay Software Ltd Coverage checking A standard ISO 26262 process is to require line-coverage and condition- coverage for test suites. Tests that each line (or condition combination) is tested in a test suite Commonly-supported on CPUs, but what about AI accelerators? The compilers for AI accelerators typically perform transformations, such as data-parallel vectorization used with GPUs, that significantly changes the control-flow of the program relative to the source code How do we define coverage-checking for AI accelerators? 21
  22. 22. © 2019 Codeplay Software Ltd Coverage checking in a heterogeneous environment Coverage checking is a way of applying a metric to a test-suite: does the test-suite test every line in a program? Stricter coverage checking ensures every condition in a conditional is also tested In a heterogeneous environment, a single source line may be compiled for different accelerator cores Each accelerator core may execute the source line in a slightly different way • How do we define coverage in an accelerator model? • How do we test coverage in an accelerator model? • If a SIMT compiler has transformed code, what does coverage mean? 22
  23. 23. © 2019 Codeplay Software Ltd Coding guidelines: MISRA C++ Standardized coding guidelines for writing safe software. Can be checked with source code static checker tools • Originated by the automotive industry, for the automotive industry • But is applicable to any industry that requires high-integrity software • Originally, Misra suggests (in its vision) its use in safety-related software • But now suggests (in its vision) its applicability to any application with high integrity or high reliability requirements The MISRA C++ group is updating the MISRA C++ standard to support accelerator programming, in collaboration with AUTOSAR. Being written as an update to MISRA C++ 2008. This is where the AI and SYCL accelerator support will go for autonomous driving coding guidelines 23
  24. 24. © 2019 Codeplay Software Ltd Dynamic memory management Accelerator programming models rely extensively on dynamic memory management This is a real challenge for AI accelerators: how to define a standard way of statically-allocating memory for AI acceleration How to free memory safely in a fault situation How to isolate different safety domains in a program without corruption between memory allocated in different safety domains 24
  25. 25. © 2019 Codeplay Software Ltd Accelerator memory management • Accelerators have a much more direct view of memory than a CPU • The simplest approach is pinned memory: at a known physical address • Accelerators have much simpler memory protection than a CPU 25 CPU Virtual memory management system Operating System Physical Memory (e.g. DDR)Storage (e.g. hard disk) Accelerator (There maybe a memory management unit here, but usually much simpler than for a CPU)
  26. 26. © 2019 Codeplay Software Ltd CPU Hypervisor Virtualization Virtualization is well-defined for CPUs and can contribute to safety isolation But for accelerators, virtualization is not clearly-defined Can’t switch instantly between accelerator threads. Can’t shut down accelerator thread instantly. Memory protection isn’t same as on CPU 26 Virtual memory management system Operating System Physical Memory (e.g. DDR)Storage (e.g. hard disk) AcceleratorVirtualization goes here How does virtualization go here?
  27. 27. © 2019 Codeplay Software Ltd Package non-safety-qualified systems via decomposition •ISO 26262 defines “Quality Managed” (“QM”) •These systems can adopt latest technologies, without developed to full safety standards •We can wrap QM systems inside ASIL systems •We need to monitor the running of the system and be able to shut down a faulty system •Requires ability to detect failures Build full safety-qualified systems •Build from the ground up: Safe RTOS that supports accelerators •Safe programming models •Safety analysis tools •Independent testing and validation Multiple, independent, redundant systems •If we independently develop systems to perform specific tasks, we can achieve fully safe redundancy Pragmatic solutions 27 OutputInput QM AI System ASIL B Monitoring system Safety monitoring for AI Safe heterogeneous programming tools Safe RTOS CPU AI Accelerator Combine & check results System #1 System #2 Dev Team #1 Dev Team #2
  28. 28. © 2019 Codeplay Software Ltd Summary 1. We need to use a range of AI accelerators to achieve AI in automotive. • We can’t just assume CPU safety processes can easily transfer to accelerators • We need all the tools we have for safety on CPUs brought to accelerators 2. There are a lot of unexpected challenges 3. Standards are critical for building out these tools and ecosystem • There are industry-wide standards being developed, but we need to get more people involved to deliver safe solutions 28
  29. 29. © 2019 Codeplay Software Ltd About Codeplay Accelerator silicon enablement •OpenCL and Vulkan implementations with ComputeAorta product for customers’ processors •Custom LLVM compiler back-ends and runtime drivers •Accelerator processor optimizations Open accelerator ecosystem •Open standards and open- source ecosystem for AI acceleration •SYCL ecosystem: the open alternative ecosystem to CUDA •TensorFlow, Eigen •SYCL-BLAS, SYCL-DNN, SYCL-M •Open-source accelerator libraries: clSPV, SPIR-V tools Automotive AI tools •Support for Renesas R-Car and Imagination Technologies PowerVR •Optimized SYCL-BLAS and SYCL-DNN libraries for automotive AI processors •Profiler to analyse performance •Working towards ISO 26262 ASIL B standards- based acceleration 29 70+ expert AI and graphics acceleration engineers in Edinburgh, Scotland, UK Ready to provide all the tech & services to deliver ground-breaking AI technologies
  30. 30. © 2019 Codeplay Software Ltd Resource 30 SYCL standard & ecosystem MISRA MISRA C and C++ standards body Codeplay automotive tools Codeplay booth See our tools on Renesas and Imagination Technologies ADAS accelerator processors Khronos Workshop at EVS Will cover OpenVX, Vulkan, OpenCL, NNEF and SYCL in much more detail Thursday May 23rd, 9am-5pm embedded-vision-summit
  31. 31. © 2019 Codeplay Software Ltd Backup
  32. 32. © 2019 Codeplay Software Ltd Tesla FSD chip mm2 GOPS GPU 40.9 600 CPU 22.1 211 NNA 15.4 72,000 SRAM 67.6 Cache 18.6 Total 260 72,811 NNA •Fast, low-precision convolutions SRAM •Needed to keep processors supplied with data CPU •Highly general- purpose at lower performance GPU •Most of the programmable performance
  33. 33. © 2019 Codeplay Software Ltd From sensing to control Car control Path planning Trajectory tracking Sensor fusion 3D mapping Semantic segmentation Frame capture Camera 33 • These systems typically operate at 15-25 frames per second (depending on maximum speed and safety requirements) • Roughly 8 input frames are required to make a processing decision • Includes tracking movement over several frames • Includes pipelining for higher throughput • At 70mph (112 km/h), braking distance is 75 m and “thinking distance” (for a human) is 21 m, or 1.5 seconds
  34. 34. © 2019 Codeplay Software Ltd Car controlPath planning Trajectory tracking Sensor fusion3D mapping Semantic segmentation Frame captureCamera Frame capture If a camera can capture a compete view of a 2m pedestrian at 2m distance, then a pedestrian at a 100m distance will cover no more than 1/50th the height of the image, or 1/2,500th of the area of the image. 34 2m 2m 100m 2m If an algorithm can recognize a pedestrian with 100 pixels, the camera must be 25 megapixels to recognize a pedestrian at 100m, which is required to drive at 70mph
  35. 35. © 2019 Codeplay Software Ltd Car controlPath planning Trajectory tracking Sensor fusion3D mapping Semantic segmentation Frame captureCamera Semantic segmentation 60-300GFLOPS per frame At 25fps = 1.5TFLOPS to 7.5TFLOPS, but for inference can often be doing in fixed, point, which is TOPS, not TFLOPS 35 Recurrent Segmentation for Variable Computational Budgets: Stanford University & Google Brain: L McIntosh, N Maheswaranathan D Sussillo, J Shlens, arXiv:1711.10151v2 [cs.CV] 15 Mar 2018
  36. 36. © 2019 Codeplay Software Ltd Car controlPath planning Trajectory tracking Sensor fusion3D mapping Semantic segmentation Frame captureCamera 3D Mapping Each sensor (cameras, LIDAR, Radar) and each perception algorithm (deep learning, SLAM, point cloud, etc) needs to generate a 3D map of the environment it detects and a list of objects (pedestrians, cars etc) to track) 36 A 100 m × 100 m × 10 m occupancy grid of 100 cm × 100 cm x 100 cm cells contains 100,000,000 cells updated every frame
  37. 37. © 2019 Codeplay Software Ltd Car controlPath planning Trajectory tracking Sensor fusion3D mapping Semantic segmentation Frame captureCamera Sensor Fusion Sensor fusion combines data from all sensors and perception algorithms. It detects inconsistencies between different sensors to detect errors This is where the redundancy in the sensors is used to achieve safety. But how do you achieve redundancy in the sensor fusion? 37 Needs to process all data from all perception algorithms combined
  38. 38. © 2019 Codeplay Software Ltd To achieve performance, create a pipeline Car controlPath planning Object trajectory tracking/ prediction Sensor fusion3D mapping Semantic segmentation Frame captureCamera 38 • To achieve maximum throughput, this will be pipelined • It can also take at least 3 frames to track movement Path planning Object trajectory tracking/ prediction Sensor fusion3D mapping Semantic segmentation Frame captureCamera Object trajectory tracking/ prediction Sensor fusion3D mapping Semantic segmentation Frame captureCamera