Vision BOF - SIGGRAPH 2014


Published on

Vision slide presentation from the 2014 SIGGRAPH BOF, including OpenKCAM, OpenVX, EGL and StreamInput

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Vision BOF - SIGGRAPH 2014

  1. 1. Open Standard APIs for Vision and Camera Processing © Copyright Khronos Group 2014 - Page 1 Neil Trevett Vice President Mobile Ecosystem, NVIDIA President, Khronos Group
  2. 2. Khronos Connects Software to Silicon © Copyright Khronos Group 2014 - Page 2 Open Consortium creating ROYALTY-FREE, OPEN STANDARD APIs for hardware acceleration Defining the roadmap for low-level silicon interfaces needed on every platform Graphics, compute, rich media, vision, sensor and camera processing Rigorous specifications AND conformance tests for cross-vendor portability Acceleration APIs BY the Industry FOR the Industry Well over a BILLION people use Khronos APIs Every Day…
  3. 3. © Copyright Khronos Group 2014 - Page 3 Khronos Standards Visual Computing - 3D Graphics - Heterogeneous Parallel Computing 3D Asset Handling - 3D authoring asset interchange - 3D asset transmission format with compression Over 100 companies defining royalty-free Acceleration in HTML5 - 3D in browser – no Plug-in APIs to connect software to silicon - Heterogeneous computing for JavaScript Sensor Processing - Vision Acceleration - Camera Control - Sensor Fusion
  4. 4. Visual Computing = Graphics PLUS Vision Real-time GPU Compute Research project on CUDA-enabled laptop High-Quality Reflections, Refractions, and Caustics in Augmented Reality and their Contribution to Visual Coherence P. Kán, H. Kaufmann, Institute of Software Technology and Interactive Systems, Vienna University of Technology, Vienna, Austria © Copyright Khronos Group 2014 - Page 4 Imagery Data Vision Processing Enhanced sensor Graphics Processing and vision capability deepens the interaction between real and virtual worlds
  5. 5. Mobile Visual Computing = New Experiences Augmented Reality © Copyright Khronos Group 2014 - Page 5 Face, Body and Gesture Tracking Computational Photography and Videography 3D Scene/Object Reconstruction Need for advanced sensors and the GPU throughput to process them
  6. 6. Vision Pipeline Challenges and Opportunities Sensor Proliferation Diverse sensor awareness of the user and surroundings • Light / Proximity • 2 cameras • 3 microphones • Touch • Position - GPS - WiFi (fingerprint) - Cellular trilateration - NFC/Bluetooth Beacons • Accelerometer • Magnetometer • Gyroscope • Pressure / Temp / Humidity © Copyright Khronos Group 2014 - Page 6 19 Growing Camera Diversity Capturing color, range and lightfields • Camera sensors >20MPix • Novel sensor configurations • Stereo pairs • Plenoptic Arrays • Active Structured Light • Active TOF Diverse Vision Processors Driving for high performance and low power • Camera ISPs • Dedicated vision IP blocks • DSPs and DSP arrays • Programmable GPUs • Multi-core CPUs Flexible sensor and camera control to generate required image stream Use best processing available for image stream processing – with code portability Control/fuse vision data by/with all other sensor data on device
  7. 7. Dedicated Hardware GPU Compute Multi-core X100 CPU X1 © Copyright Khronos Group 2014 - Page 7 Vision Processing Power Efficiency • Depth sensors = significant processing - Generate/use environmental information • Wearables will need ‘always-on’ vision - With smaller thermal limit / battery than phones! • GPUs has x10 CPU imaging power efficiency - GPUs architected for efficient pixel handling • Traditional cameras have dedicated hardware - ISP = Image Signal Processor – on all SOCs today • SOCs have space for more transistors - But can’t turn on at same time = Dark Silicon • Potential for dedicated sensor/vision silicon - Can trigger full CPU/GPU complex Power Efficiency Computation Flexibility X10 Advanced Sensors Wearables But how to program specialized processors? Performance and Functional Portability
  8. 8. OpenVX – Power Efficient Vision Acceleration • Out-of-the-Box vision acceleration framework © Copyright Khronos Group 2014 - Page 8 - Low-power, real-time, mobile and embedded • Performance portability for diverse hardware - ISPs, Dedicated vision blocks, DSPs and DSP arrays, GPUs, Multi-core CPUs • Suited for low-power, always-on acceleration - Can run solely on dedicated vision hardware • Foundational API for vision acceleration - Can be used by middleware or applications • Complementary to OpenCV - Which is great for prototyping • Khronos open source sample implementation - To be released with final specification Open source sample implementation Hardware vendor implementations OpenCV open source library Other higher-level CV libraries Application
  9. 9. OpenVX Graphs – The Key to Efficiency • Vision processing directed graphs for power and performance efficiency - Each Node can be implemented in software or accelerated hardware - Nodes may be fused by the implementation to eliminate memory transfers - Processing can be tiled to keep data entirely in local memory/cache © Copyright Khronos Group 2014 - Page 9 • VXU Utility Library for access to single nodes - Easy way to start using OpenVX by calling each node independently • EGLStreams can provide data and event interop with other Khronos APIs - BUT use of other Khronos APIs are not mandated OpenVX Node OpenVX Node OpenVX Node OpenVX Node Downstream Application Processing Native Camera Control Example OpenVX Graph
  10. 10. © Copyright Khronos Group 2014 - Page 10 OpenVX 1.0 Function Overview • Core data structures - Images and Image Pyramids - Processing Graphs, Kernels, Parameters • Image Processing - Arithmetic, Logical, and statistical operations - Multichannel Color and BitDepth Extraction and Conversion - 2D Filtering and Morphological operations - Image Resizing and Warping • Core Computer Vision - Pyramid computation - Integral Image computation • Feature Extraction and Tracking - Histogram Computation and Equalization - Canny Edge Detection - Harris and FAST Corner detection - Sparse Optical Flow OpenVX 1.0 defines framework for creating, managing and executing graphs Focused set of widely used functions that are readily accelerated Implementers can add functions as extensions Widely used extensions adopted into future versions of the core OpenVX Specification Evolution
  11. 11. Example Graph - Stereo Machine Vision © Copyright Khronos Group 2014 - Page 11 Camera 1 Compute Depth Map (User Node) Detect and track objects (User Node) Camera 2 Image Pyramid Stereo Rectify with Remap Stereo Rectify with Remap Compute Optical Flow Object coordinates OpenVX Graph Delay Tiling extension enables user nodes (extensions) to also optimally run in local memory
  12. 12. OpenVX and OpenCV are Complementary © Copyright Khronos Group 2014 - Page 12 Governance Community driven open source with no formal specification Formal specification defined and implemented by hardware vendors Conformance No conformance tests for consistency and every vendor implements different subset Full conformance test suite / process creates a reliable acceleration platform Portability APIs can vary depending on processor Hardware abstracted for portability Scope Very wide 1000s of imaging and vision functions Multiple camera APIs/interfaces Tight focus on hardware accelerated functions for mobile vision Use external camera API Efficiency Memory-based architecture Each operation reads and writes memory Graph-based execution Optimizable computation, data transfer Use Case Rapid experimentation Production development & deployment
  13. 13. OpenVX Participants and Timeline • Provisional 1.0 specification released November 2013 for industry feedback © Copyright Khronos Group 2014 - Page 13 - An update to the provisional spec published in July • OpenVX 1.0 final release planned for 2014 - With conformance tests • Itseez is working group chair (the convener of OpenCV) - Qualcomm and TI are specification editors
  14. 14. Applications and Middleware VisionWorks Primitives CUDA Libraries © Copyright Khronos Group 2014 - Page 14 Tegra K1 Classifier Corner Detection 3rd Party Vision Pipeline Samples Object Detection 3rd Party Pipelines … … SLAM VisionWorks Framework NVIDIA VisionWorks Uses OpenVX • VisionWorks library contains diverse vision and imaging primitives • Leverages OpenVX for optimized primitive execution • Can extend VisionWorks nodes through CUDA accelerated primitives • Provided with sample library of fully accelerated pipelines
  15. 15. OpenCL – Portable Heterogeneous Computing • Portable Heterogeneous programming of diverse compute resources - Targeting supercomputers -> embedded systems -> mobile devices • One code tree can be executed on CPUs, GPUs, DSPs and hardware - Dynamically interrogate system load and balance work across available processors © Copyright Khronos Group 2014 - Page 15 • OpenCL = Two APIs and C-based Kernel language - Platform Layer API to query, select and initialize compute devices - Kernel language - Subset of ISO C99 + language extensions - C Runtime API to build and execute kernels across multiple devices OpenCL Kernel Code OpenCL Kernel Code OpenCL Kernel Code OpenCL Kernel Code GPU DSP CPU CPU HW
  16. 16. OpenCL as Parallel Language Backend Harlan High level language for GPU programming Compiler directives for Fortran, C and C++ © Copyright Khronos Group 2014 - Page 16 JavaScript binding for initiation of OpenCL C kernels River Trail Language extensions to JavaScript MulticoreWare open source project on Bitbucket OpenCL provides vendor optimized, cross-platform, cross-vendor access to heterogeneous compute resources Java language extensions for parallelism PyOpenCL Python wrapper around OpenCL Language for image processing and computational photography Embedded array language for Haskell SPIR Standard Portable Intermediate Representation (extending LLVM for parallel computation) SPIR 2.0 Released here at SIGGRAPH
  17. 17. Mixamo - Avatar Videoconferencing • Real time facial animation capture on mobile – ported directly from PC • Animate an avatar while conferencing • Full GPU acceleration of vision processing using OpenCL © Copyright Khronos Group 2014 - Page 17 NVIDIA Tegra K1 Development Board
  18. 18. © Copyright Khronos Group 2014 - Page 18 Khronos APIs for Vision Processing GPU Compute Shaders (OpenGL 4.X and OpenGL ES 3.1) Pervasively available on almost any mobile device or OS Easy integration into graphics apps – no compute API interop needed Program in GLSL not C Limited to acceleration on a single GPU General Purpose Heterogeneous Programming Framework Flexible, low-level access to any devices with OpenCL compiler Single programming and run-time framework for CPUs, GPUs, DSPs, hardware Open standard for any device or OS – being used as backed by many languages and frameworks Needs full compiler stack and IEEE precision Out of the Box Vision Framework - Operators and graph framework library Can run on dedicated hardware – no compiler needed Easier performance portability to diverse hardware Suited for low-power, always-on acceleration Fixed set of operators – but can be extended It is possible to use OpenCL or GLSL to build OpenVX Nodes on programmable devices
  19. 19. © Copyright Khronos Group 2014 - Page 19 Kari Pulli, NVIDIA Research
  20. 20. Advanced Camera Control Use Cases • High-dynamic range (HDR) and computational flash photography - High-speed burst with individual frame control over exposure and flash © Copyright Khronos Group 2014 - Page 20 • Subject isolation and depth detection - High-speed burst with individual frame control over focus • Rolling shutter elimination - High-precision intra-frame synchronization between camera and motion sensor • Augmented Reality - 60Hz, low-latency capture with motion sensor synchronization - Multiple Region of Interest (ROI) capture - Synchronized stereo sensors for scene scaling - Detailed feedback on camera operation per frame • Time-of-flight or structured light depth camera processing - Aligned stacking of data from multiple sensors
  21. 21. © Copyright Khronos Group 2014 - Page 21 Typical Imaging Pipeline Pre-processing Image Signal Processor (ISP) • Pre-processing is non-existent in basic use-cases • Pre- and Post-processing can be done on CPU, GPU, DSP… • ISP controls camera via 3A algorithms Auto Exposure (AE), Auto White Balance (AWB), Auto Focus (AF) Post-processing CMOS sensor Color Filter Array Lens Bayer RGB YUV App Lens, sensor, aperture control
  22. 22. High Dynamic Range (HDR) • HDR works by combining differing exposures into the same image • A variety of methods for HDR, based on application © Copyright Khronos Group 2014 - Page 22 - Multiple frame HDR (requires frame memory) - Interlace HDR - Multiple Zone HDR Short exposure • HDR requires Long exposure Optional mid exposure HDR processing - Precise control over camera parameters (exposure) - Fast capture and processing of multiple images - Note: with interlace HDR, only 1 image is needed
  23. 23. © Copyright Khronos Group 2014 - Page 23 Image stitching, panoramic images • Made with • Requires processing of multiple images • Requires position / geometry information • Requires control of camera (e.g. AE lock)
  24. 24. Typical Burst Sequence Applications © Copyright Khronos Group 2014 - Page 24
  25. 25. © Copyright Khronos Group 2014 - Page 25 Pipelined Sensor Model • Traditional one-shot sensor model - Need to know which parameters were used -  reset pipeline between shots  slow • Viewfinding / video mode: - Pipelined, high frame rate - Settings changes take effect later • Need new model for Computational Photography - Need parameterized SEQUENCE of images to feed advanced algorithms • Real image sensors are pipelined - While one frame exposing - Next one is being prepared - Previous one is being read out
  26. 26. Need for Camera Control API - OpenKCAM • Advanced control of ISP and camera subsystem – with cross-platform portability - Generate sophisticated image stream for advanced imaging & vision apps © Copyright Khronos Group 2014 - Page 26 • No platform API currently fulfills all developer requirements - Portable access to growing sensor diversity: e.g. depth sensors and sensor arrays - Cross sensor synch: e.g. synch of camera and MEMS sensors - Advanced, high-frequency per-frame burst control of camera/sensor: e.g. ROI - Multiple input, output re-circulating streams with RAW, Bayer or YUV Processing Image Signal Processor (ISP) Image/Vision Applications Defines control of Sensor, Color Filter Array Lens, Flash, Focus, Aperture Auto Exposure (AE) Auto White Balance (AWB) Auto Focus (AF) EGLStreams
  27. 27. OpenKCAM API Requirements • Provide functional portability for advanced camera applications - Reduce extreme fragmentation for ISVs wanting more than point and shoot © Copyright Khronos Group 2014 - Page 27 • Application control over ISP processing (including 3A) - Including multiple, re-entrant ISPs • Control multiple sensors with synch and alignment - E.g. Stereo pairs, Plenoptic arrays, TOF or structured light depth cameras • Enhanced per frame detailed control - Format flexibility, Region of Interest (ROI) selection • Global timing & synchronization - E.g. Between cameras and MEMS sensors • Flexible processing/streaming - Multiple input and output streams with RAW, Bayer or YUV Processing - Streaming of rows (not just frames) Enable advanced camera functionality not available on current platforms
  28. 28. © Copyright Khronos Group 2014 - Page 28 OpenKCAM is FCAM-based • FCAM (2010) Stanford/Nokia, open source • Capture stream of camera images with precision control - A pipeline that converts requests into image stream - All parameters packed into the requests - no visible state - Programmer has full control over sensor settings for each frame in stream • Control over focus and flash - No hidden daemon running • Control ISP - Can access supplemental statistics from ISP if available • No global state - State travels with image requests - Every pipeline stage may have different state - Enables fast, deterministic state changes
  29. 29. © Copyright Khronos Group 2014 - Page 29 OpenKCAM Design Philosophy • C-language API starting from proven designs - e.g. FCAM, Android camera platform • Design alignment with widely used hardware standards - e.g. MIPI CSI • Focus on mobile, power-limited devices - But do not preclude other use cases such as automotive, DSLR… • Minimize overlap and maximize interoperability with other Khronos APIs - But other Khronos APIs are not required • Provide support for vendor-specific extensions
  30. 30. © Copyright Khronos Group 2014 - Page 30 Potential Adoption on Android • Android Exposes Java camera APIs to developers - Controls underlying Camera HAL • Camera HAL v1 API simplified basic point and shoot apps - Difficult or impossible to do much else • Camera HAL v3 API is a fundamentally different API - Streams-based to enable more sophisticated camera applications Open source project developed by Nokia and Stanford Camera API OpenKCAM builds on FCAM with a goal of being forward compatible with Android HAL V3 adopts many FCAM ideas and can use EGL in its implementation architecture OpenKCAM may be used to IMPLEMENT Android Camera HAL – and provide an advanced native camera API in NDK
  31. 31. Participating Companies and Milestones Specification ratification © Copyright Khronos Group 2014 - Page 31 Apr13 Group charter approved Jul13 3Q14 Sample implementation and tests 1Q15
  32. 32. OpenKCAM Working Group • Royalty free API for portable access to advanced mobile camera functionality - Reduce fragmentation and encourage more advanced camera applications • Control for the new wave of sensors to enable advanced imaging and vision © Copyright Khronos Group 2014 - Page 32 - Multiple sensors, depth cameras, synchronized sensors • Provide sophisticated camera functionality not available on today’s platforms - But work to enable easy adoption by platform vendors • Eager to contribute? Join Khronos OpenKCAM WG! - • Mikaël Bourges-Sévenier -
  33. 33. © Copyright Khronos Group 2014 - Page 33 Neil Trevett, NVIDIA
  34. 34. © Copyright Khronos Group 2014 - Page 34 Sensor Industry Fragmentation …
  35. 35. © Copyright Khronos Group 2014 - Page 35 Sensor Types • Basic sensor data: - Acceleration, Magnetic Field, Angular Rates - Pressure, Ambient Light, Proximity, Temperature, Humidity, RGB light, UV light - Heart rate, Blood Oxygen Level, Skin Hydration, Breathalyzer • Sensor fusion - Orientation (Quaternion or Euler Angles), Gravity, Linear Acceleration - Position • Context awareness - Device Motion: general movement of the device: still, free-fall, … - Carry: how the device is being held by a user: in pocket, in hand, … - Posture: how the body holding the device is positioned: standing, sitting, step, … - Transport: about the environment around the device: in elevator, in car, …
  36. 36. © Copyright Khronos Group 2014 - Page 36 Low-level Sensor Abstraction API Apps Need Sophisticated Access to Sensor Data Without coding to specific sensor hardware Apps request semantic sensor information StreamInput defines possible requests, e.g. Read Physical or Virtual Sensors e.g. “Game Quaternion” Context detection e.g. “Am I in an elevator?” StreamInput processing graph provides optimized sensor data stream High-value, smart sensor fusion middleware can connect to apps in a portable way Apps can gain ‘magical’ situational awareness Advanced Sensors Everywhere Multi-axis motion/position, quaternions, context-awareness, gestures, activity monitoring, health and environmental sensors Sensor Discoverability Sensor Code Portability
  37. 37. © Copyright Khronos Group 2014 - Page 37 StreamInput: Platform Integration OS Sensor APIs (E.g. Android SensorManager or iOS CoreMotion) Applications Low-level native API defines portable access to fused sensor data stream and context-awareness … Sensor Sensor Sensor HSuebn sor Hub Middleware (E.g. Context-awareness engines, gaming engines) Flexible native API to integrate where needed depending on existing platform sensor stacks
  38. 38. Sensor OSP Announcement • Proposal to converge OSP (Open Sensor Platform) APIs with StreamInput © Copyright Khronos Group 2014 - Page 38 - Sensor Platforms is StreamInput Spec Editor
  39. 39. Application Portability EGL abstracts graphics context management, surface and buffer binding and rendering © Copyright Khronos Group 2014 - Page 39 EGL 1.5 Released at GDC 2014 • EGL 1.5 brings functionality from multiple extensions into core - Increased reliability and portability • EGLImages - Sharing textures and renderbuffers • Context Robustness - Defending against malicious code • EGLSync objects - Improved OpenGL /OpenCL interop • Platform extensions - Standardized interactions for multiple OS e.g. Android and 64-bit platforms • sRGB colorspace rendering Applications OS and Display Platforms synchronization API Interop EGL provides efficient transfer of data and events between Khronos APIs
  40. 40. © Copyright Khronos Group 2014 - Page 40 Potential EGL Future Directions • EGLImageStream extensions are very powerful today - But need wider implementation in drivers - Stream other types of data – unformatted buffers for metadata and more - GPU-to-GPU streaming and invoking client API activities directly from other client APIs without CPU intervention • Separation of traditional context/surface functionality from “hub” functionality • Support for new Khronos APIs where appropriate - Streaming video + image processing + display use case
  41. 41. Khronos APIs for Augmented Reality Audio Rendering Application on CPUs, GPUs and DSPs © Copyright Khronos Group 2014 - Page 41 AR needs not just advanced sensor processing, vision acceleration, computation and rendering - but also for all these subsystems to work efficiently together Advanced Camera Control and stream generation 3D Rendering and Video Composition On GPU Sensor Fusion Vision Processing MEMS Sensors EGLStream - stream data between APIs Precision timestamps on all sensor samples
  42. 42. Summary • Khronos is building a family of interoperating APIs for portable and power-efficient vision processing • OpenVX 1.0 has been provisionally released and non-members are invited to provide feedback on the forums - • OpenKCAM and StreamInput APIs are currently in design and complement and integrate with OpenVX • Any company is welcome to join Khronos to influence the direction of mobile and embedded vision processing! - $15K annual membership fee for access to all Khronos API working groups - Well-defined IP framework protects your IP and conformant implementations © Copyright Khronos Group 2014 - Page 42 • -