Khronos Overview Nov13


Published on

Following our successful participation at SIGGRAPH Asia 2012 in Singapore, the Khronos Group is excited to demonstrate and educate about Khronos APIs at SIGGRAPH Asia 2013 in Hong Kong. This presentation is the Khronos Overview--the state of the art in open standards for visual computing, by Neil Trevett.

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Khronos Overview Nov13

  1. 1. Khronos Overview The State of the Art in Open Standards for Visual Computing Neil Trevett Khronos President Vice President Mobile Content, NVIDIA © Copyright Khronos Group 2013 - Page 1
  2. 2. Khronos Connects Software to Silicon ROYALTY-FREE, OPEN STANDARD APIs for advanced hardware acceleration Low level silicon to software interfaces needed on every platform Graphics, video, audio, compute, vision, sensor and camera processing Defines the forward looking roadmap for the silicon community Shipping on billions of devices across multiple operating systems Rigorous conformance tests for cross-vendor consistency Khronos is OPEN for any company to join and participate Acceleration APIs BY the Industry FOR the Industry © Copyright Khronos Group 2013 - Page 2
  3. 3. Making a Difference – One API at a Time Well over 1 BILLION people are using what the Khronos members have created together - Every Day… © Copyright Khronos Group 2013 - Page 3
  4. 4. Khronos Standards glTF cooperation with MPEG for 3D Asset Compression! 3D Asset Handling - Advanced Authoring pipelines - 3D Asset Transmission Format with streaming and compression OpenCL 2.0 Finalized! Visual Computing - Object and Terrain Visualization - Advanced scene construction Camera Control API Over 100 companies defining royalty-free APIs to connect software to silicon OpenVX 1.0 Provisional Released! Sensor Processing - Mobile Vision Acceleration - On-device Sensor Fusion Acceleration in the Browser - WebGL for 3D in browsers - WebCL – Heterogeneous Computing for the web WebGL and WebCL Momentum! © Copyright Khronos Group 2013 - Page 4
  5. 5. OpenCL Milestones • 24 month cadence for major OpenCL 2.0 update - Slightly longer than 18 month cadence between versions of OpenCL 1.X • Significant feedback from the developer community on Provisional Specification - Many suggestions were incorporated into the final 2.0 specification - Other feedback will be considered for future specification versions OpenCL 1.1 Specification and conformance tests released Dec08 OpenCL 2.0 Provisional Specification released for public review Jun10 OpenCL 1.0 released. Conformance tests released Dec08 Nov13 Nov11 Jul13 OpenCL 1.2 Specification and conformance tests released OpenCL 2.0 Specification finalized and conformance tests released © Copyright Khronos Group 2013 - Page 5
  6. 6. Key OpenCL 2.0 Features • Shared Virtual Memory - Host and device kernels can directly share complex, pointer-containing data structures such as trees and linked lists, providing significant programming flexibility and eliminating costly data transfers between host and devices • Nested Parallelism - Device kernels can enqueue kernels to the same device with no host interaction, enabling flexible work scheduling paradigms and avoiding the need to transfer execution control and data between the device and host, often significantly offloading host processor bottlenecks • Generic Address Space - Functions can be written without specifying a named address space for arguments, especially useful for those arguments that are declared to be a pointer to a type, eliminating the need for multiple functions to be written for each named address space used in an application © Copyright Khronos Group 2013 - Page 6
  7. 7. Broad OpenCL Implementer Adoption • Multiple conformant implementations shipping on desktop and mobile - For CPUs and GPUs on multiple OS • Android ICD extension released in latest extension specification - OpenCL implementations can be discovered and loaded as a shared object • Multiple implementations shipping in Android NDK - ARM, Imagination, Vivante, Qualcomm, Samsung … © Copyright Khronos Group 2013 - Page 7
  8. 8. OpenCL as Parallel Compute Foundation • 100+ tool chains and languages leveraging OpenCL - Heterogeneous solutions emerging for the most popular programming languages C++ AMP OpenCL HLM WebCL Aparapi Shevlin Park C++ JavaScript binding to Java language Uses Clang syntax/compiler OpenCL for initiation extensions for and LLVM extensions of OpenCL C kernels parallelism River Trail PyOpenCL Harlan Language Python wrapper High level Compiler extensions to around language for GPU directives for JavaScript OpenCL programming Fortran C and C++ OpenCL provides vendor optimized, cross-platform, cross-vendor access to heterogeneous compute resources © Copyright Khronos Group 2013 - Page 8
  9. 9. Widespread Developers Leveraging OpenCL • Broad uptake of OpenCL in commercial applications - For desktop and increasingly mobile apps • “OpenCL” on Sourceforge, Github, Google Code, BitBucket finds over 2,000 projects - x264 - Handbrake - FFMPEG - JPEG - VLC - OpenCV - GIMP - ImageMagick - IrfanView - Hadoop, Memcched - Aparapi – A parallel API (for Java) - Bolt – a Unified Heterogeneous Library - Sumatra – next generation of compute enabled Java - WinZip - Crypto++ - Bullet physics library - Etc. Etc. © Copyright Khronos Group 2013 - Page 9
  10. 10. OpenCL Academic Traction • OpenCL at over 100 Universities Worldwide Teaching multi-faceted programming courses - Research with top-tier Universities globally • Complete University Kits available - Presentation w/instructor & speaker notes - Example code, & sample application • Growing textbook ecosystem - US, Japan, Europe, China and India • Number of papers referencing OpenCL on Google Scholar is growing rapidly - Over 2000 papers in 2012 • Commercial OpenCL training courses - © Copyright Khronos Group 2013 - Page 10
  11. 11. Leveraging Proven Native APIs into HTML5 • Khronos and W3C liaison - Leverage proven native API investments into the Web - Fast API development and deployment - Designed by the hardware community - Familiar foundation reduces developer learning curve HTML Canvas WebVX? Vision Processing Path Rendering Native APIs shipping or Khronos working group JavaScript API shipping, acceleration being developed or work underway WebCAM(!) WebStream? Sensor Fusion Camera control and video processing Camera Control JavaScript Native Possible future JavaScript APIs or acceleration © Copyright Khronos Group 2013 - Page 11
  12. 12. Mobile Web is a Real Time Application 2048x1536 3100K Pixels 326 DPI + 1024x768 786K Pixels 132 DPI 320x480 153K Pixels 163 DPI Apple iPhone Buttery smooth touch interaction needs continuous 60Hz updates Apple iPad = Apple iPad Mini In 5 years the number of pixels to process on mobile screens has gone up by factor of TWENTY Need GPU Acceleration for everything Web! © Copyright Khronos Group 2013 - Page 12
  13. 13. WebGL Availability in Browsers Much WebGL content uses three.js library: - Microsoft – “where you have IE11, you have WebGL – turned on by default and working all the time” - Microsoft - WebGL also enabled for Windows applications - web app framework and web view - Apple - WebGL must be explicitly turned on MAC Safari and only exposed on iOS for iAds - Chrome OS - WebGL is the only cross-platform API to program the GPU - Google IO announcement - Chrome on Android will soon launch with WebGL © Copyright Khronos Group 2013 - Page 13
  14. 14. Microsoft PhotoSynth2 • Demonstrated at Build 2013 1:50 © Copyright Khronos Group 2013 - Page 14
  15. 15. Cross-OS Portability HTML/CSS SDK C/C++ Dalvik (Java) HTML/CSS Objective C HTML/CSS HTML5 provides cross platform portability. GPU accessibility through WebGL available soon on ~90% mobile systems C# Preferred development environments not designed for portability DirectX Native code is portablebut apps must cope with different available APIs and libraries © Copyright Khronos Group 2013 - Page 15
  16. 16. OpenGL 3D API Family Tree WebGL 2.0 is in development now will bring OpenGL ES 3.0 functionality to the Web ES3 is backward compatible so new features can be added incrementally Programmable vertex and fragment shaders Fixed function 3D Pipeline OpenGL ES 2.0 Content OpenGL ES 1.1 Content OpenGL ES 3.0 Content Mobile 3D WebGL 1.0 OpenGL ES 1.1 OpenGL ES 2.0 WebGL 2.0 OpenGL ES 3.0 ES-Next OpenGL ES 1.0 OpenGL 1.3 OpenGL 1.5 OpenGL 2.0 OpenGL 2.1 OpenGL 3.1 OpenGL 3.3 OpenGL 4.2 OpenGL 4.3 OpenGL 4.4 OpenGL 4.0 OpenGL 3.0 OpenGL 3.2 OpenGL 4.1 OpenGL 4.4 is a superset of DX11 Desktop 3D 2002 2003 2004 GL-Next 2005 2006 2007 2008 2009 2010 2011 2012 2013 © Copyright Khronos Group 2013 - Page 16
  17. 17. OpenGL ES 3.0 Highlights • Better looking, faster performing games and apps – at lower power - Incorporates proven features from OpenGL 3.3 / 4.x - 32-bit integers and floats in shader programs - NPOT, 3D textures, depth textures, texture arrays - Multiple Render Targets for deferred rendering, Occlusion Queries - Instanced Rendering, Transform Feedback … • Make life better for the programmer - Tighter requirements for supported features to reduce implementation variability • Backward compatible with OpenGL ES 2.0 - OpenGL ES 2.0 apps continue to run unmodified • Standardized Texture Compression - #1 developer request! © Copyright Khronos Group 2013 - Page 17
  18. 18. 3D Needs a Transmission Format! • Compression and streaming of 3D assets becoming essential - Mobile and connected devices need access to increasingly large asset databases • 3D is the last media type to define a compressed format - 3D is more complex – diverse asset types and use cases • Needs to be royalty-free - Avoid an ‘internet video codec war’ scenario • Eventually enable hardware implementations of successful codecs - High-performance and low power – but pragmatic adoption strategy is key Audio Video Images 3D MP3 H.264 JPEG ? ! An effective and widely adopted codec ignites previously unimagined opportunities for a media type © Copyright Khronos Group 2013 - Page 18
  19. 19. glTF – OpenGL Transmission Format • Binary file format for efficient transmission for 3D assets - Reduce network bandwidth and minimize client processing overhead • Run-time neutral - DO NOT IMPLY OR MANDATE ANY RUN-TIME BEHAVIOR - Can be used by any app or run-time – usually WebGL accelerated • Scalable to handle compression and streaming - Though baseline format does not include compression • ‘Direct load efficiency’ for WebGL - Little or NO processing to drop glTF data into WebGL client • Carry conditioned data from any authoring format - Prototyping and optimizing efficient handling of COLLADA assets Authoring Playback A standards-based content pipeline for rich native and Web 3D applications © Copyright Khronos Group 2013 - Page 19
  20. 20. COLLADA and glTF Open Source Ecosystem OpenCOLLADA Importer/Exporter and COLLADA Conformance Tests On GitHUB Tool Interop COLLADA2GLTF Translator Other authoring formats Web-based Tools Pervasive WebGL deployment Three.js glTF Importer. Rest3D initiative © Copyright Khronos Group 2013 - Page 20
  21. 21. WebGL as Test-bed for 3D Asset Compression • Integrating and benchmarking 3D geometry compression formats with glTF - Baseline is GZIP • Scalable Complexity 3D Mesh Compression codec MPEG-SC3DMC - Royalty-free graphics compression technology from MPEG (MIT License) - Open3DGC is efficient JavaScript and C/C++ implementation - Convertor using Open3DGC to compress 3D Meshes, Skinning, Animations - • WebGL-loader is Google lightweight compression for WebGL content • OpenCTM uses LZMA compression © Copyright Khronos Group 2013 - Page 21
  22. 22. Initial Compression Results • Compression Efficiency - Gzip (default level=6) - OpenCTM (default settings) - Open3DGC and Webgl-loader 300 Size (MBytes) - Positions on 14 bits - Normals and texCoords on 10 bits 400 Gzip OpenCTM 200 Webgl-loader + Gzip Open3DGC-ASCII + Gzip 100 Open3DGC-Binary 0 CAD (3748 models) 3D Scanned (78 models) MPEG dataset (1211 models) Open3DGC is 5x-9x more efficient than Gzip 1.3x-2.4x more efficient than OpenCTM and 1.2x-1.5x more efficient than webgl-loader © Copyright Khronos Group 2013 - Page 22
  23. 23. OpenVX – Power Efficient Vision Processing • Acceleration API for real-time vision - Focus on mobile and embedded systems • Diversity of efficient implementations - From programmable processors, through GPUs to dedicated hardware pipelines • Tightly specified API with conformance - Portable, production-grade vision functions Application Other higher-level CV libraries OpenCV open source library • Complementary to OpenCV - Which is great for prototyping Open source sample implementation Hardware vendor implementations Acceleration for power-efficient vision processing © Copyright Khronos Group 2013 - Page 23
  24. 24. OpenVX Graphs • Vision processing directed graphs for power and performance efficiency - Each Node can be implemented in software or accelerated hardware - Nodes may be fused by the implementation to eliminate memory transfers - Tiling extension enables user nodes (extensions) to also run in local memory • VXU Utility Library for access to single nodes - Easy way to start using OpenVX • EGLStreams can provide data and event interop with other APIs - BUT use of other Khronos APIs are not mandated Native Camera Control OpenVX Node OpenVX Node OpenVX Node Example Graph and Flow OpenVX Node Heterogeneous Processing © Copyright Khronos Group 2013 - Page 24
  25. 25. OpenVX 1.0 Function Overview • Core data structures - Images and Image Pyramids - Processing Graphs, Kernels, Parameters • Image Processing - Arithmetic, Logical, and statistical operations - Multichannel Color and BitDepth Extraction and Conversion - 2D Filtering and Morphological operations - Image Resizing and Warping • Core Computer Vision - Pyramid computation - Integral Image computation • Feature Extraction and Tracking - Histogram Computation and Equalization - Canny Edge Detection - Harris and FAST Corner detection - Sparse Optical Flow © Copyright Khronos Group 2013 - Page 25
  26. 26. OpenVX Participants and Timeline • Aiming for specification finalization by mid-2014 • Itseez is working group chair • Qualcomm and TI are specification editors © Copyright Khronos Group 2013 - Page 26
  27. 27. OpenVX and OpenCV are Complementary Governance Open Source Community Driven No formal specification Formal specification and conformance tests Implemented by hardware vendors Scope Very wide 1000s of functions of imaging and vision Multiple camera APIs/interfaces Tight focus on hardware accelerated functions for mobile vision Use external camera API Conformance No Conformance testing Every vendor implements different subset Full conformance test suite / process Reliable acceleration platform Use Case Rapid prototyping Production deployment Efficiency Memory-based architecture Each operation reads and writes memory Graph-based execution Optimizable computation, data transfer Portability APIs can vary depending on processor Hardware abstracted for portability © Copyright Khronos Group 2013 - Page 27
  28. 28. OpenVX and OpenCL are Complementary Use Case General Heterogeneous programming Domain targeted - vision processing Architecture Language-based – needs online compilation Library-based - no online compiler required Target Hardware ‘Exposed’ architected memory model – can impact performance portability Abstracted node and memory model diverse implementations can be optimized for power and performance Precision Full IEEE floating point mandated Minimal floating point requirements – optimized for vision operators Ease of Use Focus on general-purpose math libraries with no built-in vision functions Fully implemented vision operators and framework ‘out of the box’ © Copyright Khronos Group 2013 - Page 28
  29. 29. Typical Imaging Pipeline • Pre- and Post-processing can be done on CPU, GPU, DSP… • ISP controls camera via 3A algorithms Auto Exposure (AE), Auto White Balance (AWB), Auto Focus (AF) • ISP may be a separate chip or within Application Processor Lens, sensor, aperture control 3A Bayer RGB/YUV Pre-processing CMOS sensor Color Filter Array Lens Image Signal Processor (ISP) Postprocessing App Need for advanced camera control API: - to drive more flexible app camera control - over more types of camera sensors - with tighter integration with the rest of the system © Copyright Khronos Group 2013 - Page 29
  30. 30. Khronos Camera API • Catalyze camera functionality not available on any current platform - Open API that aligns with future platform direction for easy adoption - E.g. could be used to implement future versions of Android Camera HAL • More detailed control per frame - Focus, flash, format, Region of Interest (ROI) selection • Global Timing & Synchronization - E.g. Between cameras and MEMS sensors • Application control over ISP processing (including 3A) - Including multiple, re-entrant ISPs • Control multiple sensors with synch and alignment - Stereo pairs, Plenoptic arrays, TOF or structured light depth cameras • Flexible processing/streaming - Multiple output streams and streaming rows (not just frames) - RAW, Bayer and YUV Processing © Copyright Khronos Group 2013 - Page 30
  31. 31. Camera API Design Philosophy • C-language API starting from proven designs - e.g. FCAM, Android Camera HAL V3 • Design alignment with widely used hardware standards - e.g. MIPI CSI • Focus on mobile, power-limited devices - But do not preclude other use cases such as automotive, surveillance, DSLR… • Minimize overlap and maximize interoperability with other Khronos APIs - But other Khronos APIs are not required • Provide support for vendor-specific extensions Group charter approved Apr13 First draft specification 2Q14 4Q13 Jul13 1Q14 Provisional specification Specification ratification Sample implementation and tests 3Q14 © Copyright Khronos Group 2013 - Page 31
  32. 32. ‘Always On’ Camera and Sensor Processing • Visual sensor revolution – driving need for significant vision acceleration - Multi-sensors: Stereo pairs -> Plenoptic arrays -> Active depth cameras • Devices should be always environmentally-aware – e.g. ‘wave to wake’ - BUT many sensor use cases consume too much power to actually run 24/7 • Smart use of sensors to trigger levels of processing capability - ‘Scanners’ - very low power, always on, detect events in the environment ARM 7 DSP / Hardware GPU / Hardware 1 MIP and accelerometers can detect someone in the vicinity Low power activation of camera to detect someone in field of view Maximum acceleration for processing full depth sensor capability © Copyright Khronos Group 2013 - Page 32
  33. 33. Sensor Industry Fragmentation … © Copyright Khronos Group 2013 - Page 33
  34. 34. StreamInput - Sensor Fusion • Defines access to high-quality fused sensor stream and context changes - Implementers can optimize and innovate generation of the sensor stream Applications Platforms can provide increased access to improved sensor data stream – driving faster, deeper sensor usage by applications StreamInput implementations compete on sensor stream quality, reduced power consumption, environment triggering and context detection – enabling sensor subsystem vendors to increased ADDED VALUE OS Sensor OS APIs Middleware (E.g. Android SensorManager or iOS CoreMotion) Middleware engines need platformportable access to native, low-level sensor data stream (E.g. Augmented Reality engines, gaming engines) Low-level native API defines access to fused sensor data stream and context-awareness Sensor Sensor … Mobile or embedded platforms without sensor fusion APIs can provide direct application access to StreamInput Sensor Sensor Hub Hub © Copyright Khronos Group 2013 - Page 34
  35. 35. Khronos APIs for Augmented Reality AR needs not just advanced sensor processing, vision acceleration, computation and rendering - but also for all these subsystems to work efficiently together MEMS Sensors Sensor Fusion Application on CPUs, GPUs and DSPs Vision Processing Precision timestamps on all sensor samples Advanced Camera Control and stream generation Audio Rendering EGLStream stream data between APIs 3D Rendering and Video Composition On GPU Camera Control API © Copyright Khronos Group 2013 - Page 35
  36. 36. Khronos DevU In Depth Sessions Today © Copyright Khronos Group 2013 - Page 36