Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Graphics Technology Update
Presented by:
Erik Noreke, Khronos Group Vice President of Business Development

November 2013
...
Khronos Connects Software to Silicon
ROYALTY-FREE, OPEN STANDARD APIs for
advanced hardware acceleration

Low level silico...
Power is the New Limit to Performance
• GPUs are much more power efficient than CPUs for data parallelism
- When exploitin...
OpenCL – Heterogeneous Computing
• Native framework for programming diverse
parallel computing resources
- CPU, GPU, DSP –...
OpenCL Overview
• C Platform Layer API
- Query, select and initialize compute devices
• Kernel Language Specification
- Su...
OpenCL: Execution Model
• Kernel
- Basic unit of executable code ~ C function
- Data-parallel or task-parallel
• Program
-...
OpenCL Built-in Kernels
• Used to control non-OpenCL C-capable
resources on an SOC – ‘Custom Devices’
- E.g. Video encode/...
OpenCL Roadmap
OpenCL HLM (High Level Model)
High-level programming model, unifying host and device execution environments...
OpenCL Milestones
• 24 month cadence for major OpenCL 2.0 update
- Slightly longer than18 month cadence between versions o...
Mobile OpenCL Shipping
• Android ICD extension released in latest extension specification
- OpenCL implementations can be ...
Key OpenCL 2.0 Features
• Shared Virtual Memory
- Host and device kernels can directly share complex, pointer-containing d...
Key OpenCL 2.0 Features – continued…
• Images
- Improved image support including sRGB images and 3D image writes, the abil...
OpenCL as Parallel Compute Foundation

C++ AMP

OpenCL HLM

WebCL

Aparapi

JavaScript binding to Java language
C++
Shevli...
OpenGL 3D API Family Tree
ES3 is backward compatible
so new features can be
added incrementally

Programmable vertex
and f...
OpenGL ES 3.0 Highlights
• Better looking, faster performing games and apps – at lower power
- Incorporates proven feature...
Accelerating OpenGL Innovation
Bringing state-of-theart functionality to
cross-platform
graphics

OpenGL 4.4
OpenGL 4.3
Op...
OpenGL 4.3 Compute Shaders
• Execute algorithmically general-purpose GLSL shaders
- Can operate on uniforms, images and te...
OpenCL and OpenGL Compute Shaders
• OpenGL compute shaders and OpenCL support distinctly different use cases
- OpenCL prov...
OpenGL 4.4 reference pages

Huge thanks to Graham Sellers!!!
© Copyright Khronos Group, 2013 - Page 19
OpenGL Conformance Test Suite released!




Conformance submissions are required for GL 4.4 implementations

encouraged...
Leveraging Proven Native APIs into HTML5
• Khronos and W3C liaison
- Leverage proven native API investments into the Web
-...
Zygote Body, formerly Google Body
• Rendering in Zygote Body uses WebGL

www.zygote.com
www.zygotebody.com
© Copyright Khr...
WebGL Implementation Anatomy
Content downloaded from the Web.
Middleware can make WebGL accessible to
non-expert 3D progra...
WebGL Availability in Browsers

Much WebGL content uses
three.js library:
http://threejs.org/

- Microsoft – “where you ha...
Sectional Anatomy: MR Knee
• //sectional-anatomy.org/

© Copyright Khronos Group, 2013 - Page 25
Sectional Anatomy: MR Knee
• //sectional-anatomy.org/

© Copyright Khronos Group, 2013 - Page 26
Cross-OS Portability

HTML/CSS

SDK

C/C++

Dalvik
(Java)

HTML/CSS

Objective C

HTML/CSS

HTML5 provides cross
platform ...
WebGL First Wave Application Categories
• Maps and Navigation
• Modeling Tools and Repositories
• Games

• 3D Printing
• V...
WebCL – Parallel Computing for the Web
• JavaScript bindings to OpenCL APIs
- Enables initiation of Kernels written in Ope...
3D Needs a Transmission Format!
• Compression and streaming of 3D assets becoming essential
- Mobile and connected devices...
glTF Goals
• Binary file format for efficient transmission for 3D assets
- Reduce network bandwidth and minimize client pr...
COLLADA and glTF Open Source Ecosystem
OpenCOLLADA
Importer/Exporter
and COLLADA
Conformance Tests
On GitHUB

Tool Interop...
WebGL as Test-bed for 3D Asset Compression
• Integrating and benchmarking 3D geometry compression formats with glTF
- Base...
Compression Example Results Overview
• Early days – Khronos embarking on methodical analysis using glTF as test-bed
• For ...
Texture Compression is Key
• Texture compression saves precious resources
- Network bandwidth, device memory space AND dev...
ASTC – Universal Texture Standard
• Adaptive Scalable Texture Compression (ASTC)
- Quality significantly exceeds S3TC or P...
Conclusion
• Hardware acceleration is a complex application domain and needs multiple
standards across diverse domains
• A...
Upcoming SlideShare
Loading in …5
×

2013 Korean tour Daegu

803 views

Published on

Khronos toured Korea in November 2013. Erik Noreke VP of Business Development visited "Human Care Center Workshop" and "Interaction Standard Workshop". Additional details may be found on the Khronos Group event page https://www.khronos.org/news/events/korean-tour-2013

Published in: Technology
  • Be the first to comment

2013 Korean tour Daegu

  1. 1. Graphics Technology Update Presented by: Erik Noreke, Khronos Group Vice President of Business Development November 2013 © Copyright Khronos Group, 2013 - Page 1
  2. 2. Khronos Connects Software to Silicon ROYALTY-FREE, OPEN STANDARD APIs for advanced hardware acceleration Low level silicon to software interfaces needed on every platform Graphics, video, audio, compute, vision, sensor and camera processing Defines the forward looking roadmap for the silicon community Shipping on billions of devices across multiple operating systems Rigorous conformance tests for cross-vendor consistency Khronos is OPEN for any company to join and participate Acceleration APIs BY the Industry FOR the Industry © Copyright Khronos Group, 2013 - Page 2
  3. 3. Power is the New Limit to Performance • GPUs are much more power efficient than CPUs for data parallelism - When exploiting data parallelism can x10 as efficient – but can go further… • Lots of space for transistors on SOC – but can’t turn them all on at same time! - Would exceed Thermal Design Point • Dark Silicon - specialized hardware – only turned on when needed - Dedicated units can increase locality and parallelism of computation X100 Power Efficiency How do we provide access to this diversity of processors and hardware without horrible platform fragmentation? Standards! X10 X1 Dedicated Hardware Enabling new mobile use cases requires pushing computation onto GPUs and dedicated hardware GPU Compute Multi-core CPU Computation Flexibility © Copyright Khronos Group, 2013 - Page 3
  4. 4. OpenCL – Heterogeneous Computing • Native framework for programming diverse parallel computing resources - CPU, GPU, DSP – as well as hardware blocks(!) • Powerful, low-level flexibility - Foundational access to compute resources for higher-level engines, frameworks and languages • Embedded profile - No need for a separate “ES” spec - Reduces precision requirements A cross-platform, cross-vendor standard for harnessing all the compute resources in an SOC OpenCL Kernel OpenCL Code OpenCL Kernel Code OpenCL Kernel Code Kernel Code GPU DSP HW CPU CPU One code tree can be executed on CPUs, GPUs, DSPs and hardware. Dynamically interrogate system load and load balance work across available processors © Copyright Khronos Group, 2013 - Page 4
  5. 5. OpenCL Overview • C Platform Layer API - Query, select and initialize compute devices • Kernel Language Specification - Subset of ISO C99 with language extensions - Well-defined numerical accuracy - IEEE 754 rounding with specified max error - Rich set of built-in functions: cross, dot, sin, cos, pow, log … • C Runtime API - Runtime or build-time compilation of kernels - Execute compute kernels across multiple devices • Memory management is explicit - Application must move data from host  global  local and back - Implementations can optimize data movement in unified memory systems © Copyright Khronos Group, 2013 - Page 5
  6. 6. OpenCL: Execution Model • Kernel - Basic unit of executable code ~ C function - Data-parallel or task-parallel • Program - Collection of kernels and functions ~ dynamic library with run-time linking • Command Queue - Applications queue kernels & data transfers - Performed in-order or out-of-order • Work-item - An execution of a kernel by a processing element ~ thread Example of parallelism types • Work-group - A collection of related work-items that execute on a single compute unit ~ core © Copyright Khronos Group, 2013 - Page 6
  7. 7. OpenCL Built-in Kernels • Used to control non-OpenCL C-capable resources on an SOC – ‘Custom Devices’ - E.g. Video encode/decode, Camera ISP … • Represent functions of Custom Devices as an OpenCL kernel - Can enqueue Built-in Kernels to Custom Devices alongside standard OpenCL kernels • OpenCL run-time a powerful coordinating framework for ALL SOC resources - Programmable and custom devices controlled by one run-time Built-in kernels enable control of specialized processors and hardware from OpenCL run-time © Copyright Khronos Group, 2013 - Page 7
  8. 8. OpenCL Roadmap OpenCL HLM (High Level Model) High-level programming model, unifying host and device execution environments through language syntax for increased usability and broader optimization opportunities OpenCL 2.0 Provisional released! OpenCL 2.0 Significant enhancements to memory and execution models to expose emerging hardware capabilities and provide increased flexibility, functionality and performance to developers OpenCL SPIR 1.2 Provisional released! OpenCL SPIR (Standard Parallel Intermediate Representation) LLVM-based, low-level Intermediate Representation for IP Protection and as target back-end for alternative high-level languages © Copyright Khronos Group, 2013 - Page 8
  9. 9. OpenCL Milestones • 24 month cadence for major OpenCL 2.0 update - Slightly longer than18 month cadence between versions of OpenCL 1.X • Provisional Specification enables public review - Warning! The spec may change before final release! Dec08 OpenCL 1.1 Specification and conformance tests released Nov11 Jun10 OpenCL 1.0 released. Conformance tests released Dec08 OpenCL 2.0 Provisional Specification released for public review Jul13 OpenCL 1.2 Specification and conformance tests released Within 6 months (depends on feedback) OpenCL 2.0 Specification finalized and conformance tests released © Copyright Khronos Group, 2013 - Page 9
  10. 10. Mobile OpenCL Shipping • Android ICD extension released in latest extension specification - OpenCL implementations can be discovered and loaded as a shared object • Multiple implementations shipping in Android NDK - ARM, Imagination, Vivante, Qualcomm, Samsung … © Copyright Khronos Group, 2013 - Page 10
  11. 11. Key OpenCL 2.0 Features • Shared Virtual Memory - Host and device kernels can directly share complex, pointer-containing data structures such as trees and linked lists, providing significant programming flexibility and eliminating costly data transfers between host and devices • Dynamic Parallelism - Device kernels can enqueue kernels to the same device with no host interaction, enabling flexible work scheduling paradigms and avoiding the need to transfer execution control and data between the device and host, often significantly offloading host processor bottlenecks • Generic Address Space - Functions can be written without specifying a named address space for arguments, especially useful for those arguments that are declared to be a pointer to a type, eliminating the need for multiple functions to be written for each named address space used in an application © Copyright Khronos Group, 2013 - Page 11
  12. 12. Key OpenCL 2.0 Features – continued… • Images - Improved image support including sRGB images and 3D image writes, the ability for kernels to read from and write to the same image, and the creation of OpenCL images from a mip-mapped or a multi-sampled OpenGL texture for improved OpenGL interop • C11 Atomics - Subset of C11 atomics and synchronization operations to enable assignments in one work-item to be visible to other work-items in a work-group, across workgroups executing on a device or for sharing data between OpenCL device and host • Pipes - Pipes are memory objects that store data organized as a FIFO. OpenCL 2.0 provides built-in functions for kernels to read from or write pipes, providing straightforward programming that can be highly optimized by implementers © Copyright Khronos Group, 2013 - Page 12
  13. 13. OpenCL as Parallel Compute Foundation C++ AMP OpenCL HLM WebCL Aparapi JavaScript binding to Java language C++ Shevlin Park Uses Clang syntax/compiler OpenCL for initiation extensions for parallelism of OpenCL C kernels extensions and LLVM River Trail PyOpenCL Harlan Compiler High level Python wrapper Language directives for language for GPU around extensions to Fortran C and C++ programming OpenCL JavaScript OpenCL provides vendor optimized, cross-platform, cross-vendor access to heterogeneous compute resources © Copyright Khronos Group, 2013 - Page 13
  14. 14. OpenGL 3D API Family Tree ES3 is backward compatible so new features can be added incrementally Programmable vertex and fragment shaders Fixed function 3D Pipeline OpenGL ES 2.0 Content OpenGL ES 1.1 Content OpenGL ES 3.0 Content Mobile 3D WebGL 1.0 OpenGL ES 1.1 OpenGL ES 2.0 WebGL-Next OpenGL ES 3.0 ES-Next OpenGL ES 1.0 OpenGL 1.3 OpenGL 1.5 OpenGL 2.0 OpenGL 2.1 OpenGL 3.1 OpenGL 3.3 OpenGL 4.2 OpenGL 4.3 OpenGL 4.4 OpenGL 4.0 OpenGL 3.0 OpenGL 3.2 OpenGL 4.1 OpenGL 4.4 is a superset of DX11 Desktop 3D 2002 2003 2004 GL-Next 2005 2006 2007 2008 2009 2010 2011 2012 2013 © Copyright Khronos Group, 2013 - Page 14
  15. 15. OpenGL ES 3.0 Highlights • Better looking, faster performing games and apps – at lower power - Incorporates proven features from OpenGL 3.3 / 4.x - 32-bit integers and floats in shader programs - NPOT, 3D textures, depth textures, texture arrays - Multiple Render Targets for deferred rendering, Occlusion Queries - Instanced Rendering, Transform Feedback … • Make life better for the programmer - Tighter requirements for supported features to reduce implementation variability • Backward compatible with OpenGL ES 2.0 - OpenGL ES 2.0 apps continue to run unmodified • Standardized Texture Compression - #1 developer request! © Copyright Khronos Group, 2013 - Page 15
  16. 16. Accelerating OpenGL Innovation Bringing state-of-theart functionality to cross-platform graphics OpenGL 4.4 OpenGL 4.3 OpenGL 4.2 OpenGL 4.1 OpenGL 3.3/4.0 OpenGL 3.2 OpenGL 3.1 OpenGL 2.0 2004 OpenGL 2.1 2005 DirectX 9.0c 2006 2007 DirectX 10.0 OpenGL 3.0 2008 2009 DirectX 10.1 2010 DirectX 11 2011 2012 2013 DirectX 11.1 © Copyright Khronos Group, 2013 - Page 16
  17. 17. OpenGL 4.3 Compute Shaders • Execute algorithmically general-purpose GLSL shaders - Can operate on uniforms, images and textures • Process graphics data in the context of the graphics pipeline - Easier than interoperating with a compute API IF processing ‘close to the pixel’ • Standard part of all OpenGL 4.3 implementations - Matches DX11 DirectCompute functionality Physics AI Simulation Ray Tracing Imaging Global Illumination © Copyright Khronos Group, 2013 - Page 17
  18. 18. OpenCL and OpenGL Compute Shaders • OpenGL compute shaders and OpenCL support distinctly different use cases - OpenCL provides a significantly more powerful and complete compute solution 1. Full ANSI C programming of heterogeneous CPUs and GPUs 2. Utilize multiple processors 3. Precisely defined IEEE accuracy 1. Fine grain compute operations inside OpenGL 2. GLSL Shading Language 3. Execute on single GPU only Enhanced 3D Graphics apps “Shaders++” Imaging Video Physics AI Pure compute apps touching no pixels Compute Shaders © Copyright Khronos Group, 2013 - Page 18
  19. 19. OpenGL 4.4 reference pages Huge thanks to Graham Sellers!!! © Copyright Khronos Group, 2013 - Page 19
  20. 20. OpenGL Conformance Test Suite released!   Conformance submissions are required for GL 4.4 implementations  encouraged for earlier driver versions Shared codebase with OpenGL ES 3.0 CTS  additional desktop-specific tests  Core profile functionality  Enhancements underway to add more coverage © Copyright Khronos Group, 2013 - Page 20
  21. 21. Leveraging Proven Native APIs into HTML5 • Khronos and W3C liaison - Leverage proven native API investments into the Web - Fast API development and deployment - Designed by the hardware community - Familiar foundation reduces developer learning curve HTML Canvas WebVX? Vision Processing Path Rendering Native APIs shipping or Khronos working group JavaScript API shipping, acceleration being developed or work underway WebCAM(!) WebStream? Sensor Fusion Camera control and video processing Camera Control JavaScript Native Possible future JavaScript APIs or acceleration © Copyright Khronos Group, 2013 - Page 21
  22. 22. Zygote Body, formerly Google Body • Rendering in Zygote Body uses WebGL www.zygote.com www.zygotebody.com © Copyright Khronos Group, 2013 - Page 22
  23. 23. WebGL Implementation Anatomy Content downloaded from the Web. Middleware can make WebGL accessible to non-expert 3D programmers Content JavaScript, HTML, CSS, ... JavaScript Middleware Browser provides WebGL functionality alongside other HTML5 technologies - no plug-in required OS Provided Drivers. WebGL on Windows can use Google Angle to create conformant OpenGL ES 2.0 over DX9 HTML5 JavaScript CSS OpenGL ES 2.0 OpenGL DX9/Angle © Copyright Khronos Group, 2013 - Page 23
  24. 24. WebGL Availability in Browsers Much WebGL content uses three.js library: http://threejs.org/ - Microsoft – “where you have IE11, you have WebGL – turned on by default and working all the time” - Microsoft - WebGL also enabled for Windows applications - web app framework and web view - Apple - WebGL must be explicitly turned on MAC Safari and only exposed on iOS for iAds - Chrome OS - WebGL is the only cross-platform API to program the GPU - Google IO announcement - Chrome on Android will soon launch with WebGL © Copyright Khronos Group, 2013 - Page 24
  25. 25. Sectional Anatomy: MR Knee • //sectional-anatomy.org/ © Copyright Khronos Group, 2013 - Page 25
  26. 26. Sectional Anatomy: MR Knee • //sectional-anatomy.org/ © Copyright Khronos Group, 2013 - Page 26
  27. 27. Cross-OS Portability HTML/CSS SDK C/C++ Dalvik (Java) HTML/CSS Objective C HTML/CSS HTML5 provides cross platform portability. GPU accessibility through WebGL available soon on ~90% mobile systems C# Preferred development environments not designed for portability DirectX Native code is portablebut apps must cope with different available APIs and libraries © Copyright Khronos Group, 2013 - Page 27
  28. 28. WebGL First Wave Application Categories • Maps and Navigation • Modeling Tools and Repositories • Games • 3D Printing • Visualization • Music Videos and Promotion • Education • Photo Editors • Music Visualizers • Vision/Video Processing © Copyright Khronos Group, 2013 - Page 28
  29. 29. WebCL – Parallel Computing for the Web • JavaScript bindings to OpenCL APIs - Enables initiation of Kernels written in OpenCL C within the browser http://www.youtube.com/user/SamsungSISA#p/a/u/1/9Ttux1A-Nuc © Copyright Khronos Group, 2013 - Page 29
  30. 30. 3D Needs a Transmission Format! • Compression and streaming of 3D assets becoming essential - Mobile and connected devices need access to increasingly large asset databases • 3D is the last media type to define a compressed format - 3D is more complex – diverse asset types and use cases • Needs to be royalty-free - Avoid an ‘internet video codec war’ scenario • Eventually enable hardware implementations of successful codecs - High-performance and low power – but pragmatic adoption strategy is key Audio Video Images 3D MP3 H.264 JPEG ? ! An effective and widely adopted codec ignites previously unimagined opportunities for a media type © Copyright Khronos Group, 2013 - Page 30
  31. 31. glTF Goals • Binary file format for efficient transmission for 3D assets - Reduce network bandwidth and minimize client processing overhead • Run-time neutral - DO NOT IMPLY OR MANDATE ANY RUN-TIME BEHAVIOR - Can be used by any app or run-time – usually WebGL accelerated • Scalable to handle compression and streaming - Though baseline format does not include compression • ‘Direct load efficiency’ for WebGL - Little or NO processing to drop glTF data into WebGL client • Carry conditioned data from any authoring format - Prototyping and optimizing efficient handling of COLLADA assets Authoring Playback A standards-based content pipeline for rich native and Web 3D applications © Copyright Khronos Group, 2013 - Page 31
  32. 32. COLLADA and glTF Open Source Ecosystem OpenCOLLADA Importer/Exporter and COLLADA Conformance Tests On GitHUB Tool Interop COLLADA2GLTF Translator Other authoring formats https://github.com/KhronosGroup/glTF Web-based Tools Pervasive WebGL deployment https://github.com/KhronosGroup/OpenCOLLADA https://github.com/KhronosGroup/COLLADA-CTS Three.js glTF Importer. Rest3D initiative © Copyright Khronos Group, 2013 - Page 32
  33. 33. WebGL as Test-bed for 3D Asset Compression • Integrating and benchmarking 3D geometry compression formats with glTF - Baseline is GZIP - Open3DGC - implementation of the MPEG-SC3DMC - Scalable Complexity 3D Mesh Compression codec - WebGL-loader is Google lightweight compression for WebGL content Model COLLADA XML glTF+webgl-loader gzip raw utf8:42k • JSON:12k • 336k 106k utf8:8747k • JSON:753k 56763k 7378k gzip utf8:34k •JSON:2kb • 54k • 9500k glTF+Open3DGC ascii raw ascii:29k • JSON:11k • 36k utf8:1325k • JSON:29k • 1354k gzip ascii:19k • JSON:2k • 40k ascii:7793k • JSON:587k • 8380k glTF+Open3DGC binary raw bin:18k • JSON:11k • 21k ascii:1433k • JSON:29k • 1462k raw bin •gzip JSON • bin:18k • JSON:2k • 29k bin:3205k • JSON:589k • 3794k 20k bin:3205k • JSON:29k • 3234k © Copyright Khronos Group, 2013 - Page 33
  34. 34. Compression Example Results Overview • Early days – Khronos embarking on methodical analysis using glTF as test-bed • For mobile - need to balance file size AND decompression processing - Extensive processing can take more time/power than transmission • OpenCTM is promising but LZMA is very processor intensive - Work may lead to LZMA in hardware? © Copyright Khronos Group, 2013 - Page 34
  35. 35. Texture Compression is Key • Texture compression saves precious resources - Network bandwidth, device memory space AND device memory bandwidth • Developers need the same texture compression EVERYWHERE - Otherwise portable apps – such as WebGL need multiple copies of same texture Quality ASTC Royalty-free BUT only optional in ES. Only 4bpp | 3 channel No alpha support NOT Royalty-free. Platform Fragmentation DXTC/S3TC Windows ETC1 Mandated in Android Froyo (400M devices) ETC2 / EAC MANDATED in OpenGL ES 3.0 OpenGL 4.3 Royalty-free Backward compatible with ETC1 ETC2: 4bpp | 3 channel EAC: 4 (8) bpp | 1(2) channel COMBINED: RGBA 8bpp | 4 channel Does not have 1-2 bit compression WITH ALPHA PVRTC OpenGL ES 3.0 and OpenGL 4.3 extensions -> Core once proven Royalty-free Best quality. Independent control of bit-rate and # channels 1 to 4 channel 1-8bpp in fine steps Pervasive Deployment iOS 2008-2010 2012-2013 2014-> © Copyright Khronos Group, 2013 - Page 35
  36. 36. ASTC – Universal Texture Standard • Adaptive Scalable Texture Compression (ASTC) - Quality significantly exceeds S3TC or PVRTC at same bit rate • Industry-leading orthogonal compression rate and format flexibility - 1 to 4 color components: R / RG / RGB / RGBA - Choice of bit rate: from 8bpp to <1bpp in fine steps • ASTC is royalty-free and so is available to be universally adopted - Shipping as OpenGL/OpenGL ES extension today for industry feedback Original 24bpp 8bpp ASTC Compression 3.56bpp 2bpp © Copyright Khronos Group, 2013 - Page 36
  37. 37. Conclusion • Hardware acceleration is a complex application domain and needs multiple standards across diverse domains • Advances in SOC silicon processing and associated APIs to access them are about to enable mobile devices to truly meet user expectations • Now is a good time to get involved with the standards initiatives that effect your business • These slides and more details at www.khronos.org © Copyright Khronos Group, 2013 - Page 37

×