Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

HSA Overview


Published on

Overview of Heterogeneous System Architecture

Published in: Technology

HSA Overview

  2. 2. INTRODUCING HETEROGENEOUS SYSTEM ARCHITECTURE (HSA)HSA is a purpose designed architecture to enable thesoftware ecosystem to combine and exploit thecomplementary capabilities of sequential programmingelements (CPUs) and parallel processing elements (such asGPUs) to deliver new capabilities to users that go beyondthe traditional usage scenariosAMD is making HSA an open standard to jumpstart theecosystem2 | Heterogeneous System Architecture | June 2012
  3. 3. EFFECTIVE COMPUTE OFFLOAD IS MADE EASY BY HSA APP Accelerated Software Accelerated Processing Unit Applications Graphics Workloads Data Parallel Workloads Serial and Task Parallel Workloads3 | Heterogeneous System Architecture | June 2012
  4. 4. AMD HSA FEATURE ROADMAP Physical Optimized Architectural System Integration Platforms Integration Integration Integrate CPU & GPU GPU Compute C++ Unified Address Space GPU compute context in silicon support for CPU and GPU switch GPU uses pageable Unified Memory HSA Memory GPU graphics pre- system memory via Controller Management Unit emption CPU pointers Common Bi-Directional Power Fully coherent memory Manufacturing Mgmt between CPU Quality of service between CPU & GPU Technology and GPU4 | Heterogeneous System Architecture | June 2012
  5. 5. HSA COMPLIANT FEATURES Optimized Platforms Support OpenCL C++ directions and Microsoft’s upcoming C++ AMP language. GPU Compute C++ This eases programming of both CPU and GPU working together to process support parallel workloads, such as Computer Vision, Video Encoding/Transcoding, etc. CPU and GPU can share system memory. This means all system memory is HSA Memory accessible by both CPU or GPU, depending on need. In today’s world, only a Management Unit subset of system memory can be used by the GPU. Bi-Directional Power Enables “power sloshing” where CPU and GPU are able to dynamically lower or Mgmt between CPU raise their power and performance, depending on the activity and which one is and GPU more suited to the task at hand.5 | Heterogeneous System Architecture | June 2012
  6. 6. HSA COMPLIANT FEATURES Architectural Integration The unified address space provides ease of programming for developers to create Unified Address Space for CPU and GPU applications. For HSA platforms, a pointer is really a pointer and does not require separate memory pointers for CPU and GPU. GPU uses pageable The GPU can take advantage of the CPU virtual address space. With pageable system memory via system memory, the GPU can reference the data directly in the CPU domain. In CPU pointers prior architectures, data had to be copied between the two spaces or page-locked prior to use. Allows for data to be cached by both the CPU and the GPU, and referenced by Fully coherent memory either. In all previous generations, GPU caches had to be flushed at command between CPU & GPU buffer boundaries prior to CPU access. And unlike discrete GPUs, the CPU and GPU in an APU share a high speed coherent bus.6 | Heterogeneous System Architecture | June 2012
  7. 7. FULL HSA FEATURES System Integration GPU tasks can be context switched, making the GPU a multi-tasker. Context GPU compute context switching means faster application, graphics and compute switch interoperation. Users get a snappier, more interactive experience. As more applications enjoy the performance and features of the GPU, it is important GPU graphics pre- that interactivity of the system is good. This means low latency access to the GPU emption from any process. With context switching and pre-emption, time criticality is added to the tasks Quality of service assigned to the processors. Direct access to the hardware for multi-users or multiple applications are either prioritized or equalized.7 | Heterogeneous System Architecture | June 2012
  8. 8. UNLEASHING DEVELOPER INNOVATIONPROBLEM HSA + SDKs = SOLUTION Productivity & Performance with low Power Few M Few K Wide range of GPU/HW blocks hard to program HSA Differentiated Not all workloads accelerate Apps Coders Experiences Developer Return ~100K ~200+ Significant GPU niche(Differentiation in Apps Coders Value Performance, Developers historically program CPUs Power, Features, ~30+M ~4M+ Good User Time2Market) CPU Apps Experiences Coders Developer Investment (Effort, Time, New skills) 8 | Heterogeneous System Architecture | June 2012
  9. 9. HSA SOLUTION STACK How we deliver the HSA value proposition Application SW Developers Domain Specific Libs Overall Vision: Standard SW (Bolt, OpenCV,…) – Make GPU easily accessible OpenCL DirectX Other  Support mainstream languages Runtime Runtime Runtime  Expandable to domain specific languages Legacy – Make compute offload efficient HSA Runtime User Mode  Direct path to GPU (avoid Graphics Drivers overhead) HSAIL  Eliminate memory copy HW Vendors Finalizer  Low-latency dispatch Custom Drivers GPU ISA – Make it ubiquitous Other  Drive HSA as a standard through Differentiated HW CPU(s) GPU(s) Accelerators HSA Foundation  Open Source key components 9 | Heterogeneous System Architecture | June 2012
  10. 10. HSA INTERMEDIATE LAYER - HSAIL HSAIL is a virtual ISA for parallel programs  Finalized to native ISA by a JIT compiler or “Finalizer” Allow rapid innovations in native GPU architectures  HSAIL will be constant across implementations Explicitly parallel  Designed for data parallel programming Support for exceptions, virtual functions, and other high level language features Syscall methods  GPU code can call directly to system services, IO, printf, etc Debugging support10 | Heterogeneous System Architecture | June 2012
  11. 11. C++ AMP C++ AMP: a data parallel programming model initiated by Microsoft for accelerators  First announced at the 2011 AFDS C++ based higher level programming model with advanced C++11 features Single source model to well integrate host and device programming Implicit programming model that is “future proofed” to enable HSA features, e.g. avoiding host-to-device copies A C++ AMP implementation available from the Microsoft Visual Studio 11 suite under a beta release11 | Heterogeneous System Architecture | June 2012
  12. 12. C++ AMP AND HSA Compute-focused efficient HSA implementation to replace a graphics-centric implementation for C++ AMP  E.g. low latency dispatch, HSAIL enabled The shared virtual memory in HSA eliminates the data copies between host and device in existing C++ AMP programs without any source changes. Additional advanced C++ features on GPU, e.g.  More data types  Function calls  Virtual functions  Arbitrary control flow  Exceptional handling  Device and platform atomics12 | Heterogeneous System Architecture | June 2012
  13. 13. OPENCL™ AND HSA  HSA is an optimized platform architecture for OpenCL™  Not an alternative to OpenCL™  OpenCL™ on HSA will benefit from  Avoidance of wasteful copies  Low latency dispatch  Improved memory model  Pointers shared between CPU and GPU  HSA also exposes a lower level programming interface, for those that want the ultimate in control and performance  Optimized libraries may choose the lower level interface13 | Heterogeneous System Architecture | June 2012
  14. 14. HSA TAKING PLATFORM TO PROGRAMMERS Balance between CPU and GPU for performance and power efficiency Make GPUs accessible to wider audience of programmers  Programming models close to today’s CPU programming models  Enabling more advanced language features on GPU  Shared virtual memory enables complex pointer-containing data structures (lists, trees, etc) and hence more applications on GPU  Kernel can enqueue work to any other device in the system (e.g. GPU->GPU, GPU->CPU) • Enabling task-graph style algorithms, Ray-Tracing, etc Clearly defined HSA memory model enables effective reasoning for parallel programming HSA provides a compatible architecture across a wide range of programming models and HW implementations.14 | Heterogeneous System Architecture | June 2012
  15. 15. THE HSA FOUNDATION - BRINGING ABOUT THE NEXT GENERATION PLATFORM An open standardization body to bring about broad industry support for Heterogeneous Computing via the full value chain Silicon IP to ISV. GPU computing as a first class co-processor to the CPU through architecture definition Architectural support for special purpose hardware accelerators ( Rasterizer, Security Processors, DSP, etc.) Own and evolve the specifications and conformance suite Bring to market strong development solutions to drive innovative advanced content and applications Cultivate programing talent via HSA developer training and academic programs15 | Heterogeneous System Architecture | June 2012
  16. 16. THANK YOU16 | Heterogeneous System Architecture | June 2012
  17. 17. Disclaimer & Attribution The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. There is no obligation to update or otherwise correct or revise this information. However, we reserve the right to revise this information and to make changes from time to time to the content hereof without obligation to notify any person of such revisions or changes. NO REPRESENTATIONS OR WARRANTIES ARE MADE WITH RESPECT TO THE CONTENTS HEREOF AND NO RESPONSIBILITY IS ASSUMED FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. ALL IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE ARE EXPRESSLY DISCLAIMED. IN NO EVENT WILL ANY LIABILITY TO ANY PERSON BE INCURRED FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. AMD, the AMD arrow logo, and combinations thereof are trademarks of Advanced Micro Devices, Inc. All other names used in this presentation are for informational purposes only and may be trademarks of their respective owners. OpenCL is a trademark of Apple Inc. used by permission by Khronos. © 2012 Advanced Micro Devices, Inc.17 | Heterogeneous System Architecture | June 2012