Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
© Copyright 2012-2016 HSA Foundation. All Rights Reserved. 1
ENABLING EFFICIENT HETEROGENEOUS
PROCESSING THROUGH COHERENCY...
© Copyright 2012-2016 HSA Foundation. All Rights Reserved. 2
AGENDA
Heterogeneous Programming
Problem
About HSA
 Founding...
© Copyright 2012-2016 HSA Foundation. All Rights Reserved. 3
THE PROBLEM
HETEROGENEOUS APPLICATION DEVELOPMENT
© Copyright 2012-2016 HSA Foundation. All Rights Reserved. 4
WHAT IS A HETEROGENEOUS SYSTEM?
A CPU+ System
 +GPU
 +Visio...
© Copyright 2012-2016 HSA Foundation. All Rights Reserved. 5
WHAT’S THE PROBLEM?
Heterogeneous processors are
widely avail...
© Copyright 2012-2016 HSA Foundation. All Rights Reserved. 6
HSA TECHNOLOGY
Developing a new platform for heterogeneous sy...
© Copyright 2012-2016 HSA Foundation. All Rights Reserved. 7
ABOUT HSA
HETEROGENEOUS SYSTEM ARCHITECTURE FOUNDATION
© Copyright 2012-2016 HSA Foundation. All Rights Reserved. 8
HSA FOUNDATION
A Non-Profit Foundation Founded in June 2012
...
© Copyright 2012-2016 HSA Foundation. All Rights Reserved. 9
HSA – AN OPEN PLATFORM
Open Architecture, membership open to ...
© Copyright 2012-2016 HSA Foundation. All Rights Reserved. 10
MEMBERS DRIVING HSA
Founders
Promoters
Supporters
Contributo...
© Copyright 2012-2016 HSA Foundation. All Rights Reserved. 11
HSAF HARDWARE CONTRIBUTIONS
© Copyright 2012-2016 HSA Foundation. All Rights Reserved. 12
JIM MCGREGOR, TIRIAS RESEARCH
…HSAF has had a profound impac...
© Copyright 2012-2016 HSA Foundation. All Rights Reserved. 13
THE PLATFORM PILLARS OF HSA
Unified memory
(SVM)
User mode
d...
© Copyright 2012-2016 HSA Foundation. All Rights Reserved. 14
HSAF SOFTWARE
INFRASTRUCTURE
© Copyright 2012-2016 HSA Foundation. All Rights Reserved. 15
THE VISION
Make Heterogeneous Programming Much Easier
Single...
© Copyright 2012-2016 HSA Foundation. All Rights Reserved. 16
MOTIVATION (TODAY’S PICTURE)
Application OS
Transfer
buffer ...
© Copyright 2012-2016 HSA Foundation. All Rights Reserved. 17
WITH SHARED VIRTUAL MEMORY
Application OS
Transfer
buffer to...
© Copyright 2012-2016 HSA Foundation. All Rights Reserved. 18
WITH COHERENT CACHE MEMORY
Application OS
Transfer
buffer to...
© Copyright 2012-2016 HSA Foundation. All Rights Reserved. 19
SIGNALS
HSA agents support signaling
 creation/destruction ...
© Copyright 2012-2016 HSA Foundation. All Rights Reserved. 20
WITH SIGNALING
Application OS
Transfer
buffer to GPU Copy/Ma...
© Copyright 2012-2016 HSA Foundation. All Rights Reserved. 21
HSA QUEUING MODEL
User mode queuing
 Low latency dispatch
...
© Copyright 2012-2016 HSA Foundation. All Rights Reserved. 22
WITH USER MODE QUEUING
Application CPU OS Agent GPU/DSP
Tran...
© Copyright 2012-2016 HSA Foundation. All Rights Reserved. 23
FINAL PICTURE: SVM + CACHE COHERENCY +
SIGNALS + USER MODE Q...
© Copyright 2012-2016 HSA Foundation. All Rights Reserved. 24
HSA COMMAND AND DISPATCH FLOW
Application
A
Application
B
Ap...
© Copyright 2012-2016 HSA Foundation. All Rights Reserved. 25
HSA INTERMEDIATE LANGUAGE (HSAIL)
BYTECODE FOR HETEROGENEOUS...
© Copyright 2012-2016 HSA Foundation. All Rights Reserved. 26
THE PORTABILITY CHALLENGE
CPU ISAs – Backwards Compatible
 ...
© Copyright 2012-2016 HSA Foundation. All Rights Reserved. 27
HSA INTERMEDIATE LAYER — HSAIL
Virtual ISA for parallel prog...
© Copyright 2012-2016 HSA Foundation. All Rights Reserved. 28
HSAIL FEATURES
A Virtual Explicitly Parallel ISA
 ~135 Opco...
© Copyright 2012-2016 HSA Foundation. All Rights Reserved. 29
PORTABLE APPLICATIONS PROGRAMMING
FROM OPENCL TO C++17
© Copyright 2012-2016 HSA Foundation. All Rights Reserved. 30
HSA OPEN SOURCE SOFTWARE
Full open source Linux stack: tools...
© Copyright 2012-2016 HSA Foundation. All Rights Reserved. 31
ARCHITECTED PROFILING AND DEBUGGING
Profiling
• Common timel...
Python on GPU’s
Numba: NumPy aware python compiler
 Open source. Avail on Github
 Sponsored by Continuum Analytics
Direc...
© Copyright 2012-2016 HSA Foundation. All Rights Reserved. 33
PERFORMANCE RESULTS
Python Geographic Locality
What is the distance from a set of
points to a target point
 How many points are within a spec...
© Copyright 2012-2016 HSA Foundation. All Rights Reserved. 35
GEN1: FIR & AES
FIR is a memory-intensive streaming workload...
© Copyright 2012-2016 HSA Foundation. All Rights Reserved. 36
HSA PRODUCT UPDATES - AMD
FROM HSA FOUNDATION MEMBER COMPANI...
37
Heterogeneous System Architecture Is At The Core of ROCm
Rich Foundation for HPC and Ultrascale Computing support our A...
38
ROCm Enabled Hardware 2016
S9150W9100
RADEON R9 Nano S9300x2 RADEON RX480 ( Oct ROCm 1.3)
S9170
AMD Proprietary and Con...
© Copyright 2012-2016 HSA Foundation. All Rights Reserved. 39
HXGPT ANNOUNCEMENT UNITY处理器架构
Working silicon for Unity Arch...
© Copyright 2012-2016 HSA Foundation. All Rights Reserved. 40
HXGPT ANNOUNCEMENT
基于HSAIL的深度学习-神经网络开源计划
Open Source Machine...
© Copyright 2012-2016 HSA Foundation. All Rights Reserved. 41
IMAGINATION HSA COMPLIANT IP (COMING SOON)
We will be rollin...
Copyright © MediaTek Inc. All rights reserved.
HMP – 2013
Heterogeneous Multi-
Processing
HC – 2015
Heterogeneous
Computin...
© Copyright 2012-2016 HSA Foundation. All Rights Reserved. 43
CONCLUSIONS
© Copyright 2012-2016 HSA Foundation. All Rights Reserved. 44
THE RESULT
Make Heterogeneous Programming Much Easier
Single...
© Copyright 2012-2016 HSA Foundation. All Rights Reserved. 45
SUMMARY
2012 goal of changing chip H/W architecture achieved...
© Copyright 2012-2016 HSA Foundation. All Rights Reserved. 46
V1.2 SPECIFICATIONS IN PROGRESS
Improved
Data Interop
Fixed ...
© Copyright 2012-2016 HSA Foundation. All Rights Reserved. 47
JOIN US!
WWW.HSAFOUNDATION.COM
THANK YOU
Upcoming SlideShare
Loading in …5
×

"Enabling Efficient Heterogeneous Processing Through Coherency," a Presentation from the HSA Foundation

318 views

Published on

For the full video of this presentation, please visit:
http://www.embedded-vision.com/platinum-members/embedded-vision-alliance/embedded-vision-training/videos/pages/sept-2016-member-meeting-hsa-foundation

For more information about embedded vision, please visit:
http://www.embedded-vision.com

Dr. John Glossner, President of the HSA Foundation and CEO of GPT-US, delivers the presentation "Enabling Efficient Heterogeneous Processing Through Coherency" at the September 2016 Embedded Vision Alliance Member Meeting. Glossner describes the organization's goals and deliverables for enabling heterogenous programming.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

"Enabling Efficient Heterogeneous Processing Through Coherency," a Presentation from the HSA Foundation

  1. 1. © Copyright 2012-2016 HSA Foundation. All Rights Reserved. 1 ENABLING EFFICIENT HETEROGENEOUS PROCESSING THROUGH COHERENCY: AN HSA FOUNDATION UPDATE EMBEDDED VISION ALLIANCE MEMBER MEETING DR. JOHN GLOSSNER, PRESIDENT, HSA FOUNDATION / CEO GPT-US HARMONIZING THE INDUSTRY AROUND HETEROGENEOUS COMPUTING
  2. 2. © Copyright 2012-2016 HSA Foundation. All Rights Reserved. 2 AGENDA Heterogeneous Programming Problem About HSA  Founding  Member Companies  Open / Royalty Free Solutions HSA Solution  Hardware  Software  Infrastructure  HSAIL Portable Applications Programming  C/C++, Python, OpenCL Performance Results  AMD Products and Announcements  AMD, GPT, Imagination, MediaTek What’s Next
  3. 3. © Copyright 2012-2016 HSA Foundation. All Rights Reserved. 3 THE PROBLEM HETEROGENEOUS APPLICATION DEVELOPMENT
  4. 4. © Copyright 2012-2016 HSA Foundation. All Rights Reserved. 4 WHAT IS A HETEROGENEOUS SYSTEM? A CPU+ System  +GPU  +Vision Processors  +DSP  +FPGA  +Accelerators Typically  Different development tools  Different memory spaces  Communication via I/O only (data copies) Unified Coherent Memory CPU 1 CPU N… CPU 2 GPU 1 GPU 2 GPU 3 GPU M DSP ACC…
  5. 5. © Copyright 2012-2016 HSA Foundation. All Rights Reserved. 5 WHAT’S THE PROBLEM? Heterogeneous processors are widely available Huge compute capability  Acceleration Units (GPU, DSP, FPGA)  CPU Cluster-based computer Coherency  Established in high-end  Migrating to mainstream mobile and consumer BUT… Heterogeneous programming models not standardized Multi-core/device applications difficult to optimize or scale Non-portable application developer ecosystems HSAF brings compute app abstraction to heterogeneous platforms
  6. 6. © Copyright 2012-2016 HSA Foundation. All Rights Reserved. 6 HSA TECHNOLOGY Developing a new platform for heterogeneous systems  Reducing Heterogeneous System Complexity  Provides software ecosystem Abstracts away complexities of heterogeneous systems  Cache coherent shared virtual memory hardware  Removes time consuming operating system calls  Runs at user level Exploiting Compute Capabilities  Single source programming  Control and compute code reside in the same file or project
  7. 7. © Copyright 2012-2016 HSA Foundation. All Rights Reserved. 7 ABOUT HSA HETEROGENEOUS SYSTEM ARCHITECTURE FOUNDATION
  8. 8. © Copyright 2012-2016 HSA Foundation. All Rights Reserved. 8 HSA FOUNDATION A Non-Profit Foundation Founded in June 2012  Programming heterogeneous systems (“CPU +” era) Industry standards body  V1.1 Specifications released May 2016  Backward compatibility with V1.0 hardware First compatible hardware  AMD Measured Performance Improvements
  9. 9. © Copyright 2012-2016 HSA Foundation. All Rights Reserved. 9 HSA – AN OPEN PLATFORM Open Architecture, membership open to all  HSA Programmers Reference Manual  HSA Platform System Architecture  HSA Runtime  HSA Multivendor Specification Royalty Free  IP, Specifications, and APIs Open Source  Tools, Compilers, etc.  Runtime implementations  Tests
  10. 10. © Copyright 2012-2016 HSA Foundation. All Rights Reserved. 10 MEMBERS DRIVING HSA Founders Promoters Supporters Contributors Academic
  11. 11. © Copyright 2012-2016 HSA Foundation. All Rights Reserved. 11 HSAF HARDWARE CONTRIBUTIONS
  12. 12. © Copyright 2012-2016 HSA Foundation. All Rights Reserved. 12 JIM MCGREGOR, TIRIAS RESEARCH …HSAF has had a profound impact on hardware architectures … even Intel‘s Cache Coherency Unified memory (Shared Virtual Memory)
  13. 13. © Copyright 2012-2016 HSA Foundation. All Rights Reserved. 13 THE PLATFORM PILLARS OF HSA Unified memory (SVM) User mode dispatch Platform atomics Architected Signals Formal Relaxed Memory Model Cache Coherency Quality Of Service Some non-HSA platforms support a few of these platform features In combination they form a well-rounded base for application programmability
  14. 14. © Copyright 2012-2016 HSA Foundation. All Rights Reserved. 14 HSAF SOFTWARE INFRASTRUCTURE
  15. 15. © Copyright 2012-2016 HSA Foundation. All Rights Reserved. 15 THE VISION Make Heterogeneous Programming Much Easier Single source programming1 Any programming language2 Eliminate data copies3 Common address space4 Standardized command submission to Agents (GPU / DSP)5 Eliminate software layers between application and hardware6 ISA agnostic for CPU, GPU, DSP, and more7 Open source software stack8 Single tool chain C++, Python, JavaScript, … Performance! A pointer is a pointer A common dispatch language Efficient x86, ARM, MIPS, PowerVR, Mali, Adreno, GPT, … Open Access! High performance Low power Extensible to other accelerators on the SoC
  16. 16. © Copyright 2012-2016 HSA Foundation. All Rights Reserved. 16 MOTIVATION (TODAY’S PICTURE) Application OS Transfer buffer to GPU Copy/Map Memory Queue Job Schedule Job Start Job Finish Job Schedule Application Get Buffer Copy/Map Memory Agent GPU/DSP
  17. 17. © Copyright 2012-2016 HSA Foundation. All Rights Reserved. 17 WITH SHARED VIRTUAL MEMORY Application OS Transfer buffer to GPU Copy/Map Memory Queue Job Schedule Job Start Job Finish Job Schedule Application Get Buffer Copy/Map Memory Agent GPU/DSP
  18. 18. © Copyright 2012-2016 HSA Foundation. All Rights Reserved. 18 WITH COHERENT CACHE MEMORY Application OS Transfer buffer to GPU Copy/Map Memory Queue Job Schedule Job Start Job Finish Job Schedule Application Get Buffer Copy/Map Memory Agent GPU/DSP
  19. 19. © Copyright 2012-2016 HSA Foundation. All Rights Reserved. 19 SIGNALS HSA agents support signaling  creation/destruction using runtime APIs Any Agent can access signals  Wake up agents waiting upon the object  Query/Wait for current object  Allows conditions Hardware-assisted signaling and synchronization primitives  Memory semantics synchronizes work items processed by HSA agents  Synchronizes execution between threads on HSA agents and host CPU One-to-one and one-to-many signaling  System Software, runtime & application SW use infrastructure to build higher-level synchronization primitives like mutexes, semaphores, … Advantages  Asynchronous events between agents  Doesn’t require CPU  Common idiom for work offload  Low power waiting
  20. 20. © Copyright 2012-2016 HSA Foundation. All Rights Reserved. 20 WITH SIGNALING Application OS Transfer buffer to GPU Copy/Map Memory Queue Job Schedule Job Start Job Finish Job Schedule Application Get Buffer Copy/Map Memory Agent GPU/DSP
  21. 21. © Copyright 2012-2016 HSA Foundation. All Rights Reserved. 21 HSA QUEUING MODEL User mode queuing  Low latency dispatch  Application dispatches directly  No OS or driver required Architected Queuing Layer (AQL)  Single compute dispatch path for all hardware  No driver translation, direct to hardware  Standard across vendors!  Guaranteed backward compatibility Allows for dispatch to queue from any agent  CPU or GPU or DSP or FPGA, etc. Agent self enqueue enables  Recursion, Tree traversal, Wavefront reforming Requires coherency and shared virtual memory
  22. 22. © Copyright 2012-2016 HSA Foundation. All Rights Reserved. 22 WITH USER MODE QUEUING Application CPU OS Agent GPU/DSP Transfer buffer to GPU Copy/Map Memory Queue Job Schedule Job Start Job Finish Job Schedule Application Get Buffer Copy/Map Memory
  23. 23. © Copyright 2012-2016 HSA Foundation. All Rights Reserved. 23 FINAL PICTURE: SVM + CACHE COHERENCY + SIGNALS + USER MODE QUEUES Application OS Agent GPU/DSP Queue Job Start Job Finish Job
  24. 24. © Copyright 2012-2016 HSA Foundation. All Rights Reserved. 24 HSA COMMAND AND DISPATCH FLOW Application A Application B Application C Optional Dispatch Buffer Agent HARDWARE Hardware Queue A A A Hardware Queue B B B Hardware Queue C C C C C HW view:  HW / microcode controlled  HW scheduling  Architected Queuing Language (AQL)  HW-managed protection SW view:  User-mode dispatches to HW  No OS Driver overhead  Low dispatch times  Host & Kernel Agent dispatch APIs
  25. 25. © Copyright 2012-2016 HSA Foundation. All Rights Reserved. 25 HSA INTERMEDIATE LANGUAGE (HSAIL) BYTECODE FOR HETEROGENEOUS SYSTEMS
  26. 26. © Copyright 2012-2016 HSA Foundation. All Rights Reserved. 26 THE PORTABILITY CHALLENGE CPU ISAs – Backwards Compatible  ISA innovations added incrementally (ie NEON, AVX, etc)  ISA retains backwards-compatibility with previous generation  HSA instruction-set architectures: ARM, GPT, MIPS, and x86 Kernel Agent ISAs – No Backwards Compatibility  GPU, DSP, DNN, Image Signal Processor, Custom Accelerators, etc.  Massive diversity of architectures in the market  Each vendor has own ISA - and often several in market at same time  Compatibility via APIs (OpenGL, DirectX, OpenCV)
  27. 27. © Copyright 2012-2016 HSA Foundation. All Rights Reserved. 27 HSA INTERMEDIATE LAYER — HSAIL Virtual ISA for parallel programs  Finalized to native ISA by a compiler  Dynamic or Offline  ISA independent by design Explicitly parallel  Designed for data parallel programming Multiple HLL Support  Exceptions, virtual functions, etc.  Java, C++, OpenMP, C++, Python, etc main() { … #pragma omp parallel for for (int i=0;i<N; i++) { } … } High-Level Compiler BRIG Finalizer Component ISA Host ISA
  28. 28. © Copyright 2012-2016 HSA Foundation. All Rights Reserved. 28 HSAIL FEATURES A Virtual Explicitly Parallel ISA  ~135 Opcodes  RISC Register-based Load/Store  Arithmetic  IEEE 754 Floating Point including 16-bit  Integer (32/64-bit)  DSP fixed point  Packed / SIMD  f16x2, f16x4, f16x8, f32x2, f32x4, f64x2  signed/unsigned 8x4, 8x8, 8x16, 16x2, 16x4, 16x8, 32x2, 32x4, 64x2  Branches & Function Calls  Atomic Operations Wavefronts  1, 2, 4, 8, 16, 32, or 64 SIMD lanes  Lanes can be active or inactive Memory  Shared Virtual Memory Exceptions ld_global_u64 $d0, [$d6 + 120] ; $d0= load($d6+120) add_u64 $d1, $d0, 24 ; $d1= $d2+24
  29. 29. © Copyright 2012-2016 HSA Foundation. All Rights Reserved. 29 PORTABLE APPLICATIONS PROGRAMMING FROM OPENCL TO C++17
  30. 30. © Copyright 2012-2016 HSA Foundation. All Rights Reserved. 30 HSA OPEN SOURCE SOFTWARE Full open source Linux stack: tools, compilers and OS support  Allows a single shared implementation for many components  Enables university research and industry collaboration in all areas  Because it’s the right thing to do Many open source applications & frameworks  Native Languages: HCC (C++17), LLVM, GCC, CLOC/SNACK, Python, Java, …  Tools, API’s, Frameworks: CodeXL, POCL, Docker, OpenMP, OKRA, HIP, …  Research: Multi2sim, HSAEmu, gem5, ViennaCL, …  And many applications using OCL 2.x or HSA stack  Github & Bitbucket repositories have much, much more… gccbrig  Any processor with a gcc machine description can finalize HSAIL
  31. 31. © Copyright 2012-2016 HSA Foundation. All Rights Reserved. 31 ARCHITECTED PROFILING AND DEBUGGING Profiling • Common timeline across HSA accelerators & system • Common HSA hardware events (+ HW specific) • Common HSA profiling counter definitions (+ HW specific counters) • Consistent profiling methodology for all HSA accelerators Debugging • Breakpoints • Exception handling • Single-step • Tracing • HSAIL Disassembly • Emulation support • Libraries • Plugins
  32. 32. Python on GPU’s Numba: NumPy aware python compiler  Open source. Avail on Github  Sponsored by Continuum Analytics Direct HSA Support Automatic Parallelization  2x-200x speedup
  33. 33. © Copyright 2012-2016 HSA Foundation. All Rights Reserved. 33 PERFORMANCE RESULTS
  34. 34. Python Geographic Locality What is the distance from a set of points to a target point  How many points are within a specified range Numba can auto-parallelize user universal functions for HSA  Ufunc’s broadcast operation over elements of a NumPy array  ZERO HSA developer knowledge required 1M Points  >8X speedup https://github.com/ContinuumIO/Numba-HSA-Webinar
  35. 35. © Copyright 2012-2016 HSA Foundation. All Rights Reserved. 35 GEN1: FIR & AES FIR is a memory-intensive streaming workload AES is a compute-intensive streaming workload CL12 – cl_mem buffer  Copy to/from the device CL20 – SVM buffer – Coarse Grain Sync  Copy to/from SVM  Data copy cannot be avoided, since the space for SVM is limited HSA – Unified Memory Space – Fine Grained Sync  Regular pointer  No explicit copy Results  HSA compute abstraction  NO performance penalty Note: Not all algorithms run faster Benchmark: NUCAR HeteroMark
  36. 36. © Copyright 2012-2016 HSA Foundation. All Rights Reserved. 36 HSA PRODUCT UPDATES - AMD FROM HSA FOUNDATION MEMBER COMPANIES
  37. 37. 37 Heterogeneous System Architecture Is At The Core of ROCm Rich Foundation for HPC and Ultrascale Computing support our APU’s and Discreet GPU’s HSA Drives rich capabilities into the ROCm  Systems Architecture ‒ User Mode Queues ‒ Architected Queuing Language ‒ Flat memory Addressing ‒ Atomic Memory Transactions ‒ Process Concurrency & Preemption  HSA Runtime enables a programming language neutral systems interface  Supports standardized loader and linker interface ROCm: Radeon Open Compute Platform
  38. 38. 38 ROCm Enabled Hardware 2016 S9150W9100 RADEON R9 Nano S9300x2 RADEON RX480 ( Oct ROCm 1.3) S9170 AMD Proprietary and Confidential August 2016 AMD Embedded R-Series SOC AMD FX 98xx, A12-97xx
  39. 39. © Copyright 2012-2016 HSA Foundation. All Rights Reserved. 39 HXGPT ANNOUNCEMENT UNITY处理器架构 Working silicon for Unity Architecture Focus on out-of-order pipeline  Superscalar  Control flow See our demo  Image Processing Filter Before After
  40. 40. © Copyright 2012-2016 HSA Foundation. All Rights Reserved. 40 HXGPT ANNOUNCEMENT 基于HSAIL的深度学习-神经网络开源计划 Open Source Machine Learning HSAIL library  Deep Neural Network Library  Delivered in HSAIL Any HSA-Compatible platform can execute Optimized for hxGPT  Using gccbrig Development now underway
  41. 41. © Copyright 2012-2016 HSA Foundation. All Rights Reserved. 41 IMAGINATION HSA COMPLIANT IP (COMING SOON) We will be rolling out: • HSA across all MIPS I-class and P-class CPUs • HSA across all PowerVR GPUs • HSA compliant fabric solutions Coherent HSA-compliant SoC fabric PowerVR Video Encode PowerVR Camera ISP PowerVR Video Decode ROM Peripheral Bus DDR3/4 Bridge RAM PowerVR GX7200 Series6XT 2 cluster PowerVR GPU HSA-compliant eFuse DMAC Clock & Reset Control JTAG & Test PSU & Power Control TE & Crypto L2 cache PowerVR GX7200 Series6XT 2 cluster MIPS CPU HSA-compliant Display Pipeline PowerVR JPEG Encode OTP Ensigma RPU AFE Customer IP HDMI Tx & Rx USB3MIPINAND Peripherals GPIO; UART; I2C; I2S; SPI; SD Customer IP Customer IP & interfaces Imagination Smart Vision IP Platform
  42. 42. Copyright © MediaTek Inc. All rights reserved. HMP – 2013 Heterogeneous Multi- Processing HC – 2015 Heterogeneous Computing Tri-cluster 2016 Hybrid Tri-cluster Multi-Processing HSA Features Heterogeneous System Architecture LITTLE CPUs BIG CPUs LITTLE CPUs BIG CPUs GPU GPU Accelerators CoherentMemoryMMU Evolution of Heterogeneity at MediaTek Min CPUs Max CPUs GPU Mid CPUs Min CPUs Max CPUs Mid CPUs 42
  43. 43. © Copyright 2012-2016 HSA Foundation. All Rights Reserved. 43 CONCLUSIONS
  44. 44. © Copyright 2012-2016 HSA Foundation. All Rights Reserved. 44 THE RESULT Make Heterogeneous Programming Much Easier Single source programming1 Any programming language2 Eliminate data copies3 Common address space4 Standardized command submission to Agents (GPU / DSP)5 Eliminate software layers between application and hardware6 ISA agnostic for CPU, GPU, DSP, and more7 Open source software stack8 Single tool chain C++, Python, JavaScript, … Performance! A pointer is a pointer A common dispatch language Efficient x86, ARM, MIPS, PowerVR, Mali, Adreno, GPT, … Open Access! High performance Low power Extensible to other accelerators on the SoC
  45. 45. © Copyright 2012-2016 HSA Foundation. All Rights Reserved. 45 SUMMARY 2012 goal of changing chip H/W architecture achieved  Cache coherent shared virtual memory 2014-2015 S/W architecture to support H/W  March 2015 V1.0 specs  Programmed in any language (C++, Python, OpenCL) 2015-2016  May 2016 v1.1 specs  Multivendor support  Wider range of processors 2016 H/W platforms arriving  AMD’s Carrizo (Dell, Asus, Lenovo)  Licensable IP available
  46. 46. © Copyright 2012-2016 HSA Foundation. All Rights Reserved. 46 V1.2 SPECIFICATIONS IN PROGRESS Improved Data Interop Fixed function accelerators (e.g. FPGA) Local device memory Coarse grain memory Architected Debug BRIG, new linking formats Architecture Fully formalized memory model HSAIL Parallel loops Flexible API and access semantics Programming Models
  47. 47. © Copyright 2012-2016 HSA Foundation. All Rights Reserved. 47 JOIN US! WWW.HSAFOUNDATION.COM THANK YOU

×