SlideShare a Scribd company logo
1 of 30
Download to read offline
HSA Overview
GREG STONER
HSA FOUNDATION’S INITIAL FOCUS
Bring Accelerators forward as a first class processor
• Unified address space across all processors
• Operates in pageable system memory
• Full memory coherency between the CPU and GPU
• Fully defined relaxed consistency memory model
• User mode dispatch/scheduling
• Eliminate drivers from the dispatch path
• QOS through pre-emption and context switching
Attract Mainstream programmers
• Support broader set of languages beyond Traditional GP-GPU Langs
• Support for Task Parallel Runtimes & Nested Data Parallel
• Rich debugging and performance analysis support
Create a platform architecture for all accelerators
• Focused on the APU/SOC
© Copyright 2012 HSA Foundation. All Rights Reserved. 2
HSA FOUNDATION MEMBERSHIP – JUNE 2013
© Copyright 2012 HSA Foundation. All Rights Reserved.3
Founders
Promoters
Supporters
Contributors
Academic
Associates
DELIVERED VIA ROYALTY FREE STANDARDS
© Copyright 2012 HSA Foundation. All Rights Reserved. 4
 Royalty Free IP, Specifications and API’s.
 Three primary specifications are
 HSA Platform System Architecture Specification
 Focus on hardware requirements and low level system software
 Support Small Mode (32bit) and Large Mode ( 64bit)
 HSA Programmer Reference Manuel
 Definition of HSAIL Virtual ISA
 Binary format (BRIG)
 Compiler writers guide and Libraries developer guide
 HSA System Runtime Specification
AMD’S OPEN SOURCE COMMITMENT TO HSA
 We will open source our Linux execution and compilation stack
 Jump start the ecosystem
 Allow a single shared implementation where appropriate
 Enable university research in all areas
© Copyright 2012 HSA Foundation. All Rights Reserved. 5
Component Name AMD
Specific
Rationale
HSA Bolt Library No Enable understanding and debug
HSAIL Code Generator No Enable research
LLVM Contributions No Industry and academic collaboration
HSA Assembler No Enable understanding and debug
HSA Runtime No Standardize on a single runtime
HSA Finalizer Yes Enable research and debug
HSA Kernel Driver Yes For inclusion in linux distros
WHAT ARE THE PROBLEMS WE ARE TRYING TO SOLVE
 The SOC are quickly following into the
same many CPU core bottlenecks of the
PC.
 To move beyond this we need to look at
right processor(s) and/or execution device
for given workload at reasonable power
 While addressing the core issues of
 Easier to program
 Easier to optimize
 Easier to load balance
 High performance
 Lower power
© Copyright 2012 HSA Foundation. All Rights Reserved. 6
HSA TAKING PLATFORM TO PROGRAMMERS
 Balance between CPU and GPU for performance and power efficiency
 Make GPUs accessible to wider audience of programmers
 Programming models close to today’s CPU programming models
 Enabling more advanced language features on GPU
 Shared virtual memory enables complex pointer-containing data structures (lists, trees, etc.) and
hence more applications on GPU
 Kernel can enqueue work to any other device in the system (e.g. GPU->GPU, GPU->CPU)
• Enabling task-graph style algorithms, Ray-Tracing, etc
 Clearly defined HSA memory model enables effective reasoning for parallel programming
 HSA provides a compatible architecture across a wide range of programming models and HW
implementations.
Design criteria for HSA platform infrastructure
 HSA is defined through HW requirements that enforce a set of HW compliance criteria SW can depend on
 Comprehensive Memory model (well-defined visibility and ordering rules for transactions)
 Shared Virtual Memory (identical page table walk between TCU and LCU VM, HW access enforcement)
 Cache Coherency Domains
 Memory-based signaling and synchronization
 User Mode Queues, Architected Queuing Language (AQL)
 Preemption / Quality of Service
 Error Reporting
 Syscall infrastructure (TCU can dispatch operations to LCU to call general OS APIs)
 Hardware Debug (TCU)
 Architected Topology Discovery
 Thin System SW Layer enables HW features for use by an application runtime
 Mainly responsible for TCU HW init, access enforcement and resource management
 Provides a consistent, dependable feature set for application layer through SW primitives
HSA IS DESIGNED TO GO BEYOND THE GPU
© Copyright 2012 HSA Foundation. All Rights Reserved. 9
CPU
GPU
Shared Memory and Coherency
Audio
Processor
Video
Hardware
DSP
Security
Processor
Image
Signal
Processing
Fixed
Function
Accelerator
SM&C
HSA COMMAND AND DISPATCH FLOW
Application
A
Application
B
Application
C
Optional Dispatch
Buffer
GPU
HARDWARE
Hardware Queue
A
A A
Hardware Queue
B
B B
Hardware Queue
C
C C
C
C
HW view:
 HW / microcode controlled
 HW scheduling
 Architected Queuing
Language (AQL)
 HW-managed protection
SW view:
 User-mode dispatches to HW
 No KMD overhead
 Low dispatch times
 CPU & GPU dispatch APIs
HSA MEMORY MODEL
 Defined to be compatible with C++11, Java and
.NET Memory Models
 Relaxed consistency memory model for parallel
compute performance
 Loads and stores can be re-ordered by the
finalizer
 Visibility controlled by:
 Load.Acquire
 Store.Release
 Barriers
© Copyright 2012 HSA Foundation. All Rights Reserved. 11
HSAIL
 HSAIL is the intermediate language for parallel compute in HSA
 Generated by a high level compiler (LLVM, gcc, Java VM, etc)
 Compiled down to GPU ISA or other parallel processor ISA by an IHV Finalizer
 Finalizer may execute at run time, install time or build time, depending on platform type
 HSAIL is a low level instruction set designed for parallel compute in a shared virtual
memory environment. HSAIL is SIMT in form and does not dictate hardware
microarchitecture
 HSAIL is designed for fast compile time, moving most optimizations to HL compiler
 Limited register set avoids full register allocation in finalizer
 HSAIL is at the same level as PTX: an intermediate assembly or Virtual Machine Target
 Represented as bit-code in in a Brig file format with support late binding of libraries.
© Copyright 2012 HSA Foundation. All Rights Reserved. 12
HSA Security
 With HSA, GPU operates in the same security infrastructure as the CPU
 User and privileged memory
 Read, write and execute protections by page table entry
 Internally, the GPU partitions functionality by privilege level
 User mode compute queues can only run HQL packets
 User mode graphics command buffers cannot write privileged registers
 HSA supports fixed time context switching, which is resistant to Denial of Service attacks
 Today’s GPUs are vulnerable to denial of service attacks
 Long or infinite shader programs
 Full GPU reset required to restore service
 With HSA, fair scheduling and context switching ensures a responsive system
OPENCL™ AND HSA
 HSA is an optimized platform architecture, which will run OpenCL™ very well
 Not an alternative to OpenCL™
 Focused on the hardware platform more than API
 Ready to support many more languages than C/C++
 OpenCL™ on HSA will benefit from
 Avoidance of wasteful copies
 Low latency dispatch
 Improved memory model
 Virtual function calls
 Flexible control flow
 Exception generation and handling
 Device and platform atomics
 Pointers shared between CPU and GPU
HSA BRINGS A MODERN OPEN COMPILATION
FOUNDATION
 This bring about fully competitive rich complete compilation stack architecture for
the creation of a broader set of GPU Computing tools, languages and libraries.
 HSAIL Support LLVM and other compilers – GCC Java VM
© Copyright 2012 HSA Foundation. All Rights Reserved. 15
EDG or CLANG EDG or CLANG
NVVM IR SPIR
LLVM LLVM
PTX HSAIL
Hardware HARDWARE
Cuda OpenCL
LOOKING BEYOND GPU BASED LANGUAGES
 Dynamic Language are now one of the biggest areas that that we need rich
foundation to allow for exploration of heterogeneous parallel runtimes
 Also we need a foundation that goes beyond LLVM based compilation in this
environment, since many have their compilation foundation like Java/Scala,
JavaScript, Dart, etc. -
 See Project Sumatra - http://openjdk.java.net/projects/sumatra/ Formal Project
for GPU Acceleration for Java in OpenJDK
 We also see opportunities LLVM based environment to embrace other
standards based languages like OpenMP, Fortran, GO, Haskell, and DSL’s like
Halide, Julia, and many other.
© Copyright 2012 HSA Foundation. All Rights Reserved. 16
© Copyright 2012 HSA Foundation. All Rights Reserved. 17
TOOLS ARE AVAILABLE NOW
 Tools now at GitHUB – HSA Foundation
 libHSA Assembler and Disassembler
 https://github.com/HSAFoundation/HSAIL-Tools
 HSAIL Instruction Set Simulator
 https://github.com/HSAFoundation/HSAIL-Instruction-Set-Simulator
 HSA ISS Loader Library for Java and C++ for creation and dspatch HSAIL Kernals
 https://github.com/HSAFoundation/Okra-Interface-to-HSAIL-Simulator
 Soon LLVM Compilation stack which outputs HSAIL and BRIG
 Will be bring C++ AMP CLANG front-end as well
© Copyright 2012 HSA Foundation. All Rights Reserved. 18
GET THE PUBLIC SPEC AT
 HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming
Model, Compiler Writer’s Guide, and Object Format (BRIG)
 http://hsafoundation.com/standards/
 https://hsafoundation.box.com/s/m6mrsjv8b7r50kqeyyal
© Copyright 2012 HSA Foundation. All Rights Reserved. 19
BACKUP SLIDES
© Copyright 2012 HSA Foundation. All Rights Reserved. 20
HSA Foundation System Architecture Goals
 Focus of the current “HSA System Architecture” Requirements is on defining a consistent and simple hardware operating
model for use in app software, little SW involvement in performance-critical paths
 E.g. dispatch processing can be issued from either GPU or CPU within the process context
 The requirements describe in detail what needs to be implemented in HW for it to work, not how it needs to be implemented
 Definition is done in a vendor-architecture neutral way -> “Big tent” approach
 Allowing differentiation through innovation while providing a reliable baseline for software to operate on
 Many differentiated HW architectures can map to the programming model with minor modifications
 But its goal is not (yet) to define a unified HW model (e.g. at register level)
 All important control and prioritization mechanisms are defined, but implementation and access may differ across vendors
and is covered in their system software layer
 It is expected that a common model will be established over time, either through architected platform mechanisms (e.g. ACPI)
or architected HW controls accessible to system software
 Strong system software representation helps drive a common model for HSA virtualization
OPPORTUNITIES WITH LLVM BASED
COMPILATION
LLVM
CLANG
C99 C++ 11 C++AMP Objective C OpenCL OpenMP KL OSL
Render
script
UPC Rust
Halide Julia Mono Fortran Haskell
POTENTIAL ANDROID HSA STACK DIAGRAM
© Copyright 2012 HSA Foundation. All Rights Reserved. 23
CPU GPU
Frame Buffer PMEM KGSL KFD
Overlay Hal
Overlay Surface Flinger
Gralloc
(Framefuffer)
Graphics Driver
(OpenGL ES)
EGL Wrapper
Software Graphics Lib
(libGELES.android.so )
Application
Overlay
Main Memory
Application
(UI/2D)
OpenGL ES Application
HSA
Application(s)
HSA Runtime Lib
(libHSA.android.so (Inc.
Finalizer) )
Renderscript
HSA
Wrapper
Hardware
Kernel
Space
User
Space
GPU Kernel Driver
HSA WORKLOAD SUBMISSION PROCESSING
(EXAMPLE, AMD IMPLEMENTATION)
HSA Application Process X
Hardware
Processing
Feedback
Status
1
2
3
4
Doorbell
range
Header
PASID X
UM
Ring1
Parameters
UM
Ring n
Parameters
...
UM
Ring
1
UM
Ring
n
Process
X
Process
Doorbell
Mapping
Process
Y
...
...
...
...
Write ptr 1
Write ptr n
...
HSA Application Process Y
UM
Ring
1
UM
Ring
n
Process
Doorbell
Mapping
...
Write ptr 1
Write ptr n
...
= Pageable Memory
= Non-pageable Memory
= Contiguous Memory
Header
PASID Y
UM
Ring1
Parameters
UM
Ring n
Parameters
...
PASID
On completion/preempt
Kernel Mode
HSA Kernel Driver
Hardware Context
Management
User Mode
UM
Ring
1
UM
Ring
n
UM
Ring
1
UM
Ring
n
Process
Read Ptr
Mapping
Read ptr n
...
Read ptr 1
Process
Read Ptr
Mapping
Read ptr n
...
Read ptr 1
Memory Queue Descriptors (MQD)
The MQD scheduling HW queue,
operated by system software
The user mode queue (UMQ) dispatch processing
by hardware
How is the HSA design accommodating virtualization?
 Programming model does leverage few, simple paradigms (queues, syncvar’s, events) for its operation
 All resources that are necessary to queue and dispatch workload can be expressed as memory regions
 Privileged level SW enforces prioritization, scheduling and access control through HW mechanisms
 The same principles can be applied to virtualize the guest OS HSA kernel resources in HV
 Workload Dispatch, prioritization/scheduling and resource management are strictly separated in the programming model
 Communication between queues and between TCU & LCU occurs via negotiated memory structures (“syncvars”) within
the process address space
 The semantics are defined mostly via software, with little or no HW dependency
 System SW uses the same mechanism to communicate events with application process software
 Software can use system “event objects” that indicate a status change in HW requiring attention
 These are triggered by TCU interrupts and usually processed by system software on LCU
 But the state itself is in the “syncvar” and can be processed by all HSA peers within the process
HSA EXCEPTION HANDLING (EXAMPLE, AMD
IMPLEMENTATION)
 non-privileged TCU code execution causes an abnormal
condition that requires attention
 Non-privileged TCU “trap handler” is invoked that classifies the
condition according to policy
 If the policy requires attention of application/runtime, trap handler
writes a status to non-privileged Trap Memory Buffer
 Trap handler triggers TCU interrupt, privileged SW identifies
interrupt condition and raises a trap/system exception in the
context of the application process
 Exception triggers LCU exception handler (runtime, app or OS
default), condition is identified by evaluating Trap Memory Buffer
and processed
 Exception processing is finished and queue processing is
restarted by calling system software
 Approach is reused for Debug and Syscall events
GPU Hardware, Shader Execution
GPU
Shader Trap Handler
(non-privileged)
ISA instruction stream (non-privileged)
......
Trap Memory Buffer (non-privileged, non-pageable)
set up by Trap Memory Address Register (TMA)
Kernel Fusion Driver
Graphics
KMD
GPU
Interrupt
GPU
IRQMGR
GPU Debug
Handler
GPU FSA
Debug Handler
Exception
Handler
RaiseException()
OS Kernel Mode GPU and Memory
Application (on host)
Application &
Runtime Application
Runtime
Structured
Exception
Handler
Exception or SW Trap
Exception/Trap context data
Write-out
Memory
Manager
Access
Violation
IOMMUv2
Driver
TrapID
Context Data
HSA COMPONENT BLOCK DIAGRAMS (AMD
IMPLEMENTATION) AND DATA FLOW
HSA KMD
text
Application
Process
Application
Process
Application
Process
text
text
DXGKRNL
GPU Scheduling
Infrastructure
WDDM
Miniport KMD
DXGMM1
Video
Memory
Manager
Direct3D
DirectCompute
AMD WDDM
UMD
OpenCL
DXG
Thunks
IOMMUv2
User Mode
Kernel Mode
Hardware GPU
Application
Process
Application
Process
text
Application
Win32 API
HSA
Runtime
Direct3D
DirectCompute
OpenCL
DXG
Thunks
WDDM
UMD
System
Memory
Management
Process
Scheduler
HSA GPU
Scheduler
HSA
Memory
Management
IOMMUv2
Driver
CP
Shaders
HW
Dispatch
GPU
Exceptions
HSA data flow
Allocate resources
(Win32 MemMgr, WDDM)
Fill command buffer,
reference data
buffers
Create GPU/HSA
Context,
UM Work Queues
forward command buffer to
UM Work Queue
Create GPU/HSA
Context
Work Queues
MemMgr: Create Memory
resources
User Mode
Kernel Mode
Dispatch
HSA Platform Topology - Example
HSA Platform - Simple
System Memory
coherent
HSA APU
GPU
H-CU
H-CU
H-CU
Mem HSA MMU
(cacheable)
(non-cacheable)
non-coherent
CPU
core
core
core
core
HSA Platform Node 2
Node 0
Add-In Board (optional)
HSA discrete GPU
System Memory
(cacheable)
coherent
(non-cacheable)
non-coherent
HSA APU
GPU
H-CU
H-CU
H-CU
GPU
H-CU
H-CU
H-CU
CPU
Core
Core
Core
Device Local
Memory
coherent
non-coherent
Mem
Mem
HSA MMU
SBIOS
UEFI
HSA discrete GPU
GPU
H-CU
H-CU
H-CU
Device Local
Memory
coherent
non-coherent
Mem
Node 1
PCIe
BridgePCIE
System Memory
(cacheable)
coherent
(non-cacheable)
non-coherent
HSA APU
GPU
H-CU
H-CU
H-CU
CPU
Core
Core
Core
Mem HSA MMU
Add-In Board (optional)
HSA discrete GPU
GPU
H-CU
H-CU
H-CU
Device Local
Memory
coherent
non-coherent
PCIE
Mem
VBIOS
UEFI GOP
SocketInterconnect
Node 3
PCIE
Node 4
PCIE
VBIOS
UEFI GOP
Node 0
• Strict NUMA paradigm
• HSA resources are
expressed through ACPI
• System Software can
discover detailed memory,
cache, interconnect , LCU
and TCU properties
THE IOMMUV2 OPERATION
Trademark Attribution
HSA Foundation, the HSA Foundation logo and combinations thereof are trademarks of HSA Foundation, Inc. in the United States and/or other
jurisdictions. Other names used in this presentation are for identification purposes only and may be trademarks of their respective owners.
©2013 HSA Foundation, Inc. All rights reserved.

More Related Content

Viewers also liked

HSA Introduction Hot Chips 2013
HSA Introduction  Hot Chips 2013HSA Introduction  Hot Chips 2013
HSA Introduction Hot Chips 2013HSA Foundation
 
HSA Memory Model Hot Chips 2013
HSA Memory Model Hot Chips 2013HSA Memory Model Hot Chips 2013
HSA Memory Model Hot Chips 2013HSA Foundation
 
Bolt C++ Standard Template Libary for HSA by Ben Sanders, AMD
Bolt C++ Standard Template Libary for HSA  by Ben Sanders, AMDBolt C++ Standard Template Libary for HSA  by Ben Sanders, AMD
Bolt C++ Standard Template Libary for HSA by Ben Sanders, AMDHSA Foundation
 
GPU power consumption and performance trends
GPU power consumption and performance trendsGPU power consumption and performance trends
GPU power consumption and performance trendsAlessio Villardita
 
BKK16-315 Graphics Stack Update
BKK16-315 Graphics Stack UpdateBKK16-315 Graphics Stack Update
BKK16-315 Graphics Stack UpdateLinaro
 
이기종 멀티코어 기반의 Open cv 응용 사례 및 효율적인 어플리케이션 디자인
이기종 멀티코어 기반의 Open cv 응용 사례 및 효율적인 어플리케이션 디자인이기종 멀티코어 기반의 Open cv 응용 사례 및 효율적인 어플리케이션 디자인
이기종 멀티코어 기반의 Open cv 응용 사례 및 효율적인 어플리케이션 디자인Seunghwa Song
 

Viewers also liked (7)

HSA Introduction Hot Chips 2013
HSA Introduction  Hot Chips 2013HSA Introduction  Hot Chips 2013
HSA Introduction Hot Chips 2013
 
HSA Memory Model Hot Chips 2013
HSA Memory Model Hot Chips 2013HSA Memory Model Hot Chips 2013
HSA Memory Model Hot Chips 2013
 
Bolt C++ Standard Template Libary for HSA by Ben Sanders, AMD
Bolt C++ Standard Template Libary for HSA  by Ben Sanders, AMDBolt C++ Standard Template Libary for HSA  by Ben Sanders, AMD
Bolt C++ Standard Template Libary for HSA by Ben Sanders, AMD
 
GPU power consumption and performance trends
GPU power consumption and performance trendsGPU power consumption and performance trends
GPU power consumption and performance trends
 
Implement Runtime Environments for HSA using LLVM
Implement Runtime Environments for HSA using LLVMImplement Runtime Environments for HSA using LLVM
Implement Runtime Environments for HSA using LLVM
 
BKK16-315 Graphics Stack Update
BKK16-315 Graphics Stack UpdateBKK16-315 Graphics Stack Update
BKK16-315 Graphics Stack Update
 
이기종 멀티코어 기반의 Open cv 응용 사례 및 효율적인 어플리케이션 디자인
이기종 멀티코어 기반의 Open cv 응용 사례 및 효율적인 어플리케이션 디자인이기종 멀티코어 기반의 Open cv 응용 사례 및 효율적인 어플리케이션 디자인
이기종 멀티코어 기반의 Open cv 응용 사례 및 효율적인 어플리케이션 디자인
 

More from Linaro

Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea GalloDeep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea GalloLinaro
 
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta VekariaArm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta VekariaLinaro
 
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua MoraHuawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua MoraLinaro
 
Bud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qaBud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qaLinaro
 
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018Linaro
 
HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018Linaro
 
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...Linaro
 
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Linaro
 
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...Linaro
 
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Linaro
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineLinaro
 
HKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening KeynoteHKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening KeynoteLinaro
 
HKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP WorkshopHKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP WorkshopLinaro
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineLinaro
 
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and allHKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and allLinaro
 
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorHKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorLinaro
 
HKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMUHKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMULinaro
 
HKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8MHKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8MLinaro
 
HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation Linaro
 
HKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootHKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootLinaro
 

More from Linaro (20)

Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea GalloDeep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
 
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta VekariaArm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
 
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua MoraHuawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
 
Bud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qaBud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qa
 
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
 
HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018
 
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
 
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
 
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
 
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
 
HKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening KeynoteHKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening Keynote
 
HKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP WorkshopHKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP Workshop
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
 
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and allHKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
 
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorHKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
 
HKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMUHKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMU
 
HKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8MHKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8M
 
HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation
 
HKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootHKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted boot
 

Recently uploaded

Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 

Recently uploaded (20)

Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 

LCE13: Heterogeneous System Architecture (HSA) on ARM

  • 2. HSA FOUNDATION’S INITIAL FOCUS Bring Accelerators forward as a first class processor • Unified address space across all processors • Operates in pageable system memory • Full memory coherency between the CPU and GPU • Fully defined relaxed consistency memory model • User mode dispatch/scheduling • Eliminate drivers from the dispatch path • QOS through pre-emption and context switching Attract Mainstream programmers • Support broader set of languages beyond Traditional GP-GPU Langs • Support for Task Parallel Runtimes & Nested Data Parallel • Rich debugging and performance analysis support Create a platform architecture for all accelerators • Focused on the APU/SOC © Copyright 2012 HSA Foundation. All Rights Reserved. 2
  • 3. HSA FOUNDATION MEMBERSHIP – JUNE 2013 © Copyright 2012 HSA Foundation. All Rights Reserved.3 Founders Promoters Supporters Contributors Academic Associates
  • 4. DELIVERED VIA ROYALTY FREE STANDARDS © Copyright 2012 HSA Foundation. All Rights Reserved. 4  Royalty Free IP, Specifications and API’s.  Three primary specifications are  HSA Platform System Architecture Specification  Focus on hardware requirements and low level system software  Support Small Mode (32bit) and Large Mode ( 64bit)  HSA Programmer Reference Manuel  Definition of HSAIL Virtual ISA  Binary format (BRIG)  Compiler writers guide and Libraries developer guide  HSA System Runtime Specification
  • 5. AMD’S OPEN SOURCE COMMITMENT TO HSA  We will open source our Linux execution and compilation stack  Jump start the ecosystem  Allow a single shared implementation where appropriate  Enable university research in all areas © Copyright 2012 HSA Foundation. All Rights Reserved. 5 Component Name AMD Specific Rationale HSA Bolt Library No Enable understanding and debug HSAIL Code Generator No Enable research LLVM Contributions No Industry and academic collaboration HSA Assembler No Enable understanding and debug HSA Runtime No Standardize on a single runtime HSA Finalizer Yes Enable research and debug HSA Kernel Driver Yes For inclusion in linux distros
  • 6. WHAT ARE THE PROBLEMS WE ARE TRYING TO SOLVE  The SOC are quickly following into the same many CPU core bottlenecks of the PC.  To move beyond this we need to look at right processor(s) and/or execution device for given workload at reasonable power  While addressing the core issues of  Easier to program  Easier to optimize  Easier to load balance  High performance  Lower power © Copyright 2012 HSA Foundation. All Rights Reserved. 6
  • 7. HSA TAKING PLATFORM TO PROGRAMMERS  Balance between CPU and GPU for performance and power efficiency  Make GPUs accessible to wider audience of programmers  Programming models close to today’s CPU programming models  Enabling more advanced language features on GPU  Shared virtual memory enables complex pointer-containing data structures (lists, trees, etc.) and hence more applications on GPU  Kernel can enqueue work to any other device in the system (e.g. GPU->GPU, GPU->CPU) • Enabling task-graph style algorithms, Ray-Tracing, etc  Clearly defined HSA memory model enables effective reasoning for parallel programming  HSA provides a compatible architecture across a wide range of programming models and HW implementations.
  • 8. Design criteria for HSA platform infrastructure  HSA is defined through HW requirements that enforce a set of HW compliance criteria SW can depend on  Comprehensive Memory model (well-defined visibility and ordering rules for transactions)  Shared Virtual Memory (identical page table walk between TCU and LCU VM, HW access enforcement)  Cache Coherency Domains  Memory-based signaling and synchronization  User Mode Queues, Architected Queuing Language (AQL)  Preemption / Quality of Service  Error Reporting  Syscall infrastructure (TCU can dispatch operations to LCU to call general OS APIs)  Hardware Debug (TCU)  Architected Topology Discovery  Thin System SW Layer enables HW features for use by an application runtime  Mainly responsible for TCU HW init, access enforcement and resource management  Provides a consistent, dependable feature set for application layer through SW primitives
  • 9. HSA IS DESIGNED TO GO BEYOND THE GPU © Copyright 2012 HSA Foundation. All Rights Reserved. 9 CPU GPU Shared Memory and Coherency Audio Processor Video Hardware DSP Security Processor Image Signal Processing Fixed Function Accelerator SM&C
  • 10. HSA COMMAND AND DISPATCH FLOW Application A Application B Application C Optional Dispatch Buffer GPU HARDWARE Hardware Queue A A A Hardware Queue B B B Hardware Queue C C C C C HW view:  HW / microcode controlled  HW scheduling  Architected Queuing Language (AQL)  HW-managed protection SW view:  User-mode dispatches to HW  No KMD overhead  Low dispatch times  CPU & GPU dispatch APIs
  • 11. HSA MEMORY MODEL  Defined to be compatible with C++11, Java and .NET Memory Models  Relaxed consistency memory model for parallel compute performance  Loads and stores can be re-ordered by the finalizer  Visibility controlled by:  Load.Acquire  Store.Release  Barriers © Copyright 2012 HSA Foundation. All Rights Reserved. 11
  • 12. HSAIL  HSAIL is the intermediate language for parallel compute in HSA  Generated by a high level compiler (LLVM, gcc, Java VM, etc)  Compiled down to GPU ISA or other parallel processor ISA by an IHV Finalizer  Finalizer may execute at run time, install time or build time, depending on platform type  HSAIL is a low level instruction set designed for parallel compute in a shared virtual memory environment. HSAIL is SIMT in form and does not dictate hardware microarchitecture  HSAIL is designed for fast compile time, moving most optimizations to HL compiler  Limited register set avoids full register allocation in finalizer  HSAIL is at the same level as PTX: an intermediate assembly or Virtual Machine Target  Represented as bit-code in in a Brig file format with support late binding of libraries. © Copyright 2012 HSA Foundation. All Rights Reserved. 12
  • 13. HSA Security  With HSA, GPU operates in the same security infrastructure as the CPU  User and privileged memory  Read, write and execute protections by page table entry  Internally, the GPU partitions functionality by privilege level  User mode compute queues can only run HQL packets  User mode graphics command buffers cannot write privileged registers  HSA supports fixed time context switching, which is resistant to Denial of Service attacks  Today’s GPUs are vulnerable to denial of service attacks  Long or infinite shader programs  Full GPU reset required to restore service  With HSA, fair scheduling and context switching ensures a responsive system
  • 14. OPENCL™ AND HSA  HSA is an optimized platform architecture, which will run OpenCL™ very well  Not an alternative to OpenCL™  Focused on the hardware platform more than API  Ready to support many more languages than C/C++  OpenCL™ on HSA will benefit from  Avoidance of wasteful copies  Low latency dispatch  Improved memory model  Virtual function calls  Flexible control flow  Exception generation and handling  Device and platform atomics  Pointers shared between CPU and GPU
  • 15. HSA BRINGS A MODERN OPEN COMPILATION FOUNDATION  This bring about fully competitive rich complete compilation stack architecture for the creation of a broader set of GPU Computing tools, languages and libraries.  HSAIL Support LLVM and other compilers – GCC Java VM © Copyright 2012 HSA Foundation. All Rights Reserved. 15 EDG or CLANG EDG or CLANG NVVM IR SPIR LLVM LLVM PTX HSAIL Hardware HARDWARE Cuda OpenCL
  • 16. LOOKING BEYOND GPU BASED LANGUAGES  Dynamic Language are now one of the biggest areas that that we need rich foundation to allow for exploration of heterogeneous parallel runtimes  Also we need a foundation that goes beyond LLVM based compilation in this environment, since many have their compilation foundation like Java/Scala, JavaScript, Dart, etc. -  See Project Sumatra - http://openjdk.java.net/projects/sumatra/ Formal Project for GPU Acceleration for Java in OpenJDK  We also see opportunities LLVM based environment to embrace other standards based languages like OpenMP, Fortran, GO, Haskell, and DSL’s like Halide, Julia, and many other. © Copyright 2012 HSA Foundation. All Rights Reserved. 16
  • 17. © Copyright 2012 HSA Foundation. All Rights Reserved. 17
  • 18. TOOLS ARE AVAILABLE NOW  Tools now at GitHUB – HSA Foundation  libHSA Assembler and Disassembler  https://github.com/HSAFoundation/HSAIL-Tools  HSAIL Instruction Set Simulator  https://github.com/HSAFoundation/HSAIL-Instruction-Set-Simulator  HSA ISS Loader Library for Java and C++ for creation and dspatch HSAIL Kernals  https://github.com/HSAFoundation/Okra-Interface-to-HSAIL-Simulator  Soon LLVM Compilation stack which outputs HSAIL and BRIG  Will be bring C++ AMP CLANG front-end as well © Copyright 2012 HSA Foundation. All Rights Reserved. 18
  • 19. GET THE PUBLIC SPEC AT  HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, Compiler Writer’s Guide, and Object Format (BRIG)  http://hsafoundation.com/standards/  https://hsafoundation.box.com/s/m6mrsjv8b7r50kqeyyal © Copyright 2012 HSA Foundation. All Rights Reserved. 19
  • 20. BACKUP SLIDES © Copyright 2012 HSA Foundation. All Rights Reserved. 20
  • 21. HSA Foundation System Architecture Goals  Focus of the current “HSA System Architecture” Requirements is on defining a consistent and simple hardware operating model for use in app software, little SW involvement in performance-critical paths  E.g. dispatch processing can be issued from either GPU or CPU within the process context  The requirements describe in detail what needs to be implemented in HW for it to work, not how it needs to be implemented  Definition is done in a vendor-architecture neutral way -> “Big tent” approach  Allowing differentiation through innovation while providing a reliable baseline for software to operate on  Many differentiated HW architectures can map to the programming model with minor modifications  But its goal is not (yet) to define a unified HW model (e.g. at register level)  All important control and prioritization mechanisms are defined, but implementation and access may differ across vendors and is covered in their system software layer  It is expected that a common model will be established over time, either through architected platform mechanisms (e.g. ACPI) or architected HW controls accessible to system software  Strong system software representation helps drive a common model for HSA virtualization
  • 22. OPPORTUNITIES WITH LLVM BASED COMPILATION LLVM CLANG C99 C++ 11 C++AMP Objective C OpenCL OpenMP KL OSL Render script UPC Rust Halide Julia Mono Fortran Haskell
  • 23. POTENTIAL ANDROID HSA STACK DIAGRAM © Copyright 2012 HSA Foundation. All Rights Reserved. 23 CPU GPU Frame Buffer PMEM KGSL KFD Overlay Hal Overlay Surface Flinger Gralloc (Framefuffer) Graphics Driver (OpenGL ES) EGL Wrapper Software Graphics Lib (libGELES.android.so ) Application Overlay Main Memory Application (UI/2D) OpenGL ES Application HSA Application(s) HSA Runtime Lib (libHSA.android.so (Inc. Finalizer) ) Renderscript HSA Wrapper Hardware Kernel Space User Space GPU Kernel Driver
  • 24. HSA WORKLOAD SUBMISSION PROCESSING (EXAMPLE, AMD IMPLEMENTATION) HSA Application Process X Hardware Processing Feedback Status 1 2 3 4 Doorbell range Header PASID X UM Ring1 Parameters UM Ring n Parameters ... UM Ring 1 UM Ring n Process X Process Doorbell Mapping Process Y ... ... ... ... Write ptr 1 Write ptr n ... HSA Application Process Y UM Ring 1 UM Ring n Process Doorbell Mapping ... Write ptr 1 Write ptr n ... = Pageable Memory = Non-pageable Memory = Contiguous Memory Header PASID Y UM Ring1 Parameters UM Ring n Parameters ... PASID On completion/preempt Kernel Mode HSA Kernel Driver Hardware Context Management User Mode UM Ring 1 UM Ring n UM Ring 1 UM Ring n Process Read Ptr Mapping Read ptr n ... Read ptr 1 Process Read Ptr Mapping Read ptr n ... Read ptr 1 Memory Queue Descriptors (MQD) The MQD scheduling HW queue, operated by system software The user mode queue (UMQ) dispatch processing by hardware
  • 25. How is the HSA design accommodating virtualization?  Programming model does leverage few, simple paradigms (queues, syncvar’s, events) for its operation  All resources that are necessary to queue and dispatch workload can be expressed as memory regions  Privileged level SW enforces prioritization, scheduling and access control through HW mechanisms  The same principles can be applied to virtualize the guest OS HSA kernel resources in HV  Workload Dispatch, prioritization/scheduling and resource management are strictly separated in the programming model  Communication between queues and between TCU & LCU occurs via negotiated memory structures (“syncvars”) within the process address space  The semantics are defined mostly via software, with little or no HW dependency  System SW uses the same mechanism to communicate events with application process software  Software can use system “event objects” that indicate a status change in HW requiring attention  These are triggered by TCU interrupts and usually processed by system software on LCU  But the state itself is in the “syncvar” and can be processed by all HSA peers within the process
  • 26. HSA EXCEPTION HANDLING (EXAMPLE, AMD IMPLEMENTATION)  non-privileged TCU code execution causes an abnormal condition that requires attention  Non-privileged TCU “trap handler” is invoked that classifies the condition according to policy  If the policy requires attention of application/runtime, trap handler writes a status to non-privileged Trap Memory Buffer  Trap handler triggers TCU interrupt, privileged SW identifies interrupt condition and raises a trap/system exception in the context of the application process  Exception triggers LCU exception handler (runtime, app or OS default), condition is identified by evaluating Trap Memory Buffer and processed  Exception processing is finished and queue processing is restarted by calling system software  Approach is reused for Debug and Syscall events GPU Hardware, Shader Execution GPU Shader Trap Handler (non-privileged) ISA instruction stream (non-privileged) ...... Trap Memory Buffer (non-privileged, non-pageable) set up by Trap Memory Address Register (TMA) Kernel Fusion Driver Graphics KMD GPU Interrupt GPU IRQMGR GPU Debug Handler GPU FSA Debug Handler Exception Handler RaiseException() OS Kernel Mode GPU and Memory Application (on host) Application & Runtime Application Runtime Structured Exception Handler Exception or SW Trap Exception/Trap context data Write-out Memory Manager Access Violation IOMMUv2 Driver TrapID Context Data
  • 27. HSA COMPONENT BLOCK DIAGRAMS (AMD IMPLEMENTATION) AND DATA FLOW HSA KMD text Application Process Application Process Application Process text text DXGKRNL GPU Scheduling Infrastructure WDDM Miniport KMD DXGMM1 Video Memory Manager Direct3D DirectCompute AMD WDDM UMD OpenCL DXG Thunks IOMMUv2 User Mode Kernel Mode Hardware GPU Application Process Application Process text Application Win32 API HSA Runtime Direct3D DirectCompute OpenCL DXG Thunks WDDM UMD System Memory Management Process Scheduler HSA GPU Scheduler HSA Memory Management IOMMUv2 Driver CP Shaders HW Dispatch GPU Exceptions HSA data flow Allocate resources (Win32 MemMgr, WDDM) Fill command buffer, reference data buffers Create GPU/HSA Context, UM Work Queues forward command buffer to UM Work Queue Create GPU/HSA Context Work Queues MemMgr: Create Memory resources User Mode Kernel Mode Dispatch
  • 28. HSA Platform Topology - Example HSA Platform - Simple System Memory coherent HSA APU GPU H-CU H-CU H-CU Mem HSA MMU (cacheable) (non-cacheable) non-coherent CPU core core core core HSA Platform Node 2 Node 0 Add-In Board (optional) HSA discrete GPU System Memory (cacheable) coherent (non-cacheable) non-coherent HSA APU GPU H-CU H-CU H-CU GPU H-CU H-CU H-CU CPU Core Core Core Device Local Memory coherent non-coherent Mem Mem HSA MMU SBIOS UEFI HSA discrete GPU GPU H-CU H-CU H-CU Device Local Memory coherent non-coherent Mem Node 1 PCIe BridgePCIE System Memory (cacheable) coherent (non-cacheable) non-coherent HSA APU GPU H-CU H-CU H-CU CPU Core Core Core Mem HSA MMU Add-In Board (optional) HSA discrete GPU GPU H-CU H-CU H-CU Device Local Memory coherent non-coherent PCIE Mem VBIOS UEFI GOP SocketInterconnect Node 3 PCIE Node 4 PCIE VBIOS UEFI GOP Node 0 • Strict NUMA paradigm • HSA resources are expressed through ACPI • System Software can discover detailed memory, cache, interconnect , LCU and TCU properties
  • 30. Trademark Attribution HSA Foundation, the HSA Foundation logo and combinations thereof are trademarks of HSA Foundation, Inc. in the United States and/or other jurisdictions. Other names used in this presentation are for identification purposes only and may be trademarks of their respective owners. ©2013 HSA Foundation, Inc. All rights reserved.