SlideShare a Scribd company logo
1 of 48
Download to read offline
HETEROGENEOUS SYSTEM
ARCHITECTURE OVERVIEW
VINOD TIPPARAJU
PRINCIPAL MEMBER OF TECHNICAL STAFF,
AMD AUSTIN.
INTRODUCTION AND OVERVIEW
TERMINOLOGY, WHAT MAKES HSA, ORIGINS AND
EVOLUTION, USAGE SCENARIOS
SOME TERMINOLOGY
u 

u 

u 

HSA is heterogeneous systems architecture, not just GPUs
HSA Component – IP that satisfies architecture requirements and provides
identified features
SoC – system on Chip, collection of various IPs
u 

u 

u 

u 

E.g. AMD APU (Accelerated Processing Unit) integrates AMD/ARM CPU cores and
Graphics IP
It is possible to conceive companies just building parts of the IP

HSAIL -- HSA intermediate language very low-level SIMT language
HSA Agent – something that can participate in the HSA memory subsystem
(i.e. respect page sizes, memory properties, atomics, etc.)

AMD Confidential - NDA Required
WHAT IS HSA?
Systems Architecture
u 

u 

From a hardware point of view, system
architecture requirements necessary
Specifies shared memory, cache coherence
domains, concept of clocks, context
switching, memory based signaling, topology,

Programmers Reference (HSAIL)
u 

u 

u 
u 

Rules governing design and agent behavior

RUNTIME
u 

API that wraps the features like user mode
queues, clocks, signalling, etc

u 

Provides execution control

u 

An intermediate representation, very low
level.
Vendor independence, device compiler
optimizations
Abstracts HW, or can serve as the lowest
level instruction set

TOOLS
u 

Supporting profilers, debuggers and
compilers

Supports tools

u 

u 

Unique debugging support that greatly
simplifies implementing debuggers
Excellent profiling support with some user
mode access
HSA ORIGINS, EVOLUTION IN COMPUTE
u 

Next step from AMD in general purpose compute

u 

Evolutionary step
u 
u 

Exceptional graphics IP

u 

u 

Lot of experience in building general purpose CPUs
Natural to utilize graphics IP for doing compute

Prior step was HW integration phase
u 

GPU was pre-GCN (graphics core next)
u 

u 

Did not have all features to support HSA

Memory management unit was still evolving

AMD Confidential - NDA Required
TAKING THE HW INTEGRATION TO ITS
NATURAL CONCLUSION
u 

Architectural and System integration

u 

Extend architecture to make the component a first class citizen on the SoC

u 

Fully-evolved MMU

u 

Provide same level of support for tools as CPU

u 

Provide context switching, preemption, full-coherence
u 

u 

Helps simulators, migrations, checkpoints, etc

Future, other HSA IP

AMD Confidential - NDA Required
SOCS HAVE PROLIFERATED —
MAKE THEM BETTER
u 

u 

u 

SOCs have arrived and are a tremendous
advance over previous platforms
SOCs combine CPU cores, GPU cores and
other accelerators, with high bandwidth access
to memory
How do we make them even better?
u 
u 

Higher performance

u 

u 

Easier to optimize

u 

u 

Easier to program

Lower power

HSA unites accelerators architecturally
Early focus is APU (CPU with GPU compute
accelerator), but HSA goes well beyond the GPU

AMD Confidential - NDA Required
HIGH LEVEL USAGE SCENARIOS
u 

Bulk-Synchronous Parallelism -like concurrent computation
u 

u 

Rather large parallel sections followed by synchronization

Outstanding support for task-based parallelism
u 
u 

256 threads sufficient to fully fill the pipeline

u 

Launch is quick

u 

Support for execution schedules – excellent compiler target

u 

u 

Wavefront is 64 threads

Architected Queueing Language (AQL), dependencies

Advanced language support
u 

Function calls

u 

Virtual functions

u 

Exception handling (throw-catch)

AMD Confidential - NDA Required
HSA FOUNDATION
HSA FOUNDATION
u 
u 

u 
u 

u 

u 

Founded in June 2012
Developing a new platform for
heterogeneous systems
www.hsafoundation.com
Specifications under development
in working groups
Our first specification, HSA
Programmers Reference Manual
is already published and available
on our web site
Additional specifications for
System Architecture, Runtime
Software and Tools are in process

AMD Confidential - NDA Required
HSA FOUNDATION MEMBERSHIP —
AUGUST 2013
Founders

Promoters
Supporters

Contributors

Academic
Associates

AMD Confidential - NDA Required
HSA — AN OPEN PLATFORM
u 

Open Architecture, membership open to all
u 
u 

HSA System Architecture

u 

u 

HSA Programmers Reference Manual
HSA Runtime

Delivered via royalty free standards
u 

Royalty Free IP, Specifications and APIs

u 

ISA agnostic for both CPU and GPU

u 

Membership from all areas of computing
u 

Hardware Companies

u 

Operating Systems

u 

Tools and Middleware

AMD Confidential - NDA Required
MEMORY AND QUEUING MODEL
HSA MEMORY MODEL
u 

u 

u 

u 

Defines visibility ordering between all threads
in the HSA System
Designed to be compatible with C++11, Java,
OpenCL and .NET Memory Models
Relaxed consistency memory model for
parallel compute performance
Visibility controlled by:
u 

Load.Acquire

u 

Store.Release

u 

Barriers

AMD Confidential - NDA Required
HSA QUEUING MODEL
u 

User mode queuing for low latency dispatch
u 
u 

u 

Application dispatches directly
No OS or driver in the dispatch path

Architected Queuing Layer
u 
u 

u 

Single compute dispatch path for all hardware
No driver translation, direct to hardware

Allows for dispatch to queue from any agent
u 

u 

CPU or GPU

GPU self enqueue enables lots of solutions
u 

Recursion

u 

Tree traversal

u 

Wavefront reforming

AMD Confidential - NDA Required
HSAIL

AMD Confidential - NDA Required
HSA INTERMEDIATE LAYER — HSAIL
u 

HSAIL is a virtual ISA for parallel programs
u 
u 

u 

Finalized to ISA by a JIT compiler or “Finalizer”
ISA independent by design for CPU & GPU

Explicitly parallel
u 

u 

u 

Support for exceptions, virtual functions,
and other high level language features
Lower level than OpenCL SPIR
u 

u 

Designed for data parallel programming

Fits naturally in the OpenCL compilation stack

Suitable to support additional high level languages and programming models:
u 

Java, C++, OpenMP, Fortran etc

AMD Confidential - NDA Required
WHAT IS HSAIL?
u 

HSAIL is the intermediate language for parallel compute in HSA
u 
u 
u 
u 

u 

Generated by a high level compiler (LLVM, gcc, Java VM, etc)
Low-level IR, close to machine ISA level
Compiled down to target ISA by an IHV “Finalizer”
Finalizer may execute at run time, install time, or build time

Example: OpenCL™ Compilation Stack using HSAIL

High-Level Compiler Flow (Developer)
OpenCL™ Kernel
EDG or CLANG
SPIR
LLVM
HSAIL

AMD Confidential - NDA Required

Finalizer Flow (Runtime)
HSAIL
Finalizer
Hardware ISA
KEY HSAIL FEATURES
u 

Parallel

u 

Shared virtual memory

u 

Portable across vendors in HSA Foundation

u 

Stable across multiple product generations

u 

Consistent numerical results (IEEE-754 with defined min accuracy)

u 

Fast, robust, simple finalization step (no monthly updates)

u 

Good performance (little need to write in ISA)

u 

Supports all of OpenCL™ and C++ AMP™

u 

Support Java, C++, and other languages as well

AMD Confidential - NDA Required
SIMT EXECUTION MODEL
u 

HSAIL Presents a “SIMT” execution model to the programmer
u 
u 

Programmer writes program for a single thread of execution

u 

Each work-item appears to have its own program counter

u 

u 

“Single Instruction, Multiple Thread”

Branch instructions look natural

Hardware Implementation
u 
u 

Actually one program counter for the entire SIMD instruction

u 

u 

Most hardware uses SIMD (Single-Instruction Multiple Data) vectors for efficiency
Branches implemented with predication

SIMT Advantages
u 

Easier to program (branch code in particular)

u 

Natural path for mainstream programming models

u 

Scales across a wide variety of hardware (programmer doesn’t see vector width)

u 

Cross-lane operations available for those who want peak performance

AMD Confidential - NDA Required
COMPILATION TECHNOLOGY
OPPORTUNITIES WITH LLVM BASED
COMPILATION

C99

C++ 11

C++AMP

Objective C

OpenCL

OpenMP

KL

OSL

Render
script

UPC

Halide

CLANG

LLVM
AMD Confidential - NDA Required

Rust

Julia

Mono

Fortran

Haskell
ARCHITECTURE DETAILS –
WALK THROUGH OF FEATURES
AND BENEFITS
HIGH LEVEL FEATURES OF HSA
u 

Features currently being defined in the HSA Working Groups**
u 

Unified addressing across all processors

u 

Operation into pageable system memory

u 

Full memory coherency

u 

User mode dispatch

u 

Architected queuing language

u 

High level language support for GPU compute processors

u 

Preemption and context switching

** All features subject to change, pending completion and ratification of specifications in the HSA Working Groups
AMD Confidential - NDA Required
STATE OF GPU COMPUTING
•  GPUs are fast and power efficient : high compute density per-mm and per-watt
•  But: Can be hard to program
Today’s Challenges
u 

Emerging Solution

Separate address spaces

u 

HSA Hardware

u 

Copies

u 

Single address space

u 

Can’t share pointers

u 

Coherent

u 

Virtual address space

u 

Fast access from all components

u 

Can share pointers

PCIe

u 

New language required for compute kernel
u 
u 

EX: OpenCL™ runtime API
Compute kernel compiled separately
than host code

u 

Bring GPU computing to existing, popular,
programming models
u 

u 

Single-source, fully supported by
compiler
HSAIL compiler IR (Cross-platform!)
MOTIVATION (TODAY’S PICTURE)
Application
Transfer
buffer to GPU

OS

GPU

Copy/Map
Memory

Queue Job
Schedule Job
Start Job

Finish Job
Schedule
Application
Get Buffer
Copy/Map
Memory

AMD Confidential - NDA Required
SHARED VIRTUAL MEMORY (TODAY)
u 

Multiple virtual memory address spaces

PHYSICAL MEMORY
PHYSICAL MEMORY

VIRTUAL MEMORY1

CPU0
VA1->PA1
AMD Confidential - NDA Required

VIRTUAL MEMORY2

GPU
VA2->PA1
SHARED VIRTUAL MEMORY (HSA)
u 

Common Virtual Memory for all HSA agents

PHYSICAL MEMORY
PHYSICAL MEMORY

VIRTUAL MEMORY

CPU0
VA->PA
AMD Confidential - NDA Required

GPU
VA->PA
SHARED VIRTUAL MEMORY
u 

u 

Advantages
u  No mapping tricks, no copying back-and-forth between different PA
addresses
u  Send pointers (not data) back and forth between HSA agents.
Implications
u  Common Page Tables (and common interpretation of architectural
semantics such as shareability, protection, etc).
u  Common mechanisms for address translation (and servicing
address translation faults)
u  Concept of a process address space ID (PASID) to allow multiple,
per process virtual address spaces within the system.

AMD Confidential - NDA Required
GETTING THERE …
Application
Transfer
buffer to GPU

OS

GPU

Copy/Map
Memory

Queue Job
Schedule Job
Start Job

Finish Job
Schedule
Application
Get Buffer
Copy/Map
Memory

AMD Confidential - NDA Required
SHARED VIRTUAL MEMORY
u 

Specifics
u  Minimum supported VA width is 48b for 64b systems, and 32b for
32b systems.
u  HSA agents may reserve VA ranges for internal use via system
software.
u  All HSA agents other than the host unit must use the lowest
privilege level
u  If present, read/write access flags for page tables must be
maintained by all agents.
u  Read/write permissions apply to all HSA agents, equally.

AMD Confidential - NDA Required
CACHE COHERENCY
CACHE COHERENCY DOMAINS (1/2)

u 

Data accesses to global memory segment from all HSA Agents shall be
coherent without the need for explicit cache maintenance.

AMD Confidential - NDA Required
CACHE COHERENCY DOMAINS (2/2)
u 

u 

Advantages
u  Composability
u  Reduced SW complexity when communicating between agents
u  Lower barrier to entry when porting software
Implications
u  Hardware coherency support between all HSA agents
u 

Can take many forms
u  Stand alone Snoop Filters / Directories
u  Combined L3/Filters
u  Snoop-based systems (no filter)
u  Etc …

AMD Confidential - NDA Required
GETTING CLOSER …
Application
Transfer
buffer to GPU

OS

GPU

Copy/Map
Memory

Queue Job
Schedule Job
Start Job

Finish Job
Schedule
Application
Get Buffer
Copy/Map
Memory

AMD Confidential - NDA Required
SIGNALING
SIGNALING (1/2)
u 

HSA agents support the ability to use signaling objects
u  All creation/destruction signaling objects occurs via HSA
runtime APIs
u  Object creation/destruction
u  From an HSA Agent you can directly accessing
signaling objects.
u  Signaling a signal object (this will wake up HSA
agents waiting upon the object)
u  Query current object
u  Wait on the current object (various conditions
supported).

AMD Confidential - NDA Required
SIGNALING (2/2)
u 

u 

Advantages
u  Enables asynchronous interrupts between HSA agents,
without involving the kernel
u  Common idiom for work offload
u  Low power waiting
Implications
u  Runtime support required
u  Commonly implemented on top of cache coherency
flows

AMD Confidential - NDA Required
ALMOST THERE…
Application
Transfer
buffer to GPU

OS

GPU

Copy/Map
Memory

Queue Job
Schedule Job
Start Job

Finish Job
Schedule
Application
Get Buffer
Copy/Map
Memory

AMD Confidential - NDA Required
USER MODE QUEUEING
USER MODE QUEUEING (1/3)
u 

User mode Queueing
u 

Enables user space applications to directly, without OS intervention, enqueue jobs
(“Dispatch Packets”) for HSA agents.
u 

Dispatch packet is a job of work

u 

Support for multiple queues per PASID

u 

Multiple threads/agents within a PASID may enqueue Packets in the same Queue.

u 

Dependency mechanisms created for ensuring ordering between packets.

AMD Confidential - NDA Required
USER MODE QUEUEING (2/3)
u 

Advantages
u 

Avoid involving the kernel/driver when dispatching work for an Agent.

u 

Lower latency job dispatch enables finer granularity of offload

u 

u 

Standard memory protection mechanisms may be used to protect communication
with the consuming agent.

Implications
u 

Packet formats/fields are Architected – standard across vendors!
u 

u 

u 

Guaranteed backward compatibility

Packets are enqueued/dequeued via an Architected protocol (all via memory
accesses and signalling)
More on this later……

AMD Confidential - NDA Required
SUCCESS!
Application
Transfer
buffer to GPU

OS

GPU

Copy/Map
Memory

Queue Job
Schedule Job
Start Job

Finish Job
Schedule
Application
Get Buffer
Copy/Map
Memory

AMD Confidential - NDA Required
SUCCESS!
Application

OS

GPU

Queue Job

Start Job

Finish Job

AMD Confidential - NDA Required
ACCELERATING SUFFIX ARRAY
CONSTRUCTION
CLOUD SERVER WORKLOAD
SUFFIX ARRAYS
u 

Suffix Arrays are a fundamental data structure
u 

Designed for efficient searching of a large text
u 

u 

Quickly locate every occurrence of a substring S in a text T	


Suffix Arrays are used to accelerate in-memory cloud workloads
u 

Full text index search

u 

Lossless data compression

u 

Bio-informatics

AMD Confidential - NDA Required
ACCELERATED SUFFIX ARRAY
CONSTRUCTION ON HSA
By efficiently sharing data between CPU and
GPU, HSA lets us move compute to data
without penalty of intermediate copies.

By offloading data parallel computations to
GPU, HSA increases performance and
reduces energy for Suffix Array Construction
versus Single Threaded CPU.

Skew Algorithm for Compute SA

Radix Sort::GPU

+5.8x

Lexical Rank::CPU
Compute SA::CPU
-5x
Radix Sort::GPU
Merge Sort::GPU

INCREASED
PERFORMANCE

DECREASED
ENERGY

M. Deo, “Parallel Suffix Array Construction and Least Common Prefix for the GPU”, Submitted to ”Principles and Practice of Parallel Programming, (PPoPP’13)” February 2013.
AMD A10 4600M APU with Radeon™ HD Graphics; CPU: 4 cores @ 2.3 MHz (turbo 3.2 GHz); GPU: AMD Radeon HD 7660G, 6 compute units, 685MHz; 4GB RAM

AMD Confidential - NDA Required
THE HSA FUTURE
Architected heterogeneous processing on the SOC
Programming of accelerators becomes much easier
Accelerated software that runs across multiple hardware vendors
Scalability from smart phones to super computers on a common architecture
GPU acceleration of parallel processing is the initial target, with DSPs
and other accelerators coming to the HSA system architecture model
Heterogeneous software ecosystem evolves at a much faster pace
Lower power, more capable devices in your hand, on the wall, in the cloud or at
your supercomputing center.

AMD Confidential - NDA Required

More Related Content

What's hot

Multiprocessor Architecture (Advanced computer architecture)
Multiprocessor Architecture  (Advanced computer architecture)Multiprocessor Architecture  (Advanced computer architecture)
Multiprocessor Architecture (Advanced computer architecture)vani261
 
Geographic Routing in WSN
Geographic Routing in WSNGeographic Routing in WSN
Geographic Routing in WSNMahbubur Rahman
 
Chapter 4 Embedded System: Application and Domain Specific
Chapter 4 Embedded System: Application and Domain SpecificChapter 4 Embedded System: Application and Domain Specific
Chapter 4 Embedded System: Application and Domain SpecificMoe Moe Myint
 
PARALLELISM IN MULTICORE PROCESSORS
PARALLELISM  IN MULTICORE PROCESSORSPARALLELISM  IN MULTICORE PROCESSORS
PARALLELISM IN MULTICORE PROCESSORSAmirthavalli Senthil
 
Advanced computer architecture
Advanced computer architectureAdvanced computer architecture
Advanced computer architectureAjithaSomasundaram
 
parallel language and compiler
parallel language and compilerparallel language and compiler
parallel language and compilerVignesh Tamil
 
Arm programmer's model
Arm programmer's modelArm programmer's model
Arm programmer's modelv Kalairajan
 
Simulink Stateflow workshop
 Simulink Stateflow workshop Simulink Stateflow workshop
Simulink Stateflow workshopMATLABISRAEL
 
SOC System Design Approach
SOC System Design ApproachSOC System Design Approach
SOC System Design ApproachA B Shinde
 
Concepts of o s chapter 1
Concepts of o s  chapter 1Concepts of o s  chapter 1
Concepts of o s chapter 1cathline44
 
Centralized shared memory architectures
Centralized shared memory architecturesCentralized shared memory architectures
Centralized shared memory architecturesGokuldhev mony
 
Object oriented and function oriented design
Object oriented and function oriented designObject oriented and function oriented design
Object oriented and function oriented designNaveen Sagayaselvaraj
 
ADVANCED COMPUTER ARCHITECTURE AND PARALLEL PROCESSING
ADVANCED COMPUTER ARCHITECTUREAND PARALLEL PROCESSINGADVANCED COMPUTER ARCHITECTUREAND PARALLEL PROCESSING
ADVANCED COMPUTER ARCHITECTURE AND PARALLEL PROCESSING Zena Abo-Altaheen
 
Context Switching
Context SwitchingContext Switching
Context Switchingfranksvalli
 
Parallel Programming
Parallel ProgrammingParallel Programming
Parallel ProgrammingUday Sharma
 
Automatic chocolate vending machine using mucos rtos ppt
Automatic chocolate vending machine using mucos rtos pptAutomatic chocolate vending machine using mucos rtos ppt
Automatic chocolate vending machine using mucos rtos pptJOLLUSUDARSHANREDDY
 

What's hot (20)

Multiprocessor Architecture (Advanced computer architecture)
Multiprocessor Architecture  (Advanced computer architecture)Multiprocessor Architecture  (Advanced computer architecture)
Multiprocessor Architecture (Advanced computer architecture)
 
Geographic Routing in WSN
Geographic Routing in WSNGeographic Routing in WSN
Geographic Routing in WSN
 
Chapter 4 Embedded System: Application and Domain Specific
Chapter 4 Embedded System: Application and Domain SpecificChapter 4 Embedded System: Application and Domain Specific
Chapter 4 Embedded System: Application and Domain Specific
 
PARALLELISM IN MULTICORE PROCESSORS
PARALLELISM  IN MULTICORE PROCESSORSPARALLELISM  IN MULTICORE PROCESSORS
PARALLELISM IN MULTICORE PROCESSORS
 
Unit vi (1)
Unit vi (1)Unit vi (1)
Unit vi (1)
 
Advanced computer architecture
Advanced computer architectureAdvanced computer architecture
Advanced computer architecture
 
parallel language and compiler
parallel language and compilerparallel language and compiler
parallel language and compiler
 
Arm programmer's model
Arm programmer's modelArm programmer's model
Arm programmer's model
 
Simulink Stateflow workshop
 Simulink Stateflow workshop Simulink Stateflow workshop
Simulink Stateflow workshop
 
SOC System Design Approach
SOC System Design ApproachSOC System Design Approach
SOC System Design Approach
 
Concepts of o s chapter 1
Concepts of o s  chapter 1Concepts of o s  chapter 1
Concepts of o s chapter 1
 
Centralized shared memory architectures
Centralized shared memory architecturesCentralized shared memory architectures
Centralized shared memory architectures
 
Object oriented and function oriented design
Object oriented and function oriented designObject oriented and function oriented design
Object oriented and function oriented design
 
ADVANCED COMPUTER ARCHITECTURE AND PARALLEL PROCESSING
ADVANCED COMPUTER ARCHITECTUREAND PARALLEL PROCESSINGADVANCED COMPUTER ARCHITECTUREAND PARALLEL PROCESSING
ADVANCED COMPUTER ARCHITECTURE AND PARALLEL PROCESSING
 
Dynamic RAM
Dynamic RAMDynamic RAM
Dynamic RAM
 
Choosing the right processor
Choosing the right processorChoosing the right processor
Choosing the right processor
 
Context Switching
Context SwitchingContext Switching
Context Switching
 
Parallel Programming
Parallel ProgrammingParallel Programming
Parallel Programming
 
Automatic chocolate vending machine using mucos rtos ppt
Automatic chocolate vending machine using mucos rtos pptAutomatic chocolate vending machine using mucos rtos ppt
Automatic chocolate vending machine using mucos rtos ppt
 
RTOS - Real Time Operating Systems
RTOS - Real Time Operating SystemsRTOS - Real Time Operating Systems
RTOS - Real Time Operating Systems
 

Viewers also liked

Heterogeneous Or Homogeneous Classrooms Jane
Heterogeneous Or Homogeneous Classrooms   JaneHeterogeneous Or Homogeneous Classrooms   Jane
Heterogeneous Or Homogeneous Classrooms JaneKevin Hodgson
 
SIGGRAPH 2012: GPU-Accelerated 2D and Web Rendering
SIGGRAPH 2012: GPU-Accelerated 2D and Web RenderingSIGGRAPH 2012: GPU-Accelerated 2D and Web Rendering
SIGGRAPH 2012: GPU-Accelerated 2D and Web RenderingMark Kilgard
 
Accelerating Machine Learning Applications on Spark Using GPUs
Accelerating Machine Learning Applications on Spark Using GPUsAccelerating Machine Learning Applications on Spark Using GPUs
Accelerating Machine Learning Applications on Spark Using GPUsIBM
 
GTC 2012: GPU-Accelerated Path Rendering
GTC 2012: GPU-Accelerated Path RenderingGTC 2012: GPU-Accelerated Path Rendering
GTC 2012: GPU-Accelerated Path Rendering Mark Kilgard
 
GPUs in Big Data - StampedeCon 2014
GPUs in Big Data - StampedeCon 2014GPUs in Big Data - StampedeCon 2014
GPUs in Big Data - StampedeCon 2014StampedeCon
 
Computational Techniques for the Statistical Analysis of Big Data in R
Computational Techniques for the Statistical Analysis of Big Data in RComputational Techniques for the Statistical Analysis of Big Data in R
Computational Techniques for the Statistical Analysis of Big Data in Rherbps10
 
Deep learning on spark
Deep learning on sparkDeep learning on spark
Deep learning on sparkSatyendra Rana
 
PG-Strom - GPU Accelerated Asyncr
PG-Strom - GPU Accelerated AsyncrPG-Strom - GPU Accelerated Asyncr
PG-Strom - GPU Accelerated AsyncrKohei KaiGai
 
Enabling Graph Analytics at Scale: The Opportunity for GPU-Acceleration of D...
Enabling Graph Analytics at Scale:  The Opportunity for GPU-Acceleration of D...Enabling Graph Analytics at Scale:  The Opportunity for GPU-Acceleration of D...
Enabling Graph Analytics at Scale: The Opportunity for GPU-Acceleration of D...odsc
 
PG-Strom - GPGPU meets PostgreSQL, PGcon2015
PG-Strom - GPGPU meets PostgreSQL, PGcon2015PG-Strom - GPGPU meets PostgreSQL, PGcon2015
PG-Strom - GPGPU meets PostgreSQL, PGcon2015Kohei KaiGai
 
PyData Amsterdam - Name Matching at Scale
PyData Amsterdam - Name Matching at ScalePyData Amsterdam - Name Matching at Scale
PyData Amsterdam - Name Matching at ScaleGoDataDriven
 
From Machine Learning to Learning Machines: Creating an End-to-End Cognitive ...
From Machine Learning to Learning Machines: Creating an End-to-End Cognitive ...From Machine Learning to Learning Machines: Creating an End-to-End Cognitive ...
From Machine Learning to Learning Machines: Creating an End-to-End Cognitive ...Spark Summit
 
DeepLearning4J and Spark: Successes and Challenges - François Garillot
DeepLearning4J and Spark: Successes and Challenges - François GarillotDeepLearning4J and Spark: Successes and Challenges - François Garillot
DeepLearning4J and Spark: Successes and Challenges - François Garillotsparktc
 
How to Solve Real-Time Data Problems
How to Solve Real-Time Data ProblemsHow to Solve Real-Time Data Problems
How to Solve Real-Time Data ProblemsIBM Power Systems
 
Containerizing GPU Applications with Docker for Scaling to the Cloud
Containerizing GPU Applications with Docker for Scaling to the CloudContainerizing GPU Applications with Docker for Scaling to the Cloud
Containerizing GPU Applications with Docker for Scaling to the CloudSubbu Rama
 
Tallinn Estonia Advanced Java Meetup Spark + TensorFlow = TensorFrames Oct 24...
Tallinn Estonia Advanced Java Meetup Spark + TensorFlow = TensorFrames Oct 24...Tallinn Estonia Advanced Java Meetup Spark + TensorFlow = TensorFrames Oct 24...
Tallinn Estonia Advanced Java Meetup Spark + TensorFlow = TensorFrames Oct 24...Chris Fregly
 
The Potential of GPU-driven High Performance Data Analytics in Spark
The Potential of GPU-driven High Performance Data Analytics in SparkThe Potential of GPU-driven High Performance Data Analytics in Spark
The Potential of GPU-driven High Performance Data Analytics in SparkSpark Summit
 

Viewers also liked (20)

Heterogeneous Or Homogeneous Classrooms Jane
Heterogeneous Or Homogeneous Classrooms   JaneHeterogeneous Or Homogeneous Classrooms   Jane
Heterogeneous Or Homogeneous Classrooms Jane
 
SIGGRAPH 2012: GPU-Accelerated 2D and Web Rendering
SIGGRAPH 2012: GPU-Accelerated 2D and Web RenderingSIGGRAPH 2012: GPU-Accelerated 2D and Web Rendering
SIGGRAPH 2012: GPU-Accelerated 2D and Web Rendering
 
Accelerating Machine Learning Applications on Spark Using GPUs
Accelerating Machine Learning Applications on Spark Using GPUsAccelerating Machine Learning Applications on Spark Using GPUs
Accelerating Machine Learning Applications on Spark Using GPUs
 
GTC 2012: GPU-Accelerated Path Rendering
GTC 2012: GPU-Accelerated Path RenderingGTC 2012: GPU-Accelerated Path Rendering
GTC 2012: GPU-Accelerated Path Rendering
 
GPUs in Big Data - StampedeCon 2014
GPUs in Big Data - StampedeCon 2014GPUs in Big Data - StampedeCon 2014
GPUs in Big Data - StampedeCon 2014
 
Computational Techniques for the Statistical Analysis of Big Data in R
Computational Techniques for the Statistical Analysis of Big Data in RComputational Techniques for the Statistical Analysis of Big Data in R
Computational Techniques for the Statistical Analysis of Big Data in R
 
GPU Ecosystem
GPU EcosystemGPU Ecosystem
GPU Ecosystem
 
Deep learning on spark
Deep learning on sparkDeep learning on spark
Deep learning on spark
 
PG-Strom - GPU Accelerated Asyncr
PG-Strom - GPU Accelerated AsyncrPG-Strom - GPU Accelerated Asyncr
PG-Strom - GPU Accelerated Asyncr
 
Enabling Graph Analytics at Scale: The Opportunity for GPU-Acceleration of D...
Enabling Graph Analytics at Scale:  The Opportunity for GPU-Acceleration of D...Enabling Graph Analytics at Scale:  The Opportunity for GPU-Acceleration of D...
Enabling Graph Analytics at Scale: The Opportunity for GPU-Acceleration of D...
 
PG-Strom - GPGPU meets PostgreSQL, PGcon2015
PG-Strom - GPGPU meets PostgreSQL, PGcon2015PG-Strom - GPGPU meets PostgreSQL, PGcon2015
PG-Strom - GPGPU meets PostgreSQL, PGcon2015
 
PyData Amsterdam - Name Matching at Scale
PyData Amsterdam - Name Matching at ScalePyData Amsterdam - Name Matching at Scale
PyData Amsterdam - Name Matching at Scale
 
Hadoop + GPU
Hadoop + GPUHadoop + GPU
Hadoop + GPU
 
Deep Learning on Hadoop
Deep Learning on HadoopDeep Learning on Hadoop
Deep Learning on Hadoop
 
From Machine Learning to Learning Machines: Creating an End-to-End Cognitive ...
From Machine Learning to Learning Machines: Creating an End-to-End Cognitive ...From Machine Learning to Learning Machines: Creating an End-to-End Cognitive ...
From Machine Learning to Learning Machines: Creating an End-to-End Cognitive ...
 
DeepLearning4J and Spark: Successes and Challenges - François Garillot
DeepLearning4J and Spark: Successes and Challenges - François GarillotDeepLearning4J and Spark: Successes and Challenges - François Garillot
DeepLearning4J and Spark: Successes and Challenges - François Garillot
 
How to Solve Real-Time Data Problems
How to Solve Real-Time Data ProblemsHow to Solve Real-Time Data Problems
How to Solve Real-Time Data Problems
 
Containerizing GPU Applications with Docker for Scaling to the Cloud
Containerizing GPU Applications with Docker for Scaling to the CloudContainerizing GPU Applications with Docker for Scaling to the Cloud
Containerizing GPU Applications with Docker for Scaling to the Cloud
 
Tallinn Estonia Advanced Java Meetup Spark + TensorFlow = TensorFrames Oct 24...
Tallinn Estonia Advanced Java Meetup Spark + TensorFlow = TensorFrames Oct 24...Tallinn Estonia Advanced Java Meetup Spark + TensorFlow = TensorFrames Oct 24...
Tallinn Estonia Advanced Java Meetup Spark + TensorFlow = TensorFrames Oct 24...
 
The Potential of GPU-driven High Performance Data Analytics in Spark
The Potential of GPU-driven High Performance Data Analytics in SparkThe Potential of GPU-driven High Performance Data Analytics in Spark
The Potential of GPU-driven High Performance Data Analytics in Spark
 

Similar to Heterogeneous System Architecture Overview

LCU13: HSA Architecture Presentation
LCU13: HSA Architecture PresentationLCU13: HSA Architecture Presentation
LCU13: HSA Architecture PresentationLinaro
 
HSA From A Software Perspective
HSA From A Software Perspective HSA From A Software Perspective
HSA From A Software Perspective HSA Foundation
 
ISCA Final Presentation - Intro
ISCA Final Presentation - IntroISCA Final Presentation - Intro
ISCA Final Presentation - IntroHSA Foundation
 
ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...
ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...
ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...HSA Foundation
 
HSA Introduction Hot Chips 2013
HSA Introduction  Hot Chips 2013HSA Introduction  Hot Chips 2013
HSA Introduction Hot Chips 2013HSA Foundation
 
Petapath HP Cast 12 - Programming for High Performance Accelerated Systems
Petapath HP Cast 12 - Programming for High Performance Accelerated SystemsPetapath HP Cast 12 - Programming for High Performance Accelerated Systems
Petapath HP Cast 12 - Programming for High Performance Accelerated Systemsdairsie
 
ISCA Final Presentation - HSAIL
ISCA Final Presentation - HSAILISCA Final Presentation - HSAIL
ISCA Final Presentation - HSAILHSA Foundation
 
HC-4017, HSA Compilers Technology, by Debyendu Das
HC-4017, HSA Compilers Technology, by Debyendu DasHC-4017, HSA Compilers Technology, by Debyendu Das
HC-4017, HSA Compilers Technology, by Debyendu DasAMD Developer Central
 
Introduction to HSA
Introduction to HSAIntroduction to HSA
Introduction to HSA秉儒 吳
 
E scala design platform
E scala design platformE scala design platform
E scala design platformesenciatech
 
Guide to heterogeneous system architecture (hsa)
Guide to heterogeneous system architecture (hsa)Guide to heterogeneous system architecture (hsa)
Guide to heterogeneous system architecture (hsa)dibyendu.das
 
Sequoia Spark Talk March 2015.pdf
Sequoia Spark Talk March 2015.pdfSequoia Spark Talk March 2015.pdf
Sequoia Spark Talk March 2015.pdftotomeme1991
 
ISCA final presentation - Queuing Model
ISCA final presentation - Queuing ModelISCA final presentation - Queuing Model
ISCA final presentation - Queuing ModelHSA Foundation
 
ONNC - 0.9.1 release
ONNC - 0.9.1 releaseONNC - 0.9.1 release
ONNC - 0.9.1 releaseLuba Tang
 
Making Hardware Accelerator Easier to Use
Making Hardware Accelerator Easier to UseMaking Hardware Accelerator Easier to Use
Making Hardware Accelerator Easier to UseKazuaki Ishizaki
 
Introduction to Apache Beam
Introduction to Apache BeamIntroduction to Apache Beam
Introduction to Apache BeamKnoldus Inc.
 
PL-4042, Wholly Graal: Accelerating GPU offload for Java/Sumatra using the Op...
PL-4042, Wholly Graal: Accelerating GPU offload for Java/Sumatra using the Op...PL-4042, Wholly Graal: Accelerating GPU offload for Java/Sumatra using the Op...
PL-4042, Wholly Graal: Accelerating GPU offload for Java/Sumatra using the Op...AMD Developer Central
 

Similar to Heterogeneous System Architecture Overview (20)

LCU13: HSA Architecture Presentation
LCU13: HSA Architecture PresentationLCU13: HSA Architecture Presentation
LCU13: HSA Architecture Presentation
 
HSA From A Software Perspective
HSA From A Software Perspective HSA From A Software Perspective
HSA From A Software Perspective
 
ISCA Final Presentation - Intro
ISCA Final Presentation - IntroISCA Final Presentation - Intro
ISCA Final Presentation - Intro
 
HSA Introduction
HSA IntroductionHSA Introduction
HSA Introduction
 
ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...
ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...
ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...
 
HSA Introduction Hot Chips 2013
HSA Introduction  Hot Chips 2013HSA Introduction  Hot Chips 2013
HSA Introduction Hot Chips 2013
 
Petapath HP Cast 12 - Programming for High Performance Accelerated Systems
Petapath HP Cast 12 - Programming for High Performance Accelerated SystemsPetapath HP Cast 12 - Programming for High Performance Accelerated Systems
Petapath HP Cast 12 - Programming for High Performance Accelerated Systems
 
ISCA Final Presentation - HSAIL
ISCA Final Presentation - HSAILISCA Final Presentation - HSAIL
ISCA Final Presentation - HSAIL
 
HC-4017, HSA Compilers Technology, by Debyendu Das
HC-4017, HSA Compilers Technology, by Debyendu DasHC-4017, HSA Compilers Technology, by Debyendu Das
HC-4017, HSA Compilers Technology, by Debyendu Das
 
Introduction to HSA
Introduction to HSAIntroduction to HSA
Introduction to HSA
 
E scala design platform
E scala design platformE scala design platform
E scala design platform
 
Guide to heterogeneous system architecture (hsa)
Guide to heterogeneous system architecture (hsa)Guide to heterogeneous system architecture (hsa)
Guide to heterogeneous system architecture (hsa)
 
GPU Programming with Java
GPU Programming with JavaGPU Programming with Java
GPU Programming with Java
 
Sequoia Spark Talk March 2015.pdf
Sequoia Spark Talk March 2015.pdfSequoia Spark Talk March 2015.pdf
Sequoia Spark Talk March 2015.pdf
 
ISCA final presentation - Queuing Model
ISCA final presentation - Queuing ModelISCA final presentation - Queuing Model
ISCA final presentation - Queuing Model
 
ONNC - 0.9.1 release
ONNC - 0.9.1 releaseONNC - 0.9.1 release
ONNC - 0.9.1 release
 
Module-2 Instruction Set Cpus.pdf
Module-2 Instruction Set Cpus.pdfModule-2 Instruction Set Cpus.pdf
Module-2 Instruction Set Cpus.pdf
 
Making Hardware Accelerator Easier to Use
Making Hardware Accelerator Easier to UseMaking Hardware Accelerator Easier to Use
Making Hardware Accelerator Easier to Use
 
Introduction to Apache Beam
Introduction to Apache BeamIntroduction to Apache Beam
Introduction to Apache Beam
 
PL-4042, Wholly Graal: Accelerating GPU offload for Java/Sumatra using the Op...
PL-4042, Wholly Graal: Accelerating GPU offload for Java/Sumatra using the Op...PL-4042, Wholly Graal: Accelerating GPU offload for Java/Sumatra using the Op...
PL-4042, Wholly Graal: Accelerating GPU offload for Java/Sumatra using the Op...
 

More from inside-BigData.com

Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...inside-BigData.com
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networksinside-BigData.com
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...inside-BigData.com
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...inside-BigData.com
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...inside-BigData.com
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networksinside-BigData.com
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoringinside-BigData.com
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecastsinside-BigData.com
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Updateinside-BigData.com
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19inside-BigData.com
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuninginside-BigData.com
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODinside-BigData.com
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Accelerationinside-BigData.com
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficientlyinside-BigData.com
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Erainside-BigData.com
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computinginside-BigData.com
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Clusterinside-BigData.com
 

More from inside-BigData.com (20)

Major Market Shifts in IT
Major Market Shifts in ITMajor Market Shifts in IT
Major Market Shifts in IT
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networks
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networks
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecasts
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Update
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
 
State of ARM-based HPC
State of ARM-based HPCState of ARM-based HPC
State of ARM-based HPC
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Acceleration
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Era
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computing
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Cluster
 
Overview of HPC Interconnects
Overview of HPC InterconnectsOverview of HPC Interconnects
Overview of HPC Interconnects
 

Recently uploaded

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 

Recently uploaded (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 

Heterogeneous System Architecture Overview

  • 1. HETEROGENEOUS SYSTEM ARCHITECTURE OVERVIEW VINOD TIPPARAJU PRINCIPAL MEMBER OF TECHNICAL STAFF, AMD AUSTIN.
  • 2. INTRODUCTION AND OVERVIEW TERMINOLOGY, WHAT MAKES HSA, ORIGINS AND EVOLUTION, USAGE SCENARIOS
  • 3. SOME TERMINOLOGY u  u  u  HSA is heterogeneous systems architecture, not just GPUs HSA Component – IP that satisfies architecture requirements and provides identified features SoC – system on Chip, collection of various IPs u  u  u  u  E.g. AMD APU (Accelerated Processing Unit) integrates AMD/ARM CPU cores and Graphics IP It is possible to conceive companies just building parts of the IP HSAIL -- HSA intermediate language very low-level SIMT language HSA Agent – something that can participate in the HSA memory subsystem (i.e. respect page sizes, memory properties, atomics, etc.) AMD Confidential - NDA Required
  • 4. WHAT IS HSA? Systems Architecture u  u  From a hardware point of view, system architecture requirements necessary Specifies shared memory, cache coherence domains, concept of clocks, context switching, memory based signaling, topology, Programmers Reference (HSAIL) u  u  u  u  Rules governing design and agent behavior RUNTIME u  API that wraps the features like user mode queues, clocks, signalling, etc u  Provides execution control u  An intermediate representation, very low level. Vendor independence, device compiler optimizations Abstracts HW, or can serve as the lowest level instruction set TOOLS u  Supporting profilers, debuggers and compilers Supports tools u  u  Unique debugging support that greatly simplifies implementing debuggers Excellent profiling support with some user mode access
  • 5. HSA ORIGINS, EVOLUTION IN COMPUTE u  Next step from AMD in general purpose compute u  Evolutionary step u  u  Exceptional graphics IP u  u  Lot of experience in building general purpose CPUs Natural to utilize graphics IP for doing compute Prior step was HW integration phase u  GPU was pre-GCN (graphics core next) u  u  Did not have all features to support HSA Memory management unit was still evolving AMD Confidential - NDA Required
  • 6. TAKING THE HW INTEGRATION TO ITS NATURAL CONCLUSION u  Architectural and System integration u  Extend architecture to make the component a first class citizen on the SoC u  Fully-evolved MMU u  Provide same level of support for tools as CPU u  Provide context switching, preemption, full-coherence u  u  Helps simulators, migrations, checkpoints, etc Future, other HSA IP AMD Confidential - NDA Required
  • 7. SOCS HAVE PROLIFERATED — MAKE THEM BETTER u  u  u  SOCs have arrived and are a tremendous advance over previous platforms SOCs combine CPU cores, GPU cores and other accelerators, with high bandwidth access to memory How do we make them even better? u  u  Higher performance u  u  Easier to optimize u  u  Easier to program Lower power HSA unites accelerators architecturally Early focus is APU (CPU with GPU compute accelerator), but HSA goes well beyond the GPU AMD Confidential - NDA Required
  • 8. HIGH LEVEL USAGE SCENARIOS u  Bulk-Synchronous Parallelism -like concurrent computation u  u  Rather large parallel sections followed by synchronization Outstanding support for task-based parallelism u  u  256 threads sufficient to fully fill the pipeline u  Launch is quick u  Support for execution schedules – excellent compiler target u  u  Wavefront is 64 threads Architected Queueing Language (AQL), dependencies Advanced language support u  Function calls u  Virtual functions u  Exception handling (throw-catch) AMD Confidential - NDA Required
  • 10. HSA FOUNDATION u  u  u  u  u  u  Founded in June 2012 Developing a new platform for heterogeneous systems www.hsafoundation.com Specifications under development in working groups Our first specification, HSA Programmers Reference Manual is already published and available on our web site Additional specifications for System Architecture, Runtime Software and Tools are in process AMD Confidential - NDA Required
  • 11. HSA FOUNDATION MEMBERSHIP — AUGUST 2013 Founders Promoters Supporters Contributors Academic Associates AMD Confidential - NDA Required
  • 12. HSA — AN OPEN PLATFORM u  Open Architecture, membership open to all u  u  HSA System Architecture u  u  HSA Programmers Reference Manual HSA Runtime Delivered via royalty free standards u  Royalty Free IP, Specifications and APIs u  ISA agnostic for both CPU and GPU u  Membership from all areas of computing u  Hardware Companies u  Operating Systems u  Tools and Middleware AMD Confidential - NDA Required
  • 14. HSA MEMORY MODEL u  u  u  u  Defines visibility ordering between all threads in the HSA System Designed to be compatible with C++11, Java, OpenCL and .NET Memory Models Relaxed consistency memory model for parallel compute performance Visibility controlled by: u  Load.Acquire u  Store.Release u  Barriers AMD Confidential - NDA Required
  • 15. HSA QUEUING MODEL u  User mode queuing for low latency dispatch u  u  u  Application dispatches directly No OS or driver in the dispatch path Architected Queuing Layer u  u  u  Single compute dispatch path for all hardware No driver translation, direct to hardware Allows for dispatch to queue from any agent u  u  CPU or GPU GPU self enqueue enables lots of solutions u  Recursion u  Tree traversal u  Wavefront reforming AMD Confidential - NDA Required
  • 16. HSAIL AMD Confidential - NDA Required
  • 17. HSA INTERMEDIATE LAYER — HSAIL u  HSAIL is a virtual ISA for parallel programs u  u  u  Finalized to ISA by a JIT compiler or “Finalizer” ISA independent by design for CPU & GPU Explicitly parallel u  u  u  Support for exceptions, virtual functions, and other high level language features Lower level than OpenCL SPIR u  u  Designed for data parallel programming Fits naturally in the OpenCL compilation stack Suitable to support additional high level languages and programming models: u  Java, C++, OpenMP, Fortran etc AMD Confidential - NDA Required
  • 18. WHAT IS HSAIL? u  HSAIL is the intermediate language for parallel compute in HSA u  u  u  u  u  Generated by a high level compiler (LLVM, gcc, Java VM, etc) Low-level IR, close to machine ISA level Compiled down to target ISA by an IHV “Finalizer” Finalizer may execute at run time, install time, or build time Example: OpenCL™ Compilation Stack using HSAIL High-Level Compiler Flow (Developer) OpenCL™ Kernel EDG or CLANG SPIR LLVM HSAIL AMD Confidential - NDA Required Finalizer Flow (Runtime) HSAIL Finalizer Hardware ISA
  • 19. KEY HSAIL FEATURES u  Parallel u  Shared virtual memory u  Portable across vendors in HSA Foundation u  Stable across multiple product generations u  Consistent numerical results (IEEE-754 with defined min accuracy) u  Fast, robust, simple finalization step (no monthly updates) u  Good performance (little need to write in ISA) u  Supports all of OpenCL™ and C++ AMP™ u  Support Java, C++, and other languages as well AMD Confidential - NDA Required
  • 20. SIMT EXECUTION MODEL u  HSAIL Presents a “SIMT” execution model to the programmer u  u  Programmer writes program for a single thread of execution u  Each work-item appears to have its own program counter u  u  “Single Instruction, Multiple Thread” Branch instructions look natural Hardware Implementation u  u  Actually one program counter for the entire SIMD instruction u  u  Most hardware uses SIMD (Single-Instruction Multiple Data) vectors for efficiency Branches implemented with predication SIMT Advantages u  Easier to program (branch code in particular) u  Natural path for mainstream programming models u  Scales across a wide variety of hardware (programmer doesn’t see vector width) u  Cross-lane operations available for those who want peak performance AMD Confidential - NDA Required
  • 22. OPPORTUNITIES WITH LLVM BASED COMPILATION C99 C++ 11 C++AMP Objective C OpenCL OpenMP KL OSL Render script UPC Halide CLANG LLVM AMD Confidential - NDA Required Rust Julia Mono Fortran Haskell
  • 23. ARCHITECTURE DETAILS – WALK THROUGH OF FEATURES AND BENEFITS
  • 24. HIGH LEVEL FEATURES OF HSA u  Features currently being defined in the HSA Working Groups** u  Unified addressing across all processors u  Operation into pageable system memory u  Full memory coherency u  User mode dispatch u  Architected queuing language u  High level language support for GPU compute processors u  Preemption and context switching ** All features subject to change, pending completion and ratification of specifications in the HSA Working Groups AMD Confidential - NDA Required
  • 25. STATE OF GPU COMPUTING •  GPUs are fast and power efficient : high compute density per-mm and per-watt •  But: Can be hard to program Today’s Challenges u  Emerging Solution Separate address spaces u  HSA Hardware u  Copies u  Single address space u  Can’t share pointers u  Coherent u  Virtual address space u  Fast access from all components u  Can share pointers PCIe u  New language required for compute kernel u  u  EX: OpenCL™ runtime API Compute kernel compiled separately than host code u  Bring GPU computing to existing, popular, programming models u  u  Single-source, fully supported by compiler HSAIL compiler IR (Cross-platform!)
  • 26. MOTIVATION (TODAY’S PICTURE) Application Transfer buffer to GPU OS GPU Copy/Map Memory Queue Job Schedule Job Start Job Finish Job Schedule Application Get Buffer Copy/Map Memory AMD Confidential - NDA Required
  • 27. SHARED VIRTUAL MEMORY (TODAY) u  Multiple virtual memory address spaces PHYSICAL MEMORY PHYSICAL MEMORY VIRTUAL MEMORY1 CPU0 VA1->PA1 AMD Confidential - NDA Required VIRTUAL MEMORY2 GPU VA2->PA1
  • 28. SHARED VIRTUAL MEMORY (HSA) u  Common Virtual Memory for all HSA agents PHYSICAL MEMORY PHYSICAL MEMORY VIRTUAL MEMORY CPU0 VA->PA AMD Confidential - NDA Required GPU VA->PA
  • 29. SHARED VIRTUAL MEMORY u  u  Advantages u  No mapping tricks, no copying back-and-forth between different PA addresses u  Send pointers (not data) back and forth between HSA agents. Implications u  Common Page Tables (and common interpretation of architectural semantics such as shareability, protection, etc). u  Common mechanisms for address translation (and servicing address translation faults) u  Concept of a process address space ID (PASID) to allow multiple, per process virtual address spaces within the system. AMD Confidential - NDA Required
  • 30. GETTING THERE … Application Transfer buffer to GPU OS GPU Copy/Map Memory Queue Job Schedule Job Start Job Finish Job Schedule Application Get Buffer Copy/Map Memory AMD Confidential - NDA Required
  • 31. SHARED VIRTUAL MEMORY u  Specifics u  Minimum supported VA width is 48b for 64b systems, and 32b for 32b systems. u  HSA agents may reserve VA ranges for internal use via system software. u  All HSA agents other than the host unit must use the lowest privilege level u  If present, read/write access flags for page tables must be maintained by all agents. u  Read/write permissions apply to all HSA agents, equally. AMD Confidential - NDA Required
  • 33. CACHE COHERENCY DOMAINS (1/2) u  Data accesses to global memory segment from all HSA Agents shall be coherent without the need for explicit cache maintenance. AMD Confidential - NDA Required
  • 34. CACHE COHERENCY DOMAINS (2/2) u  u  Advantages u  Composability u  Reduced SW complexity when communicating between agents u  Lower barrier to entry when porting software Implications u  Hardware coherency support between all HSA agents u  Can take many forms u  Stand alone Snoop Filters / Directories u  Combined L3/Filters u  Snoop-based systems (no filter) u  Etc … AMD Confidential - NDA Required
  • 35. GETTING CLOSER … Application Transfer buffer to GPU OS GPU Copy/Map Memory Queue Job Schedule Job Start Job Finish Job Schedule Application Get Buffer Copy/Map Memory AMD Confidential - NDA Required
  • 37. SIGNALING (1/2) u  HSA agents support the ability to use signaling objects u  All creation/destruction signaling objects occurs via HSA runtime APIs u  Object creation/destruction u  From an HSA Agent you can directly accessing signaling objects. u  Signaling a signal object (this will wake up HSA agents waiting upon the object) u  Query current object u  Wait on the current object (various conditions supported). AMD Confidential - NDA Required
  • 38. SIGNALING (2/2) u  u  Advantages u  Enables asynchronous interrupts between HSA agents, without involving the kernel u  Common idiom for work offload u  Low power waiting Implications u  Runtime support required u  Commonly implemented on top of cache coherency flows AMD Confidential - NDA Required
  • 39. ALMOST THERE… Application Transfer buffer to GPU OS GPU Copy/Map Memory Queue Job Schedule Job Start Job Finish Job Schedule Application Get Buffer Copy/Map Memory AMD Confidential - NDA Required
  • 41. USER MODE QUEUEING (1/3) u  User mode Queueing u  Enables user space applications to directly, without OS intervention, enqueue jobs (“Dispatch Packets”) for HSA agents. u  Dispatch packet is a job of work u  Support for multiple queues per PASID u  Multiple threads/agents within a PASID may enqueue Packets in the same Queue. u  Dependency mechanisms created for ensuring ordering between packets. AMD Confidential - NDA Required
  • 42. USER MODE QUEUEING (2/3) u  Advantages u  Avoid involving the kernel/driver when dispatching work for an Agent. u  Lower latency job dispatch enables finer granularity of offload u  u  Standard memory protection mechanisms may be used to protect communication with the consuming agent. Implications u  Packet formats/fields are Architected – standard across vendors! u  u  u  Guaranteed backward compatibility Packets are enqueued/dequeued via an Architected protocol (all via memory accesses and signalling) More on this later…… AMD Confidential - NDA Required
  • 43. SUCCESS! Application Transfer buffer to GPU OS GPU Copy/Map Memory Queue Job Schedule Job Start Job Finish Job Schedule Application Get Buffer Copy/Map Memory AMD Confidential - NDA Required
  • 44. SUCCESS! Application OS GPU Queue Job Start Job Finish Job AMD Confidential - NDA Required
  • 46. SUFFIX ARRAYS u  Suffix Arrays are a fundamental data structure u  Designed for efficient searching of a large text u  u  Quickly locate every occurrence of a substring S in a text T Suffix Arrays are used to accelerate in-memory cloud workloads u  Full text index search u  Lossless data compression u  Bio-informatics AMD Confidential - NDA Required
  • 47. ACCELERATED SUFFIX ARRAY CONSTRUCTION ON HSA By efficiently sharing data between CPU and GPU, HSA lets us move compute to data without penalty of intermediate copies. By offloading data parallel computations to GPU, HSA increases performance and reduces energy for Suffix Array Construction versus Single Threaded CPU. Skew Algorithm for Compute SA Radix Sort::GPU +5.8x Lexical Rank::CPU Compute SA::CPU -5x Radix Sort::GPU Merge Sort::GPU INCREASED PERFORMANCE DECREASED ENERGY M. Deo, “Parallel Suffix Array Construction and Least Common Prefix for the GPU”, Submitted to ”Principles and Practice of Parallel Programming, (PPoPP’13)” February 2013. AMD A10 4600M APU with Radeon™ HD Graphics; CPU: 4 cores @ 2.3 MHz (turbo 3.2 GHz); GPU: AMD Radeon HD 7660G, 6 compute units, 685MHz; 4GB RAM AMD Confidential - NDA Required
  • 48. THE HSA FUTURE Architected heterogeneous processing on the SOC Programming of accelerators becomes much easier Accelerated software that runs across multiple hardware vendors Scalability from smart phones to super computers on a common architecture GPU acceleration of parallel processing is the initial target, with DSPs and other accelerators coming to the HSA system architecture model Heterogeneous software ecosystem evolves at a much faster pace Lower power, more capable devices in your hand, on the wall, in the cloud or at your supercomputing center. AMD Confidential - NDA Required