SlideShare a Scribd company logo
1 of 34
HETEROGENEOUS SYSTEM
ARCHITECTURE (HSA): HSAIL VIRTUAL
PARALLEL ISA
BEN SANDER, AMD
TOPICS
 Introduction and Motivation
 HSAIL – what makes it special?
 HSAIL Execution Model
 How to program in HSAIL?
 Conclusion
© Copyright 2014 HSA Foundation. All Rights Reserved
STATE OF GPU COMPUTING
Today’s Challenges
 Separate address spaces
 Copies
 Can’t share pointers
 New language required for compute kernel
 EX: OpenCL™ runtime API
 Compute kernel compiled separately than host
code
Emerging Solution
 HSA Hardware
 Single address space
 Coherent
 Virtual
 Fast access from all components
 Can share pointers
 Bring GPU computing to existing, popular,
programming models
 Single-source, fully supported by compiler
 HSAIL compiler IR (Cross-platform!)
• GPUs are fast and power efficient : high compute density per-mm and per-watt
• But: Can be hard to program
PCIe
THE PORTABILITY CHALLENGE
 CPU ISAs
 ISA innovations added incrementally (ie NEON, AVX, etc)
 ISA retains backwards-compatibility with previous generation
 Two dominant instruction-set architectures: ARM and x86
 GPU ISAs
 Massive diversity of architectures in the market
 Each vendor has own ISA - and often several in market at same time
 No commitment (or attempt!) to provide any backwards compatibility
 Traditionally graphics APIs (OpenGL, DirectX) provide necessary abstraction
© Copyright 2014 HSA Foundation. All Rights Reserved
HSAIL :
WHAT MAKES IT SPECIAL?
WHAT IS HSAIL?
 Intermediate language for parallel compute in HSA
 Generated by a “High Level Compiler” (GCC, LLVM, Java VM, etc)
 Expresses parallel regions of code
 Binary format of HSAIL is called “BRIG”
 Goal: Bring parallel acceleration to mainstream programming languages
© Copyright 2014 HSA Foundation. All Rights Reserved
main() {
…
#pragma omp parallel for
for (int i=0;i<N; i++) {
}
…
}
High-Level
Compiler
BRIG Finalizer Component
ISA
Host ISA
KEY HSAIL FEATURES
 Parallel
 Shared virtual memory
 Portable across vendors in HSA Foundation
 Stable across multiple product generations
 Consistent numerical results (IEEE-754 with defined min accuracy)
 Fast, robust, simple finalization step (no monthly updates)
 Good performance (little need to write in ISA)
 Supports all of OpenCL™
 Supports Java, C++, and other languages as well
© Copyright 2014 HSA Foundation. All Rights Reserved
HSAIL INSTRUCTION SET - OVERVIEW
 Similar to assembly language for a RISC CPU
 Load-store architecture
 Destination register first, then source registers
 140 opcodes (Java™ bytecode has 200)
 Floating point (single, double, half (f16))
 Integer (32-bit, 64-bit)
 Some packed operations
 Branches
 Function calls
 Platform Atomic Operations: and, or, xor, exch, add, sub, inc, dec, max, min, cas
 Synchronize host CPU and HSA Component!
 Text and Binary formats (“BRIG”)
ld_global_u64 $d0, [$d6 + 120] ; $d0= load($d6+120)
add_u64 $d1, $d0, 24 ; $d1= $d2+24
© Copyright 2014 HSA Foundation. All Rights Reserved
SEGMENTS AND MEMORY (1/2)
 7 segments of memory
 global, readonly, group, spill, private, arg, kernarg
 Memory instructions can (optionally) specify a segment
 Control data sharing properties and communicate intent
 Global Segment
 Visible to all HSA agents (including host CPU)
 Group Segment
 Provides high-performance memory shared in the work-group.
 Group memory can be read and written by any work-item in the work-group
 HSAIL provides sync operations to control visibility of group memory
ld_global_u64 $d0,[$d6]
ld_group_u64 $d0,[$d6+24]
st_spill_f32 $s1,[$d6+4]
© Copyright 2014 HSA Foundation. All Rights Reserved
SEGMENTS AND MEMORY (2/2)
 Spill, Private, Arg Segments
 Represent different regions of a per-work-item stack
 Typically generated by compiler, not specified by programmer
 Compiler can use these to convey intent – ie spills
 Kernarg Segment
 Programmer writes kernarg segment to pass arguments to a kernel
 Read-Only Segment
 Remains constant during execution of kernel
© Copyright 2014 HSA Foundation. All Rights Reserved
FLAT ADDRESSING
 Each segment mapped into virtual address space
 Flat addresses can map to segments based on virtual address
 Instructions with no explicit segment use flat addressing
 Very useful for high-level language support (ie classes, libraries)
 Aligns well with OpenCL 2.0 “generic” addressing feature
ld_global_u64 $d6, [%_arg0] ; global
ld_u64 $d0,[$d6+24] ; flat
© Copyright 2014 HSA Foundation. All Rights Reserved
REGISTERS
 Four classes of registers:
 S: 32-bit, Single-precision FP or Int
 D: 64-bit, Double-precision FP or Long Int
 Q: 128-bit, Packed data.
 C: 1-bit, Control Registers (Compares)
 Fixed number of registers
 S, D, Q share a single pool of resources
 S + 2*D + 4*Q <= 128
 Up to 128 S or 64 D or 32 Q (or a blend)
 Register allocation done in high-level compiler
 Finalizer doesn’t perform expensive register allocation
c0
c1
c2
c3
c4
c5
c6
c7
s0
d0
q0
s1
s2
d1
s3
s4
d2
q1
s5
s6
d3
s7
s8
d4
q2
s9
s10
d5
s11
…
s120
d60
q30
s121
s122
d61
s123
s124
d62
q31
s125
s126
d63
s127
© Copyright 2014 HSA Foundation. All Rights Reserved
SIMT EXECUTION MODEL
 HSAIL Presents a “SIMT” execution model to the programmer
 “Single Instruction, Multiple Thread”
 Programmer writes program for a single thread of execution
 Each work-item appears to have its own program counter
 Branch instructions look natural
 Hardware Implementation
 Most hardware uses SIMD (Single-Instruction Multiple Data) vectors for efficiency
 Actually one program counter for the entire SIMD instruction
 Branches implemented with predication
 SIMT Advantages
 Easier to program (branch code in particular)
 Natural path for mainstream programming models and existing compilers
 Scales across a wide variety of hardware (programmer doesn’t see vector width)
 Cross-lane operations available for those who want peak performance
© Copyright 2014 HSA Foundation. All Rights Reserved
WAVEFRONTS
 Hardware SIMD vector, composed of 1, 2, 4, 8, 16, 32, 64, 128, or 256 “lanes”
 Lanes in wavefront can be “active” or “inactive”
 Inactive lanes consume hardware resources but don’t do useful work
 Tradeoffs
 “Wavefront-aware” programming can be useful for peak performance
 But results in less portable code (since wavefront width is encoded in algorithm)
if (cond) {
operationA; // cond=True lanes active here
} else {
operationB; // cond=False lanes active here
}
© Copyright 2014 HSA Foundation. All Rights Reserved
CROSS-LANE OPERATIONS
 Example HSAIL cross-lane operation: “activelaneid”
 Dest set to count of earlier work-items that are active for this instruction
 Useful for compaction algorithms
 Example HSAIL cross-lane operation: “activelaneshuffle”
 Each workitem reads value from another lane in the wavefront
 Supports selection of “identity” element for inactive lanes
 Useful for wavefront-level reductionsactivelaneshuffle_b32 $s0, $s1, $s2, 0, 0
// s0 = dest, s1= source, s2=lane select, no identity
activelaneid_u32 $s0
© Copyright 2014 HSA Foundation. All Rights Reserved
HSAIL MODES
 Working group strived to limit optional modes and features in HSAIL
 Minimize differences between HSA target machines
 Better for compiler vendors and application developers
 Two modes survived
 Machine Models
 Small: 32-bit pointers, 32-bit data
 Large: 64-bit pointers, 32-bit or 64-bit data
 Vendors can support one or both models
 “Base” and “Full” Profiles
 Two sets of requirements for FP accuracy, rounding, exception reporting, hard
pre-emption
© Copyright 2014 HSA Foundation. All Rights Reserved
HSA PROFILES
Feature Base Full
Addressing Modes Small, Large Small, Large
All 32-bit HSAIL operations according to the declared
profile Yes Yes
F16 support (IEEE 754 or better) Yes Yes
F64 support No Yes
Precision for add/sub/mul 1/2 ULP 1/2 ULP
Precision for div 2.5 ULP 1/2 ULP
Precision for sqrt 1 ULP 1/2 ULP
HSAIL Rounding: Near Yes Yes
HSAIL Rounding: Up / Down / Zero No Yes
Subnormal floating-point Flush-to-zero Supported
Propagate NaN Payloads No Yes
FMA Yes Yes
Arithmetic Exception reporting None DETECT or BREAK
Debug trap Yes Yes
Hard Preemption No Yes
© Copyright 2014 HSA Foundation. All Rights Reserved
HSA PARALLEL EXECUTION
MODEL
© Copyright 2014 HSA Foundation. All Rights Reserved
HSA PARALLEL EXECUTION MODEL
Basic Idea:
Programmer supplies an HSAIL
“kernel” that is run on each work-item.
Kernel is written as a single thread of
execution.
Programmer specifies grid dimensions
(scope of problem) when launching
the kernel.
Each work-item has a unique
coordinate in the grid.
Programmer optionally specifies work-
group dimensions (for optimized
communication).
© Copyright 2014 HSA Foundation. All Rights Reserved
CONVOLUTION / SOBEL EDGE FILTER
Gx = [ -1 0 +1 ]
[ -2 0 +2 ]
[ -1 0 +1 ]
Gy = [ -1 -2 -1 ]
[ 0 0 0 ]
[ +1 +2 +1 ]
G = sqrt(Gx
2 + Gy
2)
© Copyright 2014 HSA Foundation. All Rights Reserved
CONVOLUTION / SOBEL EDGE FILTER
Gx = [ -1 0 +1 ]
[ -2 0 +2 ]
[ -1 0 +1 ]
Gy = [ -1 -2 -1 ]
[ 0 0 0 ]
[ +1 +2 +1 ]
G = sqrt(Gx
2 + Gy
2)
2D grid
workitem
kernel
© Copyright 2014 HSA Foundation. All Rights Reserved
CONVOLUTION / SOBEL EDGE FILTER
Gx = [ -1 0 +1 ]
[ -2 0 +2 ]
[ -1 0 +1 ]
Gy = [ -1 -2 -1 ]
[ 0 0 0 ]
[ +1 +2 +1 ]
G = sqrt(Gx
2 + Gy
2)
2D work-group
2D grid
workitem
kernel
© Copyright 2014 HSA Foundation. All Rights Reserved
HOW TO PROGRAM HSA?
WHAT DO I TYPE?
© Copyright 2014 HSA Foundation. All Rights Reserved
HSA PROGRAMMING MODELS : CORE PRINCIPLES
 Single source
 Host and device code side-by-side in same source file
 Written in same programming language
 Single unified coherent address space
 Freely share pointers between host and device
 Similar memory model as multi-core CPU
 Parallel regions identified with existing language syntax
 Typically same syntax used for multi-core CPU
 HSAIL is the compiler IR that supports these programming models
© Copyright 2014 HSA Foundation. All Rights Reserved
GCC OPENMP : COMPILATION FLOW
 SUSE GCC Project
 Adding HSAIL code generator to GCC compiler infrastructure
 Supports OpenMP 3.1 syntax
 No data movement directives required !main() {
…
// Host code.
#pragma omp parallel for
for (int i=0;i<N; i++) {
C[i] = A[i] + B[i];
}
…
}
GCC OpenMP
Compiler
BRIG Finalizer Component
ISA
Host ISA
© Copyright 2014 HSA Foundation. All Rights Reserved
GCC OpenMP flow
C/C++/Fortran OpenMP application
e.g., #pragma omp for
for( j = 0; j<n;j++) { b[j] = a[j]; }
GNU Compiler(GCC)
Compiles host code + Emits runtime
calls with kernel name, parameters,
launch attributes
Lowers OpenMP directives,
converts GIMPLE to BRIG.
Embeds BRIG into host code
Dispatch kernel to GPU
Pragmas map to calls into
HSA Runtime
Application
Compiler
Run time
Finalize kernel from BRIG->ISA
Kernels finalized once and cached.
Compile time
© Copyright 2014 HSA Foundation. All Rights Reserved
MCW C++AMP : COMPILATION FLOW
 C++AMP : Single-source C++ template parallel programming model
 MCW compiler based on CLANG/LLVM
 Open-source and runs on Linux
 Leverage open-source LLVM->HSAIL code generator
main() {
…
parallel_for_each(grid<1>(ext
entent<256>(…)
…
}
C++AMP
Compiler
BRIG Finalizer Component
ISA
Host ISA
© Copyright 2014 HSA Foundation. All Rights Reserved
JAVA: RUNTIME FLOW
© Copyright 2014 HSA Foundation. All Rights Reserved
JAVA 8 – HSA ENABLED APARAPI
 Java 8 brings Stream + Lambda API.
‒ More natural way of expressing data parallel algorithms
‒ Initially targeted at multi-core.
 APARAPI will :
‒ Support Java 8 Lambdas
‒ Dispatch code to HSA enabled devices at runtime via
HSAIL
JVM
Java Application
HSA Finalizer & Runtime
APARAPI + Lambda API
GPUCPU
Future Java – HSA ENABLED JAVA (SUMATRA)
 Adds native GPU acceleration to Java Virtual Machine
(JVM)
 Developer uses JDK Lambda, Stream API
 JVM uses GRAAL compiler to generate HSAIL
JVM
Java Application
HSA Finalizer & Runtime
Java JDK Stream + Lambda API
Java GRAAL JIT
backend
GPUCPU
AN EXAMPLE (IN JAVA 8)
© Copyright 2014 HSA Foundation. All Rights Reserved
//Example computes the percentage of total scores achieved by each player on a team.
class Player {
private Team team; // Note: Reference to the parent Team.
private int scores;
private float pctOfTeamScores;
public Team getTeam() {return team;}
public int getScores() {return scores;}
public void setPctOfTeamScores(int pct) { pctOfTeamScores = pct; }
};
// “Team” class not shown
// Assume “allPlayers’ is an initialized array of Players..
Arrays.stream(allPlayers). // wrap the array in a stream
parallel(). // developer indication that lambda is thread-safe
forEach(p -> {
int teamScores = p.getTeam().getScores();
float pctOfTeamScores = (float)p.getScores()/(float) teamScores;
p.setPctOfTeamScores(pctOfTeamScores);
});
HSAIL CODE EXAMPLE
© Copyright 2014 HSA Foundation. All Rights Reserved
01: version 0:95: $full : $large;
02: // static method HotSpotMethod<Main.lambda$2(Player)>
03: kernel &run (
04: kernarg_u64 %_arg0 // Kernel signature for lambda method
05: ) {
06: ld_kernarg_u64 $d6, [%_arg0]; // Move arg to an HSAIL register
07: workitemabsid_u32 $s2, 0; // Read the work-item global “X” coord
08:
09: cvt_u64_s32 $d2, $s2; // Convert X gid to long
10: mul_u64 $d2, $d2, 8; // Adjust index for sizeof ref
11: add_u64 $d2, $d2, 24; // Adjust for actual elements start
12: add_u64 $d2, $d2, $d6; // Add to array ref ptr
13: ld_global_u64 $d6, [$d2]; // Load from array element into reg
14: @L0:
15: ld_global_u64 $d0, [$d6 + 120]; // p.getTeam()
16: mov_b64 $d3, $d0;
17: ld_global_s32 $s3, [$d6 + 40]; // p.getScores ()
18: cvt_f32_s32 $s16, $s3;
19: ld_global_s32 $s0, [$d0 + 24]; // Team getScores()
20: cvt_f32_s32 $s17, $s0;
21: div_f32 $s16, $s16, $s17; // p.getScores()/teamScores
22: st_global_f32 $s16, [$d6 + 100]; // p.setPctOfTeamScores()
23: ret;
24: };
HOW TO PROGRAM HSA?
OTHER PROGRAMMING TOOLS
© Copyright 2014 HSA Foundation. All Rights Reserved
HSAIL ASSEMBLER
kernel &run (kernarg_u64 %_arg0)
{
ld_kernarg_u64 $d6, [%_arg0];
workitemabsid_u32 $s2, 0;
cvt_u64_s32 $d2, $s2;
mul_u64 $d2, $d2, 8;
add_u64 $d2, $d2, 24;
add_u64 $d2, $d2, $d6;
ld_global_u64 $d6, [$d2];
. . .
HSAIL
Assembler BRIG Finalizer
Machine
ISA
• HSAIL has a text format and an assembler
© Copyright 2014 HSA Foundation. All Rights Reserved
OPENCL™ OFFLINE COMPILER (CLOC)
__kernel void vec_add(
__global const float *a,
__global const float *b,
__global float *c,
const unsigned int n)
{
int id = get_global_id(0);
// Bounds check
if (id < n)
c[id] = a[id] + b[id];
}
CLOC BRIG Finalizer
Machine
ISA
•OpenCL split-source model cleanly isolates kernel
•Can express many HSAIL features in OpenCL Kernel Language
•Higher productivity than writing in HSAIL assembly
•Can dispatch kernel directly with HSAIL Runtime (lower-level access to hardware)
•Or use CLOC+OKRA Runtime for approachable “fits-on-a-slide” GPU programming model
© Copyright 2014 HSA Foundation. All Rights Reserved
KEY TAKEAWAYS
 HSAIL
 Thin, robust, fast finalizer
 Portable (multiple HW vendors and parallel architectures)
 Supports shared virtual memory and platform atomics
 HSA brings GPU computing to mainstream programming models
 Shared and coherent memory bridges “faraway accelerator” gap
 HSAIL provides the common IL for high-level languages to benefit from
parallel computing
 Languages and Compilers
 HSAIL support in GCC, LLVM, Java JVM
 Leverage same language syntax designed for multi-core CPUs
 Can use pointer-containing data structures
© Copyright 2014 HSA Foundation. All Rights Reserved

More Related Content

What's hot

HSA HSAIL Introduction Hot Chips 2013
HSA HSAIL Introduction  Hot Chips 2013 HSA HSAIL Introduction  Hot Chips 2013
HSA HSAIL Introduction Hot Chips 2013 HSA Foundation
 
HSAemu a Full System Emulator for HSA
HSAemu a Full System Emulator for HSA HSAemu a Full System Emulator for HSA
HSAemu a Full System Emulator for HSA HSA Foundation
 
HSA Memory Model Hot Chips 2013
HSA Memory Model Hot Chips 2013HSA Memory Model Hot Chips 2013
HSA Memory Model Hot Chips 2013HSA Foundation
 
ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...
ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...
ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...HSA Foundation
 
HSA Introduction Hot Chips 2013
HSA Introduction  Hot Chips 2013HSA Introduction  Hot Chips 2013
HSA Introduction Hot Chips 2013HSA Foundation
 
AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIB...
AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIB...AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIB...
AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIB...HSA Foundation
 
HC-4015, An Overview of the HSA System Architecture Requirements, by Paul Bli...
HC-4015, An Overview of the HSA System Architecture Requirements, by Paul Bli...HC-4015, An Overview of the HSA System Architecture Requirements, by Paul Bli...
HC-4015, An Overview of the HSA System Architecture Requirements, by Paul Bli...AMD Developer Central
 
AFDS 2011 Phil Rogers Keynote: “The Programmer’s Guide to the APU Galaxy.”
 AFDS 2011 Phil Rogers Keynote: “The Programmer’s Guide to the APU Galaxy.” AFDS 2011 Phil Rogers Keynote: “The Programmer’s Guide to the APU Galaxy.”
AFDS 2011 Phil Rogers Keynote: “The Programmer’s Guide to the APU Galaxy.”HSA Foundation
 
Deeper Look Into HSAIL And It's Runtime
Deeper Look Into HSAIL And It's Runtime Deeper Look Into HSAIL And It's Runtime
Deeper Look Into HSAIL And It's Runtime HSA Foundation
 
Heterogeneous Systems Architecture: The Next Area of Computing Innovation
Heterogeneous Systems Architecture: The Next Area of Computing Innovation Heterogeneous Systems Architecture: The Next Area of Computing Innovation
Heterogeneous Systems Architecture: The Next Area of Computing Innovation AMD
 
Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...
Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...
Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...AMD Developer Central
 
HSA-4131, HSAIL Programmers Manual: Uncovered, by Ben Sander
HSA-4131, HSAIL Programmers Manual: Uncovered, by Ben SanderHSA-4131, HSAIL Programmers Manual: Uncovered, by Ben Sander
HSA-4131, HSAIL Programmers Manual: Uncovered, by Ben SanderAMD Developer Central
 
Heterogeneous computing
Heterogeneous computingHeterogeneous computing
Heterogeneous computingRashid Ansari
 
PT-4058, Measuring and Optimizing Performance of Cluster and Private Cloud Ap...
PT-4058, Measuring and Optimizing Performance of Cluster and Private Cloud Ap...PT-4058, Measuring and Optimizing Performance of Cluster and Private Cloud Ap...
PT-4058, Measuring and Optimizing Performance of Cluster and Private Cloud Ap...AMD Developer Central
 
C11/C++11 Memory model. What is it, and why?
C11/C++11 Memory model. What is it, and why?C11/C++11 Memory model. What is it, and why?
C11/C++11 Memory model. What is it, and why?Mikael Rosbacke
 
Guide to heterogeneous system architecture (hsa)
Guide to heterogeneous system architecture (hsa)Guide to heterogeneous system architecture (hsa)
Guide to heterogeneous system architecture (hsa)dibyendu.das
 
PG-4039, RapidFire API, by Dmitry Kozlov
PG-4039, RapidFire API, by Dmitry KozlovPG-4039, RapidFire API, by Dmitry Kozlov
PG-4039, RapidFire API, by Dmitry KozlovAMD Developer Central
 
Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014
Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014
Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014Modern Data Stack France
 

What's hot (20)

HSA HSAIL Introduction Hot Chips 2013
HSA HSAIL Introduction  Hot Chips 2013 HSA HSAIL Introduction  Hot Chips 2013
HSA HSAIL Introduction Hot Chips 2013
 
HSA Introduction
HSA IntroductionHSA Introduction
HSA Introduction
 
HSAemu a Full System Emulator for HSA
HSAemu a Full System Emulator for HSA HSAemu a Full System Emulator for HSA
HSAemu a Full System Emulator for HSA
 
HSA Memory Model Hot Chips 2013
HSA Memory Model Hot Chips 2013HSA Memory Model Hot Chips 2013
HSA Memory Model Hot Chips 2013
 
ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...
ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...
ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...
 
Hsa10 whitepaper
Hsa10 whitepaperHsa10 whitepaper
Hsa10 whitepaper
 
HSA Introduction Hot Chips 2013
HSA Introduction  Hot Chips 2013HSA Introduction  Hot Chips 2013
HSA Introduction Hot Chips 2013
 
AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIB...
AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIB...AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIB...
AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIB...
 
HC-4015, An Overview of the HSA System Architecture Requirements, by Paul Bli...
HC-4015, An Overview of the HSA System Architecture Requirements, by Paul Bli...HC-4015, An Overview of the HSA System Architecture Requirements, by Paul Bli...
HC-4015, An Overview of the HSA System Architecture Requirements, by Paul Bli...
 
AFDS 2011 Phil Rogers Keynote: “The Programmer’s Guide to the APU Galaxy.”
 AFDS 2011 Phil Rogers Keynote: “The Programmer’s Guide to the APU Galaxy.” AFDS 2011 Phil Rogers Keynote: “The Programmer’s Guide to the APU Galaxy.”
AFDS 2011 Phil Rogers Keynote: “The Programmer’s Guide to the APU Galaxy.”
 
Deeper Look Into HSAIL And It's Runtime
Deeper Look Into HSAIL And It's Runtime Deeper Look Into HSAIL And It's Runtime
Deeper Look Into HSAIL And It's Runtime
 
Heterogeneous Systems Architecture: The Next Area of Computing Innovation
Heterogeneous Systems Architecture: The Next Area of Computing Innovation Heterogeneous Systems Architecture: The Next Area of Computing Innovation
Heterogeneous Systems Architecture: The Next Area of Computing Innovation
 
Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...
Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...
Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...
 
HSA-4131, HSAIL Programmers Manual: Uncovered, by Ben Sander
HSA-4131, HSAIL Programmers Manual: Uncovered, by Ben SanderHSA-4131, HSAIL Programmers Manual: Uncovered, by Ben Sander
HSA-4131, HSAIL Programmers Manual: Uncovered, by Ben Sander
 
Heterogeneous computing
Heterogeneous computingHeterogeneous computing
Heterogeneous computing
 
PT-4058, Measuring and Optimizing Performance of Cluster and Private Cloud Ap...
PT-4058, Measuring and Optimizing Performance of Cluster and Private Cloud Ap...PT-4058, Measuring and Optimizing Performance of Cluster and Private Cloud Ap...
PT-4058, Measuring and Optimizing Performance of Cluster and Private Cloud Ap...
 
C11/C++11 Memory model. What is it, and why?
C11/C++11 Memory model. What is it, and why?C11/C++11 Memory model. What is it, and why?
C11/C++11 Memory model. What is it, and why?
 
Guide to heterogeneous system architecture (hsa)
Guide to heterogeneous system architecture (hsa)Guide to heterogeneous system architecture (hsa)
Guide to heterogeneous system architecture (hsa)
 
PG-4039, RapidFire API, by Dmitry Kozlov
PG-4039, RapidFire API, by Dmitry KozlovPG-4039, RapidFire API, by Dmitry Kozlov
PG-4039, RapidFire API, by Dmitry Kozlov
 
Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014
Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014
Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014
 

Similar to ISCA Final Presentation - HSAIL

"Enabling Efficient Heterogeneous Processing Through Coherency," a Presentati...
"Enabling Efficient Heterogeneous Processing Through Coherency," a Presentati..."Enabling Efficient Heterogeneous Processing Through Coherency," a Presentati...
"Enabling Efficient Heterogeneous Processing Through Coherency," a Presentati...Edge AI and Vision Alliance
 
LCU13: HSA Architecture Presentation
LCU13: HSA Architecture PresentationLCU13: HSA Architecture Presentation
LCU13: HSA Architecture PresentationLinaro
 
Heterogeneous System Architecture Overview
Heterogeneous System Architecture OverviewHeterogeneous System Architecture Overview
Heterogeneous System Architecture Overviewinside-BigData.com
 
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...Jen Aman
 
HC-4017, HSA Compilers Technology, by Debyendu Das
HC-4017, HSA Compilers Technology, by Debyendu DasHC-4017, HSA Compilers Technology, by Debyendu Das
HC-4017, HSA Compilers Technology, by Debyendu DasAMD Developer Central
 
E scala design platform
E scala design platformE scala design platform
E scala design platformesenciatech
 
Ceph Performance on OpenStack - Barcelona Summit
Ceph Performance on OpenStack - Barcelona SummitCeph Performance on OpenStack - Barcelona Summit
Ceph Performance on OpenStack - Barcelona SummitTakehiro Kudou
 
Software Variability Management
Software Variability ManagementSoftware Variability Management
Software Variability ManagementXavierDevroey
 
HSA-4122, "HSA Queuing Mode," by Ian Bratt
HSA-4122, "HSA Queuing Mode," by Ian BrattHSA-4122, "HSA Queuing Mode," by Ian Bratt
HSA-4122, "HSA Queuing Mode," by Ian BrattAMD Developer Central
 
"Combining Flexibility and Low-Power in Embedded Vision Subsystems: An Applic...
"Combining Flexibility and Low-Power in Embedded Vision Subsystems: An Applic..."Combining Flexibility and Low-Power in Embedded Vision Subsystems: An Applic...
"Combining Flexibility and Low-Power in Embedded Vision Subsystems: An Applic...Edge AI and Vision Alliance
 
"Big Data" Bioinformatics
"Big Data" Bioinformatics"Big Data" Bioinformatics
"Big Data" BioinformaticsBrian Repko
 
Scylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them All
Scylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them AllScylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them All
Scylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them AllScyllaDB
 
SAS on Your (Apache) Cluster, Serving your Data (Analysts)
SAS on Your (Apache) Cluster, Serving your Data (Analysts)SAS on Your (Apache) Cluster, Serving your Data (Analysts)
SAS on Your (Apache) Cluster, Serving your Data (Analysts)DataWorks Summit
 
Real Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingReal Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingHari Shreedharan
 
Real Time Data Processing using Spark Streaming | Data Day Texas 2015
Real Time Data Processing using Spark Streaming | Data Day Texas 2015Real Time Data Processing using Spark Streaming | Data Day Texas 2015
Real Time Data Processing using Spark Streaming | Data Day Texas 2015Cloudera, Inc.
 
Apache Spark Introduction and Resilient Distributed Dataset basics and deep dive
Apache Spark Introduction and Resilient Distributed Dataset basics and deep diveApache Spark Introduction and Resilient Distributed Dataset basics and deep dive
Apache Spark Introduction and Resilient Distributed Dataset basics and deep diveSachin Aggarwal
 

Similar to ISCA Final Presentation - HSAIL (20)

"Enabling Efficient Heterogeneous Processing Through Coherency," a Presentati...
"Enabling Efficient Heterogeneous Processing Through Coherency," a Presentati..."Enabling Efficient Heterogeneous Processing Through Coherency," a Presentati...
"Enabling Efficient Heterogeneous Processing Through Coherency," a Presentati...
 
LCU13: HSA Architecture Presentation
LCU13: HSA Architecture PresentationLCU13: HSA Architecture Presentation
LCU13: HSA Architecture Presentation
 
Heterogeneous System Architecture Overview
Heterogeneous System Architecture OverviewHeterogeneous System Architecture Overview
Heterogeneous System Architecture Overview
 
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
 
HC-4017, HSA Compilers Technology, by Debyendu Das
HC-4017, HSA Compilers Technology, by Debyendu DasHC-4017, HSA Compilers Technology, by Debyendu Das
HC-4017, HSA Compilers Technology, by Debyendu Das
 
E scala design platform
E scala design platformE scala design platform
E scala design platform
 
Ceph Performance on OpenStack - Barcelona Summit
Ceph Performance on OpenStack - Barcelona SummitCeph Performance on OpenStack - Barcelona Summit
Ceph Performance on OpenStack - Barcelona Summit
 
Hive Now Sparks
Hive Now SparksHive Now Sparks
Hive Now Sparks
 
Software Variability Management
Software Variability ManagementSoftware Variability Management
Software Variability Management
 
HSA-4122, "HSA Queuing Mode," by Ian Bratt
HSA-4122, "HSA Queuing Mode," by Ian BrattHSA-4122, "HSA Queuing Mode," by Ian Bratt
HSA-4122, "HSA Queuing Mode," by Ian Bratt
 
"Combining Flexibility and Low-Power in Embedded Vision Subsystems: An Applic...
"Combining Flexibility and Low-Power in Embedded Vision Subsystems: An Applic..."Combining Flexibility and Low-Power in Embedded Vision Subsystems: An Applic...
"Combining Flexibility and Low-Power in Embedded Vision Subsystems: An Applic...
 
"Big Data" Bioinformatics
"Big Data" Bioinformatics"Big Data" Bioinformatics
"Big Data" Bioinformatics
 
Implement Runtime Environments for HSA using LLVM
Implement Runtime Environments for HSA using LLVMImplement Runtime Environments for HSA using LLVM
Implement Runtime Environments for HSA using LLVM
 
Scylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them All
Scylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them AllScylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them All
Scylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them All
 
SAS on Your (Apache) Cluster, Serving your Data (Analysts)
SAS on Your (Apache) Cluster, Serving your Data (Analysts)SAS on Your (Apache) Cluster, Serving your Data (Analysts)
SAS on Your (Apache) Cluster, Serving your Data (Analysts)
 
KIRANKUMAR_MV
KIRANKUMAR_MVKIRANKUMAR_MV
KIRANKUMAR_MV
 
Real Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingReal Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark Streaming
 
Real Time Data Processing using Spark Streaming | Data Day Texas 2015
Real Time Data Processing using Spark Streaming | Data Day Texas 2015Real Time Data Processing using Spark Streaming | Data Day Texas 2015
Real Time Data Processing using Spark Streaming | Data Day Texas 2015
 
Apache Spark Introduction and Resilient Distributed Dataset basics and deep dive
Apache Spark Introduction and Resilient Distributed Dataset basics and deep diveApache Spark Introduction and Resilient Distributed Dataset basics and deep dive
Apache Spark Introduction and Resilient Distributed Dataset basics and deep dive
 
A Common Backend for Hardware Acceleration of DSLs on FPGA
A Common Backend for Hardware Acceleration of DSLs on FPGAA Common Backend for Hardware Acceleration of DSLs on FPGA
A Common Backend for Hardware Acceleration of DSLs on FPGA
 

More from HSA Foundation

Hsa Runtime version 1.00 Provisional
Hsa Runtime version  1.00  ProvisionalHsa Runtime version  1.00  Provisional
Hsa Runtime version 1.00 ProvisionalHSA Foundation
 
Hsa programmers reference manual (version 1.0 provisional)
Hsa programmers reference manual (version 1.0 provisional)Hsa programmers reference manual (version 1.0 provisional)
Hsa programmers reference manual (version 1.0 provisional)HSA Foundation
 
ISCA Final Presentaiton - Compilations
ISCA Final Presentaiton -  CompilationsISCA Final Presentaiton -  Compilations
ISCA Final Presentaiton - CompilationsHSA Foundation
 
Hsa Platform System Architecture Specification Provisional verl 1.0 ratifed
Hsa Platform System Architecture Specification Provisional  verl 1.0 ratifed Hsa Platform System Architecture Specification Provisional  verl 1.0 ratifed
Hsa Platform System Architecture Specification Provisional verl 1.0 ratifed HSA Foundation
 
Apu13 cp lu-keynote-final-slideshare
Apu13 cp lu-keynote-final-slideshareApu13 cp lu-keynote-final-slideshare
Apu13 cp lu-keynote-final-slideshareHSA Foundation
 
HSA Foundation BoF -Siggraph 2013 Flyer
HSA Foundation BoF -Siggraph 2013 Flyer HSA Foundation BoF -Siggraph 2013 Flyer
HSA Foundation BoF -Siggraph 2013 Flyer HSA Foundation
 
HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, C...
HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, C...HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, C...
HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, C...HSA Foundation
 
ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at...
ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at...ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at...
ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at...HSA Foundation
 
Phil Rogers IFA Keynote 2012
Phil Rogers IFA Keynote 2012Phil Rogers IFA Keynote 2012
Phil Rogers IFA Keynote 2012HSA Foundation
 
Bolt C++ Standard Template Libary for HSA by Ben Sanders, AMD
Bolt C++ Standard Template Libary for HSA  by Ben Sanders, AMDBolt C++ Standard Template Libary for HSA  by Ben Sanders, AMD
Bolt C++ Standard Template Libary for HSA by Ben Sanders, AMDHSA Foundation
 
Hsa2012 logo guidelines.
Hsa2012 logo guidelines.Hsa2012 logo guidelines.
Hsa2012 logo guidelines.HSA Foundation
 
What Fabric Engine Can Do With HSA
What Fabric Engine Can Do With HSAWhat Fabric Engine Can Do With HSA
What Fabric Engine Can Do With HSAHSA Foundation
 
Fabric Engine: Why HSA is Invaluable
Fabric Engine: Why HSA is  InvaluableFabric Engine: Why HSA is  Invaluable
Fabric Engine: Why HSA is InvaluableHSA Foundation
 

More from HSA Foundation (13)

Hsa Runtime version 1.00 Provisional
Hsa Runtime version  1.00  ProvisionalHsa Runtime version  1.00  Provisional
Hsa Runtime version 1.00 Provisional
 
Hsa programmers reference manual (version 1.0 provisional)
Hsa programmers reference manual (version 1.0 provisional)Hsa programmers reference manual (version 1.0 provisional)
Hsa programmers reference manual (version 1.0 provisional)
 
ISCA Final Presentaiton - Compilations
ISCA Final Presentaiton -  CompilationsISCA Final Presentaiton -  Compilations
ISCA Final Presentaiton - Compilations
 
Hsa Platform System Architecture Specification Provisional verl 1.0 ratifed
Hsa Platform System Architecture Specification Provisional  verl 1.0 ratifed Hsa Platform System Architecture Specification Provisional  verl 1.0 ratifed
Hsa Platform System Architecture Specification Provisional verl 1.0 ratifed
 
Apu13 cp lu-keynote-final-slideshare
Apu13 cp lu-keynote-final-slideshareApu13 cp lu-keynote-final-slideshare
Apu13 cp lu-keynote-final-slideshare
 
HSA Foundation BoF -Siggraph 2013 Flyer
HSA Foundation BoF -Siggraph 2013 Flyer HSA Foundation BoF -Siggraph 2013 Flyer
HSA Foundation BoF -Siggraph 2013 Flyer
 
HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, C...
HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, C...HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, C...
HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, C...
 
ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at...
ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at...ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at...
ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at...
 
Phil Rogers IFA Keynote 2012
Phil Rogers IFA Keynote 2012Phil Rogers IFA Keynote 2012
Phil Rogers IFA Keynote 2012
 
Bolt C++ Standard Template Libary for HSA by Ben Sanders, AMD
Bolt C++ Standard Template Libary for HSA  by Ben Sanders, AMDBolt C++ Standard Template Libary for HSA  by Ben Sanders, AMD
Bolt C++ Standard Template Libary for HSA by Ben Sanders, AMD
 
Hsa2012 logo guidelines.
Hsa2012 logo guidelines.Hsa2012 logo guidelines.
Hsa2012 logo guidelines.
 
What Fabric Engine Can Do With HSA
What Fabric Engine Can Do With HSAWhat Fabric Engine Can Do With HSA
What Fabric Engine Can Do With HSA
 
Fabric Engine: Why HSA is Invaluable
Fabric Engine: Why HSA is  InvaluableFabric Engine: Why HSA is  Invaluable
Fabric Engine: Why HSA is Invaluable
 

Recently uploaded

Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 

Recently uploaded (20)

Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 

ISCA Final Presentation - HSAIL

  • 1. HETEROGENEOUS SYSTEM ARCHITECTURE (HSA): HSAIL VIRTUAL PARALLEL ISA BEN SANDER, AMD
  • 2. TOPICS  Introduction and Motivation  HSAIL – what makes it special?  HSAIL Execution Model  How to program in HSAIL?  Conclusion © Copyright 2014 HSA Foundation. All Rights Reserved
  • 3. STATE OF GPU COMPUTING Today’s Challenges  Separate address spaces  Copies  Can’t share pointers  New language required for compute kernel  EX: OpenCL™ runtime API  Compute kernel compiled separately than host code Emerging Solution  HSA Hardware  Single address space  Coherent  Virtual  Fast access from all components  Can share pointers  Bring GPU computing to existing, popular, programming models  Single-source, fully supported by compiler  HSAIL compiler IR (Cross-platform!) • GPUs are fast and power efficient : high compute density per-mm and per-watt • But: Can be hard to program PCIe
  • 4. THE PORTABILITY CHALLENGE  CPU ISAs  ISA innovations added incrementally (ie NEON, AVX, etc)  ISA retains backwards-compatibility with previous generation  Two dominant instruction-set architectures: ARM and x86  GPU ISAs  Massive diversity of architectures in the market  Each vendor has own ISA - and often several in market at same time  No commitment (or attempt!) to provide any backwards compatibility  Traditionally graphics APIs (OpenGL, DirectX) provide necessary abstraction © Copyright 2014 HSA Foundation. All Rights Reserved
  • 5. HSAIL : WHAT MAKES IT SPECIAL?
  • 6. WHAT IS HSAIL?  Intermediate language for parallel compute in HSA  Generated by a “High Level Compiler” (GCC, LLVM, Java VM, etc)  Expresses parallel regions of code  Binary format of HSAIL is called “BRIG”  Goal: Bring parallel acceleration to mainstream programming languages © Copyright 2014 HSA Foundation. All Rights Reserved main() { … #pragma omp parallel for for (int i=0;i<N; i++) { } … } High-Level Compiler BRIG Finalizer Component ISA Host ISA
  • 7. KEY HSAIL FEATURES  Parallel  Shared virtual memory  Portable across vendors in HSA Foundation  Stable across multiple product generations  Consistent numerical results (IEEE-754 with defined min accuracy)  Fast, robust, simple finalization step (no monthly updates)  Good performance (little need to write in ISA)  Supports all of OpenCL™  Supports Java, C++, and other languages as well © Copyright 2014 HSA Foundation. All Rights Reserved
  • 8. HSAIL INSTRUCTION SET - OVERVIEW  Similar to assembly language for a RISC CPU  Load-store architecture  Destination register first, then source registers  140 opcodes (Java™ bytecode has 200)  Floating point (single, double, half (f16))  Integer (32-bit, 64-bit)  Some packed operations  Branches  Function calls  Platform Atomic Operations: and, or, xor, exch, add, sub, inc, dec, max, min, cas  Synchronize host CPU and HSA Component!  Text and Binary formats (“BRIG”) ld_global_u64 $d0, [$d6 + 120] ; $d0= load($d6+120) add_u64 $d1, $d0, 24 ; $d1= $d2+24 © Copyright 2014 HSA Foundation. All Rights Reserved
  • 9. SEGMENTS AND MEMORY (1/2)  7 segments of memory  global, readonly, group, spill, private, arg, kernarg  Memory instructions can (optionally) specify a segment  Control data sharing properties and communicate intent  Global Segment  Visible to all HSA agents (including host CPU)  Group Segment  Provides high-performance memory shared in the work-group.  Group memory can be read and written by any work-item in the work-group  HSAIL provides sync operations to control visibility of group memory ld_global_u64 $d0,[$d6] ld_group_u64 $d0,[$d6+24] st_spill_f32 $s1,[$d6+4] © Copyright 2014 HSA Foundation. All Rights Reserved
  • 10. SEGMENTS AND MEMORY (2/2)  Spill, Private, Arg Segments  Represent different regions of a per-work-item stack  Typically generated by compiler, not specified by programmer  Compiler can use these to convey intent – ie spills  Kernarg Segment  Programmer writes kernarg segment to pass arguments to a kernel  Read-Only Segment  Remains constant during execution of kernel © Copyright 2014 HSA Foundation. All Rights Reserved
  • 11. FLAT ADDRESSING  Each segment mapped into virtual address space  Flat addresses can map to segments based on virtual address  Instructions with no explicit segment use flat addressing  Very useful for high-level language support (ie classes, libraries)  Aligns well with OpenCL 2.0 “generic” addressing feature ld_global_u64 $d6, [%_arg0] ; global ld_u64 $d0,[$d6+24] ; flat © Copyright 2014 HSA Foundation. All Rights Reserved
  • 12. REGISTERS  Four classes of registers:  S: 32-bit, Single-precision FP or Int  D: 64-bit, Double-precision FP or Long Int  Q: 128-bit, Packed data.  C: 1-bit, Control Registers (Compares)  Fixed number of registers  S, D, Q share a single pool of resources  S + 2*D + 4*Q <= 128  Up to 128 S or 64 D or 32 Q (or a blend)  Register allocation done in high-level compiler  Finalizer doesn’t perform expensive register allocation c0 c1 c2 c3 c4 c5 c6 c7 s0 d0 q0 s1 s2 d1 s3 s4 d2 q1 s5 s6 d3 s7 s8 d4 q2 s9 s10 d5 s11 … s120 d60 q30 s121 s122 d61 s123 s124 d62 q31 s125 s126 d63 s127 © Copyright 2014 HSA Foundation. All Rights Reserved
  • 13. SIMT EXECUTION MODEL  HSAIL Presents a “SIMT” execution model to the programmer  “Single Instruction, Multiple Thread”  Programmer writes program for a single thread of execution  Each work-item appears to have its own program counter  Branch instructions look natural  Hardware Implementation  Most hardware uses SIMD (Single-Instruction Multiple Data) vectors for efficiency  Actually one program counter for the entire SIMD instruction  Branches implemented with predication  SIMT Advantages  Easier to program (branch code in particular)  Natural path for mainstream programming models and existing compilers  Scales across a wide variety of hardware (programmer doesn’t see vector width)  Cross-lane operations available for those who want peak performance © Copyright 2014 HSA Foundation. All Rights Reserved
  • 14. WAVEFRONTS  Hardware SIMD vector, composed of 1, 2, 4, 8, 16, 32, 64, 128, or 256 “lanes”  Lanes in wavefront can be “active” or “inactive”  Inactive lanes consume hardware resources but don’t do useful work  Tradeoffs  “Wavefront-aware” programming can be useful for peak performance  But results in less portable code (since wavefront width is encoded in algorithm) if (cond) { operationA; // cond=True lanes active here } else { operationB; // cond=False lanes active here } © Copyright 2014 HSA Foundation. All Rights Reserved
  • 15. CROSS-LANE OPERATIONS  Example HSAIL cross-lane operation: “activelaneid”  Dest set to count of earlier work-items that are active for this instruction  Useful for compaction algorithms  Example HSAIL cross-lane operation: “activelaneshuffle”  Each workitem reads value from another lane in the wavefront  Supports selection of “identity” element for inactive lanes  Useful for wavefront-level reductionsactivelaneshuffle_b32 $s0, $s1, $s2, 0, 0 // s0 = dest, s1= source, s2=lane select, no identity activelaneid_u32 $s0 © Copyright 2014 HSA Foundation. All Rights Reserved
  • 16. HSAIL MODES  Working group strived to limit optional modes and features in HSAIL  Minimize differences between HSA target machines  Better for compiler vendors and application developers  Two modes survived  Machine Models  Small: 32-bit pointers, 32-bit data  Large: 64-bit pointers, 32-bit or 64-bit data  Vendors can support one or both models  “Base” and “Full” Profiles  Two sets of requirements for FP accuracy, rounding, exception reporting, hard pre-emption © Copyright 2014 HSA Foundation. All Rights Reserved
  • 17. HSA PROFILES Feature Base Full Addressing Modes Small, Large Small, Large All 32-bit HSAIL operations according to the declared profile Yes Yes F16 support (IEEE 754 or better) Yes Yes F64 support No Yes Precision for add/sub/mul 1/2 ULP 1/2 ULP Precision for div 2.5 ULP 1/2 ULP Precision for sqrt 1 ULP 1/2 ULP HSAIL Rounding: Near Yes Yes HSAIL Rounding: Up / Down / Zero No Yes Subnormal floating-point Flush-to-zero Supported Propagate NaN Payloads No Yes FMA Yes Yes Arithmetic Exception reporting None DETECT or BREAK Debug trap Yes Yes Hard Preemption No Yes © Copyright 2014 HSA Foundation. All Rights Reserved
  • 18. HSA PARALLEL EXECUTION MODEL © Copyright 2014 HSA Foundation. All Rights Reserved
  • 19. HSA PARALLEL EXECUTION MODEL Basic Idea: Programmer supplies an HSAIL “kernel” that is run on each work-item. Kernel is written as a single thread of execution. Programmer specifies grid dimensions (scope of problem) when launching the kernel. Each work-item has a unique coordinate in the grid. Programmer optionally specifies work- group dimensions (for optimized communication). © Copyright 2014 HSA Foundation. All Rights Reserved
  • 20. CONVOLUTION / SOBEL EDGE FILTER Gx = [ -1 0 +1 ] [ -2 0 +2 ] [ -1 0 +1 ] Gy = [ -1 -2 -1 ] [ 0 0 0 ] [ +1 +2 +1 ] G = sqrt(Gx 2 + Gy 2) © Copyright 2014 HSA Foundation. All Rights Reserved
  • 21. CONVOLUTION / SOBEL EDGE FILTER Gx = [ -1 0 +1 ] [ -2 0 +2 ] [ -1 0 +1 ] Gy = [ -1 -2 -1 ] [ 0 0 0 ] [ +1 +2 +1 ] G = sqrt(Gx 2 + Gy 2) 2D grid workitem kernel © Copyright 2014 HSA Foundation. All Rights Reserved
  • 22. CONVOLUTION / SOBEL EDGE FILTER Gx = [ -1 0 +1 ] [ -2 0 +2 ] [ -1 0 +1 ] Gy = [ -1 -2 -1 ] [ 0 0 0 ] [ +1 +2 +1 ] G = sqrt(Gx 2 + Gy 2) 2D work-group 2D grid workitem kernel © Copyright 2014 HSA Foundation. All Rights Reserved
  • 23. HOW TO PROGRAM HSA? WHAT DO I TYPE? © Copyright 2014 HSA Foundation. All Rights Reserved
  • 24. HSA PROGRAMMING MODELS : CORE PRINCIPLES  Single source  Host and device code side-by-side in same source file  Written in same programming language  Single unified coherent address space  Freely share pointers between host and device  Similar memory model as multi-core CPU  Parallel regions identified with existing language syntax  Typically same syntax used for multi-core CPU  HSAIL is the compiler IR that supports these programming models © Copyright 2014 HSA Foundation. All Rights Reserved
  • 25. GCC OPENMP : COMPILATION FLOW  SUSE GCC Project  Adding HSAIL code generator to GCC compiler infrastructure  Supports OpenMP 3.1 syntax  No data movement directives required !main() { … // Host code. #pragma omp parallel for for (int i=0;i<N; i++) { C[i] = A[i] + B[i]; } … } GCC OpenMP Compiler BRIG Finalizer Component ISA Host ISA © Copyright 2014 HSA Foundation. All Rights Reserved
  • 26. GCC OpenMP flow C/C++/Fortran OpenMP application e.g., #pragma omp for for( j = 0; j<n;j++) { b[j] = a[j]; } GNU Compiler(GCC) Compiles host code + Emits runtime calls with kernel name, parameters, launch attributes Lowers OpenMP directives, converts GIMPLE to BRIG. Embeds BRIG into host code Dispatch kernel to GPU Pragmas map to calls into HSA Runtime Application Compiler Run time Finalize kernel from BRIG->ISA Kernels finalized once and cached. Compile time © Copyright 2014 HSA Foundation. All Rights Reserved
  • 27. MCW C++AMP : COMPILATION FLOW  C++AMP : Single-source C++ template parallel programming model  MCW compiler based on CLANG/LLVM  Open-source and runs on Linux  Leverage open-source LLVM->HSAIL code generator main() { … parallel_for_each(grid<1>(ext entent<256>(…) … } C++AMP Compiler BRIG Finalizer Component ISA Host ISA © Copyright 2014 HSA Foundation. All Rights Reserved
  • 28. JAVA: RUNTIME FLOW © Copyright 2014 HSA Foundation. All Rights Reserved JAVA 8 – HSA ENABLED APARAPI  Java 8 brings Stream + Lambda API. ‒ More natural way of expressing data parallel algorithms ‒ Initially targeted at multi-core.  APARAPI will : ‒ Support Java 8 Lambdas ‒ Dispatch code to HSA enabled devices at runtime via HSAIL JVM Java Application HSA Finalizer & Runtime APARAPI + Lambda API GPUCPU Future Java – HSA ENABLED JAVA (SUMATRA)  Adds native GPU acceleration to Java Virtual Machine (JVM)  Developer uses JDK Lambda, Stream API  JVM uses GRAAL compiler to generate HSAIL JVM Java Application HSA Finalizer & Runtime Java JDK Stream + Lambda API Java GRAAL JIT backend GPUCPU
  • 29. AN EXAMPLE (IN JAVA 8) © Copyright 2014 HSA Foundation. All Rights Reserved //Example computes the percentage of total scores achieved by each player on a team. class Player { private Team team; // Note: Reference to the parent Team. private int scores; private float pctOfTeamScores; public Team getTeam() {return team;} public int getScores() {return scores;} public void setPctOfTeamScores(int pct) { pctOfTeamScores = pct; } }; // “Team” class not shown // Assume “allPlayers’ is an initialized array of Players.. Arrays.stream(allPlayers). // wrap the array in a stream parallel(). // developer indication that lambda is thread-safe forEach(p -> { int teamScores = p.getTeam().getScores(); float pctOfTeamScores = (float)p.getScores()/(float) teamScores; p.setPctOfTeamScores(pctOfTeamScores); });
  • 30. HSAIL CODE EXAMPLE © Copyright 2014 HSA Foundation. All Rights Reserved 01: version 0:95: $full : $large; 02: // static method HotSpotMethod<Main.lambda$2(Player)> 03: kernel &run ( 04: kernarg_u64 %_arg0 // Kernel signature for lambda method 05: ) { 06: ld_kernarg_u64 $d6, [%_arg0]; // Move arg to an HSAIL register 07: workitemabsid_u32 $s2, 0; // Read the work-item global “X” coord 08: 09: cvt_u64_s32 $d2, $s2; // Convert X gid to long 10: mul_u64 $d2, $d2, 8; // Adjust index for sizeof ref 11: add_u64 $d2, $d2, 24; // Adjust for actual elements start 12: add_u64 $d2, $d2, $d6; // Add to array ref ptr 13: ld_global_u64 $d6, [$d2]; // Load from array element into reg 14: @L0: 15: ld_global_u64 $d0, [$d6 + 120]; // p.getTeam() 16: mov_b64 $d3, $d0; 17: ld_global_s32 $s3, [$d6 + 40]; // p.getScores () 18: cvt_f32_s32 $s16, $s3; 19: ld_global_s32 $s0, [$d0 + 24]; // Team getScores() 20: cvt_f32_s32 $s17, $s0; 21: div_f32 $s16, $s16, $s17; // p.getScores()/teamScores 22: st_global_f32 $s16, [$d6 + 100]; // p.setPctOfTeamScores() 23: ret; 24: };
  • 31. HOW TO PROGRAM HSA? OTHER PROGRAMMING TOOLS © Copyright 2014 HSA Foundation. All Rights Reserved
  • 32. HSAIL ASSEMBLER kernel &run (kernarg_u64 %_arg0) { ld_kernarg_u64 $d6, [%_arg0]; workitemabsid_u32 $s2, 0; cvt_u64_s32 $d2, $s2; mul_u64 $d2, $d2, 8; add_u64 $d2, $d2, 24; add_u64 $d2, $d2, $d6; ld_global_u64 $d6, [$d2]; . . . HSAIL Assembler BRIG Finalizer Machine ISA • HSAIL has a text format and an assembler © Copyright 2014 HSA Foundation. All Rights Reserved
  • 33. OPENCL™ OFFLINE COMPILER (CLOC) __kernel void vec_add( __global const float *a, __global const float *b, __global float *c, const unsigned int n) { int id = get_global_id(0); // Bounds check if (id < n) c[id] = a[id] + b[id]; } CLOC BRIG Finalizer Machine ISA •OpenCL split-source model cleanly isolates kernel •Can express many HSAIL features in OpenCL Kernel Language •Higher productivity than writing in HSAIL assembly •Can dispatch kernel directly with HSAIL Runtime (lower-level access to hardware) •Or use CLOC+OKRA Runtime for approachable “fits-on-a-slide” GPU programming model © Copyright 2014 HSA Foundation. All Rights Reserved
  • 34. KEY TAKEAWAYS  HSAIL  Thin, robust, fast finalizer  Portable (multiple HW vendors and parallel architectures)  Supports shared virtual memory and platform atomics  HSA brings GPU computing to mainstream programming models  Shared and coherent memory bridges “faraway accelerator” gap  HSAIL provides the common IL for high-level languages to benefit from parallel computing  Languages and Compilers  HSAIL support in GCC, LLVM, Java JVM  Leverage same language syntax designed for multi-core CPUs  Can use pointer-containing data structures © Copyright 2014 HSA Foundation. All Rights Reserved

Editor's Notes

  1. Proofpoints : Green500, top end is now heterogeneous computing solutions. Components = CPU Cores, 3rd Party IP, GPU cores.
  2. CPU
  3. “Spill” = register spill.
  4. Some tension between programmability and HW implementation.
  5. Explain how SIMD hardware uses masks to run both halves of the if-then-else statements.
  6. Virtual ISA is SIMT (one thread), but can use cross-lane operations. Non-portable code. Other cross-lane operations: countlane, countuplane, masklane, sendlane, receivelane
  7. HSA supports IEEE-defined precisions. F16 supported in both profiles F64 supported only in Full profile. Base relaxes some precision and rounding requirements. FMA operation not required.
  8. Familiar to folks used to GPUs. Grid contains work-groups; work-group contains workitems. Programmer specifies the global dims and the work-group ; Work-group can often be determined automatically but also can be tuned for peak performance. Wavefront – hardware-specific concept. Similar to the “vector width” of a machine. For example SSE=128, AVX=256.
  9. We want to run a Sobel edge detect filter on Peaches the dog. The equation represents the kernel that is run on each pixel in the image. Note the equation examines surrounding pixels to determine the rate of change, ie an edge.
  10. To use the HSA Parallel Execution Model, we map the image to a 2D grid. Each pixel is mapped to a work-item, and the grid specifies all the pixels in the image. The same kernel is run for each work-item in the grid, but each work-item gets it’s unique (x,y) coordinate. By using the coordinate, work-items can read the pixel values of the surrounding pixels. Key observation is that the programmer writes the code for single pixel in what looks single thread, and the execution model exposes a huge amount of parallelism.
  11. Work-groups provide an additional optional optimization opportunity. HSA supports special “group” memory that is shared by all work-items in the work-group, and is typically very high-performance. In this case, we have read-sharing between neighboring pixels, so we have picked a relatively large square grid so that each work-item can pull from neighboring pixels. This
  12. Syntax used to specify parallel region Host code contained in binary file as well. (ie fat binary)
  13. Parallel_for_each program. Runtime similar to OpenMP.
  14. Java8 introduced in Mar-2014
  15. Key points : Code computes percentage of total scores achieved by each player. This is in Java. Standard Java8. Uses Java Stream class, and a lambda function. Player class has a pointer (object reference) to a parent Team. Stream supports parallel exectuion, including both multi-core and GPU accelerators. This is goal of HSA – make GPU programming as easy as multi-core CPU programming, and easier than multi-core + vector ISA programming. Note all functions have been inlined.
  16. This is the code for the Java ForEach loop – not translating the entire code into HSAIL. Line4 – kernarg signature. Line5 – returns the coordinate of this work-item, so we know which player to operate on. Line10 – multiple, destination first. Line19 – dereference of team variable.
  17. “Faraway accelerator” -> Separate address spaces, no pointers, perf/power cost of copies Shared virtual mem and platform atomics – HSA systems looks more like CPUs with fast vector units than discrete accelerators.