SlideShare a Scribd company logo
HSA QUEUEING
APU13 - NOVEMBER 2013
IAN BRATT,
PRINCIPAL ENGINEER
ARM
HSA QUEUEING, MOTIVATION
MOTIVATION (TODAY’S PICTURE)
Application
Transfer
buffer to GPU

OS

GPU

Copy/Map
Memory

Queue Job
Schedule Job
Start Job

Finish Job
Schedule
Application
Get Buffer
Copy/Map
Memory

© Copyright 2012 HSA Foundation. All Rights Reserved.

3
HSA QUEUEING: REQUIREMENTS
REQUIREMENTS
! 

Requires four mechanisms to enable lower overhead job dispatch.
!  Shared Virtual Memory
!  System Coherency
!  Signaling
!  User mode queueing

© Copyright 2012 HSA Foundation. All Rights Reserved.

5
SHARED VIRTUAL MEMORY
SHARED VIRTUAL MEMORY (TODAY)
! 

Multiple Virtual memory address spaces

PHYSICAL MEMORY
PHYSICAL MEMORY

VIRTUAL MEMORY1

CPU0
VA1->PA1
© Copyright 2012 HSA Foundation. All Rights Reserved.

VIRTUAL MEMORY2

GPU
VA2->PA1
7
SHARED VIRTUAL MEMORY (HSA)
! 

Common Virtual Memory for all HSA agents

PHYSICAL MEMORY
PHYSICAL MEMORY

VIRTUAL MEMORY

CPU0
VA->PA
© Copyright 2012 HSA Foundation. All Rights Reserved.

GPU
VA->PA
8
SHARED VIRTUAL MEMORY
! 

! 

Advantages
!  No mapping tricks, no copying back-and-forth between different PA
addresses
!  Send pointers (not data) back and forth between HSA agents.
Implications
!  Common Page Tables (and common interpretation of architectural
semantics such as shareability, protection, etc).
!  Common mechanisms for address translation (and servicing
address translation faults)
!  Concept of a process address space (PASID) to allow multiple, per
process virtual address spaces within the system.

© Copyright 2012 HSA Foundation. All Rights Reserved.

9
GETTING THERE …
Application
Transfer
buffer to GPU

OS

GPU

Copy/Map
Memory

Queue Job
Schedule Job
Start Job

Finish Job
Schedule
Application
Get Buffer
Copy/Map
Memory

© Copyright 2012 HSA Foundation. All Rights Reserved.

10
SHARED VIRTUAL MEMORY
! 

Specifics
!  Minimum supported VA width is 48b for 64b systems, and 32b for
32b systems.
!  HSA agents may reserve VA ranges for internal use via system
software.
!  All HSA agents other than the host unit must use the lowest
privelege level
!  If present, read/write access flags for page tables must be
maintained by all agents.
!  Read/write permissions apply to all HSA agents, equally.

© Copyright 2012 HSA Foundation. All Rights Reserved.

11
CACHE COHERENCY
CACHE COHERENCY DOMAINS (1/3)

! 

Data accesses to global memory segment from all HSA Agents shall be
coherent without the need for explicit cache maintenance.

© Copyright 2012 HSA Foundation. All Rights Reserved.

13
CACHE COHERENCY DOMAINS (2/3)
! 

! 

Advantages
!  Composability
!  Reduced SW complexity when communicating between agents
!  Lower barrier to entry when porting software
Implications
!  Hardware coherency support between all HSA agents
! 

Can take many forms
!  Stand alone Snoop Filters / Directories
!  Combined L3/Filters
!  Snoop-based systems (no filter)
!  Etc …

© Copyright 2012 HSA Foundation. All Rights Reserved.

14
GETTING CLOSER …
Application
Transfer
buffer to GPU

OS

GPU

Copy/Map
Memory

Queue Job
Schedule Job
Start Job

Finish Job
Schedule
Application
Get Buffer
Copy/Map
Memory

© Copyright 2012 HSA Foundation. All Rights Reserved.

15
CACHE COHERENCY DOMAINS (3/3)
! 

Specifics
!  No requirement for instruction memory accesses to be
coherent
!  Only applies to the Primary memory type.
!  No requirement for HSA agents to maintain coherency
to any memory location where the HSA agents do not
specify the same memory attributes
!  Read-only image data is required to remain static during
the execution of an HSA kernel.
!  No double mapping (via different attributes) in order
to modify. Must remain static

© Copyright 2012 HSA Foundation. All Rights Reserved.

16
SIGNALING
SIGNALING (1/3)
! 

HSA agents support the ability to use signaling objects
!  All creation/destruction signaling objects occurs via HSA
runtime APIs
!  From an HSA Agent you can directly access signaling
objects.
!  Signaling a signal object (this will wake up HSA
agents waiting upon the object)
!  Query current object
!  Wait on the current object (various conditions
supported).

© Copyright 2012 HSA Foundation. All Rights Reserved.

18
SIGNALING (2/3)
! 

! 

Advantages
!  Enables asynchronous interrupts between HSA agents,
without involving the kernel
!  Common idiom for work offload
!  Low power waiting
Implications
!  Runtime support required
!  Commonly implemented on top of cache coherency
flows

© Copyright 2012 HSA Foundation. All Rights Reserved.

19
ALMOST THERE…
Application
Transfer
buffer to GPU

OS

GPU

Copy/Map
Memory

Queue Job
Schedule Job
Start Job

Finish Job
Schedule
Application
Get Buffer
Copy/Map
Memory

© Copyright 2012 HSA Foundation. All Rights Reserved.

20
SIGNALING (3/3)
! 

Specifics
!  Only supported within a PASID
!  Supported wait conditions are =, !=, < and >=
!  Wait operations may return sporadically (no guarantee
against false positives)
!  Programmer must test.
!  Wait operations have a maximum duration before
returning.
!  The HSAIL atomic operations are supported on signal
objects.
!  Signal objects are opaque

© Copyright 2012 HSA Foundation. All Rights Reserved.

21
USER MODE QUEUEING
ONE BLOCK LEFT
Application
Transfer
buffer to GPU

OS

GPU

Copy/Map
Memory

Queue Job
Schedule Job
Start Job

Finish Job
Schedule
Application
Get Buffer
Copy/Map
Memory

© Copyright 2012 HSA Foundation. All Rights Reserved.

23
USER MODE QUEUEING (1/3)
! 

User mode Queueing
! 

Enables user space applications to directly, without OS intervention, enqueue jobs
(“Dispatch Packets”) for HSA agents.

! 

Queues are created/destroyed via calls to the HSA runtime.

! 

One (or many) agents enqueue packets, a single agent dequeues packets.

! 

Requires coherency and shared virtual memory.

© Copyright 2012 HSA Foundation. All Rights Reserved.

24
USER MODE QUEUEING (2/3)
! 

Advantages
! 

Avoid involving the kernel/driver when dispatching work for an Agent.

! 

Lower latency job dispatch enables finer granularity of offload

! 

! 

Standard memory protection mechanisms may be used to protect communication
with the consuming agent.

Implications
! 

Packet formats/fields are Architected – standard across vendors!
! 

! 

! 

Guaranteed backward compatibility

Packets are enqueued/dequeued via an Architected protocol (all via memory
accesses and signalling)
More on this later……

© Copyright 2012 HSA Foundation. All Rights Reserved.

25
SUCCESS!
Application
Transfer
buffer to GPU

OS

GPU

Copy/Map
Memory

Queue Job
Schedule Job
Start Job

Finish Job
Schedule
Application
Get Buffer
Copy/Map
Memory

© Copyright 2012 HSA Foundation. All Rights Reserved.

26
SUCCESS!
Application

OS

GPU

Queue Job

Start Job

Finish Job

© Copyright 2012 HSA Foundation. All Rights Reserved.

27
ARCHITECTED QUEUEING
LANGUAGE, QUEUES
ARCHITECTED QUEUEING LANGUAGE
! 

HSA Queues look just like standard
shared memory queues, supporting
multi-producer, single-consumer
! 

! 

Producer

Producer

Support is allowed for single-producer,
single-consumer

Queues consist of storage, read/write
indices, ID, etc.

Write Index
Packets
Read Index

! 

! 

! 

Queues are created/destroyed via calls
to the HSA runtime
“Packets” are placed in queues directly
from user mode, via an architected
protocol

Storage in
coherent, shared
memory
Consumer

Packet format is architected

© Copyright 2012 HSA Foundation. All Rights Reserved.

29
ARCHITECTED QUEUEING LANGUAGE
! 

Packets are read and dispatched for execution from the queue in order, but
may complete in any order.
! 

! 

There is no guarantee that more than one packet will be processed in parallel at a
time

There may be many queues. A single agent may also consume from several
queues.

! 

A packet processing agent may also enqueue packets.

! 

Once a packet is enqueued, the producer signals the doorbell
! 

! 

! 

Consumers are not required to wait on the doorbell – the consumer could instead be
polling.
The doorbell is not the synchronization mechanism (the shared memory updates to
the read/write indices ensure the synchronization).
Modifications to the read/write indices occur via HSA intrinsics, to support portability
across different platforms.

© Copyright 2012 HSA Foundation. All Rights Reserved.

30
POTENTIAL MULTI-PRODUCER ALGORITHM
// Read the current queue write offset
tmp_WriteOffset = hsa_WriteOffset();
// wait until the queue is no longer full.
while(tmp_WriteOffset == hsa_ReadOffset() + Size) {}
// Atomically bump the WriteOffset
if (hsa_WriteOffset.compare_exchange(tmp_WriteOffset, tmp_WriteOffset + 1){
// calculate index
uint32_t index = tmp_WriteOffset & (Size -1);
// copy over the packet, the format field is INVALID
BaseAddress[index] = pkt;
// Update format field with release semantics
BaseAddress[index].hdr.format.store(DISPATCH, std::memory_order_release);
// ring doorbell, with release semantics (could also amortize over multiple packets)
hsa_ring_doorbell(tmp_WriteOffset+1);
}
© Copyright 2012 HSA Foundation. All Rights Reserved.

31
POTENTIAL CONSUMER ALGORITHM
// spin while empty (could also perform low-power wait on doorbell)
while (BaseAddress[hsa_ReadOffset() & (Size - 1)].hdr.format == INVALID) { }
// calculate the index
uint32_t index = hsa_ReadOffset() & (Size - 1);
// copy over the packet
pkt = BaseAddress[index];
// set the format field to invalid
BaseAddress[index].hdr.format.store(INVALID, std::memory_order_relaxed);
// Update the readoffset using HSA intrinsic
hsa_ReadOffset.store(hsa_ReadOffset());

© Copyright 2012 HSA Foundation. All Rights Reserved.

32
ARCHITECTED QUEUEING
LANGUAGE, PACKETS
DISPATCH PACKET
! 

Packets come in two main types (Dispatch and Barrier), with architected
layouts

! 

Dispatch packet is the most common type of packet

! 

Contains
! 
! 

Pointer to the arguments

! 

WorkGroupSize (x,y,z)

! 

gridSize(x,y,z)

! 

And more……

! 

! 

Pointer to the kernel

dispatch_agent packet more general

Packets contain an additional “barrier” flag. When the barrier flag is set, no
other packets will be launched until all previously launched packets from this
queue have completed.

© Copyright 2012 HSA Foundation. All Rights Reserved.

34
DISPATCH PACKET
Start
Offset
(Bytes)

Format

Field Name

Description

0

uint16_t header

2

uint16_t dimensions

4

uint16_t workgroupSize.x

Packet header
Number of dimensions specified in gridSize. Valid values are
1, 2, or 3.
x dimension of work-group (measured in work-items).

6

uint16_t workgroupSize.y

y dimension of work-group (measured in work-items).

8

12

uint16_t workgroupSize.z
uint16_t reserved2
uint32_t gridSize.x

z dimension of work-group (measured in work-items).
Must be 0.
x dimension of grid (measured in work-items).

16

uint32_t gridSize.y

y dimension of grid (measured in work-items).

20

uint32_t gridSize.z
privateSegmentSizeB
uint32_t
ytes
groupSegmentSizeBy
uint32_t
tes

z dimension of grid (measured in work-items).
Total size in bytes of private memory allocation request (per
work-item).
Total size in bytes of group memory allocation request (per
work-group).
Address of an object in memory that includes an
implementation-defined executable ISA image for the kernel.
Address of memory containing kernel arguments.
Must be 0.
Address of HSA signaling object used to indicate completion of
the job.

10

24

28

32

uint64_t kernelObjectAddress

40
48

uint64_t kernargAddress
uint64_t reserved3

56

uint64_t completionSignal

© Copyright 2012 HSA Foundation. All Rights Reserved.

35
AGENT_DISPATCH PACKET
Start
Offset
(Bytes)

Format

0

uint16_t

header

2

uint16_t

type

4

48

uint32_t
uint64_t
uint64_t
uint64_t
uint64_t
uint64_t
uint64_t

reserved2
returnLocation
arg[0]
arg[1]
arg[2]
arg[3]
reserved3

56

uint64_t

8
16
24
32
40

Field Name

Description

Packet header
The function to be performed by the destination
Agent. The type value is split into the following
ranges:
•  0x0000:0x3FFF – Vendor specific
•  0x4000:0x7FFF – HSA runtime
•  0x8000:0xFFFF – User registered function
Must be 0.
Pointer to location to store the function return value in.
64-bit direct or indirect arguments.

Must be 0.
Address of HSA signaling object used to indicate
completionSignal
completion of the job.

© Copyright 2012 HSA Foundation. All Rights Reserved.

36
BARRIER PACKET
! 

! 

! 

Used for specifying dependences between packets
HAS agent will not launch any further packets from this queue until the barrier
packet signal conditions are met
Used for specifying dependences on packets dispatched from any queue.
! 

! 

Execution phase completes only when all of the dependent signals (up to five) have
been signaled (with the value of 0).
Or if an error has occurred in one of the packets upon which we have a dependence.

© Copyright 2012 HSA Foundation. All Rights Reserved.

37
BARRIER PACKET
Start Offset
(Bytes)

Format

0

uint16_t

header

Packet header, see 2.8.1 Packet header (p. 16).

2

uint16_t

reserved2

Must be 0.

4

uint32_t

reserved3

Must be 0.

8

uint64_t

depSignal0

16

uint64_t

depSignal1

24

uint64_t

depSignal2

32

uint64_t

depSignal3

40

uint64_t

depSignal4

48

uint64_t

reserved4

Must be 0.

56

uint64_t

completionSignal

Address of HSA signaling object used to indicate completion of
the job.

Field Name

© Copyright 2012 HSA Foundation. All Rights Reserved.

Description

Address of dependent signaling objects to be evaluated by the
packet processor.

38
DEPENDENCES
! 

! 

A user may never assume more than one packet is being executed by an HSA
agent at a time.
Implications:
! 

! 

! 

Packets can’t poll on shared memory values which will be set by packets issued
from other queues, unless the user has ensured the proper ordering.
To ensure all previous packets from a queue have been completed, use the Barrier
bit.
To ensure specific packets from any queue have completed, use the Barrier packet.

© Copyright 2012 HSA Foundation. All Rights Reserved.

39
HSA QUEUEING, PACKET EXECUTION
PACKET EXECUTION
! 

Launch phase
! 

Initiated when launch conditions are met
! 
! 

! 

All preceeding packets in the queue must have exited launch phase
If the barrier bit in the packet header is set, then all preceding packets in the
queue must have exited completion phase

Active phase
! 
! 

! 

Execute the packet
Barrier packets remain in Active phase until conditions are met.

Completion phase
! 

First step is memory release fence – make results visible.

! 

CompletionSignal field is then signaled with a decrementing atomic.

© Copyright 2012 HSA Foundation. All Rights Reserved.

41
PACKET EXECUTION

Pkt1
Launch

Pkt1
Execute
Pkt2
Launch

Pkt3 stays in Launch
phase until all packets
have completed.

Pkt1
Complete
Pkt2
Execute

Pkt2
Complete

Pkt3
Launch (barrier=1)

Pkt3
Execute

Time

© Copyright 2012 HSA Foundation. All Rights Reserved.

42
PUTTING IT ALL TOGETHER (FFT)
Packet 1

Packet 3

Packet 5

Packet 2

Packet 4

Packet 6

X[0]
X[1]
X[2]
X[3]
X[4]
X[5]
X[6]
X[7]
Barrier
© Copyright 2012 HSA Foundation. All Rights Reserved.

Time

Barrier
43
PUTTING IT ALL TOGETHER
AQL Pseudo Code
// Send the packets to do the first stage.
aql_dispatch(pkt1);
aql_dispatch(pkt2);
// Send the next two packets, setting the barrier bit so we //
know packets 1 &2 will be complete before 3 and 4 are //
launched.
aql_dispatch_with _barrier_bit(pkt3);
aql_dispatch(pkt4);
// Same as above (make sure 3 & 4 are done before issuing
5 //& 6)
aql_dispatch_with_barrier_bit(pkt5);
aql_dispatch(pkt6);
// This packet will notify us when 5 & 6 are complete)
aql_dispatch_with_barrier_bit(finish_pkt);
© Copyright 2012 HSA Foundation. All Rights Reserved.

44
SUMMARY

© Copyright 2012 HSA Foundation. All Rights Reserved.

45
REQUIREMENTS
! 

HAS Requires four mechanisms to enable lower overhead job
dispatch.
!  Shared Virtual Memory
!  System Coherency
!  Signaling
!  User mode queueing

© Copyright 2012 HSA Foundation. All Rights Reserved.

46
QUESTIONS?

© Copyright 2012 HSA Foundation. All Rights Reserved.

47

More Related Content

What's hot

PG-4039, RapidFire API, by Dmitry Kozlov
PG-4039, RapidFire API, by Dmitry KozlovPG-4039, RapidFire API, by Dmitry Kozlov
PG-4039, RapidFire API, by Dmitry KozlovAMD Developer Central
 
HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...
HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...
HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...
AMD Developer Central
 
MM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey Pavlenko
MM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey PavlenkoMM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey Pavlenko
MM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey Pavlenko
AMD Developer Central
 
HSA Features
HSA FeaturesHSA Features
HSA Features
Hen-Jung Wu
 
CC-4006, Deliver Hardware Accelerated Applications Using RemoteFX vGPU with W...
CC-4006, Deliver Hardware Accelerated Applications Using RemoteFX vGPU with W...CC-4006, Deliver Hardware Accelerated Applications Using RemoteFX vGPU with W...
CC-4006, Deliver Hardware Accelerated Applications Using RemoteFX vGPU with W...
AMD Developer Central
 
Heterogeneous Systems Architecture: The Next Area of Computing Innovation
Heterogeneous Systems Architecture: The Next Area of Computing Innovation Heterogeneous Systems Architecture: The Next Area of Computing Innovation
Heterogeneous Systems Architecture: The Next Area of Computing Innovation
AMD
 
HC-4019, "Exploiting Coarse-grained Parallelism in B+ Tree Searches on an APU...
HC-4019, "Exploiting Coarse-grained Parallelism in B+ Tree Searches on an APU...HC-4019, "Exploiting Coarse-grained Parallelism in B+ Tree Searches on an APU...
HC-4019, "Exploiting Coarse-grained Parallelism in B+ Tree Searches on an APU...
AMD Developer Central
 
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
AMD Developer Central
 
Heterogeneous System Architecture Overview
Heterogeneous System Architecture OverviewHeterogeneous System Architecture Overview
Heterogeneous System Architecture Overview
inside-BigData.com
 
PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, ...
PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, ...PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, ...
PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, ...
AMD Developer Central
 
MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...
MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...
MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...
AMD Developer Central
 
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla MahGS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
AMD Developer Central
 
LCU13: GPGPU on ARM Experience Report
LCU13: GPGPU on ARM Experience ReportLCU13: GPGPU on ARM Experience Report
LCU13: GPGPU on ARM Experience Report
Linaro
 
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware WebinarAn Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
AMD Developer Central
 
Open compute technology
Open compute technologyOpen compute technology
Open compute technology
AMD
 
AMD It's Time to ROC
AMD It's Time to ROCAMD It's Time to ROC
AMD It's Time to ROC
inside-BigData.com
 
CE-4030, Optimizing Photo Editing Application with HSA Technology, by Stanley...
CE-4030, Optimizing Photo Editing Application with HSA Technology, by Stanley...CE-4030, Optimizing Photo Editing Application with HSA Technology, by Stanley...
CE-4030, Optimizing Photo Editing Application with HSA Technology, by Stanley...AMD Developer Central
 
Heterogeneous computing
Heterogeneous computingHeterogeneous computing
Heterogeneous computing
Rashid Ansari
 
HSA Overview
HSA Overview HSA Overview
HSA Overview
HSA Foundation
 
PT-4059, Bolt: A C++ Template Library for Heterogeneous Computing, by Ben Sander
PT-4059, Bolt: A C++ Template Library for Heterogeneous Computing, by Ben SanderPT-4059, Bolt: A C++ Template Library for Heterogeneous Computing, by Ben Sander
PT-4059, Bolt: A C++ Template Library for Heterogeneous Computing, by Ben Sander
AMD Developer Central
 

What's hot (20)

PG-4039, RapidFire API, by Dmitry Kozlov
PG-4039, RapidFire API, by Dmitry KozlovPG-4039, RapidFire API, by Dmitry Kozlov
PG-4039, RapidFire API, by Dmitry Kozlov
 
HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...
HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...
HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...
 
MM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey Pavlenko
MM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey PavlenkoMM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey Pavlenko
MM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey Pavlenko
 
HSA Features
HSA FeaturesHSA Features
HSA Features
 
CC-4006, Deliver Hardware Accelerated Applications Using RemoteFX vGPU with W...
CC-4006, Deliver Hardware Accelerated Applications Using RemoteFX vGPU with W...CC-4006, Deliver Hardware Accelerated Applications Using RemoteFX vGPU with W...
CC-4006, Deliver Hardware Accelerated Applications Using RemoteFX vGPU with W...
 
Heterogeneous Systems Architecture: The Next Area of Computing Innovation
Heterogeneous Systems Architecture: The Next Area of Computing Innovation Heterogeneous Systems Architecture: The Next Area of Computing Innovation
Heterogeneous Systems Architecture: The Next Area of Computing Innovation
 
HC-4019, "Exploiting Coarse-grained Parallelism in B+ Tree Searches on an APU...
HC-4019, "Exploiting Coarse-grained Parallelism in B+ Tree Searches on an APU...HC-4019, "Exploiting Coarse-grained Parallelism in B+ Tree Searches on an APU...
HC-4019, "Exploiting Coarse-grained Parallelism in B+ Tree Searches on an APU...
 
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
 
Heterogeneous System Architecture Overview
Heterogeneous System Architecture OverviewHeterogeneous System Architecture Overview
Heterogeneous System Architecture Overview
 
PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, ...
PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, ...PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, ...
PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, ...
 
MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...
MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...
MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...
 
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla MahGS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
 
LCU13: GPGPU on ARM Experience Report
LCU13: GPGPU on ARM Experience ReportLCU13: GPGPU on ARM Experience Report
LCU13: GPGPU on ARM Experience Report
 
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware WebinarAn Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
 
Open compute technology
Open compute technologyOpen compute technology
Open compute technology
 
AMD It's Time to ROC
AMD It's Time to ROCAMD It's Time to ROC
AMD It's Time to ROC
 
CE-4030, Optimizing Photo Editing Application with HSA Technology, by Stanley...
CE-4030, Optimizing Photo Editing Application with HSA Technology, by Stanley...CE-4030, Optimizing Photo Editing Application with HSA Technology, by Stanley...
CE-4030, Optimizing Photo Editing Application with HSA Technology, by Stanley...
 
Heterogeneous computing
Heterogeneous computingHeterogeneous computing
Heterogeneous computing
 
HSA Overview
HSA Overview HSA Overview
HSA Overview
 
PT-4059, Bolt: A C++ Template Library for Heterogeneous Computing, by Ben Sander
PT-4059, Bolt: A C++ Template Library for Heterogeneous Computing, by Ben SanderPT-4059, Bolt: A C++ Template Library for Heterogeneous Computing, by Ben Sander
PT-4059, Bolt: A C++ Template Library for Heterogeneous Computing, by Ben Sander
 

Similar to HSA-4122, "HSA Queuing Mode," by Ian Bratt

HSA Queuing Hot Chips 2013
HSA Queuing Hot Chips 2013 HSA Queuing Hot Chips 2013
HSA Queuing Hot Chips 2013
HSA Foundation
 
ISCA final presentation - Queuing Model
ISCA final presentation - Queuing ModelISCA final presentation - Queuing Model
ISCA final presentation - Queuing ModelHSA Foundation
 
HSA From A Software Perspective
HSA From A Software Perspective HSA From A Software Perspective
HSA From A Software Perspective
HSA Foundation
 
"Enabling Efficient Heterogeneous Processing Through Coherency," a Presentati...
"Enabling Efficient Heterogeneous Processing Through Coherency," a Presentati..."Enabling Efficient Heterogeneous Processing Through Coherency," a Presentati...
"Enabling Efficient Heterogeneous Processing Through Coherency," a Presentati...
Edge AI and Vision Alliance
 
HSA Introduction Hot Chips 2013
HSA Introduction  Hot Chips 2013HSA Introduction  Hot Chips 2013
HSA Introduction Hot Chips 2013
HSA Foundation
 
SAS on Your (Apache) Cluster, Serving your Data (Analysts)
SAS on Your (Apache) Cluster, Serving your Data (Analysts)SAS on Your (Apache) Cluster, Serving your Data (Analysts)
SAS on Your (Apache) Cluster, Serving your Data (Analysts)
DataWorks Summit
 
ISCA Final Presentation - Intro
ISCA Final Presentation - IntroISCA Final Presentation - Intro
ISCA Final Presentation - IntroHSA Foundation
 
ISCA Final Presentation - HSAIL
ISCA Final Presentation - HSAILISCA Final Presentation - HSAIL
ISCA Final Presentation - HSAILHSA Foundation
 
Introducing Apache Geode and Spring Data GemFire
Introducing Apache Geode and Spring Data GemFireIntroducing Apache Geode and Spring Data GemFire
Introducing Apache Geode and Spring Data GemFire
John Blum
 
HSA HSAIL Introduction Hot Chips 2013
HSA HSAIL Introduction  Hot Chips 2013 HSA HSAIL Introduction  Hot Chips 2013
HSA HSAIL Introduction Hot Chips 2013
HSA Foundation
 
Big data overview by Edgars
Big data overview by EdgarsBig data overview by Edgars
Big data overview by Edgars
Andrejs Vorobjovs
 
Leveraging Open Source to Manage SAN Performance
Leveraging Open Source to Manage SAN PerformanceLeveraging Open Source to Manage SAN Performance
Leveraging Open Source to Manage SAN Performance
brettallison
 
Virtual Memory ,Direct memory addressing and indirect memory addressing prese...
Virtual Memory ,Direct memory addressing and indirect memory addressing prese...Virtual Memory ,Direct memory addressing and indirect memory addressing prese...
Virtual Memory ,Direct memory addressing and indirect memory addressing prese...
ITM University
 
Right time Vs real time
Right time Vs real timeRight time Vs real time
Right time Vs real time
Murphy Choy
 
ISCA final presentation - Memory Model
ISCA final presentation - Memory ModelISCA final presentation - Memory Model
ISCA final presentation - Memory ModelHSA Foundation
 
Get to know the browser better and write faster web apps
Get to know the browser better   and write faster web appsGet to know the browser better   and write faster web apps
Get to know the browser better and write faster web appsLior Bar-On
 
HSA Memory Model Hot Chips 2013
HSA Memory Model Hot Chips 2013HSA Memory Model Hot Chips 2013
HSA Memory Model Hot Chips 2013
HSA Foundation
 
Streaming solutions for real time problems
Streaming solutions for real time problems Streaming solutions for real time problems
Streaming solutions for real time problems
Aparna Gaonkar
 
HSAemu a Full System Emulator for HSA
HSAemu a Full System Emulator for HSA HSAemu a Full System Emulator for HSA
HSAemu a Full System Emulator for HSA
HSA Foundation
 
New Business Applications Powered by In-Memory Technology @MIT Forum for Supp...
New Business Applications Powered by In-Memory Technology @MIT Forum for Supp...New Business Applications Powered by In-Memory Technology @MIT Forum for Supp...
New Business Applications Powered by In-Memory Technology @MIT Forum for Supp...
Paul Hofmann
 

Similar to HSA-4122, "HSA Queuing Mode," by Ian Bratt (20)

HSA Queuing Hot Chips 2013
HSA Queuing Hot Chips 2013 HSA Queuing Hot Chips 2013
HSA Queuing Hot Chips 2013
 
ISCA final presentation - Queuing Model
ISCA final presentation - Queuing ModelISCA final presentation - Queuing Model
ISCA final presentation - Queuing Model
 
HSA From A Software Perspective
HSA From A Software Perspective HSA From A Software Perspective
HSA From A Software Perspective
 
"Enabling Efficient Heterogeneous Processing Through Coherency," a Presentati...
"Enabling Efficient Heterogeneous Processing Through Coherency," a Presentati..."Enabling Efficient Heterogeneous Processing Through Coherency," a Presentati...
"Enabling Efficient Heterogeneous Processing Through Coherency," a Presentati...
 
HSA Introduction Hot Chips 2013
HSA Introduction  Hot Chips 2013HSA Introduction  Hot Chips 2013
HSA Introduction Hot Chips 2013
 
SAS on Your (Apache) Cluster, Serving your Data (Analysts)
SAS on Your (Apache) Cluster, Serving your Data (Analysts)SAS on Your (Apache) Cluster, Serving your Data (Analysts)
SAS on Your (Apache) Cluster, Serving your Data (Analysts)
 
ISCA Final Presentation - Intro
ISCA Final Presentation - IntroISCA Final Presentation - Intro
ISCA Final Presentation - Intro
 
ISCA Final Presentation - HSAIL
ISCA Final Presentation - HSAILISCA Final Presentation - HSAIL
ISCA Final Presentation - HSAIL
 
Introducing Apache Geode and Spring Data GemFire
Introducing Apache Geode and Spring Data GemFireIntroducing Apache Geode and Spring Data GemFire
Introducing Apache Geode and Spring Data GemFire
 
HSA HSAIL Introduction Hot Chips 2013
HSA HSAIL Introduction  Hot Chips 2013 HSA HSAIL Introduction  Hot Chips 2013
HSA HSAIL Introduction Hot Chips 2013
 
Big data overview by Edgars
Big data overview by EdgarsBig data overview by Edgars
Big data overview by Edgars
 
Leveraging Open Source to Manage SAN Performance
Leveraging Open Source to Manage SAN PerformanceLeveraging Open Source to Manage SAN Performance
Leveraging Open Source to Manage SAN Performance
 
Virtual Memory ,Direct memory addressing and indirect memory addressing prese...
Virtual Memory ,Direct memory addressing and indirect memory addressing prese...Virtual Memory ,Direct memory addressing and indirect memory addressing prese...
Virtual Memory ,Direct memory addressing and indirect memory addressing prese...
 
Right time Vs real time
Right time Vs real timeRight time Vs real time
Right time Vs real time
 
ISCA final presentation - Memory Model
ISCA final presentation - Memory ModelISCA final presentation - Memory Model
ISCA final presentation - Memory Model
 
Get to know the browser better and write faster web apps
Get to know the browser better   and write faster web appsGet to know the browser better   and write faster web apps
Get to know the browser better and write faster web apps
 
HSA Memory Model Hot Chips 2013
HSA Memory Model Hot Chips 2013HSA Memory Model Hot Chips 2013
HSA Memory Model Hot Chips 2013
 
Streaming solutions for real time problems
Streaming solutions for real time problems Streaming solutions for real time problems
Streaming solutions for real time problems
 
HSAemu a Full System Emulator for HSA
HSAemu a Full System Emulator for HSA HSAemu a Full System Emulator for HSA
HSAemu a Full System Emulator for HSA
 
New Business Applications Powered by In-Memory Technology @MIT Forum for Supp...
New Business Applications Powered by In-Memory Technology @MIT Forum for Supp...New Business Applications Powered by In-Memory Technology @MIT Forum for Supp...
New Business Applications Powered by In-Memory Technology @MIT Forum for Supp...
 

More from AMD Developer Central

DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIsDX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
AMD Developer Central
 
Leverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math LibrariesLeverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math Libraries
AMD Developer Central
 
Introduction to Node.js
Introduction to Node.jsIntroduction to Node.js
Introduction to Node.js
AMD Developer Central
 
Media SDK Webinar 2014
Media SDK Webinar 2014Media SDK Webinar 2014
Media SDK Webinar 2014
AMD Developer Central
 
DirectGMA on AMD’S FirePro™ GPUS
DirectGMA on AMD’S  FirePro™ GPUSDirectGMA on AMD’S  FirePro™ GPUS
DirectGMA on AMD’S FirePro™ GPUS
AMD Developer Central
 
Webinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop IntelligenceWebinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop Intelligence
AMD Developer Central
 
Inside XBox- One, by Martin Fuller
Inside XBox- One, by Martin FullerInside XBox- One, by Martin Fuller
Inside XBox- One, by Martin Fuller
AMD Developer Central
 
TressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas ThibierozTressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas Thibieroz
AMD Developer Central
 
Rendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnellRendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnellAMD Developer Central
 
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonLow-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
AMD Developer Central
 
Inside XBOX ONE by Martin Fuller
Inside XBOX ONE by Martin FullerInside XBOX ONE by Martin Fuller
Inside XBOX ONE by Martin Fuller
AMD Developer Central
 
Direct3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave OldcornDirect3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave Oldcorn
AMD Developer Central
 
Introduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan NevraevIntroduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan Nevraev
AMD Developer Central
 
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasHoly smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
AMD Developer Central
 
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
AMD Developer Central
 
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
AMD Developer Central
 
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
AMD Developer Central
 
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...
AMD Developer Central
 
Mantle - Introducing a new API for Graphics - AMD at GDC14
Mantle - Introducing a new API for Graphics - AMD at GDC14Mantle - Introducing a new API for Graphics - AMD at GDC14
Mantle - Introducing a new API for Graphics - AMD at GDC14
AMD Developer Central
 

More from AMD Developer Central (20)

DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIsDX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
 
Leverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math LibrariesLeverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math Libraries
 
Introduction to Node.js
Introduction to Node.jsIntroduction to Node.js
Introduction to Node.js
 
Media SDK Webinar 2014
Media SDK Webinar 2014Media SDK Webinar 2014
Media SDK Webinar 2014
 
DirectGMA on AMD’S FirePro™ GPUS
DirectGMA on AMD’S  FirePro™ GPUSDirectGMA on AMD’S  FirePro™ GPUS
DirectGMA on AMD’S FirePro™ GPUS
 
Webinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop IntelligenceWebinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop Intelligence
 
Inside XBox- One, by Martin Fuller
Inside XBox- One, by Martin FullerInside XBox- One, by Martin Fuller
Inside XBox- One, by Martin Fuller
 
TressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas ThibierozTressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas Thibieroz
 
Rendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnellRendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnell
 
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonLow-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
 
Gcn performance ftw by stephan hodes
Gcn performance ftw by stephan hodesGcn performance ftw by stephan hodes
Gcn performance ftw by stephan hodes
 
Inside XBOX ONE by Martin Fuller
Inside XBOX ONE by Martin FullerInside XBOX ONE by Martin Fuller
Inside XBOX ONE by Martin Fuller
 
Direct3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave OldcornDirect3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave Oldcorn
 
Introduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan NevraevIntroduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan Nevraev
 
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasHoly smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
 
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
 
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
 
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
 
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...
 
Mantle - Introducing a new API for Graphics - AMD at GDC14
Mantle - Introducing a new API for Graphics - AMD at GDC14Mantle - Introducing a new API for Graphics - AMD at GDC14
Mantle - Introducing a new API for Graphics - AMD at GDC14
 

Recently uploaded

Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 

Recently uploaded (20)

Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 

HSA-4122, "HSA Queuing Mode," by Ian Bratt

  • 1. HSA QUEUEING APU13 - NOVEMBER 2013 IAN BRATT, PRINCIPAL ENGINEER ARM
  • 3. MOTIVATION (TODAY’S PICTURE) Application Transfer buffer to GPU OS GPU Copy/Map Memory Queue Job Schedule Job Start Job Finish Job Schedule Application Get Buffer Copy/Map Memory © Copyright 2012 HSA Foundation. All Rights Reserved. 3
  • 5. REQUIREMENTS !  Requires four mechanisms to enable lower overhead job dispatch. !  Shared Virtual Memory !  System Coherency !  Signaling !  User mode queueing © Copyright 2012 HSA Foundation. All Rights Reserved. 5
  • 7. SHARED VIRTUAL MEMORY (TODAY) !  Multiple Virtual memory address spaces PHYSICAL MEMORY PHYSICAL MEMORY VIRTUAL MEMORY1 CPU0 VA1->PA1 © Copyright 2012 HSA Foundation. All Rights Reserved. VIRTUAL MEMORY2 GPU VA2->PA1 7
  • 8. SHARED VIRTUAL MEMORY (HSA) !  Common Virtual Memory for all HSA agents PHYSICAL MEMORY PHYSICAL MEMORY VIRTUAL MEMORY CPU0 VA->PA © Copyright 2012 HSA Foundation. All Rights Reserved. GPU VA->PA 8
  • 9. SHARED VIRTUAL MEMORY !  !  Advantages !  No mapping tricks, no copying back-and-forth between different PA addresses !  Send pointers (not data) back and forth between HSA agents. Implications !  Common Page Tables (and common interpretation of architectural semantics such as shareability, protection, etc). !  Common mechanisms for address translation (and servicing address translation faults) !  Concept of a process address space (PASID) to allow multiple, per process virtual address spaces within the system. © Copyright 2012 HSA Foundation. All Rights Reserved. 9
  • 10. GETTING THERE … Application Transfer buffer to GPU OS GPU Copy/Map Memory Queue Job Schedule Job Start Job Finish Job Schedule Application Get Buffer Copy/Map Memory © Copyright 2012 HSA Foundation. All Rights Reserved. 10
  • 11. SHARED VIRTUAL MEMORY !  Specifics !  Minimum supported VA width is 48b for 64b systems, and 32b for 32b systems. !  HSA agents may reserve VA ranges for internal use via system software. !  All HSA agents other than the host unit must use the lowest privelege level !  If present, read/write access flags for page tables must be maintained by all agents. !  Read/write permissions apply to all HSA agents, equally. © Copyright 2012 HSA Foundation. All Rights Reserved. 11
  • 13. CACHE COHERENCY DOMAINS (1/3) !  Data accesses to global memory segment from all HSA Agents shall be coherent without the need for explicit cache maintenance. © Copyright 2012 HSA Foundation. All Rights Reserved. 13
  • 14. CACHE COHERENCY DOMAINS (2/3) !  !  Advantages !  Composability !  Reduced SW complexity when communicating between agents !  Lower barrier to entry when porting software Implications !  Hardware coherency support between all HSA agents !  Can take many forms !  Stand alone Snoop Filters / Directories !  Combined L3/Filters !  Snoop-based systems (no filter) !  Etc … © Copyright 2012 HSA Foundation. All Rights Reserved. 14
  • 15. GETTING CLOSER … Application Transfer buffer to GPU OS GPU Copy/Map Memory Queue Job Schedule Job Start Job Finish Job Schedule Application Get Buffer Copy/Map Memory © Copyright 2012 HSA Foundation. All Rights Reserved. 15
  • 16. CACHE COHERENCY DOMAINS (3/3) !  Specifics !  No requirement for instruction memory accesses to be coherent !  Only applies to the Primary memory type. !  No requirement for HSA agents to maintain coherency to any memory location where the HSA agents do not specify the same memory attributes !  Read-only image data is required to remain static during the execution of an HSA kernel. !  No double mapping (via different attributes) in order to modify. Must remain static © Copyright 2012 HSA Foundation. All Rights Reserved. 16
  • 18. SIGNALING (1/3) !  HSA agents support the ability to use signaling objects !  All creation/destruction signaling objects occurs via HSA runtime APIs !  From an HSA Agent you can directly access signaling objects. !  Signaling a signal object (this will wake up HSA agents waiting upon the object) !  Query current object !  Wait on the current object (various conditions supported). © Copyright 2012 HSA Foundation. All Rights Reserved. 18
  • 19. SIGNALING (2/3) !  !  Advantages !  Enables asynchronous interrupts between HSA agents, without involving the kernel !  Common idiom for work offload !  Low power waiting Implications !  Runtime support required !  Commonly implemented on top of cache coherency flows © Copyright 2012 HSA Foundation. All Rights Reserved. 19
  • 20. ALMOST THERE… Application Transfer buffer to GPU OS GPU Copy/Map Memory Queue Job Schedule Job Start Job Finish Job Schedule Application Get Buffer Copy/Map Memory © Copyright 2012 HSA Foundation. All Rights Reserved. 20
  • 21. SIGNALING (3/3) !  Specifics !  Only supported within a PASID !  Supported wait conditions are =, !=, < and >= !  Wait operations may return sporadically (no guarantee against false positives) !  Programmer must test. !  Wait operations have a maximum duration before returning. !  The HSAIL atomic operations are supported on signal objects. !  Signal objects are opaque © Copyright 2012 HSA Foundation. All Rights Reserved. 21
  • 23. ONE BLOCK LEFT Application Transfer buffer to GPU OS GPU Copy/Map Memory Queue Job Schedule Job Start Job Finish Job Schedule Application Get Buffer Copy/Map Memory © Copyright 2012 HSA Foundation. All Rights Reserved. 23
  • 24. USER MODE QUEUEING (1/3) !  User mode Queueing !  Enables user space applications to directly, without OS intervention, enqueue jobs (“Dispatch Packets”) for HSA agents. !  Queues are created/destroyed via calls to the HSA runtime. !  One (or many) agents enqueue packets, a single agent dequeues packets. !  Requires coherency and shared virtual memory. © Copyright 2012 HSA Foundation. All Rights Reserved. 24
  • 25. USER MODE QUEUEING (2/3) !  Advantages !  Avoid involving the kernel/driver when dispatching work for an Agent. !  Lower latency job dispatch enables finer granularity of offload !  !  Standard memory protection mechanisms may be used to protect communication with the consuming agent. Implications !  Packet formats/fields are Architected – standard across vendors! !  !  !  Guaranteed backward compatibility Packets are enqueued/dequeued via an Architected protocol (all via memory accesses and signalling) More on this later…… © Copyright 2012 HSA Foundation. All Rights Reserved. 25
  • 26. SUCCESS! Application Transfer buffer to GPU OS GPU Copy/Map Memory Queue Job Schedule Job Start Job Finish Job Schedule Application Get Buffer Copy/Map Memory © Copyright 2012 HSA Foundation. All Rights Reserved. 26
  • 27. SUCCESS! Application OS GPU Queue Job Start Job Finish Job © Copyright 2012 HSA Foundation. All Rights Reserved. 27
  • 29. ARCHITECTED QUEUEING LANGUAGE !  HSA Queues look just like standard shared memory queues, supporting multi-producer, single-consumer !  !  Producer Producer Support is allowed for single-producer, single-consumer Queues consist of storage, read/write indices, ID, etc. Write Index Packets Read Index !  !  !  Queues are created/destroyed via calls to the HSA runtime “Packets” are placed in queues directly from user mode, via an architected protocol Storage in coherent, shared memory Consumer Packet format is architected © Copyright 2012 HSA Foundation. All Rights Reserved. 29
  • 30. ARCHITECTED QUEUEING LANGUAGE !  Packets are read and dispatched for execution from the queue in order, but may complete in any order. !  !  There is no guarantee that more than one packet will be processed in parallel at a time There may be many queues. A single agent may also consume from several queues. !  A packet processing agent may also enqueue packets. !  Once a packet is enqueued, the producer signals the doorbell !  !  !  Consumers are not required to wait on the doorbell – the consumer could instead be polling. The doorbell is not the synchronization mechanism (the shared memory updates to the read/write indices ensure the synchronization). Modifications to the read/write indices occur via HSA intrinsics, to support portability across different platforms. © Copyright 2012 HSA Foundation. All Rights Reserved. 30
  • 31. POTENTIAL MULTI-PRODUCER ALGORITHM // Read the current queue write offset tmp_WriteOffset = hsa_WriteOffset(); // wait until the queue is no longer full. while(tmp_WriteOffset == hsa_ReadOffset() + Size) {} // Atomically bump the WriteOffset if (hsa_WriteOffset.compare_exchange(tmp_WriteOffset, tmp_WriteOffset + 1){ // calculate index uint32_t index = tmp_WriteOffset & (Size -1); // copy over the packet, the format field is INVALID BaseAddress[index] = pkt; // Update format field with release semantics BaseAddress[index].hdr.format.store(DISPATCH, std::memory_order_release); // ring doorbell, with release semantics (could also amortize over multiple packets) hsa_ring_doorbell(tmp_WriteOffset+1); } © Copyright 2012 HSA Foundation. All Rights Reserved. 31
  • 32. POTENTIAL CONSUMER ALGORITHM // spin while empty (could also perform low-power wait on doorbell) while (BaseAddress[hsa_ReadOffset() & (Size - 1)].hdr.format == INVALID) { } // calculate the index uint32_t index = hsa_ReadOffset() & (Size - 1); // copy over the packet pkt = BaseAddress[index]; // set the format field to invalid BaseAddress[index].hdr.format.store(INVALID, std::memory_order_relaxed); // Update the readoffset using HSA intrinsic hsa_ReadOffset.store(hsa_ReadOffset()); © Copyright 2012 HSA Foundation. All Rights Reserved. 32
  • 34. DISPATCH PACKET !  Packets come in two main types (Dispatch and Barrier), with architected layouts !  Dispatch packet is the most common type of packet !  Contains !  !  Pointer to the arguments !  WorkGroupSize (x,y,z) !  gridSize(x,y,z) !  And more…… !  !  Pointer to the kernel dispatch_agent packet more general Packets contain an additional “barrier” flag. When the barrier flag is set, no other packets will be launched until all previously launched packets from this queue have completed. © Copyright 2012 HSA Foundation. All Rights Reserved. 34
  • 35. DISPATCH PACKET Start Offset (Bytes) Format Field Name Description 0 uint16_t header 2 uint16_t dimensions 4 uint16_t workgroupSize.x Packet header Number of dimensions specified in gridSize. Valid values are 1, 2, or 3. x dimension of work-group (measured in work-items). 6 uint16_t workgroupSize.y y dimension of work-group (measured in work-items). 8 12 uint16_t workgroupSize.z uint16_t reserved2 uint32_t gridSize.x z dimension of work-group (measured in work-items). Must be 0. x dimension of grid (measured in work-items). 16 uint32_t gridSize.y y dimension of grid (measured in work-items). 20 uint32_t gridSize.z privateSegmentSizeB uint32_t ytes groupSegmentSizeBy uint32_t tes z dimension of grid (measured in work-items). Total size in bytes of private memory allocation request (per work-item). Total size in bytes of group memory allocation request (per work-group). Address of an object in memory that includes an implementation-defined executable ISA image for the kernel. Address of memory containing kernel arguments. Must be 0. Address of HSA signaling object used to indicate completion of the job. 10 24 28 32 uint64_t kernelObjectAddress 40 48 uint64_t kernargAddress uint64_t reserved3 56 uint64_t completionSignal © Copyright 2012 HSA Foundation. All Rights Reserved. 35
  • 36. AGENT_DISPATCH PACKET Start Offset (Bytes) Format 0 uint16_t header 2 uint16_t type 4 48 uint32_t uint64_t uint64_t uint64_t uint64_t uint64_t uint64_t reserved2 returnLocation arg[0] arg[1] arg[2] arg[3] reserved3 56 uint64_t 8 16 24 32 40 Field Name Description Packet header The function to be performed by the destination Agent. The type value is split into the following ranges: •  0x0000:0x3FFF – Vendor specific •  0x4000:0x7FFF – HSA runtime •  0x8000:0xFFFF – User registered function Must be 0. Pointer to location to store the function return value in. 64-bit direct or indirect arguments. Must be 0. Address of HSA signaling object used to indicate completionSignal completion of the job. © Copyright 2012 HSA Foundation. All Rights Reserved. 36
  • 37. BARRIER PACKET !  !  !  Used for specifying dependences between packets HAS agent will not launch any further packets from this queue until the barrier packet signal conditions are met Used for specifying dependences on packets dispatched from any queue. !  !  Execution phase completes only when all of the dependent signals (up to five) have been signaled (with the value of 0). Or if an error has occurred in one of the packets upon which we have a dependence. © Copyright 2012 HSA Foundation. All Rights Reserved. 37
  • 38. BARRIER PACKET Start Offset (Bytes) Format 0 uint16_t header Packet header, see 2.8.1 Packet header (p. 16). 2 uint16_t reserved2 Must be 0. 4 uint32_t reserved3 Must be 0. 8 uint64_t depSignal0 16 uint64_t depSignal1 24 uint64_t depSignal2 32 uint64_t depSignal3 40 uint64_t depSignal4 48 uint64_t reserved4 Must be 0. 56 uint64_t completionSignal Address of HSA signaling object used to indicate completion of the job. Field Name © Copyright 2012 HSA Foundation. All Rights Reserved. Description Address of dependent signaling objects to be evaluated by the packet processor. 38
  • 39. DEPENDENCES !  !  A user may never assume more than one packet is being executed by an HSA agent at a time. Implications: !  !  !  Packets can’t poll on shared memory values which will be set by packets issued from other queues, unless the user has ensured the proper ordering. To ensure all previous packets from a queue have been completed, use the Barrier bit. To ensure specific packets from any queue have completed, use the Barrier packet. © Copyright 2012 HSA Foundation. All Rights Reserved. 39
  • 40. HSA QUEUEING, PACKET EXECUTION
  • 41. PACKET EXECUTION !  Launch phase !  Initiated when launch conditions are met !  !  !  All preceeding packets in the queue must have exited launch phase If the barrier bit in the packet header is set, then all preceding packets in the queue must have exited completion phase Active phase !  !  !  Execute the packet Barrier packets remain in Active phase until conditions are met. Completion phase !  First step is memory release fence – make results visible. !  CompletionSignal field is then signaled with a decrementing atomic. © Copyright 2012 HSA Foundation. All Rights Reserved. 41
  • 42. PACKET EXECUTION Pkt1 Launch Pkt1 Execute Pkt2 Launch Pkt3 stays in Launch phase until all packets have completed. Pkt1 Complete Pkt2 Execute Pkt2 Complete Pkt3 Launch (barrier=1) Pkt3 Execute Time © Copyright 2012 HSA Foundation. All Rights Reserved. 42
  • 43. PUTTING IT ALL TOGETHER (FFT) Packet 1 Packet 3 Packet 5 Packet 2 Packet 4 Packet 6 X[0] X[1] X[2] X[3] X[4] X[5] X[6] X[7] Barrier © Copyright 2012 HSA Foundation. All Rights Reserved. Time Barrier 43
  • 44. PUTTING IT ALL TOGETHER AQL Pseudo Code // Send the packets to do the first stage. aql_dispatch(pkt1); aql_dispatch(pkt2); // Send the next two packets, setting the barrier bit so we // know packets 1 &2 will be complete before 3 and 4 are // launched. aql_dispatch_with _barrier_bit(pkt3); aql_dispatch(pkt4); // Same as above (make sure 3 & 4 are done before issuing 5 //& 6) aql_dispatch_with_barrier_bit(pkt5); aql_dispatch(pkt6); // This packet will notify us when 5 & 6 are complete) aql_dispatch_with_barrier_bit(finish_pkt); © Copyright 2012 HSA Foundation. All Rights Reserved. 44
  • 45. SUMMARY © Copyright 2012 HSA Foundation. All Rights Reserved. 45
  • 46. REQUIREMENTS !  HAS Requires four mechanisms to enable lower overhead job dispatch. !  Shared Virtual Memory !  System Coherency !  Signaling !  User mode queueing © Copyright 2012 HSA Foundation. All Rights Reserved. 46
  • 47. QUESTIONS? © Copyright 2012 HSA Foundation. All Rights Reserved. 47