HC-4015, An Overview of the HSA System Architecture Requirements, by Paul Blinzer

THE
HSA
SYSTEM
ARCHITECTURE

REQUIREMENTS
–
AN
OVERVIEW

PAUL
BLINZER,
FELLOW,
HSA
SYSTEM
SOFTWARE,
AMD

SYSTEM
ARCHITECTURE
WORKGROUP
CHAIR,
HSA

FOUNDATION

1
|

THE
HSA
PLATFORM
SYSTEM
ARCHITECTURE
SPECIFICATION
–
AN
OVERVIEW

|

NOVEMBER
12,
2013
|
APU13

AGENDA

! 

What
is
the
HSA
FoundaKon?

! 

The
System
Architecture
Workgroup
and
its
goals

! 

What
defines
HSA
plaVorms
and
components?

! 

The
Shared
Virtual
Memory
requirements

! 

The
HSA
Memory
Model
Requirements

! 

The
HSA
Queuing
Architecture

! 

Some
other
requirements
set
by
the
System
Architecture
specificaKon

! 

Where
to
find
further
informaKon

! 

Q
&
A

2
|

THE
HSA
PLATFORM
SYSTEM
ARCHITECTURE
SPECIFICATION
–
AN
OVERVIEW

|

NOVEMBER
12,
2013
|
APU13

WHAT
IS
THE
HSA
FOUNDATION?

"  This
is
the
short
version…

! 

The
HSA
FoundaKon
is
a
not-‐for-‐profit
consorKum
of
SOC
and
SOC
IP

vendors,
OEMs,
academia,
OSVs
and
ISVs
defining
a
consistent

heterogeneous
plaVorm
architecture
to
make
it
dramaKcally
easier

to
program
heterogeneous
parallel
devices

! 

! 

It
spans
mulKple
host
plaVorm
architectures
and
programmable
data
parallel
components
(e.g.

CPU:
x86,
ARM,
MIPS,
…
device
types:
GPUs,
DSPs,
…)
to
work
collaboraKvely
within
the
same

HSA
system
architecture

It
defines
a
set
of
specificaKons
that
define
HW
&
SW
plaVorm
requirements
to
enable

applicaKons
to
target
the
feature
set
from
high
level
languages
and
APIs

! 

! 

! 

It’s
not
a
replacement
to
e.g.
OpenCL
but
complementary
to
it,
defining
the
system
level

properKes
“below
the
API”,
leveraged
by
applicaKon-‐
and
system
soiware

Conformance

The
System
Architecture
specificaKon
defines
the
required
component
and
plaVorm
features
for

HSA
compliant
components

This
presentaKon
is
an
overview
of
the
current
System
Architecture

definiKons
and
does
not
represent
a
complete
or
“final”
state

! 

Tools

that
one
is
the
specificaKon
itself
when
available
☺

3
|

THE
HSA
PLATFORM
SYSTEM
ARCHITECTURE
SPECIFICATION
–
AN
OVERVIEW

|

NOVEMBER
12,
2013
|
APU13

System
Runtime
Specification

Programmer’s
Reference
Manual
Platform
(Software)
System
Architecture
Specification

THE
SYSTEM
ARCHITECTURE
WORKGROUP
OF
THE
HSA
FOUNDATION

" 

Who
ParKcipates
and
what
are
the
goals?

"  The
workgroup
membership
spans
a
wide
variety
of
IP
and
plaVorm
architecture
owners

‒  Several
host
plaVorm
architectures
are
targeted

"  The
specificaKons
define
a
common
set
of
plaVorm
properKes
that
provide
a
dependable

hardware
and
system
foundaKon
for
applicaKon
soiware,
libraries
and
runKmes

"  The
goal
is
to
eliminate
“weak
points”
in
the
system
soiware-‐
and
hardware
architecture

of
tradiKonal
plaVorms
that
lead
to
unnecessary
overhead
in
the
operaKons
of
data

parallel
workloads

"  The
main
deliverables
are:

‒  Well-‐defined,

consistent
and
dependable
memory
model
all
HSA
agents
operate
in

‒  Share
access
to
process
virtual
memory
between
HSA
agents
(“ptr-‐is-‐ptr”)

‒  Low-‐latency
workload
dispatch
contained
in
user-‐mode
queues

‒  Scalability
across
a
wide
range
of
plaVorms

‒  These
properKes
are
leveraged
in
the
“HSA
Programmer’s
Reference”,
HSAIL
and
HSA
RunKme

specificaKons

4
|

THE
HSA
PLATFORM
SYSTEM
ARCHITECTURE
SPECIFICATION
–
AN
OVERVIEW

|

NOVEMBER
12,
2013
|
APU13

WHAT
DEFINES
HSA
PLATFORMS
AND
COMPONENTS?

" 

" 

In
short,
an
HSA
compaKble
plaVorm
consists
of
“HSA
agents”
(hardware

components
that
parKcipate
in
the
HSA
memory
model)
adhering
to
the
various

system
architecture
requirements

Each
HSA
agent
adheres
to
the
same
queuing
&
dispatch
mechanics,
low-‐latency

synchronizaKon
primiKves,
memory
coherence
and
data
visibility
(memory
model)

requirements

‒ 

Defined
mainly
in
the
“(Soiware)
System
Architecture”
specificaKon

‒ 

The
HSAIL
and
“Programmer’s
Reference
Manual”
specificaKons
define
the
soiware
execuKon
model

‒ 

Architected
mechanisms
to
enqueue
and
dispatch
workloads
from
one
HSA
agent
queue
to
another
eliminate
the
need
to

use
the
host
CPU
for
these
purposes
for
a
lot
of
scenarios

‒ 

Architected
infrastructure
allows
exchanging
data
with
non-‐HSA
compliant
components
in
a
plaVorm

‒ 

Fundamental
data
types
are
naturally
aligned

5
|

THE
HSA
PLATFORM
SYSTEM
ARCHITECTURE
SPECIFICATION
–
AN
OVERVIEW

|

NOVEMBER
12,
2013
|
APU13

WHAT
DEFINES
HSA
PLATFORMS
AND
COMPONENTS?

‒  There
are
two
different
machine
models
(“small”
and
“large”)
that
target
different
funcKonality
levels

‒  It
takes
into
account
different
feature
requirements
for
different
plaVorm
environments

‒  In
all
cases,
the
same
HSA
applicaKon
programming
model
is
used
to
target
HSA
agents
and
provides
the
same
power–
efficient
and
low-‐latency

dispatch
mechanisms,
synchronizaKon
primiKves
and
SW
programming
model

‒  ApplicaKons
wriren
to
target
HSA
small
model
machines
will
generally
work
on
large
model
machines,
too

‒  If
the
large
model
plaVorm
and
host
OperaKng
System
provides
a
32bit
process
environment

Proper&es

Small
Machine
Model

Large
Machine
Model

PlaVorm
targets

embedded
or
personal
device
space
(controllers,

smartphones,
etc.)

PC,
workstaKon,
cloud
Server,
etc
running
more
demanding
workloads

NaKve
pointer
size

32bit

64bit
(+
32bit
ptr
if
32bit
processes
are
supported)

FloaKng
point
size

Half
(FP16*),
Single
(FP32)
precision

Half
(FP16*),
Single
(FP32),
Double
(FP64)
precision

Atomic
ops
size

32bit

32bit,
64bit

*min.
Load
and
store
on
memory

6
|

THE
HSA
PLATFORM
SYSTEM
ARCHITECTURE
SPECIFICATION
–
AN
OVERVIEW

|

NOVEMBER
12,
2013
|
APU13

THE
SHARED
PROCESS
VIRTUAL
ADDRESS
SPACE
REQUIREMENTS(1)

‒  The
Basis
of
“ptr-‐is-‐ptr”

" 

Each
HSA
agent
adheres
to
the
same
user
process
address
space
view
as
the
host
CPU

‒ 

" 

The
process
address
view
is
established
by
the
hardware’s
page
table
mappings

‒ 
‒ 
‒ 

" 

HSA
operates
in
a
“ﬂat”
virtual
address
space,
using
64bit
&
32bit
ptrs
depending
on
applicaKon/machine
model

‒  A
pointer
value
references
the
same
memory
for
every
HSA
agent

‒  An
HSA
agent
can
“walk”
or
update
linked
data
structures
directly
without
any
assistance
from
a
host
CPU

HSA
agent
virtual
address
range
matches
the
host
plaVorm
(e.g.
48bit,
32bit,
…)

HSA
agents
always
operate
at
“user
privilege”
of
the
host
CPU,
policy
enforced
by
system

HSA
agents
observe
the
same
memory
page
table
arributes
(cache,
read,
write,
…)
and
page
sizes
of
the
host
CPU,
policy
enforced

by
system

HSA
agents
support
page
faults,
allowing
to
directly
operate
on
pageable
memory
as

provided
by
the
OperaKng
System
environment

‒ 

‒ 

For
allocated
pageable
memory,
System
Soiware
takes
page
faults,
commits
memory,
loads
contents
from
backup
store
and

restarts
execuKon
like
it
does
for
any
access
from
host
CPU
threads

There
is
no
tedious
device
buﬀer
copy,
explicit
page
lock
or
similar
needed
to
access
data
in
allocated
memory
by
an
HSA
agent

directly!

7
|

THE
HSA
PLATFORM
SYSTEM
ARCHITECTURE
SPECIFICATION
–
AN
OVERVIEW

|

NOVEMBER
12,
2013
|
APU13

THE
SHARED
PROCESS
VIRTUAL
ADDRESS
SPACE
REQUIREMENTS(2)

"  The
basis
of
“ptr-‐is-‐ptr”

" 

On
AMD
processor-‐based
pla9orms,
the
IOMMUv2
device

provides
the
HSAMMU
translaKon
services
via
standard
PCI

Express™
ATS/PRI
protocols
to
HSA
compliant
hardware

when
accessing
memory
from
the
HSA
agent

‒ 

‒ 

" 

Device
Table
base
register

Event
Counter
registers

HSA MMU
(IOMMUv2 device)

Command
Page Req
Buffer
Log
base register
base register
Event Log
base register

System memory

IOMMUv2
integraKon
into
OS
memory
manager
provides
the
low-‐level

infrastructure
(e.g.
in
Linux®
kernel)

Different
host
plaVorm
architectures
may
use
different
detail
mechanisms
here

HSA MMU
Translation Tables
(per Process, PASID)

Page Service
Request Log

Event
Log

8
|

THE
HSA
PLATFORM
SYSTEM
ARCHITECTURE
SPECIFICATION
–
AN
OVERVIEW

|

NOVEMBER
12,
2013
|
APU13

I/O page tables

Command
Buffer

The
implementaKon
detail
is
not
relevant
to
the
applicaKon
and
dealt
within
the

system
soiware
(e.g.
OS)

Host
translation

Device
Table

‒ 

As
long
as
it
follows
the
HSA
Sysarch
requirements,
it
is
ok

Interrupt
Remapping
Table

‒ 

Guest &
host
translation

separate
translaKon
levels
are
used
(see
block
diagram)

ImplementaKon
of
shared
virtual
address
space
by
other

vendors
on
other
host
plaVorms
may
be
different

Perf Counters &
RAS Info (opt.)

Peripheral Page
Requests
(PPR) Service

The
HSAMMU
funcKonality
is
provided
in
addiKon
to

IOMMU
funcKonality
used
in
device
virtualizaKon

‒ 

" 

HSA MMU Data structures

THE
HSA
MEMORY
MODEL
REQUIREMENTS

"  What
are

Its
key
properKes?

" 

A
memory
model
defines
how
writes
by
one
work
item
or
agent
becomes
visible
to
other

work
items
and
agents,
rules
that
need
to
be
adhered
to
by
compilers
and
applicaKon

threads

‒ 

‒ 

" 

‒ 

Naturally
aligned
on
size,
small
machine
model
supports
32bit,
large
machine
model
supports
32bit
and
64bit

Cache
Coherency
between
HSA
agents
(&
host
CPU)
is
maintained
by
default

‒ 

Inherently
maps
to
many
CPU
and
device
architectures
very
easily

Efficient
sequenKal
consistency
mechanisms
supported
to
fit
high-‐level
language
programming
models

A
consistent,
full
set
of
atomic
operaKons
is
available

‒ 

" 

Important
to
define
scope
for
performance
opKmizaKons
in
the
compiler,
to
allow
reordering
of
code
in
the
Finalizer

At
its
base,
the
HSA
memory
model
is
based
on
a
“relaxed”
load
acquire/store
release

model

‒ 

" 

It
defines
visibility
and
ordering
rules
of
write
and
read
events
across
work
items,
HSA
agents
and
interacKons
with
non-‐HSA

components
in
the
system

key
feature
of
the
HSA
system
&
plaVorm
environment

9
|

THE
HSA
PLATFORM
SYSTEM
ARCHITECTURE
SPECIFICATION
–
AN
OVERVIEW

|

NOVEMBER
12,
2013
|
APU13

THE
HSA
QUEUEING
ARCHITECTURE
REQUIREMENTS(1)

"  The
basis
of
the
workload
dispatch
on
HSA
" 

The
queue
dispatch
occurs
through
architected
queue
packets
(“Architected

Queuing
Language”,
AQL
)
that
references
the
work
items
&
parameters

‒ 

Dispatch
to
HW
occurs
directly
in
user
mode,
eliminaKng
a
notable
source
of
latency
overhead
in
tradiKonal
architectures!

‒ 

Two
architected
packet
types
exist
at
the
moment,
dispatch
and
barrier
packets

‒ 

‒ 

" 

Each
queue
is
defined
by
several
architected
parameters
(type,
base
address,
size,
read
index,
write
index,
…)
that
allow

targeKng
the
queue
from
other
HSA
agents
and
the
host
CPU

The
design
allows
an
HSA
agent
on
the
plaVorm
to
build
&
dispatch
jobs
to
a
queue
using
HSA
architected
interfaces

ApplicaKons
and
runKme
can
build
different
queuing
models
on
top
of
the

infrastructure

‒ 

Single-‐producer,
MulK-‐producer
queuing
models,
lock-‐free
dispatch,
…
are
all
opKons
SW
can
implement
on
top
of
the

system
architecture’s
queue
definiKon
to
fit
the
use
model

10
|

THE
HSA
PLATFORM
SYSTEM
ARCHITECTURE
SPECIFICATION
–
AN
OVERVIEW

|

NOVEMBER
12,
2013
|
APU13

THE
HSA
QUEUEING
ARCHITECTURE
REQUIREMENTS(2)

"  The
basis
of
the
workload
dispatch
on
HSA
" 

The
HSA
System
Architecture
defines
a
user
mode
queue
based
dispatch

mechanism

‒ 
‒ 

" 

Each
queue
is
only
valid
within
that
process
context
and
represents
a
virtual
enKty
that
is
scheduled
to
hardware

The
job
execuKon
occurs
at
“user
privilege”
like
the
rest
of
the
applicaKon
code,
enforced
by
system
architecture

Each
HSA
agent
allows
for
mulKple
queues
per
applicaKon
process

‒ 

HSA
defines
in-‐order
dispatch
semanKcs
of
work
items
within
queues
for
efficient
HW
implementaKon

‒ 

‒ 

" 

HW
may
execute
dispatch
packets
“out-‐of-‐order”,
if
no
dependencies
exist
and
in-‐order
semanKcs
are
followed

externally

“Out
of
order”
execuKon
applies
between
queues,
with
explicit,
memory
based
synchronizaKon
mechanisms
between
them

as
needed

It
is
“cheap”
to
create
queues
in
HSA,
so
applicaKons
can
have
one
queue
per
HSA

agent
for
each
applicaKon
thread,
or
leveraging
mulKple
HSA
user
queues
per

thread
if
needed

‒ 

This
gives
applicaKons
a
lot
of
flexibility
to
structure
the
queue
layout
to
match
the
problem
instead
of
trying
to
fit
the

problem
to
work
with
one
or
a
few
queues
only

11
|

THE
HSA
PLATFORM
SYSTEM
ARCHITECTURE
SPECIFICATION
–
AN
OVERVIEW

|

NOVEMBER
12,
2013
|
APU13

OTHER
REQUIREMENTS
SET
BY
THE
HSA
SYSTEM
ARCHITECTURE

"  Miscellaneous
menKon,
but
nevertheless
important
to
make
it
work
well…

" 

HSA
Memory
based
signaling
and
synchronizaKon
primiKves

‒ 

Defines
memory
based
semanKcs
to
synchronize
with
work
items
processed
by
HSA
agents

‒ 

e.g.
32bit
or
64bit
value,
content
update,
wait
on
value
by
HSA
agents
and
AQL
packets

‒ 
‒ 

Allows
one-‐to-‐one
and
one-‐to-‐many
signaling

‒ 

The
signaling
semanKcs
follow
atomicity
requirements
defined
in
the
memory
model

‒ 

" 

Hardware-‐assisted,
power-‐efficient
&
low-‐latency
way
to
synchronize
execuKon
of
work
items
between
threads

RunKme
&
applicaKon
SW
can
use
infrastructure
to
build
mutexes,
semaphores,
other
synchronizaKon

primiKves

HSA
Cache
Coherency
Domains

‒ 

Defines
the
scope
of
HSA
cache
coherency
and
relate
to
other
non-‐HSA
system
resource
operaKons

‒ 

Associated
with
the
memory
model
requirements

‒ 

Architected
way
to
interact
with
non-‐HSA
plaVorm
infrastructure
(e.g.
graphics)

12
|

THE
HSA
PLATFORM
SYSTEM
ARCHITECTURE
SPECIFICATION
–
AN
OVERVIEW

|

NOVEMBER
12,
2013
|
APU13

OTHER
REQUIREMENTS
SET
BY
THE
HSA
SYSTEM
ARCHITECTURE

"  Miscellaneous
menKon,
but
nevertheless

important

HSA Platform - Simple

" 

HSA
system
Kmestamp
requirements

‒ 

‒ 

Deﬁnes
a
low-‐overhead
mechanism
to
“determine
the
passing
of
Kme”
on
an
HSA

plaVorm

core

GPU

core
core
core

H-CU
H-CU

Mem

HSA MMU

H-CU

The
value
can
be
queried
by
HSAIL
or
HSA
runKme

‒ 

CPU

System Memory

Represented
by
a
64bit
Kmestamp
value
that
does
not
roll
over
and
is
incremented
at
a

constant
rate
in
HW

‒ 

" 

HSA APU

ApplicaKons
and
tools
are
able
to
build
a
consistent
Kmeline
across
all
HSA
agents

HSA
Topology
requirements

HSA Platform
Add-In GPU (optional)

GPU

HSA APU

‒ 

Deﬁnes
system
topology
and
properKes
of
HSA
agents
discoverable
on
an
HSA
plaVorm

by
an
applicaKon
to
take
advantage
of
plaVorm
properKes

‒ 

‒ 

Examples
are
#of
compute
units,
max.
work
item
dimensions,
work
group
size,

work
item
size,
queue
properKes,
…

API’s
like
OpenCL™
and
others
can
leverage
HSA
system
topology
data
to
discover

memory
layout,
compute
unit
properKes
and
other
properKes
and
consistently

report
the
system
topology
for
applicaKons
to
leverage

13
|

THE
HSA
PLATFORM
SYSTEM
ARCHITECTURE
SPECIFICATION
–
AN
OVERVIEW

|

NOVEMBER
12,
2013
|
APU13

Device Local
Memory

HSA GPU

H-CU
CPU
core
core
core
core

System Memory

H-CU
GPU

HSA MMU

System
Firmware

H-CU

H-CU
H-CU

Mem

IOBUS

H-CU

Firmware

Mem

WHERE
TO
FIND
FURTHER
INFORMATION
ON
SYSTEM
ARCHITECTURE?

" 

HSA
FoundaKon
Website:
hrp://www.hsafoundaKon.com

‒ 

The
main
locaKon
for
specs,
developer
info,
tools,
publicaKons
and
many
things
more

‒ 

HSA
Programmer’s
Reference
Manual
v
0.95
has
been
published

‒ 

HSA
PlaVorm
Soiware
Systems
Architecture
SpeciﬁcaKon
is
quickly
nearing
the
0.95
state

‒ 

Will
be
published
aier
raKﬁcaKon
by
the
HSA
FoundaKon
Board
of
Directors

‒ 

Stay
Tuned

14
|

THE
HSA
PLATFORM
SYSTEM
ARCHITECTURE
SPECIFICATION
–
AN
OVERVIEW

|

NOVEMBER
12,
2013
|
APU13

ANY
QUESTIONS?

"  Of
course
there
are,
so
go
ahead
☺

15
|

THE
HSA
PLATFORM
SYSTEM
ARCHITECTURE
SPECIFICATION
–
AN
OVERVIEW

|

NOVEMBER
12,
2013
|
APU13

HC-4015, An Overview of the HSA System Architecture Requirements, by Paul Blinzer

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to HC-4015, An Overview of the HSA System Architecture Requirements, by Paul Blinzer

Similar to HC-4015, An Overview of the HSA System Architecture Requirements, by Paul Blinzer (20)

More from AMD Developer Central

More from AMD Developer Central (20)

Recently uploaded

Recently uploaded (20)

HC-4015, An Overview of the HSA System Architecture Requirements, by Paul Blinzer