HC-4015, An Overview of the HSA System Architecture Requirements, by Paul Blinzer
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

HC-4015, An Overview of the HSA System Architecture Requirements, by Paul Blinzer

  • 1,642 views
Uploaded on

Presentation HC-4015 by Paul Blinzer at the AMD Developer Summit (APU13) November 11-13, 2013.

Presentation HC-4015 by Paul Blinzer at the AMD Developer Summit (APU13) November 11-13, 2013.

More in: Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,642
On Slideshare
1,642
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
65
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. THE  HSA  SYSTEM  ARCHITECTURE   REQUIREMENTS  –  AN  OVERVIEW   PAUL  BLINZER,  FELLOW,  HSA  SYSTEM  SOFTWARE,  AMD   SYSTEM  ARCHITECTURE  WORKGROUP  CHAIR,  HSA   FOUNDATION   1   |      THE  HSA  PLATFORM  SYSTEM  ARCHITECTURE  SPECIFICATION  –  AN  OVERVIEW    |      NOVEMBER  12,  2013  |  APU13    
  • 2. AGENDA   !  What  is  the  HSA  FoundaKon?   !  The  System  Architecture  Workgroup  and  its  goals   !  What  defines  HSA  plaVorms  and  components?   !  The  Shared  Virtual  Memory  requirements   !  The  HSA  Memory  Model  Requirements   !  The  HSA  Queuing  Architecture   !  Some  other  requirements  set  by  the  System  Architecture  specificaKon   !  Where  to  find  further  informaKon   !  Q  &  A   2   |      THE  HSA  PLATFORM  SYSTEM  ARCHITECTURE  SPECIFICATION  –  AN  OVERVIEW    |      NOVEMBER  12,  2013  |  APU13    
  • 3. WHAT  IS  THE  HSA  FOUNDATION?   "  This  is  the  short  version…   !  The  HSA  FoundaKon  is  a  not-­‐for-­‐profit  consorKum  of  SOC  and  SOC  IP   vendors,  OEMs,  academia,  OSVs  and  ISVs  defining  a  consistent   heterogeneous  plaVorm  architecture  to  make  it  dramaKcally  easier   to  program  heterogeneous  parallel  devices   !  !  It  spans  mulKple  host  plaVorm  architectures  and  programmable  data  parallel  components  (e.g.   CPU:  x86,  ARM,  MIPS,  …  device  types:  GPUs,  DSPs,  …)  to  work  collaboraKvely  within  the  same   HSA  system  architecture   It  defines  a  set  of  specificaKons  that  define  HW  &  SW  plaVorm  requirements  to  enable   applicaKons  to  target  the  feature  set  from  high  level  languages  and  APIs   !  !  !  It’s  not  a  replacement  to  e.g.  OpenCL  but  complementary  to  it,  defining  the  system  level   properKes  “below  the  API”,  leveraged  by  applicaKon-­‐  and  system  soiware   Conformance The  System  Architecture  specificaKon  defines  the  required  component  and  plaVorm  features  for   HSA  compliant  components   This  presentaKon  is  an  overview  of  the  current  System  Architecture   definiKons  and  does  not  represent  a  complete  or  “final”  state   !  Tools that  one  is  the  specificaKon  itself  when  available  ☺   3   |      THE  HSA  PLATFORM  SYSTEM  ARCHITECTURE  SPECIFICATION  –  AN  OVERVIEW    |      NOVEMBER  12,  2013  |  APU13     System Runtime Specification Programmer’s Reference Manual Platform (Software) System Architecture Specification
  • 4. THE  SYSTEM  ARCHITECTURE  WORKGROUP  OF  THE  HSA  FOUNDATION   "  Who  ParKcipates  and  what  are  the  goals?     "  The  workgroup  membership  spans  a  wide  variety  of  IP  and  plaVorm  architecture  owners   ‒  Several  host  plaVorm  architectures  are  targeted     "  The  specificaKons  define  a  common  set  of  plaVorm  properKes  that  provide  a  dependable   hardware  and  system  foundaKon  for  applicaKon  soiware,  libraries  and  runKmes   "  The  goal  is  to  eliminate  “weak  points”  in  the  system  soiware-­‐  and  hardware  architecture   of  tradiKonal  plaVorms  that  lead  to  unnecessary  overhead  in  the  operaKons  of  data   parallel  workloads   "  The  main  deliverables  are:   ‒  Well-­‐defined,    consistent  and  dependable  memory  model  all  HSA  agents  operate  in   ‒  Share  access  to  process  virtual  memory  between  HSA  agents  (“ptr-­‐is-­‐ptr”)   ‒  Low-­‐latency  workload  dispatch  contained  in  user-­‐mode  queues   ‒  Scalability  across  a  wide  range  of  plaVorms   ‒  These  properKes  are  leveraged  in  the  “HSA  Programmer’s  Reference”,  HSAIL  and  HSA  RunKme   specificaKons   4   |      THE  HSA  PLATFORM  SYSTEM  ARCHITECTURE  SPECIFICATION  –  AN  OVERVIEW    |      NOVEMBER  12,  2013  |  APU13    
  • 5. WHAT  DEFINES  HSA  PLATFORMS  AND  COMPONENTS?   "  "  In  short,  an  HSA  compaKble  plaVorm  consists  of  “HSA  agents”  (hardware   components  that  parKcipate  in  the  HSA  memory  model)  adhering  to  the  various   system  architecture  requirements   Each  HSA  agent  adheres  to  the  same  queuing  &  dispatch  mechanics,  low-­‐latency   synchronizaKon  primiKves,  memory  coherence  and  data  visibility  (memory  model)   requirements   ‒  Defined  mainly  in  the  “(Soiware)  System  Architecture”  specificaKon   ‒  The  HSAIL  and  “Programmer’s  Reference  Manual”  specificaKons  define  the  soiware  execuKon  model   ‒  Architected  mechanisms  to  enqueue  and  dispatch  workloads  from  one  HSA  agent  queue  to  another  eliminate  the  need  to   use  the  host  CPU  for  these  purposes  for  a  lot  of  scenarios   ‒  Architected  infrastructure  allows  exchanging  data  with  non-­‐HSA  compliant  components  in  a  plaVorm   ‒  Fundamental  data  types  are  naturally  aligned   5   |      THE  HSA  PLATFORM  SYSTEM  ARCHITECTURE  SPECIFICATION  –  AN  OVERVIEW    |      NOVEMBER  12,  2013  |  APU13    
  • 6. WHAT  DEFINES  HSA  PLATFORMS  AND  COMPONENTS?   ‒  There  are  two  different  machine  models  (“small”  and  “large”)  that  target  different  funcKonality  levels   ‒  It  takes  into  account  different  feature  requirements  for  different  plaVorm  environments     ‒  In  all  cases,  the  same  HSA  applicaKon  programming  model  is  used  to  target  HSA  agents  and  provides  the  same  power– efficient  and  low-­‐latency    dispatch  mechanisms,  synchronizaKon  primiKves  and  SW  programming  model   ‒  ApplicaKons  wriren  to  target  HSA  small  model  machines  will  generally  work  on  large  model  machines,  too   ‒  If  the  large  model  plaVorm  and  host  OperaKng  System  provides  a  32bit  process  environment   Proper&es   Small  Machine  Model   Large  Machine  Model   PlaVorm  targets   embedded  or  personal  device  space  (controllers,   smartphones,  etc.)   PC,  workstaKon,  cloud  Server,  etc  running  more  demanding  workloads   NaKve  pointer  size   32bit   64bit  (+  32bit  ptr  if  32bit  processes  are  supported)   FloaKng  point  size   Half  (FP16*),  Single  (FP32)  precision     Half  (FP16*),  Single  (FP32),  Double  (FP64)  precision   Atomic  ops  size   32bit   32bit,  64bit   *min.  Load  and  store  on  memory   6   |      THE  HSA  PLATFORM  SYSTEM  ARCHITECTURE  SPECIFICATION  –  AN  OVERVIEW    |      NOVEMBER  12,  2013  |  APU13    
  • 7. THE  SHARED  PROCESS  VIRTUAL  ADDRESS  SPACE  REQUIREMENTS(1)   ‒  The  Basis  of  “ptr-­‐is-­‐ptr”   "  Each  HSA  agent  adheres  to  the  same  user  process  address  space  view  as  the  host  CPU   ‒  "  The  process  address  view  is  established  by  the  hardware’s  page  table  mappings   ‒  ‒  ‒  "  HSA  operates  in  a  “flat”  virtual  address  space,  using  64bit  &  32bit  ptrs  depending  on  applicaKon/machine  model   ‒  A  pointer  value  references  the  same  memory  for  every  HSA  agent   ‒  An  HSA  agent  can  “walk”  or  update  linked  data  structures  directly  without  any  assistance  from  a  host  CPU   HSA  agent  virtual  address  range  matches  the  host  plaVorm  (e.g.  48bit,  32bit,  …)   HSA  agents  always  operate  at  “user  privilege”  of  the  host  CPU,  policy  enforced  by  system   HSA  agents  observe  the  same  memory  page  table  arributes  (cache,  read,  write,  …)  and  page  sizes  of  the  host  CPU,  policy  enforced   by  system   HSA  agents  support  page  faults,  allowing  to  directly  operate  on  pageable  memory  as   provided  by  the  OperaKng  System  environment   ‒  ‒  For  allocated  pageable  memory,  System  Soiware  takes  page  faults,  commits  memory,  loads  contents  from  backup  store  and   restarts  execuKon  like  it  does  for  any  access  from  host  CPU  threads   There  is  no  tedious  device  buffer  copy,  explicit  page  lock  or  similar  needed  to  access  data  in  allocated  memory  by  an  HSA  agent   directly!   7   |      THE  HSA  PLATFORM  SYSTEM  ARCHITECTURE  SPECIFICATION  –  AN  OVERVIEW    |      NOVEMBER  12,  2013  |  APU13    
  • 8. THE  SHARED  PROCESS  VIRTUAL  ADDRESS  SPACE  REQUIREMENTS(2)   "  The  basis  of  “ptr-­‐is-­‐ptr”   "  On  AMD  processor-­‐based  pla9orms,  the  IOMMUv2  device   provides  the  HSAMMU  translaKon  services  via  standard  PCI   Express™  ATS/PRI  protocols  to  HSA  compliant  hardware   when  accessing  memory  from  the  HSA  agent   ‒  ‒  "  Device Table base register Event Counter registers HSA MMU (IOMMUv2 device) Command Page Req Buffer Log base register base register Event Log base register System memory IOMMUv2  integraKon  into  OS  memory  manager  provides  the  low-­‐level   infrastructure  (e.g.  in  Linux®  kernel)   Different  host  plaVorm  architectures  may  use  different  detail  mechanisms  here   HSA MMU Translation Tables (per Process, PASID) Page Service Request Log Event Log 8   |      THE  HSA  PLATFORM  SYSTEM  ARCHITECTURE  SPECIFICATION  –  AN  OVERVIEW    |      NOVEMBER  12,  2013  |  APU13     I/O page tables Command Buffer The  implementaKon  detail  is  not  relevant  to  the  applicaKon  and  dealt  within  the   system  soiware  (e.g.  OS)   Host translation Device Table ‒  As  long  as  it  follows  the  HSA  Sysarch  requirements,  it  is  ok     Interrupt Remapping Table ‒  Guest & host translation separate  translaKon  levels  are  used  (see  block  diagram)   ImplementaKon  of  shared  virtual  address  space  by  other   vendors  on  other  host  plaVorms  may  be  different   Perf Counters & RAS Info (opt.) Peripheral Page Requests (PPR) Service The  HSAMMU  funcKonality  is  provided  in  addiKon  to   IOMMU  funcKonality  used  in  device  virtualizaKon   ‒  "  HSA MMU Data structures
  • 9. THE  HSA  MEMORY  MODEL  REQUIREMENTS   "  What  are    Its  key  properKes?   "  A  memory  model  defines  how  writes  by  one  work  item  or  agent  becomes  visible  to  other   work  items  and  agents,  rules  that  need  to  be  adhered  to  by  compilers  and  applicaKon   threads   ‒  ‒  "  ‒  Naturally  aligned  on  size,  small  machine  model  supports  32bit,  large  machine  model  supports  32bit  and  64bit   Cache  Coherency  between  HSA  agents  (&  host  CPU)  is  maintained  by  default   ‒    Inherently  maps  to  many  CPU  and  device  architectures  very  easily   Efficient  sequenKal  consistency  mechanisms  supported  to  fit  high-­‐level  language  programming  models   A  consistent,  full  set  of  atomic  operaKons  is  available   ‒  "  Important  to  define  scope  for  performance  opKmizaKons  in  the  compiler,  to  allow  reordering  of  code  in  the  Finalizer   At  its  base,  the  HSA  memory  model  is  based  on  a  “relaxed”  load  acquire/store  release   model   ‒  "  It  defines  visibility  and  ordering  rules  of  write  and  read  events  across  work  items,  HSA  agents  and  interacKons  with  non-­‐HSA   components  in  the  system   key  feature  of  the  HSA  system  &  plaVorm  environment   9   |      THE  HSA  PLATFORM  SYSTEM  ARCHITECTURE  SPECIFICATION  –  AN  OVERVIEW    |      NOVEMBER  12,  2013  |  APU13    
  • 10. THE  HSA  QUEUEING  ARCHITECTURE  REQUIREMENTS(1)   "  The  basis  of  the  workload  dispatch  on  HSA "      The  queue  dispatch  occurs  through  architected  queue  packets  (“Architected   Queuing  Language”,  AQL  )  that  references  the  work  items  &  parameters   ‒  Dispatch  to  HW  occurs  directly  in  user  mode,  eliminaKng  a  notable  source  of  latency  overhead  in  tradiKonal  architectures!   ‒  Two  architected  packet  types  exist  at  the  moment,  dispatch  and  barrier  packets   ‒  ‒  "  Each  queue  is  defined  by  several  architected  parameters  (type,  base  address,  size,  read  index,  write  index,  …)  that  allow   targeKng  the  queue  from  other  HSA  agents  and  the  host  CPU   The  design  allows  an  HSA  agent  on  the  plaVorm  to  build  &  dispatch  jobs  to  a  queue  using  HSA  architected  interfaces   ApplicaKons  and  runKme  can  build  different  queuing  models  on  top  of  the   infrastructure   ‒  Single-­‐producer,  MulK-­‐producer  queuing  models,  lock-­‐free  dispatch,  …  are  all  opKons  SW  can  implement  on  top  of  the   system  architecture’s  queue  definiKon  to  fit  the  use  model   10   |      THE  HSA  PLATFORM  SYSTEM  ARCHITECTURE  SPECIFICATION  –  AN  OVERVIEW    |      NOVEMBER  12,  2013  |  APU13    
  • 11. THE  HSA  QUEUEING  ARCHITECTURE  REQUIREMENTS(2)   "  The  basis  of  the  workload  dispatch  on  HSA "      The  HSA  System  Architecture  defines  a  user  mode  queue  based  dispatch   mechanism     ‒  ‒  "  Each  queue  is  only  valid  within  that  process  context  and  represents  a  virtual  enKty  that  is  scheduled  to  hardware   The  job  execuKon  occurs  at  “user  privilege”  like  the  rest  of  the  applicaKon  code,  enforced  by  system  architecture   Each  HSA  agent  allows  for  mulKple  queues  per  applicaKon  process   ‒  HSA  defines  in-­‐order  dispatch  semanKcs  of  work  items  within  queues  for  efficient  HW  implementaKon   ‒  ‒  "  HW  may  execute  dispatch  packets  “out-­‐of-­‐order”,  if  no  dependencies  exist  and  in-­‐order  semanKcs  are  followed   externally   “Out  of  order”  execuKon  applies  between  queues,  with  explicit,  memory  based  synchronizaKon  mechanisms  between  them   as  needed   It  is  “cheap”  to  create  queues  in  HSA,  so  applicaKons  can  have  one  queue  per  HSA   agent  for  each  applicaKon  thread,  or  leveraging  mulKple  HSA  user  queues  per   thread  if  needed   ‒  This  gives  applicaKons  a  lot  of  flexibility  to  structure  the  queue  layout  to  match  the  problem  instead  of  trying  to  fit  the   problem  to  work  with  one  or  a  few  queues  only   11   |      THE  HSA  PLATFORM  SYSTEM  ARCHITECTURE  SPECIFICATION  –  AN  OVERVIEW    |      NOVEMBER  12,  2013  |  APU13    
  • 12. OTHER  REQUIREMENTS  SET  BY  THE  HSA  SYSTEM  ARCHITECTURE     "  Miscellaneous  menKon,  but  nevertheless  important  to  make  it  work  well…   "  HSA  Memory  based  signaling  and  synchronizaKon  primiKves   ‒  Defines  memory  based  semanKcs  to  synchronize  with  work  items  processed  by  HSA  agents     ‒  e.g.  32bit  or  64bit  value,  content  update,  wait  on  value  by  HSA  agents  and  AQL  packets   ‒  ‒  Allows  one-­‐to-­‐one  and  one-­‐to-­‐many  signaling   ‒  The  signaling  semanKcs  follow  atomicity  requirements  defined  in  the  memory  model     ‒  "  Hardware-­‐assisted,  power-­‐efficient  &  low-­‐latency  way  to  synchronize  execuKon  of  work  items  between  threads   RunKme  &  applicaKon  SW  can  use  infrastructure  to  build  mutexes,  semaphores,  other  synchronizaKon    primiKves   HSA  Cache  Coherency  Domains   ‒  Defines  the  scope  of  HSA  cache  coherency  and  relate  to  other  non-­‐HSA  system  resource  operaKons   ‒  Associated  with  the  memory  model  requirements   ‒  Architected  way  to  interact  with  non-­‐HSA  plaVorm  infrastructure  (e.g.  graphics)   12   |      THE  HSA  PLATFORM  SYSTEM  ARCHITECTURE  SPECIFICATION  –  AN  OVERVIEW    |      NOVEMBER  12,  2013  |  APU13    
  • 13. OTHER  REQUIREMENTS  SET  BY  THE  HSA  SYSTEM  ARCHITECTURE     "  Miscellaneous  menKon,  but  nevertheless    important   HSA Platform - Simple "  HSA  system  Kmestamp  requirements     ‒  ‒  Defines  a  low-­‐overhead  mechanism  to  “determine  the  passing  of  Kme”  on  an  HSA   plaVorm   core GPU core core core H-CU H-CU Mem HSA MMU H-CU The  value  can  be  queried  by  HSAIL  or  HSA  runKme   ‒  CPU System Memory Represented  by  a  64bit  Kmestamp  value  that  does  not  roll  over  and  is  incremented  at  a   constant  rate  in  HW   ‒  "  HSA APU ApplicaKons  and  tools  are  able  to  build  a  consistent  Kmeline  across  all  HSA  agents     HSA  Topology  requirements   HSA Platform Add-In GPU (optional) GPU HSA APU ‒  Defines  system  topology  and  properKes  of  HSA  agents  discoverable  on  an  HSA  plaVorm   by  an  applicaKon  to  take  advantage  of  plaVorm  properKes   ‒  ‒  Examples  are  #of  compute  units,  max.  work  item  dimensions,  work  group  size,   work  item  size,  queue  properKes,  …   API’s  like  OpenCL™  and  others  can  leverage  HSA  system  topology  data  to  discover   memory  layout,  compute  unit  properKes  and  other  properKes  and  consistently   report  the  system  topology  for  applicaKons  to  leverage   13   |      THE  HSA  PLATFORM  SYSTEM  ARCHITECTURE  SPECIFICATION  –  AN  OVERVIEW    |      NOVEMBER  12,  2013  |  APU13     Device Local Memory HSA GPU H-CU CPU core core core core System Memory H-CU GPU HSA MMU System Firmware H-CU H-CU H-CU Mem IOBUS H-CU Firmware Mem
  • 14. WHERE  TO  FIND  FURTHER  INFORMATION  ON  SYSTEM  ARCHITECTURE?   "  HSA  FoundaKon  Website:  hrp://www.hsafoundaKon.com   ‒  The  main  locaKon  for  specs,  developer  info,  tools,  publicaKons  and  many  things  more   ‒  HSA  Programmer’s  Reference  Manual  v  0.95  has  been  published   ‒  HSA  PlaVorm  Soiware  Systems  Architecture  SpecificaKon  is  quickly  nearing  the  0.95  state     ‒  Will  be  published  aier  raKficaKon  by  the  HSA  FoundaKon  Board  of  Directors   ‒  Stay  Tuned   14   |      THE  HSA  PLATFORM  SYSTEM  ARCHITECTURE  SPECIFICATION  –  AN  OVERVIEW    |      NOVEMBER  12,  2013  |  APU13    
  • 15. ANY  QUESTIONS?   "  Of  course  there  are,  so  go  ahead  ☺   15   |      THE  HSA  PLATFORM  SYSTEM  ARCHITECTURE  SPECIFICATION  –  AN  OVERVIEW    |      NOVEMBER  12,  2013  |  APU13