Bitfusion Nimbix Dev Summit Heterogeneous Architectures

444 views

Published on

Bitfusion Nimbix Dev Summit Heterogeneous Architectures

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
444
On SlideShare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
22
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Bitfusion Nimbix Dev Summit Heterogeneous Architectures

  1. 1. HETEROGENEOUS  ARCHITECTURES:  A  SURVEY  AND  OVERVIEW  FOR   DEVELOPERS 1   MAZHAR  MEMON CTO,  BITFUSION.IO
  2. 2. 2   abstract  and  slow  à    ß  complex  and  fast   Time  à   Delivering  performance  and  efficiency  to     today’s  applica<ons  is  becoming  more  difficult             The  problem  in  compuHng
  3. 3. The  soKware  world  is  increasingly   abstract
  4. 4. Transistor  scaling  is  ending
  5. 5. Moore’s  law  slowing  -­‐>  complexity Era  of  frequency   Era  of  mul<-­‐core   Era  of  many-­‐core  
  6. 6. 6   abstract  and  slow  à    ß  complex  and  fast   Time  à   Help!           The  problem  in  compuHng
  7. 7. The  soluHon(s) •   Hardware   •  Specialized  hardware  required  to  keep  up  with  accelerated  performance  curve   •  Encourage  accessibility:    low  hourly  pricing   • SoIware   •  Abstrac<ons:    Libraries,  APIs,  tool  chain  up  to  compiler  IR,  use  transla<ons  where  possible   •  Ecosystem:  Learning  materials,  user  groups,  university  engagement   •   What  makes  this  happen:    Developers   7   Remainder  of  this  talk  is  about  the  hardware  out  there  and  how  to  develop  for  them  
  8. 8. Current  State  of  Developer  Experience   for  Accelerators 8   -­‐  Update  to  the  right  Opera<ng  System   -­‐  Install  Vendor  Tool-­‐flows  which  only   work  on  specific  Opera<ng  Systems   -­‐  SeXng  up  the  Environment  and   Licenses   -­‐  Installing  the  Board     -­‐  SeXng  up  the  board   -­‐  Numerous  pages  of  documenta<on   Unhappy  Developer   Experience  L   In  many  cases  developers  give  up   before  even  star<ng  real  work  due   to  this  poor  developer  experience  
  9. 9. Overview  of  available  compute  devices 9   …from  easiest  to  hardest  
  10. 10. Integrated  GPUs •   Architecture:    SIMD,  shared  resource  architecture   •   Targeted  workloads:  Medium-­‐sized  offloads,  latency-­‐sensi<ve,  cost-­‐sensi<ve,  media   •   Programming  models:    OpenCL,  DirectCompute,  C++  AMP,  SPIR,  HSAIL   •   Ecosystem  maturity:    High   • Links:   •  haps://soIware.intel.com/en-­‐us/ar<cles/intel-­‐graphics-­‐developers-­‐guides   10  
  11. 11. Discrete  GPUs •   Architecture:    SIMD,  discrete  coprocessor  configura<on   •   Targeted  workloads:  Large-­‐sized  offloads,  throughput-­‐sensi<ve,  parallel  structured   •   Programming  models:    CUDA,  OpenCL,  DirectCompute,  C++  AMP,  SYCL,  SPIR,  HSA   •   Ecosystem  maturity:    High   • Links:   •  hap://docs.nvidia.com/cuda/cuda-­‐geXng-­‐started-­‐guide-­‐for-­‐linux   11  
  12. 12. MICs •   Architecture:    Many  GP  cores,  (co)processor  configura<on   •   Targeted  workloads:    Large-­‐sized  offloads,  throughput-­‐sensi<ve,  generic  HPC   •   Programming  models:    OpenCL,  OMP,  MPI,  general  x86   •   Ecosystem  maturity:    High   • Links:     •  haps://soIware.intel.com/en-­‐us/ar<cles/intel-­‐xeon-­‐phi-­‐coprocessor-­‐developers-­‐quick-­‐start-­‐guide   12  
  13. 13. FPGAs •   Architecture:    LUTs+HPs+Fabric,  coprocessor  configura<on   •   Targeted  workloads:    extreme  pipelining  or  fanout,  systolic,  fast  configura<on(?)   •   Programming  models:    VHDL,  Verilog,  HLS,  OpenCL   •   Ecosystem  maturity:    Medium   • Links:   • haps://www.altera.com/products/design-­‐soIware/embedded-­‐soIware-­‐developers/opencl/ overview.highResolu<onDisplay.html   • hap://www.xilinx.com/products/design-­‐tools/soIware-­‐zone/sdaccel.html   13  
  14. 14. Automata •   Architecture:    NFA  with  programmable  fabric   •   Targeted  workloads:  MISD,  paaern  matching,  parallel  unstructured   •   Programming  models:    API,  ANML,  regexp   •   Ecosystem  maturity:  Low   • Links:  hap://micronautomata.com/     14  
  15. 15. Enabling  developers:   Accessibility:    sHll  a  problem 15  
  16. 16. Vision   To  bring  supercompu<ng  for  the  masses  by:   ◦  building  soIware  to  automa<cally  realize  the  benefits  of  heterogeneous  hardware   16  
  17. 17. Enabling  scaling  automaHcally Horizontal  Scaling   with  BF  Boost  remo<ng  technology   Ver5cal  Scaling   with  BF  Boost  spliXng  technology     Heterogeneous  Scaling   with  BF  Boost  intercep<on  technology   cpu  system   gpu  system   3X    Machine  learning  with  Caffe,  Torch:  2  local  vs.  8  remote  GPUs   3.5X  Rendering  with  Blender:  1  local  vs.  4  remote  GPUs   20X    Rendering  with  Blender:  4  remote  GPUs   8X    Image  Processing  with  ImageMagick:  1  vs.  12  local  GPUs   10X    Computer  Vision  (face  detect)  with  OpenCV:  12  CPU  cores  vs.  4  GPUs   7X    Computa5onal  Science  with  NAMD:  2  remote  GPUs  
  18. 18. BiYusion  Tech:  Remote  VirtualizaHon 18   Features   •  Scale-­‐out:  connect  one  server  to  many  accelerators  to  boost  performance   •  Scale-­‐in:    connect  many  servers  to  few  accelerators  to  pool  resources  and  lower  cost   •  Service  discovery:  local  and  remote  machines  can  discover  themselves  on  demand   without  complex  or  <me  consuming  configura<on.   •  Virtual  pools:    Segment  resources  by  class  of  users  or  hardware   Remote  virtualiza<on  enables  varied  virtual  configura<ons  by   combining  or  sharing  the  resources  of  local  and  remote  servers   •  Binary-­‐level  API  intercep<on   •  Distribute  work  across  local   and  remote  machines   •  Advanced  performance   features  including   synchroniza<on  elision  and   data  pipelining   applica5on   remote  servers   local  server   •  SoIware  sees  all  new  hardware  as   if  it  were  directly  connected     •  No  change  to  soIware  required   applica5on   virtual  server  with     combined  resources   System  view   Applica5on  view   data  and   compute   pipelining   Advanced  caching  and   data  directories   Auto  service   discovery,   metering   Func<on   redirec<on  for   advanced   coprocessors  
  19. 19. Helping  to  solve  accessibility 19   scale-­‐out   pooling   Inexpensive   micro-­‐client   Shared  Heterogeneous   server  
  20. 20. offer  most  affordable 20   Heterogeneous   cloud   Developer   machine   high  performance  developer  instances and •  Binary-­‐level  API  intercep<on   •  Distribute  work  across  local   and  remote  machines   •  Advanced  performance   features  including   synchroniza<on  elision  and   data  pipelining   applica5on   remote  servers   local  server   data  and   compute   pipelining   Advanced  caching  and   data  directories   Auto   service   discovery,   metering   Func<on  redirec<on   for  advanced   coprocessors  
  21. 21. SUPERCOMPUTING  TO  THE  MASSES 21  
  22. 22. Quantum  computers •   Architecture:   •   Targeted  workloads:   •   Programming  models:   •   Ecosystem  maturity:   22  
  23. 23. ApplicaHon  specific  processors •   Architecture:  Varied   •   Targeted  workloads:  App  specific:  molecular  simula<ons,  dnn   •   Programming  models:  API   •   Ecosystem  maturity:    Zero-­‐ish   23  

×