Keynote snir sc

  • 7,630 views
Uploaded on

presentation Cray award SC13

presentation Cray award SC13

More in: Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
7,630
On Slideshare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
76
Comments
0
Likes
3

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Supercomputing: The Next 10 Years Marc  Snir   Argonne  Na.onal  Laboratory  &   University  of  Illinois  at  Urbana-­‐Champaign  
  • 2. Past Those  who  cannot  remember  the  past  are  condemned  to   repeat  it  (Santayana)   MCS    -­‐-­‐  Marc  Snir   November  13   2  
  • 3. The Last Great Extinction The  aJack  of  the  killer  micros   ShiL  from  bipolar  vector  processor  to  clusters  of  MOS  microprocessors   Core  Count  of  leading  Top500  System   10000000   1000000   100000   10000   1000   100   10   1   MCS    -­‐-­‐  Marc  Snir   November  13   3  
  • 4. 1990: The Attack of the Killer Micros (Eugene Brooks, 1990) §  Bipolar  technology  had  hit  a  power  wall  (nitrogen  cooling)   §  Alterna.ve  materials  were  too  expensive  /not  ready  (gallium  arsenide)   §  An  alterna.ve  “good  enough”  technology  was  ready   –  MOS  microprocessors  had  been  around    20  years  and  were  a  fast  growing   market   –  MOS  had  a  clear  evolu.on  path  (“Moore’s  Law”)   §  MOS  was  no  beJer  than  bipolar  (in  1991)   Cray  C90     •  244  MHz     •  Vector     •  Vector  registers     •  16  shared-­‐memory  nodes   CM5     •  32  MHz   •  Scalar   •  Cache   •  1024  message-­‐ passing  nodes   §  New  paradigm  took  a  while  to  establish  itself  (CM1,  CM2,  KSR…)   §  Change  in  technology  led  to  change  in  vendors  and  business  model   §  Technology  shiL  required  a  long  and  painful  process  of  code  rewrite   MCS    -­‐-­‐  Marc  Snir   November  13   4  
  • 5. Present The  past  no  longer  is  and  the  future  is  not  yet  (St.  Augus.ne)   MCS    -­‐-­‐  Marc  Snir   November  13   5  
  • 6. 20 Years of (Near) Stability §  One  dominant  programming  model:  Message-­‐Passing  (MPI)   §  One  major  shiL  –  from  single  core  to  mul.core   –  Easy  since  one  can  treat  each  core  as  a  node   10000000   1000000   100000   10000   1000   100   10   mul.core   1   MCS    -­‐-­‐  Marc  Snir   November  13   6  
  • 7. Increasing Instability §  Heterogeneous  memory:  NUMA,  noncoherent  shared  memory,   scratchpads…   §  Heterogenous  processing:  GPUs,  accelerators,  big-­‐small  cores     (NVIDIA,  Xeon  Phi,  ARM  big.LITTLE))   §  Hybrid  Memory  Cube  &  near-­‐memory  processing   §  No  standard  programming  model   10000000   1000000   100000   10000   1000   accelerators   100   10   mul.core   1   MCS    -­‐-­‐  Marc  Snir   November  13   7  
  • 8. On Our Way to the Next Extinction? §  History  repeats  itself:   –  CMOS  technology  has  hit  a  power  wall     •  Clock  speed  is  not  raising   –  Alterna.ve  materials  are  (too)  expensive  /not  ready  (gallium   arsenide  and  other  III-­‐V  materials;  nanowires,  nanotubes)   While  power  consump0on  is  an  urgent  challenge,  its  leakage  or   sta0c  component  will  become  a  major  industry  crisis  in  the  long   term,  threatening  the  survival  of  CMOS  technology  itself,  just  as   bipolar  technology  was  threatened  and  eventually  disposed  of   decades  ago  (ITRS  2011)   §  History  does  not  repeat  itself:   –  There  is  a  much  larger  industrial  base   –  An  alterna.ve  “good  enough”  technology  IS  NOT  ready   –  There  is  much  more  code  that  needs  to  be  rewriJen  if  new  model  is   needed  (>200MLOCs)   MCS    -­‐-­‐  Marc  Snir   November  13   8  
  • 9. Future It  is  difficult  to  make  predic.ons,  especially  about  the  future  (Yogi  Berra)     MCS    -­‐-­‐  Marc  Snir   November  13   9  
  • 10. The End of Moore’s Law is Coming §  Moore’s  Law:  The   number  of  transistors   per  chip  doubles  every   two/three  years   §  Stein’s  Law:  If   something  cannot  go   forever,  it  will  stop   §  Ques.on  is  not   whether  but  when  will   Moore’s  Law  stop   MCS    -­‐-­‐  Marc  Snir   November  13   10  
  • 11. The 7nm Wall (courtesy  J.  Aldun)   ANL-­‐LBNL-­‐ORNL-­‐PNNL     19  November  2013   11  
  • 12. The End of the Road (?) §  Quantum  tunneling  becomes  a  major  obstacle  as  devices  shrinks   –  7-­‐5nm  feature  size  has  long  been  predicted  to  be  the  lower  limit  for   CMOS  devices   •  ITRS  predicts  7.5nm  will  be  reached  in  2024   §  7.5nm  ~  30  atoms  of  silicon   –  No  much  room  for  further  miniaturiza0on,  independent  of   technology!   –  Room  for  clock  increase  (new  materials,  quantum  effect  gates,   cryogenic  devices…)     MCS    -­‐-­‐  Marc  Snir   November  13   12  
  • 13. The Last Mile is the Most Expensive Mile §  New  technologies  are  needed   –  New  materials  (e.g.,  III-­‐V,  germanium  thin  channels,  nanowires,  nanotubes   or  graphene)     –  New  structures  (e.g.,  3D  transistor  structures)     –  New  packages  (e.g.,  HMC,  photonics)   –  New  lithography   –  Control  or  tolerance  of  large  variances  (safety  margins,  resilience,  aging)   §  New  technologies  are  expensive   –  NRE  increases  faster  than  profits  –  forces  consolida.on   –  Only  two  companies  can  sustain  the  investments  needed  to  go  below  22nm   (Intel  and  Samsung)    [Heck,  Kaza,  Pinner]   §  Less  compe..on  &  larger  investments  =  slower  progress   MCS    -­‐-­‐  Marc  Snir   November  13   13  
  • 14. The Future Is Not What It Was (courtesy  J.  Aldun)   ANL-­‐LBNL-­‐ORNL-­‐PNNL     19  November  2013   14  
  • 15. The Path of Least Resistance – Other than Moore §  Industry  goal  is  not  increased  performance;  it  is  increased   ROI.  Industry  will  increasingly  invest  in  alterna.ves  as   increasing  performance  becomes  more  expensive   –  Low  power,  low  cost   –  New  markets:  MEMS,  sensors   –  System  on  a  chip  (smartphone,  tablet)   ✗  Fewer  good  commodity  building  blocks  for  HPC   –  No  low-­‐power/high-­‐flops/high-­‐resilience  CPU   ✔ More  opportuni.es  for  semi-­‐custom  and  integra.on  of   mul.ple  vendor  IP  on  a  chip   §  New  business  model  for  supercompu.ng?   –  Semi-­‐custom  &  system  on  a  chip  integrator    
  • 16. Exascale MCS    -­‐-­‐  Marc  Snir   November  13   16  
  • 17. Identified Issues §  Scale  (billion  threads)   §  Power  (10’s  of  MWaJs)   –  Communica<on:  >  99%  of  power  is  consumed  by  moving   operands  across  the  memory  hierarchy  and  across  nodes   –  Reduced  memory  size:  (communica.on  in  .me)   §  Resilience:  Something  fails  every  hour;  the  machine  is  never   “whole”   –  Trade-­‐off  between  power  and  resilience   §  Asynchrony:  Equal  work  ≠  equal  .me   –  Power  management   –  Error  recovery   MCS    -­‐-­‐  Marc  Snir   November  13   17  
  • 18. My Main Concerns §  Uncertainly  about  underlying  HW  architecture   –  Slower  progress  of  IC  will  necessitate  faster  progress  of  architecture   –  May  not  converge  to  a  new,  stable  model   –  It  is  not  about  por.ng  applica.ons  to  a  new  programming  model  –  it   is  about  designing  applica.ons  for  portability   §  Increased  soFware  complexity   –  Simula.ons  of  complex  systems  +  uncertainty  quan.fica.on  +   op.miza.on…   –  Support  of  complex  workflows  (e.g.,  in  situ  analysis)   –  SoLware  management  of  power  and  failures   –  Heterogeneity   –  Scale  and  .ght  coupling  (tail  of  distribu.on  maJers!)   –  Hypothesis:  soLware  will  con.nue  to  be  dominant  cause  of  failures   MCS    -­‐-­‐  Marc  Snir   November  13   18  
  • 19. Conclusion §  Moore’s  Law  is  slowing  down;  the  slow-­‐down  has  many   fundamental  consequences  –  only  a  few  of  them  explored  in  this   talk   §  HPC  is  the  “canary  in  the  mine”:   –  issues  appear  earlier  because  of  size  and  .ght  coupling   §  Op.mis.c  view  of  the  next  decades:  no  stasis.     –  A  frenzy  of  innova.on  to  con.nue  pushing  current  ecosystem,   followed  by  frenzy  of  innova.on  to  use  totally  different   compute  technologies   §  Pessimis.c  view:    The  end  is  coming   MCS    -­‐-­‐  Marc  Snir   November  13   19  
  • 20. MCS    -­‐-­‐  Marc  Snir   November  13   20  
  • 21. Backup MCS    -­‐-­‐  Marc  Snir   November  13   21  
  • 22. Do We Care? §  It’s  all  about  Big  Data  Now,  simula.ons  are  passé.   §  B***t   §  All  science  is  either  physics  or  stamp  collec0ng.  (Ernest   Rutherford)   –  In  Physical  Sciences,  experiments  and  observa.ons  exist  to   validate/refute/mo.vate  theory.  “Data  Mining”  not  driven  by  a   scien.fic  hypothesis  is  “stamp  collec.on”.   §  Simula.on  is  needed  to  go  from  a  mathema.cal  model  to   predic.ons  on  observa.ons.   –  If  system  is  complex  (e.g.,  climate)  then  simula.on  is  expensive   –  OLen,  models  are  stochas.c  and  predic.ons  are  sta.s.cal  –   complica.ng  both  simula.on  and  data  analysis     MCS    -­‐-­‐  Marc  Snir   November  13   22  
  • 23. Observation Meets Data: Cosmology Record-­‐breaking  applica.on:  3.6  Trillion   Computation Meets par.cles,  14  Pflop/s   Data: The Argonnealman  Habib)   (courtesy  S View Supercomputer Simulation Campaign Mapping the Sky with Survey Instruments HACC=Hardware/Hybrid Accelerated Cosmology Code(s) LSST ‘Cosmic Calibration’ HACC+CCF (Domain science+CS+Math+Stats +Machine learning) LSST Weak Lensing w = -1 w = - 0.9 ‘Precision Oracle’ Emulator based on Gaussian Process Interpolation in HighDimensional Spaces CCF= Cosmic Calibration Framework Markov chain Monte Carlo Observations: Statistical error bars will ‘disappear’ soon!