GridEngine Summit Keynote about Uber Cloud Experiment

Uploaded on

Wolfgang Gentzsch Keynote at Univa GridEngine Summit 2012 about the Uber-Cloud Experiment (aka HPC Experiment). More info at …

Wolfgang Gentzsch Keynote at Univa GridEngine Summit 2012 about the Uber-Cloud Experiment (aka HPC Experiment). More info at

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads


Total Views
On Slideshare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide


  • 1. Status  September  15  2012  Keynote   Half-­‐Time  Results  
  • 2. Why  this  Experiment  ?   Access  to  remote  computing  resources    Focus  on:  simulation  applications  in  digital                                                                                 manufacturing  and  High  Performance                                                                                                                             Technical  Computing  (HPTC)    Focus  on:  remote  resources  in  HPTC  Centers  &  in  HPTC  Clouds    Observation:  while  business  clouds  are  becoming  widely  used,  the   acceptance  of  simulation  clouds  in  industry  is  still  in  an  early  adopter   stage  (CAE,  Bio,  Finance,  Oil  &  Gas,  DCC)    Reason:  big  difference  between  business  and  simulation  clouds    Barriers:  Complexity,  IP,  data  transfer,  software  licenses,                                       parallel  communications,  specific  system  requirements,                                                     data  security,  interoperability,  cost,  etc.  
  • 3. Our  Goal  for  the  Experiment    Current  Experiment:  August  –  October  2012    Form  a  community  around  the  benefits  of  HPTC  in  the  cloud    Hands-­‐on  exploring  and  understand  the  challenges  with   digital  manufacturing  in  the  Cloud    Study  each  end-­‐to-­‐end  process  and  find  ways  to  overcome   these  challenges    Document  our  findings  
  • 4. Participants   Some  of  our  Providers   Some  of  our  Resource  Providers   want  to  be  anonymous    Media  Sponsor  
  • 5. Participants   Some  of  our  ISVs  want  Some  of  our  Software  Providers   to  be  anonymous    
  • 6. Participants   Some  of  our  HPC  Experts      Some  of  our  HPC  Experts   want  to  be  anonymous    
  • 7. Participants   Many  of  our  industry  end-­‐users    Some  of  our  End-­‐Users     want  to  be  anonymous    
  • 8. Where  are  we  with  the  experiment    We  currently  have  over  170  participating  organizations  and   individuals    Experiment  reaches  to  every  corner                                                                                                                     of  the  globe,    participants  are                                                                                                                     coming  from  22  countries    Participants  sign  up  through  and      25  teams  have  been  formed                                                                                                                                             and  are  active    
  • 9. Participants  by  geography   %  of  Site  Traffic   US   36  %   Germany   12  %   Italy   6  %   Australia   6  %   Spain   5  %   UK   5  %   Russia   3  %   France   3  %   Other   24  %  
  • 10. Teams,  it’s  all  about  teams    Anchor  Bolt     Cement  Flow     Wind  Turbines    Resonance     Sprinkler     Combustion    Radiofrequency     Space  Capsule     Blood  Flow    Supersonic     Car  Acoustics     ChinaCFD    Liquid-­‐Gas     Dosimetry     Gas  Bubbles    Wing-­‐Flow     Weathermen     Side  impact    Ship-­‐Hull     ColombiaBio  
  • 11. Building  the  teams    An  end-­‐user  joins  the  experiment    Organizers  (Burak,  Wolfgang)  identify  perfect  team  expert    Organizers  contact  the  ISV  and  ask  to  join  the  experiment    End-­‐user  and  team  expert  analyze  resource  requirements  and  send   to  organizers    Organizers  suggest  one  or  two  computational                                                                       resource  providers      After  all  four  team  members  agree  on  the  right                                                                     resource  provider,  the  team  is  ready  to  go  
  • 12. Bumps  on  the  road  –  the  top  4    Delays  because  of  vacation  times  in  August                                                                              &  other   projects  (internal,  customer)  from                                                                                  our  participants    Getting  HPC  participants  was  quite  easy,                                                                                                                       getting  CAE  participants  was  a  challenge    Participants  can  spend  only  small  portion  of  their  time    Learning  the  access  and  usage  processes  of  our  software  and  compute   resource  providers  can  take  many  days    Process  automation  capabilities  of  providers  vary  greatly.  Some  have   focused  on  enrollment,  registration  automation,  while  others  haven’t.      Experiment  organizers’  lack  of  automation,  currently  the  whole  end-­‐to-­‐ end  process  is  manual  (intentionally)    Getting  regular  updates  from  Team  Experts  is  a  challenge  because  this   is  not  their  day  job    Consider:  the  sample  size  is  still  small  
  • 13. Are  we  discovering  hurdles?    Reaching  end-­‐users  who  are  ready  and  willing  to  engage  in   HPTC  and  especially  HPTC  in  the  Cloud.      About  half  of  our  participants  want  to  remain  anonymous,  for   different  reasons  (failure,  policies,  internal  processes,…)    HPC  is  complex.  At  times  it  requires  multiple  experts.    Matching  end-­‐user  projects  with  the  appropriate  resource   providers  is  tricky  and  critical  to  the  teams  success.    Resource  providers  (e.g.  HPC  Centers)  often  face                                         internal  policy  and  legal  hurdles      Sometimes,  the  1000  cpu-­‐core  hours  are  a  limit  
  • 14. Let’s  hear  from  Team  Experts!  Chris  Dagdigian    Co-­‐founder  and  Principal  Consultant  BioTeam  Inc   Ingo  Seipp     Science  +  Computing  
  • 15. Team  2  Short  Status  Report        HPC  Expert:     End  User:    Anonymous  
  • 16. Team  2  Overview   OUR  END  USER   OUR  HPC  SOFTWARE  •  Individual  &  organization  has  requested     CST  Studio  Suite   anonymity     “Electromagnetic  Simulation”  •  Goal:  Hybrid  model  in  which  local  and      cloud  resources  leveraged  simultaneously     Diverse  Architecture  Options  •  We  can  say  this:   1.  Local  Windows  Workstation   –  It’s  a  medical  device   2.  CST  Distributed  Computing   –  Simulating  new  probe  design  for  a   particular  device   3.  CST  MPI   –  Tests  involve  running  at  simulation  size  &   resolution  that  cannot  be  performed     OS  Diversity   internally*     Various  combinations  of  Windows   –  *Using  fake  data  at  this  time   and  Linux  based  machines  
  • 17. The  first  Design  
  • 18. First  Design  Failed  Miserably   The  Good   The  Bad  •  Looked  pretty  on  paper!     Can’t  launch  GPU  nodes  from  •  Total  isolation  of  systems  via   inside  a  VPC   Amazon  VPC  and  custom  subnets     …  so  we  ran  them  on  “regular”  •  VPC  allows  for  “elastic”  NIC   EC2     devices  w/  persistent  MAC   addresses.   –  Awesome  for  license  servers     …  and  this  did  not  work  well     NAT  translation  between  EC2  •  VPN  Server  allowed  end-­‐user   and  VPC  private  IP  addresses   remote  resources  to  directly  join   our  cloud  environment   wreaked  havoc  with  CST   Distributed  Computing  Master  
  • 19. Current  Design  
  • 20. Second  Design  –  Good  So  Far   The  Good   The  Bad  •  It  works;  we  are  running  tasks     Lost  the  persistent  MAC   across  multiple  GPU  solver  nodes   address  when  we  left  the   right  now   VPC;  need  to  treat  our  license  •  Security  surprisingly  good  despite   server  very  carefully   losing  VPC  isolation   –  EC2  Security  Groups  block  us  from     Unclear  today  how  we  will   the  rest  of  Internet  &  AWS   attempt  to  incorporate   remote  solver  &  workstation  •  CST  License  server  now  running  on   resources  at  end-­‐user  site   much  cheaper  Linux  instance     We  know  from  attempt  #1   that  CST  and  NAT  translation  •  Clear  path  to  elasticity  and  high-­‐ don’t  work  well  together  …   scale  simulation  runs  
  • 21. Next  Steps    Run  at  large  scale    Refine  our  architecture     We  might  move  the  License  Server  back  into  a  VPC  in  order  to   leverage  the  significant  benefit  of  persistent  MAC  addresses  and   elastic  NIC  devices     Figure  out  how  bring  in  the  remote  resources  sitting  at  end  user   site  
  • 22. FrontEnd  +  2  GPU  Solvers  In  Action  
  • 23. Team  8,  Short  Status  Report   Multiphase  flows  within  the  Ingo  Seipp   cement  and  mineral  industry  and  Team    Science  +  Computing  
  • 24. Multiphase  flows  within  the  cement   and  mineral  industry     HPC  Expert:  science  +  computing     End  user:  FLSmidth     Applications:  Ansys  CFX,  EDEM       Resource  provider:  Bull  extreme  factory     Goals:       Reduce  runtime  of  jobs     Increase  mesh  sizes  ©  2012  science  +    computing  ag   HPC  Experiment  webinar  |  14.9.2012  
  • 25. Flash  dryer  and  SAG  mill   Challenge:  Scalability  of  flash  dryer  problem  ©  2012  science  +    computing  ag   HPC  Experiment  webinar  |  14.9.2012  
  • 26. extreme  factory     150  Tflops  with  Intel  Xeon  E5  and  X5600  family  cpus     GPUs     Over  30  installed     applications  ©  2012  science  +    computing  ag   HPC  Experiment  webinar  |  14.9.2012  
  • 27. extreme  factory     Simple  user  license  management  ©  2012  science  +    computing  ag   HPC  Experiment  webinar  |  14.9.2012  
  • 28. extreme  factory     User  access  by  customized  web-­‐interface  or  ssh  ©  2012  science  +    computing  ag   HPC  Experiment  webinar  |  14.9.2012  
  • 29. Project  status  and  experiences     XF  has  installed  recent  versions  of  Ansys  software.     Ansys  CFX  License  provided  by  Ansys,  installed  at  XF.     Non-­‐disclosure  agreements  signed  for  end-­‐user     Access  to  XF  for  end-­‐user       Establish  batch  execution  process  of  customer  case,   i.e.  setting  up  environment  and  command  to  run  users  job.     Working  process  changes   GUI-­‐oriented  working  process  needs  to  be  adopted  to  integrate  or   allow  cloud-­‐based  execution  of  computing  tasks.    ©  2012  science  +    computing  ag   HPC  Experiment  webinar  |  14.9.2012  
  • 30. Team  14,  Short  Status  Report   Simulation  of  electromagnetic   radiation  and  dosimetry  in  Ingo  Seipp   humans  in  cars  induced  by  and  Team     mobile  phone  technology  Science  +  Computing  
  • 31. Team  14,  Short  Status  Report   Simulation  of  electromagnetic  radiation  and  dosimetry  in   humans  in  cars  induced  by  mobile  phone  technology.     Enduser:  University  Wuppertal,  Institute  of       Application:  CST  Software  MWS,CS,DS,EM  (CST)     Resource  provider:  University  Rapperswil  MICTC     Goals:     Reduce  runtime  of  current  jobs     Increase  model  resolution  sizes  ©  2012  science  +    computing  ag   HPC  Experiment  webinar  |  14.9.2012  
  • 32. EMC  dosimetry  with  high  res  models  ©  2012  science  +    computing  ag   HPC  Experiment  webinar  |  14.9.2012  
  • 33. EMC  dosimetry  with  high  res  models   Challenge:  Problem  size  and  parallel  execution  ©  2012  science  +    computing  ag   HPC  Experiment  webinar  |  14.9.2012  
  • 34. Project  status  and  experiences     CST  software  has  been  installed  and  the  license  provided  and   installed.     Access  for  end-­‐user  via  VPN  has  been  enabled  by  resource   provider.  Currently,  there  are  still  some  problems  connecting   from  high-­‐security  end-­‐user  site.  Local  IT  is  investigating.     The  CST  software  installation  must  still  be  tested  for  a   parallel  job.     It  requires  some  effort  to  setup  and  test  new  applications.  ©  2012  science  +    computing  ag   HPC  Experiment  webinar  |  14.9.2012  
  • 35. Announcing:  Uber-­‐Cloud  Experiment  Round  2    Why  ‘Uber-­‐Cloud’:  HPC/CAE/BIO  in  the  Cloud  is  only  one  part  of  the   Experiment,  in  addition  we  provide  access  to  HPC  Centers  and  other   resources.    Round  1  is  proof  of  concept  =>  YES,  remote  access  to  HPC  resources   works,  and,  there  is  real  interest  and  need!    Round  2  will  be  more  professional  =>  more  tools                                                                                   instead  of  hands-­‐on,  more  teams,  more  applications                                                                 beyond  CAE,  a  list  of  professional  services,  measuring                                                                   the  effort,  how  much  would  it  cost,  etc.    Existing  Round  1  Teams  are  encouraged  in  Round  2  to  use  other   resources  or  can  participate  in  forming  new  Teams.    Oct  15:  Call  for  Participation;  Nov  15:  Start  of  Experiment  Round  2  
  • 36. What  is  next?    07/24/2012  Publish  updated  kick  off  document    07/24/2012  Request  for  detailed  participant  profiles    08/10/2012  End-­‐user  projects  submitted    08/17/2012  Resources  are  assigned,  end-­‐user  projects  start    09/14/2012  Half  Time  meeting  webinar    10/15/2012  End-­‐user  projects  are  completed    10/31/2012  Experiment  is  completed    11/15/2012  Experiment  findings  are  published    11/15/2012  Start  of  Experiment  Round  2,  Kick-­‐off  at  SC  in  Salt  Lake  City  
  • 37. Conclusion    Response  to  the  Uber-­‐Cloud  Experiment  is  overwhelming    Everybody  is  learning  and  working  along  their  very  specific   business  interest    At  least  20  of  the  25  teams  will  finish  successfully  and  in  time    97%  of  current  participants  will  continue  in  Round  2    Univa  Grid  Engine  Community:  forming  teams  which  explore   bursting  into  a  public  HPC  Cloud  from  their  Univa  GE  cluster    The  experiment  could  help  the  Grid  Engine  customers  be  more:   business  flexible,  profitable,  cost  effective,  customer  friendly…  
  • 38. Thank  You