Status	  September	  15	  2012	  Keynote	                Half-­‐Time	  Results	  
Why	  this	  Experiment	  ?	                     Access	  to	  remote	  computing	  resources	    Focus	  on:	  simulatio...
Our	  Goal	  for	  the	  Experiment	    Current	  Experiment:	  August	  –	  October	  2012	    Form	  a	  community	  a...
Participants	                                                      Some	  of	  our	  Providers	     Some	  of	  our	  Reso...
Participants	                                                   Some	  of	  our	  ISVs	  want	  Some	  of	  our	  Software...
Participants	                                            Some	  of	  our	  HPC	  Experts	  	  	  Some	  of	  our	  HPC	  E...
Participants	                                             Many	  of	  our	  industry	  end-­‐users	  	  Some	  of	  our	  ...
Where	  are	  we	  with	  the	  experiment	    We	  currently	  have	  over	  170	  participating	  organizations	  and	 ...
Participants	  by	  geography	                                %	  of	  Site	  Traffic	       US	                            ...
Teams,	  it’s	  all	  about	  teams	    Anchor	  Bolt	       Cement	  Flow	         Wind	  Turbines	    Resonance	    ...
Building	  the	  teams	    An	  end-­‐user	  joins	  the	  experiment	    Organizers	  (Burak,	  Wolfgang)	  identify	  ...
Bumps	  on	  the	  road	  –	  the	  top	  4	    Delays	  because	  of	  vacation	  times	  in	  August	  	  	  	  	  	  	...
Are	  we	  discovering	  hurdles?	    Reaching	  end-­‐users	  who	  are	  ready	  and	  willing	  to	  engage	  in	     ...
Let’s	  hear	  from	  Team	  Experts!	  Chris	  Dagdigian	  	  Co-­‐founder	  and	  Principal	  Consultant	  BioTeam	  Inc...
Team	  2	  Short	  Status	  Report	                                   	  	  	  HPC	  Expert:	  	                    End	  ...
Team	  2	  Overview	             OUR	  END	  USER	                                                   OUR	  HPC	  SOFTWARE	...
The	  first	  Design	  
First	  Design	  Failed	  Miserably	                   The	  Good	                                              The	  Bad	...
Current	  Design	  
Second	  Design	  –	  Good	  So	  Far	                    The	  Good	                                                   Th...
Next	  Steps	    Run	  at	  large	  scale	    Refine	  our	  architecture	         We	  might	  move	  the	  License	  S...
FrontEnd	  +	  2	  GPU	  Solvers	  In	  Action	  
Team	  8,	  Short	  Status	  Report	                                  Multiphase	  flows	  within	  the	  Ingo	  Seipp	    ...
Multiphase	  flows	  within	  the	  cement	                 and	  mineral	  industry	                   HPC	  Expert:	  sc...
Flash	  dryer	  and	  SAG	  mill	                  Challenge:	  Scalability	  of	  flash	  dryer	  problem	  ©	  2012	  sci...
extreme	  factory	          150	  Tflops	  with	  Intel	  Xeon	  E5	  and	  X5600	  family	...
extreme	  factory	                   Simple	  user	  license	  management	  ©	  2012	  science	  +	  	  computing	  ag	  ...
extreme	  factory	          User	  access	  by	  customized	  web-­‐interface	  or	  ssh	  ©	  2012	  science	  +	  	  co...
Project	  status	  and	  experiences	                   XF	  has	  installed	  recent	  versions	  of	  Ansys	  software....
Team	  14,	  Short	  Status	  Report	                                  Simulation	  of	  electromagnetic	                 ...
Team	  14,	  Short	  Status	  Report	                 Simulation	  of	  electromagnetic	  radiation	  and	  dosimetry	  in...
EMC	  dosimetry	  with	  high	  res	  models	  ©	  2012	  science	  +	  	  computing	  ag	     HPC	  Experiment	  webinar	...
EMC	  dosimetry	  with	  high	  res	  models	                  Challenge:	  Problem	  size	  and	  parallel	  execution	  ...
Project	  status	  and	  experiences	                   CST	  software	  has	  been	  installed	  and	  the	  license	  p...
Announcing:	  Uber-­‐Cloud	  Experiment	  Round	  2	    Why	  ‘Uber-­‐Cloud’:	  HPC/CAE/BIO	  in	  the	  Cloud	  is	  onl...
What	  is	  next?	    07/24/2012	  Publish	  updated	  kick	  off	  document	    07/24/2012	  Request	  for	  detailed	  ...
Conclusion	    Response	  to	  the	  Uber-­‐Cloud	  Experiment	  is	  overwhelming	    Everybody	  is	  learning	  and	 ...
Thank	  You	  	  
Upcoming SlideShare
Loading in …5

GridEngine Summit Keynote about Uber Cloud Experiment


Published on

Wolfgang Gentzsch Keynote at Univa GridEngine Summit 2012 about the Uber-Cloud Experiment (aka HPC Experiment). More info at

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

GridEngine Summit Keynote about Uber Cloud Experiment

  1. 1. Status  September  15  2012  Keynote   Half-­‐Time  Results  
  2. 2. Why  this  Experiment  ?   Access  to  remote  computing  resources    Focus  on:  simulation  applications  in  digital                                                                                 manufacturing  and  High  Performance                                                                                                                             Technical  Computing  (HPTC)    Focus  on:  remote  resources  in  HPTC  Centers  &  in  HPTC  Clouds    Observation:  while  business  clouds  are  becoming  widely  used,  the   acceptance  of  simulation  clouds  in  industry  is  still  in  an  early  adopter   stage  (CAE,  Bio,  Finance,  Oil  &  Gas,  DCC)    Reason:  big  difference  between  business  and  simulation  clouds    Barriers:  Complexity,  IP,  data  transfer,  software  licenses,                                       parallel  communications,  specific  system  requirements,                                                     data  security,  interoperability,  cost,  etc.  
  3. 3. Our  Goal  for  the  Experiment    Current  Experiment:  August  –  October  2012    Form  a  community  around  the  benefits  of  HPTC  in  the  cloud    Hands-­‐on  exploring  and  understand  the  challenges  with   digital  manufacturing  in  the  Cloud    Study  each  end-­‐to-­‐end  process  and  find  ways  to  overcome   these  challenges    Document  our  findings  
  4. 4. Participants   Some  of  our  Providers   Some  of  our  Resource  Providers   want  to  be  anonymous    Media  Sponsor  
  5. 5. Participants   Some  of  our  ISVs  want  Some  of  our  Software  Providers   to  be  anonymous    
  6. 6. Participants   Some  of  our  HPC  Experts      Some  of  our  HPC  Experts   want  to  be  anonymous    
  7. 7. Participants   Many  of  our  industry  end-­‐users    Some  of  our  End-­‐Users     want  to  be  anonymous    
  8. 8. Where  are  we  with  the  experiment    We  currently  have  over  170  participating  organizations  and   individuals    Experiment  reaches  to  every  corner                                                                                                                     of  the  globe,    participants  are                                                                                                                     coming  from  22  countries    Participants  sign  up  through  and      25  teams  have  been  formed                                                                                                                                             and  are  active    
  9. 9. Participants  by  geography   %  of  Site  Traffic   US   36  %   Germany   12  %   Italy   6  %   Australia   6  %   Spain   5  %   UK   5  %   Russia   3  %   France   3  %   Other   24  %  
  10. 10. Teams,  it’s  all  about  teams    Anchor  Bolt     Cement  Flow     Wind  Turbines    Resonance     Sprinkler     Combustion    Radiofrequency     Space  Capsule     Blood  Flow    Supersonic     Car  Acoustics     ChinaCFD    Liquid-­‐Gas     Dosimetry     Gas  Bubbles    Wing-­‐Flow     Weathermen     Side  impact    Ship-­‐Hull     ColombiaBio  
  11. 11. Building  the  teams    An  end-­‐user  joins  the  experiment    Organizers  (Burak,  Wolfgang)  identify  perfect  team  expert    Organizers  contact  the  ISV  and  ask  to  join  the  experiment    End-­‐user  and  team  expert  analyze  resource  requirements  and  send   to  organizers    Organizers  suggest  one  or  two  computational                                                                       resource  providers      After  all  four  team  members  agree  on  the  right                                                                     resource  provider,  the  team  is  ready  to  go  
  12. 12. Bumps  on  the  road  –  the  top  4    Delays  because  of  vacation  times  in  August                                                                              &  other   projects  (internal,  customer)  from                                                                                  our  participants    Getting  HPC  participants  was  quite  easy,                                                                                                                       getting  CAE  participants  was  a  challenge    Participants  can  spend  only  small  portion  of  their  time    Learning  the  access  and  usage  processes  of  our  software  and  compute   resource  providers  can  take  many  days    Process  automation  capabilities  of  providers  vary  greatly.  Some  have   focused  on  enrollment,  registration  automation,  while  others  haven’t.      Experiment  organizers’  lack  of  automation,  currently  the  whole  end-­‐to-­‐ end  process  is  manual  (intentionally)    Getting  regular  updates  from  Team  Experts  is  a  challenge  because  this   is  not  their  day  job    Consider:  the  sample  size  is  still  small  
  13. 13. Are  we  discovering  hurdles?    Reaching  end-­‐users  who  are  ready  and  willing  to  engage  in   HPTC  and  especially  HPTC  in  the  Cloud.      About  half  of  our  participants  want  to  remain  anonymous,  for   different  reasons  (failure,  policies,  internal  processes,…)    HPC  is  complex.  At  times  it  requires  multiple  experts.    Matching  end-­‐user  projects  with  the  appropriate  resource   providers  is  tricky  and  critical  to  the  teams  success.    Resource  providers  (e.g.  HPC  Centers)  often  face                                         internal  policy  and  legal  hurdles      Sometimes,  the  1000  cpu-­‐core  hours  are  a  limit  
  14. 14. Let’s  hear  from  Team  Experts!  Chris  Dagdigian    Co-­‐founder  and  Principal  Consultant  BioTeam  Inc   Ingo  Seipp     Science  +  Computing  
  15. 15. Team  2  Short  Status  Report        HPC  Expert:     End  User:    Anonymous  
  16. 16. Team  2  Overview   OUR  END  USER   OUR  HPC  SOFTWARE  •  Individual  &  organization  has  requested     CST  Studio  Suite   anonymity     “Electromagnetic  Simulation”  •  Goal:  Hybrid  model  in  which  local  and      cloud  resources  leveraged  simultaneously     Diverse  Architecture  Options  •  We  can  say  this:   1.  Local  Windows  Workstation   –  It’s  a  medical  device   2.  CST  Distributed  Computing   –  Simulating  new  probe  design  for  a   particular  device   3.  CST  MPI   –  Tests  involve  running  at  simulation  size  &   resolution  that  cannot  be  performed     OS  Diversity   internally*     Various  combinations  of  Windows   –  *Using  fake  data  at  this  time   and  Linux  based  machines  
  17. 17. The  first  Design  
  18. 18. First  Design  Failed  Miserably   The  Good   The  Bad  •  Looked  pretty  on  paper!     Can’t  launch  GPU  nodes  from  •  Total  isolation  of  systems  via   inside  a  VPC   Amazon  VPC  and  custom  subnets     …  so  we  ran  them  on  “regular”  •  VPC  allows  for  “elastic”  NIC   EC2     devices  w/  persistent  MAC   addresses.   –  Awesome  for  license  servers     …  and  this  did  not  work  well     NAT  translation  between  EC2  •  VPN  Server  allowed  end-­‐user   and  VPC  private  IP  addresses   remote  resources  to  directly  join   our  cloud  environment   wreaked  havoc  with  CST   Distributed  Computing  Master  
  19. 19. Current  Design  
  20. 20. Second  Design  –  Good  So  Far   The  Good   The  Bad  •  It  works;  we  are  running  tasks     Lost  the  persistent  MAC   across  multiple  GPU  solver  nodes   address  when  we  left  the   right  now   VPC;  need  to  treat  our  license  •  Security  surprisingly  good  despite   server  very  carefully   losing  VPC  isolation   –  EC2  Security  Groups  block  us  from     Unclear  today  how  we  will   the  rest  of  Internet  &  AWS   attempt  to  incorporate   remote  solver  &  workstation  •  CST  License  server  now  running  on   resources  at  end-­‐user  site   much  cheaper  Linux  instance     We  know  from  attempt  #1   that  CST  and  NAT  translation  •  Clear  path  to  elasticity  and  high-­‐ don’t  work  well  together  …   scale  simulation  runs  
  21. 21. Next  Steps    Run  at  large  scale    Refine  our  architecture     We  might  move  the  License  Server  back  into  a  VPC  in  order  to   leverage  the  significant  benefit  of  persistent  MAC  addresses  and   elastic  NIC  devices     Figure  out  how  bring  in  the  remote  resources  sitting  at  end  user   site  
  22. 22. FrontEnd  +  2  GPU  Solvers  In  Action  
  23. 23. Team  8,  Short  Status  Report   Multiphase  flows  within  the  Ingo  Seipp   cement  and  mineral  industry  and  Team    Science  +  Computing  
  24. 24. Multiphase  flows  within  the  cement   and  mineral  industry     HPC  Expert:  science  +  computing     End  user:  FLSmidth     Applications:  Ansys  CFX,  EDEM       Resource  provider:  Bull  extreme  factory     Goals:       Reduce  runtime  of  jobs     Increase  mesh  sizes  ©  2012  science  +    computing  ag   HPC  Experiment  webinar  |  14.9.2012  
  25. 25. Flash  dryer  and  SAG  mill   Challenge:  Scalability  of  flash  dryer  problem  ©  2012  science  +    computing  ag   HPC  Experiment  webinar  |  14.9.2012  
  26. 26. extreme  factory     150  Tflops  with  Intel  Xeon  E5  and  X5600  family  cpus     GPUs     Over  30  installed     applications  ©  2012  science  +    computing  ag   HPC  Experiment  webinar  |  14.9.2012  
  27. 27. extreme  factory     Simple  user  license  management  ©  2012  science  +    computing  ag   HPC  Experiment  webinar  |  14.9.2012  
  28. 28. extreme  factory     User  access  by  customized  web-­‐interface  or  ssh  ©  2012  science  +    computing  ag   HPC  Experiment  webinar  |  14.9.2012  
  29. 29. Project  status  and  experiences     XF  has  installed  recent  versions  of  Ansys  software.     Ansys  CFX  License  provided  by  Ansys,  installed  at  XF.     Non-­‐disclosure  agreements  signed  for  end-­‐user     Access  to  XF  for  end-­‐user       Establish  batch  execution  process  of  customer  case,   i.e.  setting  up  environment  and  command  to  run  users  job.     Working  process  changes   GUI-­‐oriented  working  process  needs  to  be  adopted  to  integrate  or   allow  cloud-­‐based  execution  of  computing  tasks.    ©  2012  science  +    computing  ag   HPC  Experiment  webinar  |  14.9.2012  
  30. 30. Team  14,  Short  Status  Report   Simulation  of  electromagnetic   radiation  and  dosimetry  in  Ingo  Seipp   humans  in  cars  induced  by  and  Team     mobile  phone  technology  Science  +  Computing  
  31. 31. Team  14,  Short  Status  Report   Simulation  of  electromagnetic  radiation  and  dosimetry  in   humans  in  cars  induced  by  mobile  phone  technology.     Enduser:  University  Wuppertal,  Institute  of       Application:  CST  Software  MWS,CS,DS,EM  (CST)     Resource  provider:  University  Rapperswil  MICTC     Goals:     Reduce  runtime  of  current  jobs     Increase  model  resolution  sizes  ©  2012  science  +    computing  ag   HPC  Experiment  webinar  |  14.9.2012  
  32. 32. EMC  dosimetry  with  high  res  models  ©  2012  science  +    computing  ag   HPC  Experiment  webinar  |  14.9.2012  
  33. 33. EMC  dosimetry  with  high  res  models   Challenge:  Problem  size  and  parallel  execution  ©  2012  science  +    computing  ag   HPC  Experiment  webinar  |  14.9.2012  
  34. 34. Project  status  and  experiences     CST  software  has  been  installed  and  the  license  provided  and   installed.     Access  for  end-­‐user  via  VPN  has  been  enabled  by  resource   provider.  Currently,  there  are  still  some  problems  connecting   from  high-­‐security  end-­‐user  site.  Local  IT  is  investigating.     The  CST  software  installation  must  still  be  tested  for  a   parallel  job.     It  requires  some  effort  to  setup  and  test  new  applications.  ©  2012  science  +    computing  ag   HPC  Experiment  webinar  |  14.9.2012  
  35. 35. Announcing:  Uber-­‐Cloud  Experiment  Round  2    Why  ‘Uber-­‐Cloud’:  HPC/CAE/BIO  in  the  Cloud  is  only  one  part  of  the   Experiment,  in  addition  we  provide  access  to  HPC  Centers  and  other   resources.    Round  1  is  proof  of  concept  =>  YES,  remote  access  to  HPC  resources   works,  and,  there  is  real  interest  and  need!    Round  2  will  be  more  professional  =>  more  tools                                                                                   instead  of  hands-­‐on,  more  teams,  more  applications                                                                 beyond  CAE,  a  list  of  professional  services,  measuring                                                                   the  effort,  how  much  would  it  cost,  etc.    Existing  Round  1  Teams  are  encouraged  in  Round  2  to  use  other   resources  or  can  participate  in  forming  new  Teams.    Oct  15:  Call  for  Participation;  Nov  15:  Start  of  Experiment  Round  2  
  36. 36. What  is  next?    07/24/2012  Publish  updated  kick  off  document    07/24/2012  Request  for  detailed  participant  profiles    08/10/2012  End-­‐user  projects  submitted    08/17/2012  Resources  are  assigned,  end-­‐user  projects  start    09/14/2012  Half  Time  meeting  webinar    10/15/2012  End-­‐user  projects  are  completed    10/31/2012  Experiment  is  completed    11/15/2012  Experiment  findings  are  published    11/15/2012  Start  of  Experiment  Round  2,  Kick-­‐off  at  SC  in  Salt  Lake  City  
  37. 37. Conclusion    Response  to  the  Uber-­‐Cloud  Experiment  is  overwhelming    Everybody  is  learning  and  working  along  their  very  specific   business  interest    At  least  20  of  the  25  teams  will  finish  successfully  and  in  time    97%  of  current  participants  will  continue  in  Round  2    Univa  Grid  Engine  Community:  forming  teams  which  explore   bursting  into  a  public  HPC  Cloud  from  their  Univa  GE  cluster    The  experiment  could  help  the  Grid  Engine  customers  be  more:   business  flexible,  profitable,  cost  effective,  customer  friendly…  
  38. 38. Thank  You