HA HPC with
OpenNebula
Eliot Eshelman - Microway
2015-06-29
HPC - what is it good for?
Lattice Quantum ChromoDynamics (QCD)
RBC/UKQCD collaboration; Research Team: Dirk Broemmel, Thomas
Rae, Ben Samways, Investigators: Jonathan Flynn
Physics
HPC - what is it good for?
Tech-X VORPAL for the DOE and NNSA
Physics
HPC - what is it good for?
Astrophysics
Simulation of a supernova
Courtesy of Oak Ridge National Laboratory, U.S. Dept. of Energy
HPC - what is it good for?
Planetary Science
WRF 0.5km simulation of Hurricane Sandy
NCAR CISL VAPOR visualizations
HPC - what is it good for?
Life Science
NAMD & GROMACS; visualized with VMD
HPC - what is it good for?
https://www.nersc.gov/assets/Trinity--NERSC-8-RFP/
Documents/NERSCWorkloadAnalysisFeb2013.pdf
All Science!
HPC - what is it good for?
ALTAIR AcuSolve
Engineering:
FEA
CFD
Multi-Physics
HPC - what is it good for?
Machine Learning
NVIDIA DIGITS with Caffe from UC Berkeley
HPC - what is it good for?
Big Data
First: a discussion of scale
What types of HPC systems do we design?
● up to ~512 nodes
● budgets of $50K to $3M
Most leadership-class HPC sites use similar
designs, but source from the big vendors.
10,000-foot view of an HPC cluster
HPC clusters ready to ship
Microway's Test Drive cluster
● Owned and maintained by Microway
● Used by customers for benchmarking
● Used by employees for testing, replicating
customer issues & software development
● Not actually mission-critical, but designed to
emulate those that are...
The Hardware
● (3) OpenNebula hosts
● (4) Parallel storage servers
● (6) Bare-metal CPU + GPU
compute nodes
● Gigabit Ethernet
● 56Gbps FDR InfiniBand
Physical Network Topology
...
Logical Infrastructure
HPC Cluster Services
Compute Nodes
● Remaining bare metal for now
○ Virtualizing GPUs has caveats
● Virtualizing the nodes does give a lot more
flexibility to the admins and the users
○ HPC users have very specific software needs
○ VMs can enable reproducibility
○ Some sites are trying out containers (Docker)
End Goal
● Each employee/customer can be assigned
their own private HPC cluster
● Multiple cluster instances for:
○ Development
○ QA
○ Production
What we gain
Flexibility:
● Easy backups
● Easy restores
● Easy upgrades
● Easy rollbacks
● Faster software
development
Customer sees:
● Better uptime
● Quicker upgrades
● Fewer bugs
● Better performance
What we lose
Not much!
● a little bit of performance
(~1% on CPU; up to 10% on I/O)
● no more direct access to InfiniBand
(HPC folks like having access to bare metal)
Other tools to investigate...
What's next?
● Got a project in mind?
● Inspired to speak at our next meetup?
Get in touch!
eliot@microway.com

OpenNebula TechDay Boston 2015 - HA HPC with OpenNebula

  • 1.
    HA HPC with OpenNebula EliotEshelman - Microway 2015-06-29
  • 2.
    HPC - whatis it good for? Lattice Quantum ChromoDynamics (QCD) RBC/UKQCD collaboration; Research Team: Dirk Broemmel, Thomas Rae, Ben Samways, Investigators: Jonathan Flynn Physics
  • 3.
    HPC - whatis it good for? Tech-X VORPAL for the DOE and NNSA Physics
  • 4.
    HPC - whatis it good for? Astrophysics Simulation of a supernova Courtesy of Oak Ridge National Laboratory, U.S. Dept. of Energy
  • 5.
    HPC - whatis it good for? Planetary Science WRF 0.5km simulation of Hurricane Sandy NCAR CISL VAPOR visualizations
  • 6.
    HPC - whatis it good for? Life Science NAMD & GROMACS; visualized with VMD
  • 7.
    HPC - whatis it good for? https://www.nersc.gov/assets/Trinity--NERSC-8-RFP/ Documents/NERSCWorkloadAnalysisFeb2013.pdf All Science!
  • 8.
    HPC - whatis it good for? ALTAIR AcuSolve Engineering: FEA CFD Multi-Physics
  • 9.
    HPC - whatis it good for? Machine Learning NVIDIA DIGITS with Caffe from UC Berkeley
  • 10.
    HPC - whatis it good for? Big Data
  • 11.
    First: a discussionof scale What types of HPC systems do we design? ● up to ~512 nodes ● budgets of $50K to $3M Most leadership-class HPC sites use similar designs, but source from the big vendors.
  • 12.
    10,000-foot view ofan HPC cluster
  • 13.
  • 14.
    Microway's Test Drivecluster ● Owned and maintained by Microway ● Used by customers for benchmarking ● Used by employees for testing, replicating customer issues & software development ● Not actually mission-critical, but designed to emulate those that are...
  • 15.
    The Hardware ● (3)OpenNebula hosts ● (4) Parallel storage servers ● (6) Bare-metal CPU + GPU compute nodes ● Gigabit Ethernet ● 56Gbps FDR InfiniBand
  • 16.
  • 17.
  • 18.
  • 19.
    Compute Nodes ● Remainingbare metal for now ○ Virtualizing GPUs has caveats ● Virtualizing the nodes does give a lot more flexibility to the admins and the users ○ HPC users have very specific software needs ○ VMs can enable reproducibility ○ Some sites are trying out containers (Docker)
  • 20.
    End Goal ● Eachemployee/customer can be assigned their own private HPC cluster ● Multiple cluster instances for: ○ Development ○ QA ○ Production
  • 21.
    What we gain Flexibility: ●Easy backups ● Easy restores ● Easy upgrades ● Easy rollbacks ● Faster software development Customer sees: ● Better uptime ● Quicker upgrades ● Fewer bugs ● Better performance
  • 22.
    What we lose Notmuch! ● a little bit of performance (~1% on CPU; up to 10% on I/O) ● no more direct access to InfiniBand (HPC folks like having access to bare metal)
  • 23.
    Other tools toinvestigate...
  • 24.
    What's next? ● Gota project in mind? ● Inspired to speak at our next meetup? Get in touch! eliot@microway.com