Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Deploying OpenNebula in an HPC environment


Published on

HPCNow! outlines their dynamic provisioning of Hybrid nodes, used primarily for HPC. OpenNebula is a fundamental component, offering the desired flexibility and ease.

Published in: Software
  • Be the first to comment

  • Be the first to like this

Deploying OpenNebula in an HPC environment

  1. 1. Deploying OpenNebula in an HPC environment Alfred Gil Chief Computational Scientist & Cofounder OpenNebula Cloud TechDay Barcelona, May 2019
  2. 2. • HPCNow! company overview • Motivation • Architecture • Implementation • Conclusions
  3. 3. Quick introduction to HPCNow! ● Global HPC consulting company ● IT + scientific background ● HPC services and solutions ● User-oriented company ● Hardware agnostic Company overview
  4. 4. System Administrators and User Support Top500 Supercomputer Users Company overview
  5. 5. IISW
  6. 6. Batch scheduler Slurm, LSF, PBS, Torque, SGE Cluster manager sNow!, xCat, Rocks, Bright Monitoring & alerts tools Ganglia, Nagios, Icinga, Grafana, Elastic Search Parallel file system BeeGFS, Lustre, GPFS, HDFS, CEPH Company overview
  7. 7. User environment User libraries, Modules, EasyBuild, Spack Development tools Compilers: GNU, Intel, PGI, IBM XL compilers; Debuggers and profilers: V-Tune, DDT, GDB Scientific and engineering applications More than 100 references. Contact us to know more. Company overview
  8. 8. Virtualization OpenNebula, OpenStack, VMware, Xen-Source Containers Singularity, Docker, Docker Swarm, LXD Remote visualization TurboVNC, VirtualGL, Websocket, DCV, X2Go HPC Portal EnginFrame Company overview
  9. 9. Contributions to HPC Community Company overview
  10. 10. Public sector Private Companies Company overview
  11. 11. Partners HW SW Company overview
  12. 12. • HPCNow! company overview • Motivation • Architecture • Implementation • Conclusions
  13. 13. What is High Performance Computing? Many tasks and/or threads working together to solve different parts of a single larger problem. This is achieved with parallel programming, which usually requires large shared memory systems or low latency and high bandwidth network. Motivation
  14. 14. HPC users need more than just compute solution ❅ Workflow: Pre-processing and post-processing, workflow frameworks,... ❅ Web services: RStudio, Galaxy, Jupyter notebook, JMS,... ❅ Software managers: Anaconda, EasyBuild, Spack,... ❅ Prebuilt software: Docker, Singularity, VM image (NeuroDebian,..),... Motivation
  15. 15. Convergence Solution HPC Cluster, Singularity, Docker Swarm, OpenNebula Allows to dynamically re-architect / re-purpose the HPC solution to accommodate different roles / user needs. Motivation
  16. 16. Dynamic Provisioning Hybrid nodes Vestibulumcongue Vestibulum congue Vestibulum congue Spare Nodes OpenNebula Slurm DockerSwarm Use Resource scontrol update node=X state=RESUME onehost enable X docker node update --availability active X 1 Release Resource scontrol update node=X state=DOWN onehost offline X docker node update --availability drain X 2 Motivation
  17. 17. • HPCNow! company overview • Motivation • Architecture • Implementation • Conclusions
  18. 18. mgmnt compute mgmnt hybrid storage Use case Architecture
  19. 19. mgmnt Management node ● VM’s (xen) ○ slurm01 slurmctld ○ slurmdb01 slurmdbd ○ ceph01 ceph-deploy ○ oneceph01 oned, sunstone, oneflow, onegate ○ login01 ○ ldap01 ● exports /home via NFS Architecture
  20. 20. Global configuration ● OpenNebula v5.6.0 ● Ceph v13.2.1 mimic ● Datastore ○ standard ceph configuration ■ cephds type Image ■ ceph_system type System ● Nodes with kvm hypervisor ● NIC’s with virtio model Architecture
  21. 21. • HPCNow! company overview • Motivation • Architecture • Implementation • Conclusions
  22. 22. Stumbling blocks along the way ● Snapshots ○ datastore for images configured as raw ■ recommended for ceph using RBD ○ images stored as raw, even created as qcow2 ○ snapshot of system disk, and recovering from ceph ■ rbd ls -l -p one ● Bridge destroyed when no virtual NIC linked ○ switch keep_empty_bridge to true in /var/lib/one/remotes/etc/vnm/OpenNebulaNetwork.conf ■ bug preventing to transfer config to hypervisors at /var/tmp/one/etc/vnm/OpenNebulaNetwork.conf ○ create virtual network with PHYDEV unset one-2-103-0 one-2-103-0@0 one-2-104-0 Implementation
  23. 23. Stumbling blocks along the way ● VM could not communicate with each other ○ switch net.bridge.bridge-nf-call-iptables parameter to 0. ○ tried to do it persistent in /etc/sysctl..d/bridge-nf-call.conf and /usr/lib/sysctl.d/00-system.conf ■ bug prevents for working, when sysctl runs the bridge kernel module is not already loaded. ○ fixed by modifying /usr/lib/systemd/system/libvirtd.service Type=notify EnvironmentFile=-/etc/sysconfig/libvirtd ExecStart=/usr/sbin/libvirtd $LIBVIRTD_ARGS +ExecStartPost=/usr/bin/sleep 30s +ExecStartPost=/usr/sbin/sysctl -w net.bridge.bridge-nf-call-iptables=0 +ExecStartPost=/usr/sbin/sysctl -p ExecReload=/bin/kill -HUP $MAINPID KillMode=process Restart=on-failure Implementation
  24. 24. Stumbling blocks along the way ● VM creation from Sunstone ended with FAILED status ○ error: Cannot check QEMU binary /usr/bin/qemu-system-x86_64: No such file or directory ■ ln -s /usr/libexec/qemu-kvm /usr/bin/qemu-system-x86_64 Implementation
  25. 25. • HPCNow! company overview • Motivation • Architecture • Conclusions
  26. 26. Conclusions ● We architected and implemented a solution deploying nodes with hybrid role. ● This solution allows dynamically re-purpose the cluster to accommodate the user needs. ● OpenNebula has been found to be a really easy tool to install, deploy and manage. ● Useful tips and collaboration in the forum to troubleshoot issues. Conclusions
  27. 27. Marie Curie, 8 - 08042 Barcelona (Spain) 34 Fernly Rise, 2019 Auckland (New Zealand) Barcelona Auckland