Streamlining HPC Workloads with Containers

3,555 views

Published on

One might find it ironic that some of the world's fastest supercomputers -- vast clusters capable of trillions of floating point operations per second -- can take upwards of a half an hour to reboot in between jobs. While we often talk about the density advantages of containers, it's the opposite approach that we use in the High Performance Computing world! Here, we use exactly 1 system container per node, giving it unlimited access to all of the host's CPU, Memory, Disk, IO, and Network. And yet we can still leverage the management characteristics of containers -- security, snapshots, live migration, and instant deployment to recycle each node in between jobs. In this talk, we'll examine a reference architecture and some best practices around containers in HPC environments.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
3,555
On SlideShare
0
From Embeds
0
Number of Embeds
3,092
Actions
Shares
0
Downloads
10
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Streamlining HPC Workloads with Containers

  1. 1. Streamlining HPC Workloads with Containers @DustinKirkland
  2. 2. what does high-performance computing look like?
  3. 3. Wikipedia says...
  4. 4. Or perhaps in China...
  5. 5. Google image search shows...
  6. 6. The university student learns...
  7. 7. HackerNews suggests...
  8. 8. Your DevOps engineer launches... x1.32xlarge
  9. 9. But then there is your real, actual data center...
  10. 10. what do all of these have in common?
  11. 11. a lot, actually
  12. 12. they’re all running Linux
  13. 13. directly on the bare metal itself
  14. 14. performance is maximized
  15. 15. overhead is minimized
  16. 16. big problems are distributed across a cluster
  17. 17. everyone prefers a clean environment
  18. 18. virtual machines always involve overhead
  19. 19. VM MonitorVMXON VMXOFF Guest VM EntryVM Exit
  20. 20. oh, and let’s reboot a datacenter
  21. 21. BIOS is checking memory for problems… Scanning 1,199,511,627,776 bytes… This may take several minutes… Running test 1 of 8: 1.0% complete Overall test status: 0.1% complete Time elapsed: 17m23s Status: No problems have been found yet.
  22. 22. so let’s have a look at containers
  23. 23. first, process containers
  24. 24. awesome for HPC functions
  25. 25. LXD
  26. 26. second, machine containers
  27. 27. ➢ Ultra fast “vm-lite” guests (bare metal speed) ➢ Any distribution of Linux - e.g. Ubuntu, CentOS ➢ Starts in less than 1 second ➢ 15x density of KVM or ESX for idle workloads host A nova-lxd lxc cli lxdkernel other restful apps lxc machine LXD REST API host B lxc machine lxdkernel host C host D host ... lxc machine lxc machine lxc machine lxdkernel lxdkernel lxdkernel
  28. 28. ➢ Ultra fast “vm-lite” guests (bare metal speed) ➢ Any distribution of Linux - e.g. Ubuntu, CentOS ➢ Starts in less than 1 second ➢ 15x density of KVM or ESX for idle workloads host A nova-lxd lxc cli lxdkernel other restful apps LXD REST API host B lxdkernel host C host D host ... lxdkernel lxdkernel lxdkernel lxc machine lxc machine lxc machine lxc machine lxc machine
  29. 29. CPU Cores CPU Cycles Memory Disk Space Disk IO Network IO One LXD container, with 100% of the system: “alloy” mode
  30. 30. exclusive access to system resources
  31. 31. but secured from the underlying hardware and OS
  32. 32. cgroups, user namespaces, apparmor, seccomp
  33. 33. instant startup
  34. 34. looks like a machine, Linux on Linux
  35. 35. zero latency
  36. 36. zero overhead
  37. 37. identical performance
  38. 38. snapshot restore
  39. 39. live migration
  40. 40. demo
  41. 41. ubuntu.com/lxd github.com/lxc linuxcontainers.org

×