Streamlining HPC Workloads with Containers

•

1 like•6,584 views

One might find it ironic that some of the world's fastest supercomputers -- vast clusters capable of trillions of floating point operations per second -- can take upwards of a half an hour to reboot in between jobs. While we often talk about the density advantages of containers, it's the opposite approach that we use in the High Performance Computing world! Here, we use exactly 1 system container per node, giving it unlimited access to all of the host's CPU, Memory, Disk, IO, and Network. And yet we can still leverage the management characteristics of containers -- security, snapshots, live migration, and instant deployment to recycle each node in between jobs. In this talk, we'll examine a reference architecture and some best practices around containers in HPC environments.

Technology

Streamlining HPC Workloads
with Containers
@DustinKirkland

what does
high-performance computing
look like?

Your DevOps engineer launches...
x1.32xlarge

But
then
there
is
your
real,
actual
data
center...

big problems are
distributed across a cluster

virtual machines
always involve overhead

VM MonitorVMXON VMXOFF
Guest
VM EntryVM Exit

BIOS is checking memory for problems…
Scanning 1,199,511,627,776 bytes…
This may take several minutes…
Running test 1 of 8: 1.0% complete
Overall test status: 0.1% complete
Time elapsed: 17m23s
Status:
No problems have been found yet.

➢ Ultra fast “vm-lite” guests (bare metal speed)
➢ Any distribution of Linux - e.g. Ubuntu, CentOS
➢ Starts in less than 1 second
➢ 15x density of KVM or ESX for idle workloads
host A
nova-lxd lxc cli
lxdkernel
other restful apps
lxc
machine
LXD REST API
host B
lxc
machine
lxdkernel
host C host D host ...
lxc
machine
lxc
machine
lxc
machine
lxdkernel lxdkernel lxdkernel

CPU Cores
CPU Cycles
Memory
Disk Space
Disk IO
Network IO
One LXD container,
with 100% of the system:
“alloy” mode

but secured from the
underlying hardware and OS

cgroups, user namespaces,
apparmor, seccomp

ubuntu.com/lxd
github.com/lxc
linuxcontainers.org

What's hot

Kubernetes 架構與虛擬化之差異inwin stack

Containerd: Building a Container Supervisor by Michael CrosbyDocker, Inc.

XPDS14: MirageOS 2.0: branch consistency for Xen Stub Domains - Anil Madhavap...The Linux Foundation

Open v switch20150410bRichard Kuo

Ryan Koop's Docker Chicago Meetup Demo March 12 2014Cohesive Networks

Docker Athens: Docker Engine Evolution & Containerd Use CasesPhil Estes

Project Atomic-NuleculeLalatendu Mohanty

All Things Open 2015: DOCKER: EVERYTHING YOU SHOULD KNOWDocker, Inc

The State of containerdMoby Project

Continuous integration and deployment with dockerpebble {code}

Live migrating a container: pros, cons and gotchasDocker, Inc.

Docker Oxford launch - Introduction to Dockerjonatanblue

Containerd Internals: Building a Core Container RuntimePhil Estes

Virtualization inside kubernetesinwin stack

Practical CNILinuxCon ContainerCon CloudOpen China

Containerd Project Update: FOSDEM 2018Phil Estes

Distributed Version Control SystemsMihail Stoynov

Dev opsmeetup sept2013-leasewebMicrosoft

Containerize! Between Docker and Jube.Henryk Konsek

AppSec USA 2014 talk by Chris Swan "Implications & Opportunities at the Bleed...Cohesive Networks

What's hot (20)

Kubernetes 架構與虛擬化之差異

Containerd: Building a Container Supervisor by Michael Crosby

XPDS14: MirageOS 2.0: branch consistency for Xen Stub Domains - Anil Madhavap...

Open v switch20150410b

Ryan Koop's Docker Chicago Meetup Demo March 12 2014

Docker Athens: Docker Engine Evolution & Containerd Use Cases

Project Atomic-Nulecule

All Things Open 2015: DOCKER: EVERYTHING YOU SHOULD KNOW

The State of containerd

Continuous integration and deployment with docker

Live migrating a container: pros, cons and gotchas

Docker Oxford launch - Introduction to Docker

Containerd Internals: Building a Core Container Runtime

Virtualization inside kubernetes

Practical CNI

Containerd Project Update: FOSDEM 2018

Distributed Version Control Systems

Dev opsmeetup sept2013-leaseweb

Containerize! Between Docker and Jube.

AppSec USA 2014 talk by Chris Swan "Implications & Opportunities at the Bleed...

Viewers also liked

[Container world 2017] The Questions You're Afraid to Ask about ContainersDustin Kirkland

Ubuntu 16.04 LTS Security FeaturesDustin Kirkland

Open ZFS Keynote (public)Dustin Kirkland

What HPC can learn from DevOps?Walid Shaari

Openstack Summit Container Day KeynoteBoyd Hemphill

Managing Container Clusters in OpenStack Native WayQiming Teng

Webinar container management in OpenStackCREATE-NET

Cloud init and cloud provisioning [openstack summit vancouver]Joshua Harlow

Open Container Technologies and OpenStack - Sorting Through Kubernetes, the O...Daniel Krook

Architecting Ceph SolutionsRed_Hat_Storage

My SQL and Ceph: Head-to-Head Performance LabRed_Hat_Storage

TUT18972: Unleash the power of Ceph across the Data CenterEttore Simone

OpenStack MagnumAdrian Otto

Container World 2017!kgraham32

Container World 2017 - Characterizing and Contrasting Container OrchestratorsLee Calcote

SoCal DevOps Meetup 1/26/2017 - Habitat by ChefTrevor Hess

Container Camp London (2016-09-09)craigbox

Shifter: Containers in HPC Environmentsinside-BigData.com

LAS16-211: Using LAVA V2 for advanced KVM testingLinaro

HPC Storage Appliances for the EnterprisIntel IT Center

Viewers also liked (20)

[Container world 2017] The Questions You're Afraid to Ask about Containers

Ubuntu 16.04 LTS Security Features

Open ZFS Keynote (public)

What HPC can learn from DevOps?

Openstack Summit Container Day Keynote

Managing Container Clusters in OpenStack Native Way

Webinar container management in OpenStack

Cloud init and cloud provisioning [openstack summit vancouver]

Open Container Technologies and OpenStack - Sorting Through Kubernetes, the O...

Architecting Ceph Solutions

My SQL and Ceph: Head-to-Head Performance Lab

TUT18972: Unleash the power of Ceph across the Data Center

OpenStack Magnum

Container World 2017!

Container World 2017 - Characterizing and Contrasting Container Orchestrators

SoCal DevOps Meetup 1/26/2017 - Habitat by Chef

Container Camp London (2016-09-09)

Shifter: Containers in HPC Environments

LAS16-211: Using LAVA V2 for advanced KVM testing

HPC Storage Appliances for the Enterpris

Similar to Streamlining HPC Workloads with Containers

Kubernetes: My BFFJonathan Yu, P.Eng.

Lessons learnt on a 2000-core clusterEugene Kirpichov

LXC, Docker, and the future of software delivery | LinuxCon 2013dotCloud

LXC Docker and the Future of Software DeliveryDocker, Inc.

Deep Dive on Amazon EC2 instancesAmazon Web Services

Erik Skytthe - Monitoring Mesos, Docker, Containers with Zabbix | ZabConf2016Zabbix

Containers > VMsDavid Timothy Strauss

Microkernels and BeyondDavid Evans

LinuxCon 2011: OpenVZ and Linux Kernel TestingOpenVZ

LinuxCon 2011: OpenVZ and Linux Kernel TestingAndrey Vagin

Virtual Machines Security Internals: Detection and ExploitationMattia Salvi

MunichJS - 2011-04-06Mike West

node.js, javascript and the futureJeff Miccolis

The Ultimate Deobfuscator - ToorCON San Diego 2008Stephan Chenette

Operating System Multiple Choice QuestionsShusil Baral

ClickOS_EE80777777777777777777777777777.pptxBiHongPhc

SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...Amazon Web Services

Docker & Daily DevOpsSatria Ady Pradana

Docker and-daily-devopsSatria Ady Pradana

Docker's Jérôme Petazzoni: Best Practices in Dev to Production Parity for Con...Heavybit

Similar to Streamlining HPC Workloads with Containers (20)

Kubernetes: My BFF

Lessons learnt on a 2000-core cluster

LXC, Docker, and the future of software delivery | LinuxCon 2013

LXC Docker and the Future of Software Delivery

Deep Dive on Amazon EC2 instances

Erik Skytthe - Monitoring Mesos, Docker, Containers with Zabbix | ZabConf2016

Containers > VMs

Microkernels and Beyond

LinuxCon 2011: OpenVZ and Linux Kernel Testing

Virtual Machines Security Internals: Detection and Exploitation

MunichJS - 2011-04-06

node.js, javascript and the future

The Ultimate Deobfuscator - ToorCON San Diego 2008

Operating System Multiple Choice Questions

ClickOS_EE80777777777777777777777777777.pptx

SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...

Docker & Daily DevOps

Docker and-daily-devops

Docker's Jérôme Petazzoni: Best Practices in Dev to Production Parity for Con...

Recently uploaded

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang

DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106

Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski

AI as an Interface for Commercial BuildingsMemoori

My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer

Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi

Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited

Search Engine Optimization SEO PDF for 2024.pdfRankYa

The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2

Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation

"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays

SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal

What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett

CloudStudio User manual (basic edition):comworks

Artificial intelligence in cctv survelliance.pptxhariprasad279825

WordPress Websites for Engineers: Elevate Your Brandgvaughan

Recently uploaded (20)

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)

DevoxxFR 2024 Reproducible Builds with Apache Maven

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics

Are Multi-Cloud and Serverless Good or Bad?

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...

AI as an Interface for Commercial Buildings

My INSURER PTE LTD - Insurtech Innovation Award 2024

Vertex AI Gemini Prompt Engineering Tips

Ensuring Technical Readiness For Copilot in Microsoft 365

Search Engine Optimization SEO PDF for 2024.pdf

The Future of Software Development - Devin AI Innovative Approach.pdf

Connect Wave/ connectwave Pitch Deck Presentation

"Debugging python applications inside k8s environment", Andrii Soldatenko

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...

SAP Build Work Zone - Overview L2-L3.pptx

What's New in Teams Calling, Meetings and Devices March 2024

CloudStudio User manual (basic edition):

Artificial intelligence in cctv survelliance.pptx

WordPress Websites for Engineers: Elevate Your Brand

Streamlining HPC Workloads with Containers

1. Streamlining HPC Workloads with Containers @DustinKirkland

2. what does high-performance computing look like?

3. Wikipedia says...

4. Or perhaps in China...

5. Google image search shows...

6. The university student learns...

7. HackerNews suggests...

8. Your DevOps engineer launches... x1.32xlarge

9. But then there is your real, actual data center...

10. what do all of these have in common?

11. a lot, actually

12. they’re all running Linux

13. directly on the bare metal itself

14. performance is maximized

15. overhead is minimized

16. big problems are distributed across a cluster

17. everyone prefers a clean environment

18. virtual machines always involve overhead

19. VM MonitorVMXON VMXOFF Guest VM EntryVM Exit

20. oh, and let’s reboot a datacenter

21.

22. BIOS is checking memory for problems… Scanning 1,199,511,627,776 bytes… This may take several minutes… Running test 1 of 8: 1.0% complete Overall test status: 0.1% complete Time elapsed: 17m23s Status: No problems have been found yet.

23.

24. so let’s have a look at containers

25. first, process containers

26. awesome for HPC functions

27.

28.

29.

30. LXD

31. second, machine containers

32.

33. ➢ Ultra fast “vm-lite” guests (bare metal speed) ➢ Any distribution of Linux - e.g. Ubuntu, CentOS ➢ Starts in less than 1 second ➢ 15x density of KVM or ESX for idle workloads host A nova-lxd lxc cli lxdkernel other restful apps lxc machine LXD REST API host B lxc machine lxdkernel host C host D host ... lxc machine lxc machine lxc machine lxdkernel lxdkernel lxdkernel

34. ➢ Ultra fast “vm-lite” guests (bare metal speed) ➢ Any distribution of Linux - e.g. Ubuntu, CentOS ➢ Starts in less than 1 second ➢ 15x density of KVM or ESX for idle workloads host A nova-lxd lxc cli lxdkernel other restful apps LXD REST API host B lxdkernel host C host D host ... lxdkernel lxdkernel lxdkernel lxc machine lxc machine lxc machine lxc machine lxc machine

35. CPU Cores CPU Cycles Memory Disk Space Disk IO Network IO One LXD container, with 100% of the system: “alloy” mode

36. exclusive access to system resources

37. but secured from the underlying hardware and OS

38. cgroups, user namespaces, apparmor, seccomp

39. instant startup

40. looks like a machine, Linux on Linux

41. zero latency

42. zero overhead

43. identical performance

44. snapshot restore

45. live migration

46. demo