Future Science
on Future OpenStack
Developing next generation infrastructure at CERN and SKA
Belmiro Moreira
Cloud Architect, CERN
Stig Telfer
CTO, StackHPC Ltd
SKA Performance Prototype Platform
Co-chair, OpenStack Scientific SIG
CERN - Large Hadron Collider (LHC)
CERN - Large Hadron Collider (LHC)
CERN: Compact Muon Solenoid (CMS)
CERN: Cloud Infrastructure by Numbers
CERN Cloud Architecture
Cell A
Ceph
DBoD DBoD
Ceph
Nova API Cell
Controllers
API Servers
Cell B Cell C
OpenStack
Services
22ms
What is SKA?
Image courtesy of CSIRO
Science Data Processor
ALaSKA - à la SKA:
SKA Performance Prototyping
SKA - Performance Prototype Platform
Bare Metal Hardware
Lifecycle
CERN - Hardware Lifecycle
● ~ 2000 new servers per year
○ Two rounds of procurement - bulk purchases
■ Continuous delivery model cannot be used at CERN
○ Hardware location is defined per procurement round according to rack space, cooling and
electrical availability
○ Annual capacity planning
● When new servers are added others need to be retired
○ The process to empty machines depends on the workloads running on the servers
■ Batch workloads usually require a couple of weeks to free up the servers
■ Services/Personal workloads are migrated to the new hardware
CERN - Hardware Lifecycle
● Hardware is highly heterogeneous
○ 2-3 vendors per annual procurement cycle, each one with their own optimisations
● Advantages
○ A problem with a vendor doesn’t affect the entire capacity for a procurement cycle
■ When there are issues in one delivery, such as disk firmware, BMC controllers, … others
are usually not affected
● Disadvantages
○ 15-20 different hardware configurations in the data centres
○ CERN tooling (bootstrap, monitoring, ...) needs to support different configurations
○ Challenge in defining the VM flavors exposed to the users
CERN - Hardware Lifecycle
● OpenStack Ironic
○ Provision physical servers using OpenStack Nova API
■ VMs don’t fit all use cases
● Disk servers; DB servers; ...
○ All resources are managed using OpenStack (VMs; Containers; Bare Metal)
■ Same accounting and traceability for all resources
● Can Ironic be used to manage all the Hardware Lifecycle?
■ Replace all the specific tooling built over the years to manage the Hardware Lifecycle
workflow
CERN - Hardware Lifecycle
● Requirements to manage the Hardware Lifecycle
○ A database to store all hardware attributes
■ Manufacturer, product revision, firmware version, …
○ Flexible and complete Hardware introspection
○ Flexible API to add/query server attributes
○ Burn in and acceptance process
○ Define when resources are available to users
■ State workflow
○ Policy needs to allow segregate access to the different teams
○ Clear retirement procedure
CERN - Hardware Lifecycle
CERN - Hardware Lifecycle
● Current CERN model
○ Automated but complex
○ Set of tools/DBs developed in house
○ Difficult to track and account resource utilization
○ CERN specific
● What we envision with Ironic
○ Capable to manage the entire the Hardware Lifecycle
■ Automated and Generic
■ Pluggable
■ Track resources
ALaSKA - Hardware Life Cycle
Auto-discovery and inspection
Hardware anomaly detection with Cardiff
Enrollment with Inspector rules
Ansible-driven BIOS and RAID configuration
Ansible-driven network switch configuration
Kayobe: Kolla-on-Bifrost
http://kayobe.readthedocs.io/
Application Cluster Storage
Requirements
● SKA Science Data Processor consumes a data feed of 1.5TB/s
● This data must be stored for 6 hours
● Processed datasets must be stored for 6 months
Solutions
● High performance filesystems
● High performance object stores
● High performance message queues
Supporting Scientific
Applications
Preemptible Instances
● Scientific Clouds use project quotas
○ Projects have different funding models
■ They expect a predefined number of resources available
■ But not always these resources are used full time
○ Other projects can use these free resources
■ Opportunistic workloads
● Public clouds use a spot market for free resources
○ Based on different pricing/SLA considering resource availability
○ Private clouds usually don’t charge users directly
● How can the available resources be used more efficiently?
Preemptible Instances
● Building a prototype
○ Minimise changes to OpenStack nova
● Approach
○ Preemptible instances are identified using metadata
○ Project quotas are not considered for preemptible instances
○ “NoValidHost” for a non preemptible instance triggers the “Reaper” service
○ The “Reaper” service is responsible to delete preemptible instances
■ Needs some intelligence to free up the resources necessary for the new instance
○ The original request retries
● Follow/Participate in the discussion
○ https://etherpad.openstack.org/p/nova-preemptible-servers-discussion
○ https://review.openstack.org/#/c/438640/2
○ https://gitlab.cern.ch/ttsiouts/reaper/
Magnum on Bare Metal
Better Ironic support in Magnum templates
File and Block Storage within Magnum environments
OpenStack Magnum - Manages clusters defined by cluster templates
Supports Docker swarm mode & Kubernetes
Remote access to clusters using native tooling (Docker client, kubectl, etc.)
Automated scaling up/down
Bare metal support not always current
Sahara on Bare Metal
HiBD - Hadoop and Spark with Infiniband and RDMA acceleration
OpenStack Sahara - Manages clusters defined by cluster templates
Supports Hadoop and Spark
Automated scaling up/down
Extended with HiBD from OSU
RDMA-enabled analytics
OpenHPC on OpenStack
Cluster infrastructure deployed using Heat templates
Configuration and “personalisation” in Ansible
Slurm-as-a-Service deployment and configuration in Ansible
Infiniband and MPI
Home directories in CephFS
Keys managed in Barbican
OpenStack Scientific SIG
● Written with help from the OpenStack
Scientific SIG
● Current best practice for OpenStack and
HPC
● Six subject overviews with case studies
contributed by WG members
https://www.openstack.org/science/
What will openstack ‘Z’ look like?
● Due for release 2H 2022
● …?
● …?
● …?
● Remaining details TBD
Future Science on Future OpenStack

Future Science on Future OpenStack

  • 1.
    Future Science on FutureOpenStack Developing next generation infrastructure at CERN and SKA
  • 2.
    Belmiro Moreira Cloud Architect,CERN Stig Telfer CTO, StackHPC Ltd SKA Performance Prototype Platform Co-chair, OpenStack Scientific SIG
  • 5.
    CERN - LargeHadron Collider (LHC)
  • 6.
    CERN - LargeHadron Collider (LHC)
  • 7.
    CERN: Compact MuonSolenoid (CMS)
  • 8.
  • 9.
    CERN Cloud Architecture CellA Ceph DBoD DBoD Ceph Nova API Cell Controllers API Servers Cell B Cell C OpenStack Services 22ms
  • 12.
    What is SKA? Imagecourtesy of CSIRO
  • 14.
  • 15.
    ALaSKA - àla SKA: SKA Performance Prototyping
  • 16.
    SKA - PerformancePrototype Platform
  • 18.
  • 19.
    CERN - HardwareLifecycle ● ~ 2000 new servers per year ○ Two rounds of procurement - bulk purchases ■ Continuous delivery model cannot be used at CERN ○ Hardware location is defined per procurement round according to rack space, cooling and electrical availability ○ Annual capacity planning ● When new servers are added others need to be retired ○ The process to empty machines depends on the workloads running on the servers ■ Batch workloads usually require a couple of weeks to free up the servers ■ Services/Personal workloads are migrated to the new hardware
  • 20.
    CERN - HardwareLifecycle ● Hardware is highly heterogeneous ○ 2-3 vendors per annual procurement cycle, each one with their own optimisations ● Advantages ○ A problem with a vendor doesn’t affect the entire capacity for a procurement cycle ■ When there are issues in one delivery, such as disk firmware, BMC controllers, … others are usually not affected ● Disadvantages ○ 15-20 different hardware configurations in the data centres ○ CERN tooling (bootstrap, monitoring, ...) needs to support different configurations ○ Challenge in defining the VM flavors exposed to the users
  • 21.
    CERN - HardwareLifecycle ● OpenStack Ironic ○ Provision physical servers using OpenStack Nova API ■ VMs don’t fit all use cases ● Disk servers; DB servers; ... ○ All resources are managed using OpenStack (VMs; Containers; Bare Metal) ■ Same accounting and traceability for all resources ● Can Ironic be used to manage all the Hardware Lifecycle? ■ Replace all the specific tooling built over the years to manage the Hardware Lifecycle workflow
  • 22.
    CERN - HardwareLifecycle ● Requirements to manage the Hardware Lifecycle ○ A database to store all hardware attributes ■ Manufacturer, product revision, firmware version, … ○ Flexible and complete Hardware introspection ○ Flexible API to add/query server attributes ○ Burn in and acceptance process ○ Define when resources are available to users ■ State workflow ○ Policy needs to allow segregate access to the different teams ○ Clear retirement procedure
  • 23.
    CERN - HardwareLifecycle
  • 24.
    CERN - HardwareLifecycle ● Current CERN model ○ Automated but complex ○ Set of tools/DBs developed in house ○ Difficult to track and account resource utilization ○ CERN specific ● What we envision with Ironic ○ Capable to manage the entire the Hardware Lifecycle ■ Automated and Generic ■ Pluggable ■ Track resources
  • 25.
    ALaSKA - HardwareLife Cycle Auto-discovery and inspection Hardware anomaly detection with Cardiff Enrollment with Inspector rules Ansible-driven BIOS and RAID configuration Ansible-driven network switch configuration Kayobe: Kolla-on-Bifrost http://kayobe.readthedocs.io/
  • 26.
  • 27.
    Requirements ● SKA ScienceData Processor consumes a data feed of 1.5TB/s ● This data must be stored for 6 hours ● Processed datasets must be stored for 6 months
  • 28.
    Solutions ● High performancefilesystems ● High performance object stores ● High performance message queues
  • 29.
  • 30.
    Preemptible Instances ● ScientificClouds use project quotas ○ Projects have different funding models ■ They expect a predefined number of resources available ■ But not always these resources are used full time ○ Other projects can use these free resources ■ Opportunistic workloads ● Public clouds use a spot market for free resources ○ Based on different pricing/SLA considering resource availability ○ Private clouds usually don’t charge users directly ● How can the available resources be used more efficiently?
  • 31.
    Preemptible Instances ● Buildinga prototype ○ Minimise changes to OpenStack nova ● Approach ○ Preemptible instances are identified using metadata ○ Project quotas are not considered for preemptible instances ○ “NoValidHost” for a non preemptible instance triggers the “Reaper” service ○ The “Reaper” service is responsible to delete preemptible instances ■ Needs some intelligence to free up the resources necessary for the new instance ○ The original request retries ● Follow/Participate in the discussion ○ https://etherpad.openstack.org/p/nova-preemptible-servers-discussion ○ https://review.openstack.org/#/c/438640/2 ○ https://gitlab.cern.ch/ttsiouts/reaper/
  • 32.
    Magnum on BareMetal Better Ironic support in Magnum templates File and Block Storage within Magnum environments OpenStack Magnum - Manages clusters defined by cluster templates Supports Docker swarm mode & Kubernetes Remote access to clusters using native tooling (Docker client, kubectl, etc.) Automated scaling up/down Bare metal support not always current
  • 33.
    Sahara on BareMetal HiBD - Hadoop and Spark with Infiniband and RDMA acceleration OpenStack Sahara - Manages clusters defined by cluster templates Supports Hadoop and Spark Automated scaling up/down Extended with HiBD from OSU RDMA-enabled analytics
  • 34.
    OpenHPC on OpenStack Clusterinfrastructure deployed using Heat templates Configuration and “personalisation” in Ansible Slurm-as-a-Service deployment and configuration in Ansible Infiniband and MPI Home directories in CephFS Keys managed in Barbican
  • 35.
    OpenStack Scientific SIG ●Written with help from the OpenStack Scientific SIG ● Current best practice for OpenStack and HPC ● Six subject overviews with case studies contributed by WG members https://www.openstack.org/science/
  • 36.
    What will openstack‘Z’ look like? ● Due for release 2H 2022 ● …? ● …? ● …? ● Remaining details TBD