Integrating Bare-metal Provisioning
into CERN’s Private Cloud Service
Arne Wiebalck
Belmiro Moreira
Daniel Abad
Mateusz Kowalski
OpenStack Summit, Vancouver 2018
CERN: Understand the mysteries of the universe!
3
Large Hadron Collider
• Largest machine ever built by mankind!
• 100m underground
• 27km circumference
• Protons do 11’000 turns/sec!
Arne Wiebalck: Integrating Ironic into the CERN’s Private Cloud Service, OpenStack Summit, Vancouver 2018
CERN: Understand the mysteries of the universe!
4
Four main detectors
• Positioned at interaction points
• ~10’000 tons heavy
• Handling ~1’000’000 collisions/sec
• Selecting ~200 events (few GB/sec)
Note the physicist
for scale!
Arne Wiebalck: Integrating Ironic into the CERN’s Private Cloud Service, OpenStack Summit, Vancouver 2018
CERN: Understand the mysteries of the universe!
5
home.cern
Distributed analysis
- 170 data centers worldwide
- 800k cores, ~1EB on disk/tape
Arne Wiebalck: Integrating Ironic into the CERN’s Private Cloud Service, OpenStack Summit, Vancouver 2018
About the CERN IT Department
6
Enable the laboratory to fulfill its mission
- Main data centre on Meyrin/Geneva site
- Wigner data centre in Budapest (since 2013)
- Connected via three dedicated 100Gbs links
- Where possible, resources at both sites
(plus disaster recovery)
Drone footage of the CERN CC
Arne Wiebalck: Integrating Ironic into the CERN’s Private Cloud Service, OpenStack Summit, Vancouver 2018
7
Arne Wiebalck: Integrating Ironic into the CERN’s Private Cloud Service, OpenStack Summit, Vancouver 2018
The Agile Infrastructure Project
2012, a turning point for CERN IT:
- LHC Computing and data requirements were
increasing … Moore’s law would help, but not enough
- EU funded projects for fabric management
tool chain ended
- Staff not to grow with the managed resources
- LS1 (2013) ahead, next window only in 2019!
- Other deployments have surpassed CERN‘s
Three core areas:
- Centralized Monitoring
- Config’ management
- IaaS based on OpenStack
“All servers shall be virtual!”
0
20
40
60
80
100
120
140
160
Run 1 Run 2 Run 3 Run 4
GRID
ATLAS
CMS
LHCb
ALICE
we are
here
what we
can afford
The OpenStack Service at CERN
8
• Production since July 2013
- Several rolling upgrades since,
now mostly on Queens
- Many sub services deployed
• Spans two data centers
- One region, one API entry point
• CellsV2 with 70 cells
- Separate h/w, use case, power, location, …
• Deployed using RDO + Puppet
- Mostly upstream, patched where needed
• Many sub services run on VMs!
- Boot strapping
Arne Wiebalck: Integrating Ironic into the CERN’s Private Cloud Service, OpenStack Summit, Vancouver 2018
9
Arne Wiebalck: Integrating Ironic into the CERN’s Private Cloud Service, OpenStack Summit, Vancouver 2018
Why Bare-metal provisioning?
10
Arne Wiebalck: Integrating Ironic into the CERN’s Private Cloud Service, OpenStack Summit, Vancouver 2018
Why Bare-Metal Provisioning? (1)
• VMs not sensible/suitable for all of our use cases
- Storage and database nodes, HPC clusters, boot strapping,
critical network equipment or specialised network setups,
precise/repeatable benchmarking for s/w frameworks, …
• Complete our service offerings
- Physical nodes (in addition to VMs and containers)
- OpenStack UI as the single pane of glass
• Simplify hardware provisioning workflows
- For users: openstack server create/delete
- For procurement & h/w provisioning teams:
initial on-boarding, server re-assignments
11
Arne Wiebalck: Integrating Ironic into the CERN’s Private Cloud Service, OpenStack Summit, Vancouver 2018
Why Bare-Metal Provisioning? (2)
• Consolidate accounting & bookkeeping
- Resource accounting input will come from less sources
- Machine re-assignments will be easier to track
• Enable new use cases
- Containers on bare metal Doesn’t change the overall policy 
The reasons why we introduced virtual
machines have not gone away!
12
Arne Wiebalck: Integrating Ironic into the CERN’s Private Cloud Service, OpenStack Summit, Vancouver 2018
Ironic Overview
• Bare Metal Project in OpenStack
- Provision ‘physical’ instances, as part
of an OpenStack cloud (or independently)
- Allows Compute service to manage and provide
physical servers as if they were virtual machines
- User interfaces with Nova (which also provides
quotas, scheduling, ….)
• Hardware management possible
via common interfaces (& vendor-specific ones)
- PXE, IPMI
- Allows for unified interface to manage
heterogeneous machine park (~50 h/w types at CERN for hypervisors only!)
Nova
API + Scheduler
Requests physical
instance
Ironic
API + Conductor
Nova
Compute
picks
Ironic
Driver
Physical Servers
User
enrolls
Admin
Glance
Neutron
13
Arne Wiebalck: Integrating Ironic into the CERN’s Private Cloud Service, OpenStack Summit, Vancouver 2018
Ironic Components
Ironic DB
Inspector DB
ironic-api
Message Queue
Nova
ironic-
conductor
ironic-inspector
Admin
Physical Servers
ironic-python-agent
(IPA)
REST
API
ironic-python-agent
runs in temp RAM disk
provides remote access to
conductor and inspector
(inspect, configure, clean, deploy)
ironic-inspector
can be used for in-band inspection
(boot node into RAM disk,
collect data and update DB)ironic-api
receives, authenticates, and
handles requests (by RPC’ing
the ironic-conductor)
ironic-conductor
orchestrates node tasks:
add, edit, delete, provision,
deploy, clean, power, …
Database for service data
(e.g. nodes/ports, conductors)
Message queue for inter-
component communication
Node inspection states
14
Arne Wiebalck: Integrating Ironic into the CERN’s Private Cloud Service, OpenStack Summit, Vancouver 2018
Ironic Service Setup and Status
• Service runs on three controller nodes
- Symmetric: controllers api & conductor (& inspector)
- Automatic conductor affinity allows for easy add/remove
• Queens, (C)CentOS7, RDO, Puppet
inspector
conductor
api
inspector
conductor
api
inspector
conductor
api
~1300 nodes enrolled
~90% active (1150)
~30 different h/w types
Users:
- OpenStack 
- HPC, Windows, CMS, DB, …
Nodes per state Nodes per conductor
15
Arne Wiebalck: Integrating Ironic into the CERN’s Private Cloud Service, OpenStack Summit, Vancouver 2018
Integration with Nova CellsV1
• We’re a heavy user of cells
- Introduced for scalability reasons
- Also: resource management, location,
user and h/w separation, AVZs, …
• We carry some nova patches
- Project-to-cell mapping via project metadata
• We introduced a bare-metal cell
- All projects with physical flavors are mapped
- No mixing of VMs and BMs in the same project
Nova
Top Cell API
Nova
Top Cell Scheduler
Nova
Cell Scheduler
Nova
Compute
Project
Mapping
Bare-Metal Cell
Bare-Metal Cell
16
Arne Wiebalck: Integrating Ironic into the CERN’s Private Cloud Service, OpenStack Summit, Vancouver 2018
Our QA Ironic Controllers
• Environment needed to validate Ironic changes
- Close to production setup
- Different set of controllers
• Leverage cell-mapping
- QA project mapped to Bare-Metal
QA cell
- Endpoint filtering ensures project
gets QA controllers from the catalog
$ source admin.sh
$ openstack catalog list
…
| ironic | cern |
| | public: http://openstack.cern.ch:6385 |
| | admin: http://openstack.cern.ch:6385 |
| | internal: http://openstack.cern.ch:6385 |
…
$
$ grep OS_PROJECT_NAME admin.sh
export OS_PROJECT_NAME=”Bare-Metal"
$
$ source admin-qa.sh
$ openstack catalog list
…
| ironic | cern |
| | public: http://ironic-qa-01.cern.ch:6385 |
| | admin: http://ironic-qa-01.cern.ch:6385 |
| | internal: http://ironic-qa-01.cern.ch:6385 |
…
$
$ grep OS_PROJECT_NAME admin-qa.sh
export OS_PROJECT_NAME=”Bare-Metal QA"
17
Arne Wiebalck: Integrating Ironic into the CERN’s Private Cloud Service, OpenStack Summit, Vancouver 2018
Integration with ... Networking
• CERN network structure is simple, no segmentation*
- Storage: no management network
- OpenStack: no management network
- Ironic: no provisioning or cleaning network
• CERN networking team manages IP addresses
- Our OpenStack deployment interfaces with their service
• We’re in the transition between nova-network and Neutron
• Patched to Nova compute to basically not try to get an IP,
only change the existing entries in the network DB
* Not fully true
18
Arne Wiebalck: Integrating Ironic into the CERN’s Private Cloud Service, OpenStack Summit, Vancouver 2018
Integration with … Config’ mgmt / PXE
• Foreman, our host inventory
- no entries for physical instances, same behaviour as for VMs
- have a wrapper to create entries upon instantiation
(the wrapper also triggers on flavor property cern:physical=true to allow for subsequent installation)
- IPMI creds were stored there as well (openstack console url show)
• Using the CERN central PXE/TFTP server
- Mostly because it’s there 
- (nothing set up on the conductor)
19
Arne Wiebalck: Integrating Ironic into the CERN’s Private Cloud Service, OpenStack Summit, Vancouver 2018
Modifications and Additions
• Subclass of PXEAndIPMIToolDriver
- power on/off handling and image deployment
• Overriding …
- management: always boot from network first
- console: return IPMI URL, user name, and password
(we will revisit the web condole setup with shellinabox and also the
recently proposed VNC graphical console)
Driver: CERNPXEAndIPMIToolDriver
Module: AIMS
• Register with CERN installation infrastructure
- Used in pxe and inspector modules
• Processing hook to detect CERN custom
hardware properties
Inspector plugin: CERNDeviceDetection
$ grep –a1 CERN /etc/ironic-inspector/inspector.conf
[CERN]
inspected_capabilities = disk_enclosures, infiniband_adapters,
boot_mode, disk_label, cpu_name, cpu_family, cpu_model,
cpu_stepping
Overriding ironic_url
• Mask the endpoints from the catalog
- Use the local service (to not mix production and QA)
IRONIC IPA
20
Arne Wiebalck: Integrating Ironic into the CERN’s Private Cloud Service, OpenStack Summit, Vancouver 2018
Modifications: CernHardwareManager
• Hardware manager: support h/w in the IPA
- Customised hardware inspection, include specific tools e.g. for the BIOS
- Subclass to default GenericHardwareManager
• The CERN Hardware Manager …
- counts the attached disk enclosures
- counts the available Infiniband adapters
- checks the IPMI users with admin rights
(too many? password changed?  raise errors.CleaningError)
- deregisters the node from the central PXE server
- WIP: software RAID configuration
As h/w managers are run by the IPA
(which is part of the deploy image), the
image usually needs to be rebuilt every
time the h/w manager is changed …
To avoid this, the CERN Hardware
Manager is downloaded (git cloned)
every time the IPA is started, which
allows us to re-use the same image.
21
Arne Wiebalck: Integrating Ironic into the CERN’s Private Cloud Service, OpenStack Summit, Vancouver 2018
Some issues challenges …
Unexpected node shutdowns
22
• User complains his physical instance get powered off continuously!
- « … after we boot it up, it powers down! »
Arne Wiebalck: Integrating Ironic into the CERN’s Private Cloud Service, OpenStack Summit, Vancouver 2018
$ nova instance-action list $UUID|grep –c stop
2018-03-08 14:38:57.593 23408 INFO ironic.conductor.utils …
Successfully set node 953f1886-e68c-4129-abcd-xyz power state to
power off by soft power off.
Ironic Conductor logs
2018-03-08 14:38:18.407 23376 INFO eventlet.wsgi.server … "PUT
/v1/nodes/ 953f1886-e68c-4129-abcd-xyz /states/power HTTP/1.1" …
Ironic API logs
2018-03-08 14:38:16.141 8245 INFO nova.compute.manager [-] … During
_sync_instance_power_state the DB power_state (4) does not match the vm_power_state
from the hypervisor (1). Updating power_state in the DB to match the hypervisor.
2018-03-08 14:38:16.944 8245 WARNING nova.compute.manager [-] … Instance is not
stopped. Calling the stop API. …
2018-03-08 14:38:58.524 8245 INFO nova.virt.ironic.driver … Successfully soft powered off
Ironic node 953f1886-e68c-4129-abcd-xyz.
Nova Compute logs
- nova syncs DB upon instance shut-down
(virtual or physical, via API or from within)
- nova syncs instance when an instance
comes up bypassing the API
- owner of physical instance started the
node via IPMI
nova config issue!
Options:
- disable sync
- make the instance the source of truth
- remove IPMI access from users 
“Scrubbing the bathtub”
23
• Conductor affinity can leave nodes behind …
- How does the conductor affinity work anyway?
Arne Wiebalck: Integrating Ironic into the CERN’s Private Cloud Service, OpenStack Summit, Vancouver 2018
‘Location dependent’ scheduling
24
• A delivery may be installed in two buildings
• We started with one flavor per delivery / hardware type
(Q: How do you name your bare-metal flavors?)
• Control where the current instance goes?
• In Pike with flavor capabilities and node properties:
capabilities:cern_delivery=‘dl4636624’,
capabilities:cern_ip_service=‘S912_C_IP21’,
…
• In Queens with resource classes (we don’t use traits atm):
resources:CUSTOM_BAREMETAL_P1_DL4636624_S912_C_IP21
Arne Wiebalck: Integrating Ironic into the CERN’s Private Cloud Service, OpenStack Summit, Vancouver 2018
$
$ openstack flavor show p1.dl4636624.912_C_IP21
…
| properties | resources:CUSTOM_BAREMETAL_P1_DL4636624
_S912_C_IP21=‘1’
…
$
$
$ openstack baremetal node show i7865234753
…
| resource_class| BAREMETAL_P1_DL4636624_S912_C_IP21
…
$
$
Ping me if you have a good
flavor naming scheme!
The Nova Upgrade to Queens
25
Arne Wiebalck: Integrating Ironic into the CERN’s Private Cloud Service, OpenStack Summit, Vancouver 2018
• Multi-cell deployment
- Ocata  Queens, cells v1  v2
(plus backports to support cell scheduling)
• Known and “could have known” caveats
- properties/capabillities  resource classes
- Nova Q requires BM REST API v1.37  Ironic Queens
slipped our testing… ‘emergency’ upgrade!
(explicitly mentioned in the Ironic upgrade guide)
• Surprises: “No valid host”
- request_filter requires aggregates (which need manual updates)
- compute node fail-over creates “new” resource providers (which
need to be mapped to the aggregates again!) …reduced CNs to 1*
- Resource provider aggregations update took too long (loops sequentially
over all physical nodes)  removed, as not needed.
Physical Servers
Nova
Cell Controller
Nova
Compute
Physical Servers
Nova
Compute
Physical Servers
Nova
Compute
…
Bare-Metal Cell
* Nova Bug 1771806
26
Arne Wiebalck: Integrating Ironic into the CERN’s Private Cloud Service, OpenStack Summit, Vancouver 2018
Support for deploy-time s/w RAID
• Vast majority of the (compute) servers in the CERN DCs use software RAID
- compute: RAID-0, e.g. batch farm, elastic search
- services: RAID-1 & RAID-10, e.g. hypervisors
• Ironic server instantiation is a 2-step process for now
- Via OpenStack (to get hold of a node and have it registered)
- Via PXE to apply custom RAID config
• WIP: CERNHardwareManager
- Basic support for s/w RAID
- Detect all devices and create one array of desired type
- No partition support, no disk subsets, single level
Support for this be
very welcome!
Almost works 
27
Procurement Workflows & Ironic …
28
Re-allocationAllocation
Foreman
Recently added
Burn-in
29
Arne Wiebalck: Integrating Ironic into the CERN’s Private Cloud Service, OpenStack Summit, Vancouver 2018
• h/w purchases: formal procedure compliant with public procurements
- Market survey identifies potential bidders
- Tender spec is sent to ask for offers
- Larger deliveries 1-2 times / year
• “Burn-in” before acceptance
- Compliance with technical spec (e.g. performance)
- Find failed components (e.g. broken RAM)
- Find systematic errors (e.g. bad firmware)
- Provoke early failing due to stress
Whole process can take weeks!
Hardware Burn-in in the CERN Data Centre (1)
“bathtub curve”
30
Arne Wiebalck: Integrating Ironic into the CERN’s Private Cloud Service, OpenStack Summit, Vancouver 2018
Hardware Burn-in in the CERN Data Centre (2)
• Initial checks: Serial Asset Tag and BIOS settings
- Purchase order ID and unique serial no. to be set in the BMC (node name!)
• “Burn-in” tests
- CPU: burnK7, burnP6, burnMMX (cooling)
- RAM: memtest, Disk: badblocks
- Network: iperf(3) between pairs of nodes
- automatic node pairing
- Benchmarking: HEPSpec06 (& fio)
- derivative of SPEC06
- we buy total compute capacity (not newest processors)
$ ipmitool fru print 0 | tail -2
Product Serial : 245410-1
Product Asset Tag : CD5792984
$ openstack baremetal node show CD5792984-245410-1
“Double peak” structure due
to slower hardware threads
OpenAccess paper
31
Arne Wiebalck: Integrating Ironic into the CERN’s Private Cloud Service, OpenStack Summit, Vancouver 2018
Procurement Workflows: Plans
• Converge on single deploy image between teams
- WIP: Burn-in tests
… need to look into configuring time limits (automated vs. manual cleaning?)
- Initial node registration with CERN networking
- Retirement and sanitization/donation
… ‘extreme cleaning’, e.g. no CERN data in the BMC
• Remove additional hardware DBs
- Firmware versions (upgrades?), IPMI pw backup, …
- Interfaces to various tools (ticketing, security, …)
… ironic_observer role?
• Mistral workflows
- Boot into this image, flash the f/w, reboot, …
32
One more thing …
33
Arne Wiebalck: Integrating Ironic into the CERN’s Private Cloud Service, OpenStack Summit, Vancouver 2018
A new use case: Containers on Bare-Metal
• Our cloud provides Magnum
- 250+ clusters, mostly Kubernetes
- Nodes are virtual machines
• Batch farm runs in VMs as well
- 3% performance overhead, 0% with containers
• General service offer: managed clusters
- Users get only K8s credentials
- Cloud team manages the cluster and the underlying
infra
Integration: seamless!
(based on specific template)
Monitoring (metrics/logs)?
 Pod in the cluster
 Logs: fluentd + ES
 Metrics: cadvisor + influx
34
Arne Wiebalck: Integrating Ironic into the CERN’s Private Cloud Service, OpenStack Summit, Vancouver 2018
Summary and Outlook
• Ironic moved to Production in the CERN cloud!
- Minor modifications, no major issues
- 1’300 nodes enrolled, >1’000 instances!
- All physical hardware is handed out via Ironic now
• Existing workflows and new use cases
- First steps done, much more to come!
• About to start: Enrollment of the remaining servers*
*Only 10’000 servers left to do 
Integrating Bare-metal Provisioning into CERN's Private Cloud

Integrating Bare-metal Provisioning into CERN's Private Cloud

  • 2.
    Integrating Bare-metal Provisioning intoCERN’s Private Cloud Service Arne Wiebalck Belmiro Moreira Daniel Abad Mateusz Kowalski OpenStack Summit, Vancouver 2018
  • 3.
    CERN: Understand themysteries of the universe! 3 Large Hadron Collider • Largest machine ever built by mankind! • 100m underground • 27km circumference • Protons do 11’000 turns/sec! Arne Wiebalck: Integrating Ironic into the CERN’s Private Cloud Service, OpenStack Summit, Vancouver 2018
  • 4.
    CERN: Understand themysteries of the universe! 4 Four main detectors • Positioned at interaction points • ~10’000 tons heavy • Handling ~1’000’000 collisions/sec • Selecting ~200 events (few GB/sec) Note the physicist for scale! Arne Wiebalck: Integrating Ironic into the CERN’s Private Cloud Service, OpenStack Summit, Vancouver 2018
  • 5.
    CERN: Understand themysteries of the universe! 5 home.cern Distributed analysis - 170 data centers worldwide - 800k cores, ~1EB on disk/tape Arne Wiebalck: Integrating Ironic into the CERN’s Private Cloud Service, OpenStack Summit, Vancouver 2018
  • 6.
    About the CERNIT Department 6 Enable the laboratory to fulfill its mission - Main data centre on Meyrin/Geneva site - Wigner data centre in Budapest (since 2013) - Connected via three dedicated 100Gbs links - Where possible, resources at both sites (plus disaster recovery) Drone footage of the CERN CC Arne Wiebalck: Integrating Ironic into the CERN’s Private Cloud Service, OpenStack Summit, Vancouver 2018
  • 7.
    7 Arne Wiebalck: IntegratingIronic into the CERN’s Private Cloud Service, OpenStack Summit, Vancouver 2018 The Agile Infrastructure Project 2012, a turning point for CERN IT: - LHC Computing and data requirements were increasing … Moore’s law would help, but not enough - EU funded projects for fabric management tool chain ended - Staff not to grow with the managed resources - LS1 (2013) ahead, next window only in 2019! - Other deployments have surpassed CERN‘s Three core areas: - Centralized Monitoring - Config’ management - IaaS based on OpenStack “All servers shall be virtual!” 0 20 40 60 80 100 120 140 160 Run 1 Run 2 Run 3 Run 4 GRID ATLAS CMS LHCb ALICE we are here what we can afford
  • 8.
    The OpenStack Serviceat CERN 8 • Production since July 2013 - Several rolling upgrades since, now mostly on Queens - Many sub services deployed • Spans two data centers - One region, one API entry point • CellsV2 with 70 cells - Separate h/w, use case, power, location, … • Deployed using RDO + Puppet - Mostly upstream, patched where needed • Many sub services run on VMs! - Boot strapping Arne Wiebalck: Integrating Ironic into the CERN’s Private Cloud Service, OpenStack Summit, Vancouver 2018
  • 9.
    9 Arne Wiebalck: IntegratingIronic into the CERN’s Private Cloud Service, OpenStack Summit, Vancouver 2018 Why Bare-metal provisioning?
  • 10.
    10 Arne Wiebalck: IntegratingIronic into the CERN’s Private Cloud Service, OpenStack Summit, Vancouver 2018 Why Bare-Metal Provisioning? (1) • VMs not sensible/suitable for all of our use cases - Storage and database nodes, HPC clusters, boot strapping, critical network equipment or specialised network setups, precise/repeatable benchmarking for s/w frameworks, … • Complete our service offerings - Physical nodes (in addition to VMs and containers) - OpenStack UI as the single pane of glass • Simplify hardware provisioning workflows - For users: openstack server create/delete - For procurement & h/w provisioning teams: initial on-boarding, server re-assignments
  • 11.
    11 Arne Wiebalck: IntegratingIronic into the CERN’s Private Cloud Service, OpenStack Summit, Vancouver 2018 Why Bare-Metal Provisioning? (2) • Consolidate accounting & bookkeeping - Resource accounting input will come from less sources - Machine re-assignments will be easier to track • Enable new use cases - Containers on bare metal Doesn’t change the overall policy  The reasons why we introduced virtual machines have not gone away!
  • 12.
    12 Arne Wiebalck: IntegratingIronic into the CERN’s Private Cloud Service, OpenStack Summit, Vancouver 2018 Ironic Overview • Bare Metal Project in OpenStack - Provision ‘physical’ instances, as part of an OpenStack cloud (or independently) - Allows Compute service to manage and provide physical servers as if they were virtual machines - User interfaces with Nova (which also provides quotas, scheduling, ….) • Hardware management possible via common interfaces (& vendor-specific ones) - PXE, IPMI - Allows for unified interface to manage heterogeneous machine park (~50 h/w types at CERN for hypervisors only!) Nova API + Scheduler Requests physical instance Ironic API + Conductor Nova Compute picks Ironic Driver Physical Servers User enrolls Admin Glance Neutron
  • 13.
    13 Arne Wiebalck: IntegratingIronic into the CERN’s Private Cloud Service, OpenStack Summit, Vancouver 2018 Ironic Components Ironic DB Inspector DB ironic-api Message Queue Nova ironic- conductor ironic-inspector Admin Physical Servers ironic-python-agent (IPA) REST API ironic-python-agent runs in temp RAM disk provides remote access to conductor and inspector (inspect, configure, clean, deploy) ironic-inspector can be used for in-band inspection (boot node into RAM disk, collect data and update DB)ironic-api receives, authenticates, and handles requests (by RPC’ing the ironic-conductor) ironic-conductor orchestrates node tasks: add, edit, delete, provision, deploy, clean, power, … Database for service data (e.g. nodes/ports, conductors) Message queue for inter- component communication Node inspection states
  • 14.
    14 Arne Wiebalck: IntegratingIronic into the CERN’s Private Cloud Service, OpenStack Summit, Vancouver 2018 Ironic Service Setup and Status • Service runs on three controller nodes - Symmetric: controllers api & conductor (& inspector) - Automatic conductor affinity allows for easy add/remove • Queens, (C)CentOS7, RDO, Puppet inspector conductor api inspector conductor api inspector conductor api ~1300 nodes enrolled ~90% active (1150) ~30 different h/w types Users: - OpenStack  - HPC, Windows, CMS, DB, … Nodes per state Nodes per conductor
  • 15.
    15 Arne Wiebalck: IntegratingIronic into the CERN’s Private Cloud Service, OpenStack Summit, Vancouver 2018 Integration with Nova CellsV1 • We’re a heavy user of cells - Introduced for scalability reasons - Also: resource management, location, user and h/w separation, AVZs, … • We carry some nova patches - Project-to-cell mapping via project metadata • We introduced a bare-metal cell - All projects with physical flavors are mapped - No mixing of VMs and BMs in the same project Nova Top Cell API Nova Top Cell Scheduler Nova Cell Scheduler Nova Compute Project Mapping Bare-Metal Cell Bare-Metal Cell
  • 16.
    16 Arne Wiebalck: IntegratingIronic into the CERN’s Private Cloud Service, OpenStack Summit, Vancouver 2018 Our QA Ironic Controllers • Environment needed to validate Ironic changes - Close to production setup - Different set of controllers • Leverage cell-mapping - QA project mapped to Bare-Metal QA cell - Endpoint filtering ensures project gets QA controllers from the catalog $ source admin.sh $ openstack catalog list … | ironic | cern | | | public: http://openstack.cern.ch:6385 | | | admin: http://openstack.cern.ch:6385 | | | internal: http://openstack.cern.ch:6385 | … $ $ grep OS_PROJECT_NAME admin.sh export OS_PROJECT_NAME=”Bare-Metal" $ $ source admin-qa.sh $ openstack catalog list … | ironic | cern | | | public: http://ironic-qa-01.cern.ch:6385 | | | admin: http://ironic-qa-01.cern.ch:6385 | | | internal: http://ironic-qa-01.cern.ch:6385 | … $ $ grep OS_PROJECT_NAME admin-qa.sh export OS_PROJECT_NAME=”Bare-Metal QA"
  • 17.
    17 Arne Wiebalck: IntegratingIronic into the CERN’s Private Cloud Service, OpenStack Summit, Vancouver 2018 Integration with ... Networking • CERN network structure is simple, no segmentation* - Storage: no management network - OpenStack: no management network - Ironic: no provisioning or cleaning network • CERN networking team manages IP addresses - Our OpenStack deployment interfaces with their service • We’re in the transition between nova-network and Neutron • Patched to Nova compute to basically not try to get an IP, only change the existing entries in the network DB * Not fully true
  • 18.
    18 Arne Wiebalck: IntegratingIronic into the CERN’s Private Cloud Service, OpenStack Summit, Vancouver 2018 Integration with … Config’ mgmt / PXE • Foreman, our host inventory - no entries for physical instances, same behaviour as for VMs - have a wrapper to create entries upon instantiation (the wrapper also triggers on flavor property cern:physical=true to allow for subsequent installation) - IPMI creds were stored there as well (openstack console url show) • Using the CERN central PXE/TFTP server - Mostly because it’s there  - (nothing set up on the conductor)
  • 19.
    19 Arne Wiebalck: IntegratingIronic into the CERN’s Private Cloud Service, OpenStack Summit, Vancouver 2018 Modifications and Additions • Subclass of PXEAndIPMIToolDriver - power on/off handling and image deployment • Overriding … - management: always boot from network first - console: return IPMI URL, user name, and password (we will revisit the web condole setup with shellinabox and also the recently proposed VNC graphical console) Driver: CERNPXEAndIPMIToolDriver Module: AIMS • Register with CERN installation infrastructure - Used in pxe and inspector modules • Processing hook to detect CERN custom hardware properties Inspector plugin: CERNDeviceDetection $ grep –a1 CERN /etc/ironic-inspector/inspector.conf [CERN] inspected_capabilities = disk_enclosures, infiniband_adapters, boot_mode, disk_label, cpu_name, cpu_family, cpu_model, cpu_stepping Overriding ironic_url • Mask the endpoints from the catalog - Use the local service (to not mix production and QA) IRONIC IPA
  • 20.
    20 Arne Wiebalck: IntegratingIronic into the CERN’s Private Cloud Service, OpenStack Summit, Vancouver 2018 Modifications: CernHardwareManager • Hardware manager: support h/w in the IPA - Customised hardware inspection, include specific tools e.g. for the BIOS - Subclass to default GenericHardwareManager • The CERN Hardware Manager … - counts the attached disk enclosures - counts the available Infiniband adapters - checks the IPMI users with admin rights (too many? password changed?  raise errors.CleaningError) - deregisters the node from the central PXE server - WIP: software RAID configuration As h/w managers are run by the IPA (which is part of the deploy image), the image usually needs to be rebuilt every time the h/w manager is changed … To avoid this, the CERN Hardware Manager is downloaded (git cloned) every time the IPA is started, which allows us to re-use the same image.
  • 21.
    21 Arne Wiebalck: IntegratingIronic into the CERN’s Private Cloud Service, OpenStack Summit, Vancouver 2018 Some issues challenges …
  • 22.
    Unexpected node shutdowns 22 •User complains his physical instance get powered off continuously! - « … after we boot it up, it powers down! » Arne Wiebalck: Integrating Ironic into the CERN’s Private Cloud Service, OpenStack Summit, Vancouver 2018 $ nova instance-action list $UUID|grep –c stop 2018-03-08 14:38:57.593 23408 INFO ironic.conductor.utils … Successfully set node 953f1886-e68c-4129-abcd-xyz power state to power off by soft power off. Ironic Conductor logs 2018-03-08 14:38:18.407 23376 INFO eventlet.wsgi.server … "PUT /v1/nodes/ 953f1886-e68c-4129-abcd-xyz /states/power HTTP/1.1" … Ironic API logs 2018-03-08 14:38:16.141 8245 INFO nova.compute.manager [-] … During _sync_instance_power_state the DB power_state (4) does not match the vm_power_state from the hypervisor (1). Updating power_state in the DB to match the hypervisor. 2018-03-08 14:38:16.944 8245 WARNING nova.compute.manager [-] … Instance is not stopped. Calling the stop API. … 2018-03-08 14:38:58.524 8245 INFO nova.virt.ironic.driver … Successfully soft powered off Ironic node 953f1886-e68c-4129-abcd-xyz. Nova Compute logs - nova syncs DB upon instance shut-down (virtual or physical, via API or from within) - nova syncs instance when an instance comes up bypassing the API - owner of physical instance started the node via IPMI nova config issue! Options: - disable sync - make the instance the source of truth - remove IPMI access from users 
  • 23.
    “Scrubbing the bathtub” 23 •Conductor affinity can leave nodes behind … - How does the conductor affinity work anyway? Arne Wiebalck: Integrating Ironic into the CERN’s Private Cloud Service, OpenStack Summit, Vancouver 2018
  • 24.
    ‘Location dependent’ scheduling 24 •A delivery may be installed in two buildings • We started with one flavor per delivery / hardware type (Q: How do you name your bare-metal flavors?) • Control where the current instance goes? • In Pike with flavor capabilities and node properties: capabilities:cern_delivery=‘dl4636624’, capabilities:cern_ip_service=‘S912_C_IP21’, … • In Queens with resource classes (we don’t use traits atm): resources:CUSTOM_BAREMETAL_P1_DL4636624_S912_C_IP21 Arne Wiebalck: Integrating Ironic into the CERN’s Private Cloud Service, OpenStack Summit, Vancouver 2018 $ $ openstack flavor show p1.dl4636624.912_C_IP21 … | properties | resources:CUSTOM_BAREMETAL_P1_DL4636624 _S912_C_IP21=‘1’ … $ $ $ openstack baremetal node show i7865234753 … | resource_class| BAREMETAL_P1_DL4636624_S912_C_IP21 … $ $ Ping me if you have a good flavor naming scheme!
  • 25.
    The Nova Upgradeto Queens 25 Arne Wiebalck: Integrating Ironic into the CERN’s Private Cloud Service, OpenStack Summit, Vancouver 2018 • Multi-cell deployment - Ocata  Queens, cells v1  v2 (plus backports to support cell scheduling) • Known and “could have known” caveats - properties/capabillities  resource classes - Nova Q requires BM REST API v1.37  Ironic Queens slipped our testing… ‘emergency’ upgrade! (explicitly mentioned in the Ironic upgrade guide) • Surprises: “No valid host” - request_filter requires aggregates (which need manual updates) - compute node fail-over creates “new” resource providers (which need to be mapped to the aggregates again!) …reduced CNs to 1* - Resource provider aggregations update took too long (loops sequentially over all physical nodes)  removed, as not needed. Physical Servers Nova Cell Controller Nova Compute Physical Servers Nova Compute Physical Servers Nova Compute … Bare-Metal Cell * Nova Bug 1771806
  • 26.
    26 Arne Wiebalck: IntegratingIronic into the CERN’s Private Cloud Service, OpenStack Summit, Vancouver 2018 Support for deploy-time s/w RAID • Vast majority of the (compute) servers in the CERN DCs use software RAID - compute: RAID-0, e.g. batch farm, elastic search - services: RAID-1 & RAID-10, e.g. hypervisors • Ironic server instantiation is a 2-step process for now - Via OpenStack (to get hold of a node and have it registered) - Via PXE to apply custom RAID config • WIP: CERNHardwareManager - Basic support for s/w RAID - Detect all devices and create one array of desired type - No partition support, no disk subsets, single level Support for this be very welcome! Almost works 
  • 27.
  • 28.
  • 29.
    29 Arne Wiebalck: IntegratingIronic into the CERN’s Private Cloud Service, OpenStack Summit, Vancouver 2018 • h/w purchases: formal procedure compliant with public procurements - Market survey identifies potential bidders - Tender spec is sent to ask for offers - Larger deliveries 1-2 times / year • “Burn-in” before acceptance - Compliance with technical spec (e.g. performance) - Find failed components (e.g. broken RAM) - Find systematic errors (e.g. bad firmware) - Provoke early failing due to stress Whole process can take weeks! Hardware Burn-in in the CERN Data Centre (1) “bathtub curve”
  • 30.
    30 Arne Wiebalck: IntegratingIronic into the CERN’s Private Cloud Service, OpenStack Summit, Vancouver 2018 Hardware Burn-in in the CERN Data Centre (2) • Initial checks: Serial Asset Tag and BIOS settings - Purchase order ID and unique serial no. to be set in the BMC (node name!) • “Burn-in” tests - CPU: burnK7, burnP6, burnMMX (cooling) - RAM: memtest, Disk: badblocks - Network: iperf(3) between pairs of nodes - automatic node pairing - Benchmarking: HEPSpec06 (& fio) - derivative of SPEC06 - we buy total compute capacity (not newest processors) $ ipmitool fru print 0 | tail -2 Product Serial : 245410-1 Product Asset Tag : CD5792984 $ openstack baremetal node show CD5792984-245410-1 “Double peak” structure due to slower hardware threads OpenAccess paper
  • 31.
    31 Arne Wiebalck: IntegratingIronic into the CERN’s Private Cloud Service, OpenStack Summit, Vancouver 2018 Procurement Workflows: Plans • Converge on single deploy image between teams - WIP: Burn-in tests … need to look into configuring time limits (automated vs. manual cleaning?) - Initial node registration with CERN networking - Retirement and sanitization/donation … ‘extreme cleaning’, e.g. no CERN data in the BMC • Remove additional hardware DBs - Firmware versions (upgrades?), IPMI pw backup, … - Interfaces to various tools (ticketing, security, …) … ironic_observer role? • Mistral workflows - Boot into this image, flash the f/w, reboot, …
  • 32.
  • 33.
    33 Arne Wiebalck: IntegratingIronic into the CERN’s Private Cloud Service, OpenStack Summit, Vancouver 2018 A new use case: Containers on Bare-Metal • Our cloud provides Magnum - 250+ clusters, mostly Kubernetes - Nodes are virtual machines • Batch farm runs in VMs as well - 3% performance overhead, 0% with containers • General service offer: managed clusters - Users get only K8s credentials - Cloud team manages the cluster and the underlying infra Integration: seamless! (based on specific template) Monitoring (metrics/logs)?  Pod in the cluster  Logs: fluentd + ES  Metrics: cadvisor + influx
  • 34.
    34 Arne Wiebalck: IntegratingIronic into the CERN’s Private Cloud Service, OpenStack Summit, Vancouver 2018 Summary and Outlook • Ironic moved to Production in the CERN cloud! - Minor modifications, no major issues - 1’300 nodes enrolled, >1’000 instances! - All physical hardware is handed out via Ironic now • Existing workflows and new use cases - First steps done, much more to come! • About to start: Enrollment of the remaining servers* *Only 10’000 servers left to do 