Operational War Stories from 5 Years
of Running OpenStack in Production
Arne Wiebalck, CERN
OpenStack Summit, Vancouver, May 2018
(R)Evolution in CERN IT:
CERN: Understand the mysteries of the universe!
Arne Wiebalck: Operational War Stories from 5 Years of Running OpenStack, OpenStack Summit, Vancouver 2018
2
Large Hadron Collider
• Largest machine ever built by mankind!
• 100m underground
• 27km circumference
• Protons do 11’000 turns/sec!
CERN: Understand the mysteries of the universe!
Arne Wiebalck: Operational War Stories from 5 Years of Running OpenStack, OpenStack Summit, Vancouver 2018
3
Four main detectors
• Positioned at interaction points
• ~10’000 tons heavy
• Handling ~1’000’000 collisions/sec
• Selecting ~200 events (few GB/sec)
Note the physicist
for scale!
CERN: Understand the mysteries of the universe!
Arne Wiebalck: Operational War Stories from 5 Years of Running OpenStack, OpenStack Summit, Vancouver 2018
4
home.cern
Mysteries
Challenges
Setbacks
Achievements
Distributed analysis
- 170 data centers worldwide
- 800k cores, ~0.9EB on disk/tape
90% of CERN’s
Compute Resources
are delivered on top
of OpenStack
5
Arne Wiebalck: Operational War Stories from 5 Years of Running OpenStack, OpenStack Summit, Vancouver 2018
Running OpenStack at CERN in production during the past 5 years …
Mysteries
Challenges
Setbacks
Achievements
Mysteries?
6
Arne Wiebalck: Operational War Stories from 5 Years of Running OpenStack, OpenStack Summit, Vancouver 2018
• Innocent instances being killed …?
- init script sending SIGTERM to wrong PID
• Host shut down shortly after boot …?
- IPMI bypass upset nova’s power state synchronization
• Deletions grinding Cinder to a halt …?
- “In-place upgrades are evil!”
• Bare metal database losing entries …?
- everyone has mixed up dev and prod once
• Upgrades before the upgrade …?
- some have even mixed up dev and prod twice 
• No monitoring when doing Manila tests…?
- reduce logging when launching 10k pods!
• Volume data loss on reboot …?
After a reboot, a user reported all data
disappeared from a mounted volume.
The volume was attached and mounted.
$ df –h
Filesystem Size Used Avail Use% Mounted on
…
/dev/vdb 3.5G 0 3.5G 0% /mnt
…
$ cat /etc/fstab
#
# /etc/fstab
#
# Accessible filesystems are maintained under '/dev/disk'
# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8)
#
…
/dev/vdb /mnt/myvolume tmpfs defaults 1 2
…
Challenges!
7
Arne Wiebalck: Operational War Stories from 5 Years of Running OpenStack, OpenStack Summit, Vancouver 2018
• Scaling an ever-growing service
- To 300k cores and ~38’000 instances, cells v1/v2
- Automate and outsource: workflow engines
• Continuously upgrading
- 10+ projects times 2 releases/year
- Operating systems upgrades
• Deal with team turn-over
- Around 15 members, but 50+ contributors in total!
• Reboot 9k hosts w/ 35’000 VMs!
- Triggered by Spectre/Meltdown
• Fix virtual-to-physical performance gap
Up to 20% loss on large VMs!
“Tuning”: KSM*, EPT, pinning, … 10%
Compare with Hyper-V: no issue ... NUMA!
Numa-awares & node pinning ... 3%!
Rolled out to our batch farm ... 
*KSM on/off: beware of the memory reclaim!
VM Before After
4x 8 7.8%
2x 16 16%
1x 24 20% 5%
1x 32 20% 3%
A Setback … Performance regression
8
Arne Wiebalck: Operational War Stories from 5 Years of Running OpenStack, OpenStack Summit, Vancouver 2018
After roll-out and recreation of thousands of
instances we received reports about (a few)
extremely slow instances!
VMs looked OK ...
Hosts: 30-50% load, 100k IRQs/sec (TLB)!
“EPT off” side-effect!
2MB huge pages
Slipped testing, required new campaign 
(Some) Achievements!
9
Arne Wiebalck: Operational War Stories from 5 Years of Running OpenStack, OpenStack Summit, Vancouver 2018
• 300k cores provisioned to experiments and services
• 9000 hosts with 40 h/w types integrated into the service
• Deployment across two data centers, 23ms apart
• Migration of 1000s of VMs due to hardware retirement
• 2M requests handled by a Magnum K8s cluster
• Sustained VM creation/deletion rate: every 10 secs
• Support use cases from number crunching to hotel bookings
• Service runs on virtual machines itself
• 5’000 volumes with >.5PB of data
• Many contributions upstream
• Close to 10 million instances created in total
• Running Nova Queens with multiple cells in v2
• …
Decided to move to an Agile Infra in 2012
OpenStack one of the cornerstones
“All services shall be virtual!”
Managed the increasing demand for
resources with a constant team size!
Reduced the provisioning cycle from
months to minutes!
Established single interface for
compute resource provisioning!
Enabled the (R)Evolution in CERN IT!
Operational War Stories from 5 Years of Running OpenStack in Production

Operational War Stories from 5 Years of Running OpenStack in Production

  • 1.
    Operational War Storiesfrom 5 Years of Running OpenStack in Production Arne Wiebalck, CERN OpenStack Summit, Vancouver, May 2018 (R)Evolution in CERN IT:
  • 2.
    CERN: Understand themysteries of the universe! Arne Wiebalck: Operational War Stories from 5 Years of Running OpenStack, OpenStack Summit, Vancouver 2018 2 Large Hadron Collider • Largest machine ever built by mankind! • 100m underground • 27km circumference • Protons do 11’000 turns/sec!
  • 3.
    CERN: Understand themysteries of the universe! Arne Wiebalck: Operational War Stories from 5 Years of Running OpenStack, OpenStack Summit, Vancouver 2018 3 Four main detectors • Positioned at interaction points • ~10’000 tons heavy • Handling ~1’000’000 collisions/sec • Selecting ~200 events (few GB/sec) Note the physicist for scale!
  • 4.
    CERN: Understand themysteries of the universe! Arne Wiebalck: Operational War Stories from 5 Years of Running OpenStack, OpenStack Summit, Vancouver 2018 4 home.cern Mysteries Challenges Setbacks Achievements Distributed analysis - 170 data centers worldwide - 800k cores, ~0.9EB on disk/tape 90% of CERN’s Compute Resources are delivered on top of OpenStack
  • 5.
    5 Arne Wiebalck: OperationalWar Stories from 5 Years of Running OpenStack, OpenStack Summit, Vancouver 2018 Running OpenStack at CERN in production during the past 5 years … Mysteries Challenges Setbacks Achievements
  • 6.
    Mysteries? 6 Arne Wiebalck: OperationalWar Stories from 5 Years of Running OpenStack, OpenStack Summit, Vancouver 2018 • Innocent instances being killed …? - init script sending SIGTERM to wrong PID • Host shut down shortly after boot …? - IPMI bypass upset nova’s power state synchronization • Deletions grinding Cinder to a halt …? - “In-place upgrades are evil!” • Bare metal database losing entries …? - everyone has mixed up dev and prod once • Upgrades before the upgrade …? - some have even mixed up dev and prod twice  • No monitoring when doing Manila tests…? - reduce logging when launching 10k pods! • Volume data loss on reboot …? After a reboot, a user reported all data disappeared from a mounted volume. The volume was attached and mounted. $ df –h Filesystem Size Used Avail Use% Mounted on … /dev/vdb 3.5G 0 3.5G 0% /mnt … $ cat /etc/fstab # # /etc/fstab # # Accessible filesystems are maintained under '/dev/disk' # See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) # … /dev/vdb /mnt/myvolume tmpfs defaults 1 2 …
  • 7.
    Challenges! 7 Arne Wiebalck: OperationalWar Stories from 5 Years of Running OpenStack, OpenStack Summit, Vancouver 2018 • Scaling an ever-growing service - To 300k cores and ~38’000 instances, cells v1/v2 - Automate and outsource: workflow engines • Continuously upgrading - 10+ projects times 2 releases/year - Operating systems upgrades • Deal with team turn-over - Around 15 members, but 50+ contributors in total! • Reboot 9k hosts w/ 35’000 VMs! - Triggered by Spectre/Meltdown • Fix virtual-to-physical performance gap Up to 20% loss on large VMs! “Tuning”: KSM*, EPT, pinning, … 10% Compare with Hyper-V: no issue ... NUMA! Numa-awares & node pinning ... 3%! Rolled out to our batch farm ...  *KSM on/off: beware of the memory reclaim! VM Before After 4x 8 7.8% 2x 16 16% 1x 24 20% 5% 1x 32 20% 3%
  • 8.
    A Setback …Performance regression 8 Arne Wiebalck: Operational War Stories from 5 Years of Running OpenStack, OpenStack Summit, Vancouver 2018 After roll-out and recreation of thousands of instances we received reports about (a few) extremely slow instances! VMs looked OK ... Hosts: 30-50% load, 100k IRQs/sec (TLB)! “EPT off” side-effect! 2MB huge pages Slipped testing, required new campaign 
  • 9.
    (Some) Achievements! 9 Arne Wiebalck:Operational War Stories from 5 Years of Running OpenStack, OpenStack Summit, Vancouver 2018 • 300k cores provisioned to experiments and services • 9000 hosts with 40 h/w types integrated into the service • Deployment across two data centers, 23ms apart • Migration of 1000s of VMs due to hardware retirement • 2M requests handled by a Magnum K8s cluster • Sustained VM creation/deletion rate: every 10 secs • Support use cases from number crunching to hotel bookings • Service runs on virtual machines itself • 5’000 volumes with >.5PB of data • Many contributions upstream • Close to 10 million instances created in total • Running Nova Queens with multiple cells in v2 • … Decided to move to an Agile Infra in 2012 OpenStack one of the cornerstones “All services shall be virtual!” Managed the increasing demand for resources with a constant team size! Reduced the provisioning cycle from months to minutes! Established single interface for compute resource provisioning! Enabled the (R)Evolution in CERN IT!