Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Unveiling CERN Cloud Architecture - October, 2015

680 views

Published on

Unveiling CERN Cloud Architecture
OpenStack Design Summit, Tokyo - October, 2015

Published in: Technology
  • Be the first to comment

Unveiling CERN Cloud Architecture - October, 2015

  1. 1. Unveiling CERN Cloud Architecture Openstack Design Summit – Tokyo, 2015 Belmiro Moreira belmiro.moreira@cern.ch @belmiromoreira
  2. 2. What is CERN? •  European Organization for Nuclear Research (Conseil Européen pour la Recherche Nucléaire) •  Founded in 1954 •  21 state members, other countries contribute to experiments •  Situated between Geneva and the Jura Mountains, straddling the Swiss-French border •  CERN mission is to do fundamental research 3
  3. 3. LHC - Large Hadron Collider 4
  4. 4. LHC and Experiments 5 CMS detector https://www.google.com/maps/streetview/#cern
  5. 5. LHC and Experiments 6 Proton-lead collisions at ALICE detector
  6. 6. CERN Data Centres 7
  7. 7. OpenStack at CERN by numbers 8 ~ 5000 Compute Nodes (~130k cores) •  ~ 4800 KVM •  ~ 200 Hyper-V ~ 2400 Images ( ~ 30 TB in use) ~ 1800 Volumes ( ~ 800 TB allocated) ~ 2000 Users ~ 2300 Projects ~ 16000 VMs running Number of VMs created (green) and VMs deleted (red) every 30 minutes
  8. 8. OpenStack timeline at CERN 9 ESSEX 5 Apr 2012 FOLSOM 27 Sep 2012 GRIZZLY 4 Apr 2013 HAVANA 17 Oct 2013 ICEHOUSE 17 Apr 2014 JUNO 16 Oct 2014 Havana February 2014 Icehouse October 2014 KILO 30 Apr 2015 “Hamster” Oct 2013 “Guppy” Jun 2012 “Ibex” Mar 2013 Grizzly Jul 2013 Juno April 2015 LIBERTY Kilo October 2015 CERN production infrastructure
  9. 9. •  Evolution of the number of VMs created since July 2013 OpenStack timeline at CERN 10 Number of VMs running Number of VMs created (cumulative)
  10. 10. Infrastructure Overview •  One region, two data centres, 26 Cells •  HA architecture only on Top Cell •  Children Cells control plane are usually VMs running in the shared infrastructure •  Using nova-network with custom CERN driver •  2 Hypervisor types (KVM, HyperV) •  Scientific Linux CERN 6; CERN Centos 7; Windows Server 2012 R2 •  2 Ceph instances •  Keystone integrated with CERN account/lifecycle system •  Nova; Keystone; Glance; Cinder; Heat; Horizon, Ceilometer; Rally •  Deployment using OpenStack puppet modules and RDO 11
  11. 11. Architecture Overview 12 Nova Compute Cell Nova Top Cell Nova Compute Cell Nova Compute Cell Load BalancerCeph Glance Cinder Heat Ceilometer Horizon Keystone DB infrastructure (...) Geneva Data Centre Budapest Data Centre Ceph DB infrastructure Nova Compute Cell Nova Compute Cell Nova Compute Cell (...)
  12. 12. Why Cells? •  Single endpoint to users •  Scale transparently between Data Centres •  Availability and Resilience •  Isolate different use-cases 13
  13. 13. CellsV1 Limitations •  Functionality Limitations: •  Security Groups •  Manage aggregates on Top Cell •  Availability Zone support •  Cell scheduler limited functionality •  Ceilometer integration 14
  14. 14. Nova Deployment at CERN 15 nova-cells rabbitmqTop cell controller API node nova-api rabbitmq nova-cells nova-api nova-scheduler nova-conductor nova-network Child cell controller Compute node nova-compute rabbitmq nova-cells nova-api nova-scheduler nova-conductor nova-network Child cell controller Compute node nova-compute DB (...) Load Balancer DB DB
  15. 15. Nova - Cells Control Plane Top Cell Controller: •  Controller nodes running only on physical nodes •  Clustered RabbitMQ with mirrored queues •  “nova-api” nodes are VMs •  deployed in the “common” (user shared) infrastructure 16 Children Cells Controllers: •  Only ONE controller node per cell •  NO HA at Children Cell level •  Most are VMs running in other Cells •  Children Cell controller fails? •  Replaced by another VM •  User VMs are still available •  ~200 compute nodes per cell
  16. 16. Nova - Cells Scheduling •  Different cells have different use cases •  Hardware, Location, Network configuration, Hypervisor type, ... •  Cells capabilities •  “datacentre”, “hypervisor”, “avzs” •  example: capabilities=hypervisor=kvm,avzs=avz-a,datacentre=geneva •  scheduler filters to use these capabilities •  CERN Cell Filters available at: https://github.com/cernops/nova/tree/cern-2014.2.2-1/nova/cells/filters 17
  17. 17. Nova - Cells Scheduling - Project Mapping How we map projects to cells? https://github.com/cernops/nova/blob/cern-2014.2.2-2/nova/cells/filters/target_cell_project.py •  Default cells; Dedicated cells •  Target cell will be selected considering the following configuration: “nova.conf” cells_default=cellA,cellB,cellC,cellD cells_projects=cellE:<project_uuid1>;<project_uuid2>,cellF:<project_uuid3> •  “disabling” a cell is removing it from the list... http://openstack-in-production.blogspot.fr/2015/10/scheduling-and-disabling-cells.html 18
  18. 18. Nova - Cells Scheduling - AVZs •  CellsV1 implementation is not aware of aggregates •  How to have AVZs with cells? •  Create the aggregate/availability zone in the Top Cell •  Create “fake” nova-compute services to add nodes into the AVZs aggregates •  Cell scheduler uses “capabilities” to identify AVZs •  NO aggregates in the children cells 19
  19. 19. Nova - Legacy Child Cell configuration at CERN •  Our first cell (2013) •  Cell with >1000 compute nodes •  Any problem in Cell control plane had huge impact •  All availability zones behind this Cell using aggregates •  Aggregates dedicated to specific projects •  Multiple hardware types •  KVM and Hyper-V 20
  20. 20. Nova - Cell Division (from 1 to 9) How to divide an existing Cell? •  Setup new Child Cells controllers •  Copy the existing DB to all new Cells and delete all instance records that will not belong to the new Cell •  Move compute nodes to new Cells •  Change instances “cells path” in Top Cell DB 21
  21. 21. Nova - Live Migration •  Block live migration •  Compute nodes don’t have shared storage •  Not used for daily operations... •  Resources availability and network clusters constraints •  Only considered for pets •  Planned for the SLC6 to CC7 migration •  Planned for hardware end of life •  How to orchestrate large live-migration campaign? 22
  22. 22. Nova - Live Migration •  Block live migration with volumes attached is problematic... •  Attached Cinder volumes are block migrated along with instance •  They are copied, over the network, from themselves to themselves •  Can cause data corruption •  https://bugs.launchpad.net/nova/+bug/1376615 •  https://bugzilla.redhat.com/show_bug.cgi?id=1203032 •  https://review.openstack.org/#/c/176768/ 23
  23. 23. Nova - Kilo with SLC6 •  Kilo dropped support to Python 2.6 •  We still have ~800 compute nodes running on SLC6 •  We needed to build Nova RPM for SLC6 •  Original recipe from GoDaddy! •  Create a venv using python 2.7 from SCL •  Build the venv with Anvil •  Package the venv in a RPM 24
  24. 24. Nova - Network CERN network configuration: •  Network is divided into several "network clusters" (L3 networks), that have several ”IP services" (L2 subnets) •  Each compute node is associated to a "network cluster” •  VMs running in a compute node can only have an IP from the "network cluster" associated to the compute node •  https://etherpad.openstack.org/p/Network_Segmentation_Usecases 25
  25. 25. Nova - Network •  Developed CERN Network driver •  Create a new VM 1.  Selects the network cluster considering the compute node selected to boot the instance 2.  Selects an address from the network cluster 3.  Updates CERN network database 4.  Waits for the central DNS refresh •  “fixed_ips” table contains IPv4, IPv6, MAC and network cluster •  New table does the mapping “host” -> network cluster •  Network constraints in some nova operations •  Resize, Live-Migration •  https://github.com/cernops/nova/blob/cern-2014.2.2-2/nova/network/manager.py 26
  26. 26. Neutron is coming... •  NOT in production. Testing/developing instance •  What we use/don't use from Neutron •  No SDN or tunneling •  Only provider networks, no private/tenant •  Flat networking.  VMs bridged directly to the real network •  No DHCP or DNS from neutron. We have already our infrastructure •  We don't use floating IPs •  Neutron API not exposed to users •  Implemented API extensions and Mechanism Driver for our use case •  https://github.com/cernops/neutron/commit/63f4e19c7423dcdc2b5a7573d0898ec9e799663b •  How to migrate from nova-network to Neutron? 27
  27. 27. Keystone Deployment at CERN 28 Load Balancer DB Service CatalogueDB Keystone Service Catalogue (Exposed to Users) (Dedicated to Ceilometer) Keystone Active Directory
  28. 28. Keystone •  Keystone nodes are VMs •  Integrated with CERN’s Active Directory infrastructure •  Project life cycle •  ~200 arrivals/departures per month •  CERN user subscribes the "cloud service” •  Created "Personal Project" with limited quota •  “Shared Projects” created by request •  "Personal project" disabled when user leaves the Organization •  After 3 months stop resources and after 6 months delete resources (VMs, Volumes, Images, …) 29
  29. 29. Glance Deployment at CERN 30 Load Balancer DB Glance-api Glance-registry Glance node (Exposed to Users) Glance-api Glance-registry Glance node (Only used for Ceilometer calls) Ceph Geneva
  30. 30. Glance •  Uses Ceph backend in Geneva •  Glance nodes are VMs •  NO Glance image cache •  Glance API and Glance Registry running in the same node •  Glance API only talks with local Glance Registry •  Two sets of nodes (API exposed to users and Ceilometer) •  When Glance Quotas per Project? •  Problematic in private clouds where users are not “charged” for storage 31
  31. 31. Cinder Deployment at CERN 32 Load Balancer DB Cinder-api Cinder-volume Cinder node Cinder-scheduler rabbitmq Ceph Geneva Ceph Budapest NetApp
  32. 32. Cinder •  Ceph and NetApp backends •  Extended list of available volume types (QoS, Backend, Location) •  Cinder nodes are VMs •  Active/Active? •  When a volume is created a “cinder-volume” node is associated •  Responsible for volume operations •  Not easy to replace cinder controller nodes •  DB entries need to be changed manually •  More about CERN storage infrastructure for OpenStack: •  https://www.openstack.org/summit/vancouver-2015/summit-videos/presentation/ceph-at-cern-a-year-in-the- life-of-a-petabyte-scale-block-storage-service 33
  33. 33. Ceilometer Deployment at CERN 34 nova-compute ceilometer-compute Hbase Ceilometer Notification Agent Ceilometer Pulling Collector Ceilometer Notification Collector Ceilometer UDP Collector MysqlMongoDB Ceilometer API Cell rabbitmq notifications Ceilometer rabbitmq Ceilometer Evaluator & Notifier sampleRPC sampleUDP Ceilometer API HEAT ceilometer-central-agent Compute node
  34. 34. Ceilometer 35 •  “ceilometer-compute-agent” queries “nova-api” for the instances hosted in the compute node •  This can be very demanding for “nova-api” •  When using the default “instance_name_template” the “instance_name” in Top Cell is different from the Child Cell •  Need to have “nova-api” per Cell Number of Nova API calls done by ceilometer-compute-agent per hour
  35. 35. •  Using a dedicated RabbitMQ cluster for Ceilometer •  Initially we used Children Cells Not a good idea! •  Any failure/slow down in the backend storage system can create a big queue... Ceilometer 36 Size of “metering.sample” queue
  36. 36. Rally 37 •  Probe/Benchmarking the Infrastructure every hour
  37. 37. Challenges •  Capacity increase to 200k cores by Summer 2016 •  Live Migrate thousands of VMs •  Upgrade ~800 compute nodes from SLC6 to CC7 •  Retire old servers •  Move to Neutron •  Identity Federation with different scientific sites •  Magnum and containers possibilities 38
  38. 38. belmiro.moreira@cern.ch @belmiromoreira http://openstack-in-production.blogspot.com

×