SlideShare a Scribd company logo
1 of 35
Download to read offline
Tips, Tricks and Tactics with Cells
and Scaling OpenStack
●
●
●
OpenStack Summit - Paris 2014
Multi-Cell Openstack: How to Evolve your Cloud to Scale
https://www.openstack.org/summit/openstack-paris-summit-2014/session-videos/presentation/multi-cell-
openstack-how-to-evolve-your-cloud-to-scale
Sam Morrison
sam.morrison@unimelb.edu.au
Australian Research Cloud
● Started in 2011
● Funded by the Australian Government
● 8 institutions around the country
● Production early 2012 -
Openstack Diablo
● Now running a mix of Juno
and Icehouse
● Use Ubuntu 14.04 and KVM
● 100Gbps network connecting most sites (AARNET)
Reasons for using cells
Single API endpoint, compute cells dispersed
around australia.
Simpler from users perspective.
Single set of security groups, keypairs etc.
Less openstack expertise needed as only one
version of some core openstack services.
Size
● 8 sites
● 14 cells
● ~6000 users registered
● ~700 hypervisors
● 30,000+ cores
People
● Core team of 3 devops
● 1-2 operators per site
http://status.rc.nectar.org.au/growth/infrastructure/
Interaction with other services
Each cell also has one or more:
● cinder-volume host, using Ceph, LVM and NetApp
backends
● A globally replicated swift region per site
● Glance-api pointing to local swift proxy for images
● Ceilometer collectors at each cell push up to a central
mongo
● Private L3 network spanning all cells - will be useful for
neutron
Cells Infrastructure
Each compute cell has:
● MariaDB Galera cluster
● RabbitMQ cluster
● nova scheduler/cells/vnc/compute/network/api-metadata
● glance-api
● swift region - proxy and storage nodes
● ceilometer-collectors to forward up to a global collector
● cinder-volume
API cell:
● nova-api, nova-cells
● keystone
● glance-registry
● cinder-api, scheduler
● heat, designate, ceilometer-api
Scheduling
● Have some “private” cells only available to certain tenants. This is usually
determined by funding source.
● Global flavours and cell local flavours
● Cell level aggregates for intra cell scheduling
● Some sites have GPUs, fast IO for their own use.
○ Introduced compute and ram optimised flavours
○ Not all cells support all flavours
● Each cell advertises 1 or more availability zones to use in scheduling.
○ Ties in with cinder availability zones
Bringing on new cells
Want to test in production before opening to public
Don’t want to flood brand new cells
Scheduler filters
● Role based access to cell. Cells advertise what roles can schedule to them
● Direct only - Allow public to select that cell directly but global scheduler
doesn’t count it
Operating cells
Have a small openstack cluster to manage all global infrastructure
Standard environment - use puppet
Upgrade cells one at a time - live upgrades
● upgrade compute conductors
● upgrade API cell
● upgrade compute cells
● upgrade compute nodes
Read access to compute cells RabbitMQs for troubleshooting and monitoring.
Only real interface into each of the cells.
Console-log is a good test of cells functionality - have one in each cell and
monitor
Future plans
Move to Neutron - in planning and testing stage
● Currently have a single public network per cell, want to provide tenant
networks and higher level services
Start off with a global neutron and simple shared flat provider networks per cell.
All hypervisors talking to the same rabbit - scale issues?
Also looking at other higher level openstack services (which there are many!)
Belmiro Moreira
belmiro.moreira@cern.ch
@belmiromoreira
CERN - Large Hadron Collider
CERN - Cloud Infrastructure
●
●
○
●
●
○
●
○
○
○
○
CERN - Cloud Infrastructure
●
○
○
○
○
■
●
○
CERN - Prune Nova DBs
●
●
●
●
○
○
○
CERN - Cells scheduling
●
○
●
○
■
■
■
■
CERN - Flavors management
●
○
●
○
■
●
○
○
CERN - Testing Cells
●
○
○
●
○
○
○ …
CERN - Testing Cells
CERN - Testing Cells
Matt Van Winkle
mvanwink@rackspace.com
@mvanwink
www.rackspace.com
Cells at Rackspace
• Managed Cloud company offering a suite of dedicated and cloud hosting products
• Founded in 1998 in San Antonio, TX
• Home of Fanatical Support
• More than 200,000 customers in 120 countries
Rackspace
24www.rackspace.com
• In production since August 2012
– Currently running: Nova; Glance; Neutron; Ironic; Swift; Cinder
• Regular upgrades from trunk
– Package built on trunk pull from mid March in testing now
• Compute nodes are Debian based
– Run as VMs on hypervisors and manage via XAPI
• 6 Geographic regions around the globe
– DFW; ORD; IAD; LON; SYD; HKG
• Numbers
– 10’s of 1000’s of hypervisors (Over 340,000 Cores, Just over 1.2 Petabytes of RAM)
• All XenServer
– Over 170,000 virtual machines
– API per region with multiple Compute cells (3 – 35+) each
Rackspace – Cloud Infrastructure
25www.rackspace.com
• Cells Infrastructure
– Size between ~100 and ~600 hosts per cell
– Different Flavor Types (General Purpose, HIgh I/O, Compute Optimized, etc)
– Working on exposing maintenance zones or near/far scheduling (host, shared IP space, network aggregation)
– Separate DB cluster for each cell
• Run our Cells infrastructure in cells
– Control Plane exists as instances in small OpenStack deployment
– Multiple Hardware types
– Separate tenants – Control plane instances from other internal users
Rackspace – Cloud Infrastructure - Cells
26www.rackspace.com
• Multiple cells within each flavor class
– Hardware Profile
• Additionally, we group by vendor
• Live migration needs matching CPUs
– Range of flavor size within each cell (eg. General Purpose 1, 2, 4 and 8 Gig)
• Tenant Scheduling
– Custom filter schedules by Flavor class first
• All General Purpose cells, for example
– Scheduled by available RAM afterwards
• Enhancements for spreading out tenant load and max IOPs per host
– In some cases, filters can bind a cell to specific tenants (testing and internal use)
• Work in Cells V2 to enhance scheduling
– https://review.openstack.org/#/c/141486/ as one example
27
Cell Scheduling
www.rackspace.com
• Common control plane nodes deployed by ansible play book
– DB Pair
– Cells service
– Scheduler
– Rabbit
• Playbook Populates flavor info based on hardware type
• Hypervisors bootstrapped once CP exists
– Create Compute Node VM
– Deploy Code and configure
– Update routes, etc
• Provision IP blocks
• Test
• Link via playbook
28
Deploying a Cell
www.rackspace.com
• Larger region has run rate
around 50,000 VMs
• 1000’s of VMs created/deleted
per hour in busiest regions
• Downstream BI and Revenue
assurance teams require deleted
instance records be kept for 90
days
• Current deleted instance counts
range between 132,000 and
900,000
29
Rackspace – Purge Nova DBs
www.rackspace.com
30
Rackspace – Purge Nova DBs
www.rackspace.com
• By Pass URL prior to linking a cell up
– Test API endpoint: http://nova-admin-api01.memory1-0002.XXXX.XXXXXX.XXXX:8774/v2
• Full set of tests
– Instance creates, deletes, resizes
– Overlay network creation
– Volume provisioning
– Integration with other RS products
• Trickier to test hosts being added to an existing cell
– Hosts are either enabled or disabled
– Targeting helps
• --hint target_cell=’<cellname>’
• --hint 0z0ne_target_host=<host_name>
31
Testing Cells
www.rackspace.com
• No formal way of disabling a cell
• Weighting helps – but is not absolute
– Weighting cell can still “win” scheduler calculation based on available RAM
• Solution: custom filter uses specific weight offset value to avoid scheduling (- 42)
32
Managing Cells – “Disable”
www.rackspace.com
class DisableCellFilter(filters.BaseCellFilter):
"""Disable cell filter. Drop cell if weight is -42.
"""
def filter_all(self, cells, filter_properties):
"""Override filter_all() which operates on the full list
of cells...
"""
output_cells = []
for cell in cells:
if cell.db_info.get('weight_offset', 0) == -42:
LOG.debug("cell disabled: %s" % cell)
else:
output_cells.append(cell)
return output_cells
33
Managing Cells – “Disable”
www.rackspace.com
• Rackspace uses Quark Plugin
– https://github.com/rackerlabs/quark
• Borrowed old idea from Quantum/Melange days
– Default tenant for each cell
– Each cell is a segment
– Provider subnets are scoped to a segment
– Nova requests ports on provider network for the segment
• Public
• Private
• MAC addresses too
34
Neutron and Cells
www.rackspace.com
?
●
●
●

More Related Content

What's hot

CERN OpenStack Cloud Control Plane - From VMs to K8s
CERN OpenStack Cloud Control Plane - From VMs to K8sCERN OpenStack Cloud Control Plane - From VMs to K8s
CERN OpenStack Cloud Control Plane - From VMs to K8sBelmiro Moreira
 
Moving from CellsV1 to CellsV2 at CERN
Moving from CellsV1 to CellsV2 at CERNMoving from CellsV1 to CellsV2 at CERN
Moving from CellsV1 to CellsV2 at CERNBelmiro Moreira
 
Future Science on Future OpenStack
Future Science on Future OpenStackFuture Science on Future OpenStack
Future Science on Future OpenStackBelmiro Moreira
 
CERN User Story
CERN User StoryCERN User Story
CERN User StoryTim Bell
 
OpenStack High Availability
OpenStack High AvailabilityOpenStack High Availability
OpenStack High AvailabilityJakub Pavlik
 
20170926 cern cloud v4
20170926 cern cloud v420170926 cern cloud v4
20170926 cern cloud v4Tim Bell
 
Evolution of Openstack Networking at CERN
Evolution of Openstack Networking at CERNEvolution of Openstack Networking at CERN
Evolution of Openstack Networking at CERNBelmiro Moreira
 
OpenStack Nova - Developer Introduction
OpenStack Nova - Developer IntroductionOpenStack Nova - Developer Introduction
OpenStack Nova - Developer IntroductionJohn Garbutt
 
Euro ht condor_alahiff
Euro ht condor_alahiffEuro ht condor_alahiff
Euro ht condor_alahiffvandersantiago
 
Integrating Bare-metal Provisioning into CERN's Private Cloud
Integrating Bare-metal Provisioning into CERN's Private CloudIntegrating Bare-metal Provisioning into CERN's Private Cloud
Integrating Bare-metal Provisioning into CERN's Private CloudArne Wiebalck
 
OpenStack Summit Vancouver: Lessons learned on upgrades
OpenStack Summit Vancouver:  Lessons learned on upgradesOpenStack Summit Vancouver:  Lessons learned on upgrades
OpenStack Summit Vancouver: Lessons learned on upgradesFrédéric Lepied
 
Operational War Stories from 5 Years of Running OpenStack in Production
Operational War Stories from 5 Years of Running OpenStack in ProductionOperational War Stories from 5 Years of Running OpenStack in Production
Operational War Stories from 5 Years of Running OpenStack in ProductionArne Wiebalck
 
Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017Dave Holland
 
Enhancing OpenStack FWaaS for real world application
Enhancing OpenStack FWaaS for real world applicationEnhancing OpenStack FWaaS for real world application
Enhancing OpenStack FWaaS for real world applicationopenstackindia
 
How OpenStack is Built - Anton Weiss - OpenStack Day Israel 2016
How OpenStack is Built - Anton Weiss - OpenStack Day Israel 2016How OpenStack is Built - Anton Weiss - OpenStack Day Israel 2016
How OpenStack is Built - Anton Weiss - OpenStack Day Israel 2016Cloud Native Day Tel Aviv
 
RENCI User Group Meeting 2017 - I Upgraded iRODS and I still have all my hair
RENCI User Group Meeting 2017 - I Upgraded iRODS and I still have all my hairRENCI User Group Meeting 2017 - I Upgraded iRODS and I still have all my hair
RENCI User Group Meeting 2017 - I Upgraded iRODS and I still have all my hairJohn Constable
 
OpenStack Data Processing ("Sahara") project update - December 2014
OpenStack Data Processing ("Sahara") project update - December 2014OpenStack Data Processing ("Sahara") project update - December 2014
OpenStack Data Processing ("Sahara") project update - December 2014Sergey Lukjanov
 

What's hot (20)

CERN OpenStack Cloud Control Plane - From VMs to K8s
CERN OpenStack Cloud Control Plane - From VMs to K8sCERN OpenStack Cloud Control Plane - From VMs to K8s
CERN OpenStack Cloud Control Plane - From VMs to K8s
 
Moving from CellsV1 to CellsV2 at CERN
Moving from CellsV1 to CellsV2 at CERNMoving from CellsV1 to CellsV2 at CERN
Moving from CellsV1 to CellsV2 at CERN
 
Future Science on Future OpenStack
Future Science on Future OpenStackFuture Science on Future OpenStack
Future Science on Future OpenStack
 
CERN User Story
CERN User StoryCERN User Story
CERN User Story
 
OpenStack High Availability
OpenStack High AvailabilityOpenStack High Availability
OpenStack High Availability
 
20170926 cern cloud v4
20170926 cern cloud v420170926 cern cloud v4
20170926 cern cloud v4
 
Evolution of Openstack Networking at CERN
Evolution of Openstack Networking at CERNEvolution of Openstack Networking at CERN
Evolution of Openstack Networking at CERN
 
How to Develop OpenStack
How to Develop OpenStackHow to Develop OpenStack
How to Develop OpenStack
 
OpenStack Nova - Developer Introduction
OpenStack Nova - Developer IntroductionOpenStack Nova - Developer Introduction
OpenStack Nova - Developer Introduction
 
Euro ht condor_alahiff
Euro ht condor_alahiffEuro ht condor_alahiff
Euro ht condor_alahiff
 
Integrating Bare-metal Provisioning into CERN's Private Cloud
Integrating Bare-metal Provisioning into CERN's Private CloudIntegrating Bare-metal Provisioning into CERN's Private Cloud
Integrating Bare-metal Provisioning into CERN's Private Cloud
 
OpenStack Summit Vancouver: Lessons learned on upgrades
OpenStack Summit Vancouver:  Lessons learned on upgradesOpenStack Summit Vancouver:  Lessons learned on upgrades
OpenStack Summit Vancouver: Lessons learned on upgrades
 
TripleO
 TripleO TripleO
TripleO
 
Operational War Stories from 5 Years of Running OpenStack in Production
Operational War Stories from 5 Years of Running OpenStack in ProductionOperational War Stories from 5 Years of Running OpenStack in Production
Operational War Stories from 5 Years of Running OpenStack in Production
 
Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017
 
Enhancing OpenStack FWaaS for real world application
Enhancing OpenStack FWaaS for real world applicationEnhancing OpenStack FWaaS for real world application
Enhancing OpenStack FWaaS for real world application
 
How OpenStack is Built - Anton Weiss - OpenStack Day Israel 2016
How OpenStack is Built - Anton Weiss - OpenStack Day Israel 2016How OpenStack is Built - Anton Weiss - OpenStack Day Israel 2016
How OpenStack is Built - Anton Weiss - OpenStack Day Israel 2016
 
RENCI User Group Meeting 2017 - I Upgraded iRODS and I still have all my hair
RENCI User Group Meeting 2017 - I Upgraded iRODS and I still have all my hairRENCI User Group Meeting 2017 - I Upgraded iRODS and I still have all my hair
RENCI User Group Meeting 2017 - I Upgraded iRODS and I still have all my hair
 
OpenStack HA
OpenStack HAOpenStack HA
OpenStack HA
 
OpenStack Data Processing ("Sahara") project update - December 2014
OpenStack Data Processing ("Sahara") project update - December 2014OpenStack Data Processing ("Sahara") project update - December 2014
OpenStack Data Processing ("Sahara") project update - December 2014
 

Viewers also liked

Divide and conquer: resource segregation in the OpenStack cloud
Divide and conquer: resource segregation in the OpenStack cloudDivide and conquer: resource segregation in the OpenStack cloud
Divide and conquer: resource segregation in the OpenStack cloudStephen Gordon
 
Openstack study-nova-02
Openstack study-nova-02Openstack study-nova-02
Openstack study-nova-02Jinho Shin
 
Openstack Study Nova 1
Openstack Study Nova 1Openstack Study Nova 1
Openstack Study Nova 1Jinho Shin
 
Deep Dive into Openstack Storage, Sean Cohen, Red Hat
Deep Dive into Openstack Storage, Sean Cohen, Red HatDeep Dive into Openstack Storage, Sean Cohen, Red Hat
Deep Dive into Openstack Storage, Sean Cohen, Red HatCloud Native Day Tel Aviv
 
Hacking on OpenStack\'s Nova source code
Hacking on OpenStack\'s Nova source codeHacking on OpenStack\'s Nova source code
Hacking on OpenStack\'s Nova source codeZhongyue Luo
 
OpenStack Cloud Request Flow
OpenStack Cloud Request FlowOpenStack Cloud Request Flow
OpenStack Cloud Request FlowMirantis
 
Deep Dive: OpenStack Summit (Red Hat Summit 2014)
Deep Dive: OpenStack Summit (Red Hat Summit 2014)Deep Dive: OpenStack Summit (Red Hat Summit 2014)
Deep Dive: OpenStack Summit (Red Hat Summit 2014)Stephen Gordon
 

Viewers also liked (8)

Divide and conquer: resource segregation in the OpenStack cloud
Divide and conquer: resource segregation in the OpenStack cloudDivide and conquer: resource segregation in the OpenStack cloud
Divide and conquer: resource segregation in the OpenStack cloud
 
Ironic - Vietnam OpenStack Technical Meetup #12
Ironic - Vietnam OpenStack Technical Meetup #12Ironic - Vietnam OpenStack Technical Meetup #12
Ironic - Vietnam OpenStack Technical Meetup #12
 
Openstack study-nova-02
Openstack study-nova-02Openstack study-nova-02
Openstack study-nova-02
 
Openstack Study Nova 1
Openstack Study Nova 1Openstack Study Nova 1
Openstack Study Nova 1
 
Deep Dive into Openstack Storage, Sean Cohen, Red Hat
Deep Dive into Openstack Storage, Sean Cohen, Red HatDeep Dive into Openstack Storage, Sean Cohen, Red Hat
Deep Dive into Openstack Storage, Sean Cohen, Red Hat
 
Hacking on OpenStack\'s Nova source code
Hacking on OpenStack\'s Nova source codeHacking on OpenStack\'s Nova source code
Hacking on OpenStack\'s Nova source code
 
OpenStack Cloud Request Flow
OpenStack Cloud Request FlowOpenStack Cloud Request Flow
OpenStack Cloud Request Flow
 
Deep Dive: OpenStack Summit (Red Hat Summit 2014)
Deep Dive: OpenStack Summit (Red Hat Summit 2014)Deep Dive: OpenStack Summit (Red Hat Summit 2014)
Deep Dive: OpenStack Summit (Red Hat Summit 2014)
 

Similar to Tips, Tactics and Tricks with Scaling OpenStack Cells

Open stack ha design & deployment kilo
Open stack ha design & deployment   kiloOpen stack ha design & deployment   kilo
Open stack ha design & deployment kiloSteven Li
 
Lessons Learned Running The Largest OpenStack Clouds
Lessons Learned Running The Largest OpenStack CloudsLessons Learned Running The Largest OpenStack Clouds
Lessons Learned Running The Largest OpenStack CloudsKenneth Hui
 
CloudStack - LinuxFest NorthWest
CloudStack - LinuxFest NorthWestCloudStack - LinuxFest NorthWest
CloudStack - LinuxFest NorthWestke4qqq
 
Sanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticiansSanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticiansPeter Clapham
 
Testing kubernetes and_open_shift_at_scale_20170209
Testing kubernetes and_open_shift_at_scale_20170209Testing kubernetes and_open_shift_at_scale_20170209
Testing kubernetes and_open_shift_at_scale_20170209mffiedler
 
OpenStack HA
OpenStack HAOpenStack HA
OpenStack HAtcp cloud
 
Getting Started with Apache CloudStack
Getting Started with Apache CloudStackGetting Started with Apache CloudStack
Getting Started with Apache CloudStackJoe Brockmeier
 
Cloud Architect Alliance #15: Openstack
Cloud Architect Alliance #15: OpenstackCloud Architect Alliance #15: Openstack
Cloud Architect Alliance #15: OpenstackMicrosoft
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyPeter Clapham
 
OpenNaaS Overview Complete
OpenNaaS Overview CompleteOpenNaaS Overview Complete
OpenNaaS Overview CompleteJoan Garcia
 
Deploying Apache CloudStack from API to UI
Deploying Apache CloudStack from API to UIDeploying Apache CloudStack from API to UI
Deploying Apache CloudStack from API to UIJoe Brockmeier
 
OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...
OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...
OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...NETWAYS
 
Apache CloudStack: API to UI (STLLUG)
Apache CloudStack: API to UI (STLLUG)Apache CloudStack: API to UI (STLLUG)
Apache CloudStack: API to UI (STLLUG)Joe Brockmeier
 
Introduction to Kubernetes
Introduction to KubernetesIntroduction to Kubernetes
Introduction to KubernetesVishal Biyani
 
Openstack - An introduction/Installation - Presented at Dr Dobb's conference...
 Openstack - An introduction/Installation - Presented at Dr Dobb's conference... Openstack - An introduction/Installation - Presented at Dr Dobb's conference...
Openstack - An introduction/Installation - Presented at Dr Dobb's conference...Rahul Krishna Upadhyaya
 

Similar to Tips, Tactics and Tricks with Scaling OpenStack Cells (20)

Txlf2012
Txlf2012Txlf2012
Txlf2012
 
Open stack ha design & deployment kilo
Open stack ha design & deployment   kiloOpen stack ha design & deployment   kilo
Open stack ha design & deployment kilo
 
Lessons Learned Running The Largest OpenStack Clouds
Lessons Learned Running The Largest OpenStack CloudsLessons Learned Running The Largest OpenStack Clouds
Lessons Learned Running The Largest OpenStack Clouds
 
CloudStack - LinuxFest NorthWest
CloudStack - LinuxFest NorthWestCloudStack - LinuxFest NorthWest
CloudStack - LinuxFest NorthWest
 
Sanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticiansSanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticians
 
Flexible compute
Flexible computeFlexible compute
Flexible compute
 
Testing kubernetes and_open_shift_at_scale_20170209
Testing kubernetes and_open_shift_at_scale_20170209Testing kubernetes and_open_shift_at_scale_20170209
Testing kubernetes and_open_shift_at_scale_20170209
 
OpenStack HA
OpenStack HAOpenStack HA
OpenStack HA
 
Getting Started with Apache CloudStack
Getting Started with Apache CloudStackGetting Started with Apache CloudStack
Getting Started with Apache CloudStack
 
Cloud Architect Alliance #15: Openstack
Cloud Architect Alliance #15: OpenstackCloud Architect Alliance #15: Openstack
Cloud Architect Alliance #15: Openstack
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
 
Apache CloudStack from API to UI
Apache CloudStack from API to UIApache CloudStack from API to UI
Apache CloudStack from API to UI
 
Neutron scaling
Neutron scalingNeutron scaling
Neutron scaling
 
OpenNaaS Overview Complete
OpenNaaS Overview CompleteOpenNaaS Overview Complete
OpenNaaS Overview Complete
 
Deploying Apache CloudStack from API to UI
Deploying Apache CloudStack from API to UIDeploying Apache CloudStack from API to UI
Deploying Apache CloudStack from API to UI
 
OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...
OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...
OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...
 
Apache CloudStack: API to UI (STLLUG)
Apache CloudStack: API to UI (STLLUG)Apache CloudStack: API to UI (STLLUG)
Apache CloudStack: API to UI (STLLUG)
 
Openstack nova
Openstack novaOpenstack nova
Openstack nova
 
Introduction to Kubernetes
Introduction to KubernetesIntroduction to Kubernetes
Introduction to Kubernetes
 
Openstack - An introduction/Installation - Presented at Dr Dobb's conference...
 Openstack - An introduction/Installation - Presented at Dr Dobb's conference... Openstack - An introduction/Installation - Presented at Dr Dobb's conference...
Openstack - An introduction/Installation - Presented at Dr Dobb's conference...
 

Recently uploaded

Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 

Recently uploaded (20)

Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 

Tips, Tactics and Tricks with Scaling OpenStack Cells

  • 1. Tips, Tricks and Tactics with Cells and Scaling OpenStack ● ● ●
  • 2. OpenStack Summit - Paris 2014 Multi-Cell Openstack: How to Evolve your Cloud to Scale https://www.openstack.org/summit/openstack-paris-summit-2014/session-videos/presentation/multi-cell- openstack-how-to-evolve-your-cloud-to-scale
  • 4. Australian Research Cloud ● Started in 2011 ● Funded by the Australian Government ● 8 institutions around the country ● Production early 2012 - Openstack Diablo ● Now running a mix of Juno and Icehouse ● Use Ubuntu 14.04 and KVM ● 100Gbps network connecting most sites (AARNET)
  • 5. Reasons for using cells Single API endpoint, compute cells dispersed around australia. Simpler from users perspective. Single set of security groups, keypairs etc. Less openstack expertise needed as only one version of some core openstack services.
  • 6. Size ● 8 sites ● 14 cells ● ~6000 users registered ● ~700 hypervisors ● 30,000+ cores People ● Core team of 3 devops ● 1-2 operators per site http://status.rc.nectar.org.au/growth/infrastructure/
  • 7. Interaction with other services Each cell also has one or more: ● cinder-volume host, using Ceph, LVM and NetApp backends ● A globally replicated swift region per site ● Glance-api pointing to local swift proxy for images ● Ceilometer collectors at each cell push up to a central mongo ● Private L3 network spanning all cells - will be useful for neutron
  • 8. Cells Infrastructure Each compute cell has: ● MariaDB Galera cluster ● RabbitMQ cluster ● nova scheduler/cells/vnc/compute/network/api-metadata ● glance-api ● swift region - proxy and storage nodes ● ceilometer-collectors to forward up to a global collector ● cinder-volume API cell: ● nova-api, nova-cells ● keystone ● glance-registry ● cinder-api, scheduler ● heat, designate, ceilometer-api
  • 9. Scheduling ● Have some “private” cells only available to certain tenants. This is usually determined by funding source. ● Global flavours and cell local flavours ● Cell level aggregates for intra cell scheduling ● Some sites have GPUs, fast IO for their own use. ○ Introduced compute and ram optimised flavours ○ Not all cells support all flavours ● Each cell advertises 1 or more availability zones to use in scheduling. ○ Ties in with cinder availability zones
  • 10. Bringing on new cells Want to test in production before opening to public Don’t want to flood brand new cells Scheduler filters ● Role based access to cell. Cells advertise what roles can schedule to them ● Direct only - Allow public to select that cell directly but global scheduler doesn’t count it
  • 11. Operating cells Have a small openstack cluster to manage all global infrastructure Standard environment - use puppet Upgrade cells one at a time - live upgrades ● upgrade compute conductors ● upgrade API cell ● upgrade compute cells ● upgrade compute nodes Read access to compute cells RabbitMQs for troubleshooting and monitoring. Only real interface into each of the cells. Console-log is a good test of cells functionality - have one in each cell and monitor
  • 12. Future plans Move to Neutron - in planning and testing stage ● Currently have a single public network per cell, want to provide tenant networks and higher level services Start off with a global neutron and simple shared flat provider networks per cell. All hypervisors talking to the same rabbit - scale issues? Also looking at other higher level openstack services (which there are many!)
  • 14. CERN - Large Hadron Collider
  • 15. CERN - Cloud Infrastructure ● ● ○ ● ● ○ ● ○ ○ ○ ○
  • 16. CERN - Cloud Infrastructure ● ○ ○ ○ ○ ■ ● ○
  • 17. CERN - Prune Nova DBs ● ● ● ● ○ ○ ○
  • 18. CERN - Cells scheduling ● ○ ● ○ ■ ■ ■ ■
  • 19. CERN - Flavors management ● ○ ● ○ ■ ● ○ ○
  • 20. CERN - Testing Cells ● ○ ○ ● ○ ○ ○ …
  • 21. CERN - Testing Cells
  • 22. CERN - Testing Cells
  • 24. • Managed Cloud company offering a suite of dedicated and cloud hosting products • Founded in 1998 in San Antonio, TX • Home of Fanatical Support • More than 200,000 customers in 120 countries Rackspace 24www.rackspace.com
  • 25. • In production since August 2012 – Currently running: Nova; Glance; Neutron; Ironic; Swift; Cinder • Regular upgrades from trunk – Package built on trunk pull from mid March in testing now • Compute nodes are Debian based – Run as VMs on hypervisors and manage via XAPI • 6 Geographic regions around the globe – DFW; ORD; IAD; LON; SYD; HKG • Numbers – 10’s of 1000’s of hypervisors (Over 340,000 Cores, Just over 1.2 Petabytes of RAM) • All XenServer – Over 170,000 virtual machines – API per region with multiple Compute cells (3 – 35+) each Rackspace – Cloud Infrastructure 25www.rackspace.com
  • 26. • Cells Infrastructure – Size between ~100 and ~600 hosts per cell – Different Flavor Types (General Purpose, HIgh I/O, Compute Optimized, etc) – Working on exposing maintenance zones or near/far scheduling (host, shared IP space, network aggregation) – Separate DB cluster for each cell • Run our Cells infrastructure in cells – Control Plane exists as instances in small OpenStack deployment – Multiple Hardware types – Separate tenants – Control plane instances from other internal users Rackspace – Cloud Infrastructure - Cells 26www.rackspace.com
  • 27. • Multiple cells within each flavor class – Hardware Profile • Additionally, we group by vendor • Live migration needs matching CPUs – Range of flavor size within each cell (eg. General Purpose 1, 2, 4 and 8 Gig) • Tenant Scheduling – Custom filter schedules by Flavor class first • All General Purpose cells, for example – Scheduled by available RAM afterwards • Enhancements for spreading out tenant load and max IOPs per host – In some cases, filters can bind a cell to specific tenants (testing and internal use) • Work in Cells V2 to enhance scheduling – https://review.openstack.org/#/c/141486/ as one example 27 Cell Scheduling www.rackspace.com
  • 28. • Common control plane nodes deployed by ansible play book – DB Pair – Cells service – Scheduler – Rabbit • Playbook Populates flavor info based on hardware type • Hypervisors bootstrapped once CP exists – Create Compute Node VM – Deploy Code and configure – Update routes, etc • Provision IP blocks • Test • Link via playbook 28 Deploying a Cell www.rackspace.com
  • 29. • Larger region has run rate around 50,000 VMs • 1000’s of VMs created/deleted per hour in busiest regions • Downstream BI and Revenue assurance teams require deleted instance records be kept for 90 days • Current deleted instance counts range between 132,000 and 900,000 29 Rackspace – Purge Nova DBs www.rackspace.com
  • 30. 30 Rackspace – Purge Nova DBs www.rackspace.com
  • 31. • By Pass URL prior to linking a cell up – Test API endpoint: http://nova-admin-api01.memory1-0002.XXXX.XXXXXX.XXXX:8774/v2 • Full set of tests – Instance creates, deletes, resizes – Overlay network creation – Volume provisioning – Integration with other RS products • Trickier to test hosts being added to an existing cell – Hosts are either enabled or disabled – Targeting helps • --hint target_cell=’<cellname>’ • --hint 0z0ne_target_host=<host_name> 31 Testing Cells www.rackspace.com
  • 32. • No formal way of disabling a cell • Weighting helps – but is not absolute – Weighting cell can still “win” scheduler calculation based on available RAM • Solution: custom filter uses specific weight offset value to avoid scheduling (- 42) 32 Managing Cells – “Disable” www.rackspace.com class DisableCellFilter(filters.BaseCellFilter): """Disable cell filter. Drop cell if weight is -42. """ def filter_all(self, cells, filter_properties): """Override filter_all() which operates on the full list of cells... """ output_cells = [] for cell in cells: if cell.db_info.get('weight_offset', 0) == -42: LOG.debug("cell disabled: %s" % cell) else: output_cells.append(cell) return output_cells
  • 33. 33 Managing Cells – “Disable” www.rackspace.com
  • 34. • Rackspace uses Quark Plugin – https://github.com/rackerlabs/quark • Borrowed old idea from Quantum/Melange days – Default tenant for each cell – Each cell is a segment – Provider subnets are scoped to a segment – Nova requests ports on provider network for the segment • Public • Private • MAC addresses too 34 Neutron and Cells www.rackspace.com