SlideShare a Scribd company logo
1 of 17
Capacity Management 
and Provisioning 
(Cloud's full, can't build here) 
Matt Van Winkle, Manager Cloud Engineering @mvanwink 
Andy Hill, Systems Engineer @andyhky 
Joel Preas, Systems Engineer @joelintheory
Public Cloud Capacity at Rackspace 
• Rackspace Public Cloud has deployed 100+ cells in ~2 
years 
• New cells used to take engineer assembly and 3-5w 
after bare OS install 
• 1 year later done by on-shift operators ~1w (as low as 
1d) 
• Usually constrained by networking
Control Plane Sizing 
• Data plane operations impacting both cell and top level 
control plane 
– Image downloads/uploads 
• How large should Nova DB be? 
– Breaking point of ‘standard’ cell control plane 
buildout - particularly database
Cell Sizing Considerations 
• Efficient use of Private IP address space 
– Used for connections to services like Swift and 
dedicated environment 
• Broadcast domains 
• Attempt to have minimal control plane for 
overhead/complexity
Hypervisor Sizing Considerations 
• Enough spare drive space for COW images 
– XS VHD size can easily be 2x space given to guest during normal 
operation! 
– Errors in cleaning up “snapshots” exacerbated by tight disk overhead 
constraints 
• Drive space for pre-cached images 
– cache_images=some # nova 
– use_cow_images=True # nova 
– cache_in_nova=True # glance
Other Sizing Notes 
• Need reserve space for emergencies (host evac) 
• Reserve space is cell-bound, due to instances being 
unable to move between cells 
– https://review.openstack.org/#/c/125607/ 
– cells.host_reserve_percent 
• VM overhead 
– https://wiki.openstack.org/wiki/XenServer/Overhead 
– https://review.openstack.org/#/c/60087/
Problems 
• Load Balancers 
• Glance and Swift 
• Fraud / Non Payment 
• Routes 
• Road Testing
Load Balancers 
• Alternate Routes needed for high BW operations 
– Generally Glance 
• Load Balancer can become bottleneck 
• Database queries returning lots of rows (cell sizing)
Swift and Glance Bandwidth 
Problems: 
• Creates single bottleneck 
• Imaging speeds monitored, exceeding thresholds 
triggers investigation / scale out 
• Cache not shared between glance-api nodes
Swift and Glance Bandwidth 
Monitoring / Solutions: 
• Need to get downloads out of path of control plane (compute direct to 
image store) 
• Cache base images 
– Pre-seed when possible 
– Can cache images to HV ahead of time for fast-cloning 
https://wiki.openstack.org/wiki/FastCloningForXenServer 
• Glance and Swift having shared request IDs would be nice 
• Shared cache might elevate hit-rate, save bandwidth 
What about when scaling out doesn’t work? Rearchitecture.
Fraud and Non-Payment 
Fraud 
• Mark instance as 
suspended 
• Still takes capacity 
• What do? 
• Account Actioneer 
Non-Payment 
• Similar to fraud but worse for capacity! 
• Try to give customer as much time as 
possible to return to the fold 
• Same overall strategy as fraud but 
instances kept longer
Road Testing nodes before enabling 
• New Cell 
– Bypass URLs (cell-specific API nodes) 
• Different nova.conf not using cells 
– compute_api_class=nova.compute.api.API # before 
• Cell tenant restrictions 
• Existing Cell/Rekick - Not as easy :( 
– How to ensure customer builds don’t land on box 
that isn’t road tested?
Managing the Capacity Management 
● Supply Chain/Resource Pipeline 
● Impact from Product Development 
● Gaps/Challenges from upstream
Capacity Pipeline 
• Large Customer Requests 
• Triggers 
– % Used 
– # Largest Slots per flavor 
• IPv4 Addresses 
– Cells and scheduler unaware :( 
– Auditor + Resolver 
• Control Plane (runs on OpenStack too)
Product Implications 
• Keep up with code deploys (hotpatches) 
• Adjusting provisioning playbooks to: 
– new flavor types 
– new configurations/applications (quantum- 
>neutron, nova-conductor) 
– control plane changes (10g glance) 
– new hardware manufacturers (OCP) 
• Non production environments
Upstream Challenges 
• Disabled flag for cells 
– Blueprint: http://bit.do/CellDisableBP 
– Bug: http://bit.do/CellDisableBug 
• Build to “disabled” host 
– Testing after a re-provision 
– Testing for adding new capacity to existing cell 
• Scheduling based on IP capacity 
– New scheduler service? 
– Currently handled by outside service “Resolver”, similar to Entropy 
• General “Cells as first class citizen” effort led by alaski
Questions? 
THANK YOU 
RACKSPACE® | 1 FANATICAL PLACE, CITY OF WINDCREST | SAN ANTONIO, TX 78218 
US SALES: 1-800-961-2888 | US SUPPORT: 1-800-961-4454 | WWW.RACKSPACE.COM 
© RACKSPACE LTD. | RACKSPACE® AND FANATICAL SUPPORT® ARE SERVICE MARKS OF RACKSPACE US, INC. REGISTERED IN THE UNITED STATES AND OTHER COUNTRIES. | WWW.RACKSPACE.COM

More Related Content

What's hot

Apache Performance Tuning: Scaling Out
Apache Performance Tuning: Scaling OutApache Performance Tuning: Scaling Out
Apache Performance Tuning: Scaling OutSander Temme
 
Technical Introduction to PostgreSQL and PPAS
Technical Introduction to PostgreSQL and PPASTechnical Introduction to PostgreSQL and PPAS
Technical Introduction to PostgreSQL and PPASAshnikbiz
 
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程HBaseCon
 
Pascal benois performance_troubleshooting-spsbe18
Pascal benois performance_troubleshooting-spsbe18Pascal benois performance_troubleshooting-spsbe18
Pascal benois performance_troubleshooting-spsbe18BIWUG
 
Five Years of EC2 Distilled
Five Years of EC2 DistilledFive Years of EC2 Distilled
Five Years of EC2 DistilledGrig Gheorghiu
 
Lightening Talk - PostgreSQL Worst Practices
Lightening Talk - PostgreSQL Worst PracticesLightening Talk - PostgreSQL Worst Practices
Lightening Talk - PostgreSQL Worst PracticesPGConf APAC
 

What's hot (10)

Apache Performance Tuning: Scaling Out
Apache Performance Tuning: Scaling OutApache Performance Tuning: Scaling Out
Apache Performance Tuning: Scaling Out
 
Planning for Disaster Recovery (DR) with Galera Cluster
Planning for Disaster Recovery (DR) with Galera ClusterPlanning for Disaster Recovery (DR) with Galera Cluster
Planning for Disaster Recovery (DR) with Galera Cluster
 
Technical Introduction to PostgreSQL and PPAS
Technical Introduction to PostgreSQL and PPASTechnical Introduction to PostgreSQL and PPAS
Technical Introduction to PostgreSQL and PPAS
 
Java Performance Tuning
Java Performance TuningJava Performance Tuning
Java Performance Tuning
 
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
 
Pascal benois performance_troubleshooting-spsbe18
Pascal benois performance_troubleshooting-spsbe18Pascal benois performance_troubleshooting-spsbe18
Pascal benois performance_troubleshooting-spsbe18
 
Five Years of EC2 Distilled
Five Years of EC2 DistilledFive Years of EC2 Distilled
Five Years of EC2 Distilled
 
Lightening Talk - PostgreSQL Worst Practices
Lightening Talk - PostgreSQL Worst PracticesLightening Talk - PostgreSQL Worst Practices
Lightening Talk - PostgreSQL Worst Practices
 
Performance out
Performance outPerformance out
Performance out
 
Weblogic - clustering failover, and load balancing
Weblogic - clustering failover, and load balancingWeblogic - clustering failover, and load balancing
Weblogic - clustering failover, and load balancing
 

Viewers also liked

Case Study: HCL Technologies On Capacity Planning for Cloud and Virtualized E...
Case Study: HCL Technologies On Capacity Planning for Cloud and Virtualized E...Case Study: HCL Technologies On Capacity Planning for Cloud and Virtualized E...
Case Study: HCL Technologies On Capacity Planning for Cloud and Virtualized E...CA Technologies
 
Virtualization and how it leads to cloud
Virtualization and how it leads to cloudVirtualization and how it leads to cloud
Virtualization and how it leads to cloudHuzefa Husain
 
Capacity Managementand the Cloud
Capacity Managementand the CloudCapacity Managementand the Cloud
Capacity Managementand the Clouddannyq
 
20140128 webinar-get-more-out-of-mysql-with-tokudb-140319063324-phpapp02
20140128 webinar-get-more-out-of-mysql-with-tokudb-140319063324-phpapp0220140128 webinar-get-more-out-of-mysql-with-tokudb-140319063324-phpapp02
20140128 webinar-get-more-out-of-mysql-with-tokudb-140319063324-phpapp02Francisco Gonçalves
 
Deterministic capacity planning for OpenStack as elastic cloud infrastructure
Deterministic capacity planning for OpenStack as elastic cloud infrastructureDeterministic capacity planning for OpenStack as elastic cloud infrastructure
Deterministic capacity planning for OpenStack as elastic cloud infrastructureSean Cohen
 
Capacity Management in a Cloud Computing World
Capacity Management in a Cloud Computing WorldCapacity Management in a Cloud Computing World
Capacity Management in a Cloud Computing WorldDavid Linthicum
 
Build Consumer-Facing Apps with Heroku Connect
Build Consumer-Facing Apps with Heroku ConnectBuild Consumer-Facing Apps with Heroku Connect
Build Consumer-Facing Apps with Heroku ConnectJeff Douglas
 
Capturing Measurable Non Functional Requirements
Capturing Measurable Non Functional RequirementsCapturing Measurable Non Functional Requirements
Capturing Measurable Non Functional RequirementsShehzad Lakdawala
 
Traditional Infrastructure Capacity Models vs. Cloud Capacity Models
Traditional Infrastructure Capacity Models vs. Cloud Capacity ModelsTraditional Infrastructure Capacity Models vs. Cloud Capacity Models
Traditional Infrastructure Capacity Models vs. Cloud Capacity ModelsJitscale
 
Capacity Planning for Cloud Computing
Capacity Planning for Cloud ComputingCapacity Planning for Cloud Computing
Capacity Planning for Cloud ComputingAdrian Cockcroft
 
Who's Who in Container Land
Who's Who in Container LandWho's Who in Container Land
Who's Who in Container LandMike Kavis
 
Handling Non Functional Requirements on an Agile Project
Handling Non Functional Requirements on an Agile ProjectHandling Non Functional Requirements on an Agile Project
Handling Non Functional Requirements on an Agile ProjectKen Howard
 
Case study: integrating azure with google app engine
Case study: integrating azure with google app engine Case study: integrating azure with google app engine
Case study: integrating azure with google app engine Miguel Scotter
 
Adressing nonfunctional requirements with agile practices
Adressing nonfunctional requirements with agile practicesAdressing nonfunctional requirements with agile practices
Adressing nonfunctional requirements with agile practicesMario Cardinal
 
Private PaaS & Container-as-a-Service for ISVs and Enterprise - Use Cases and...
Private PaaS & Container-as-a-Service for ISVs and Enterprise - Use Cases and...Private PaaS & Container-as-a-Service for ISVs and Enterprise - Use Cases and...
Private PaaS & Container-as-a-Service for ISVs and Enterprise - Use Cases and...Dmitry Lazarenko
 
Non functional requirements framework
Non functional requirements frameworkNon functional requirements framework
Non functional requirements frameworkwweinmeyer79
 
Cross Platform Mobile Application Architecture
Cross Platform Mobile Application ArchitectureCross Platform Mobile Application Architecture
Cross Platform Mobile Application ArchitectureDerrick Bowen
 
Design for non functional requirements
Design for non functional requirementsDesign for non functional requirements
Design for non functional requirementsHabeeb Mahaboob
 
Non functional requirements. do we really care…?
Non functional requirements. do we really care…?Non functional requirements. do we really care…?
Non functional requirements. do we really care…?OSSCube
 

Viewers also liked (20)

Case Study: HCL Technologies On Capacity Planning for Cloud and Virtualized E...
Case Study: HCL Technologies On Capacity Planning for Cloud and Virtualized E...Case Study: HCL Technologies On Capacity Planning for Cloud and Virtualized E...
Case Study: HCL Technologies On Capacity Planning for Cloud and Virtualized E...
 
Virtualization and how it leads to cloud
Virtualization and how it leads to cloudVirtualization and how it leads to cloud
Virtualization and how it leads to cloud
 
Capacity model
Capacity modelCapacity model
Capacity model
 
Capacity Managementand the Cloud
Capacity Managementand the CloudCapacity Managementand the Cloud
Capacity Managementand the Cloud
 
20140128 webinar-get-more-out-of-mysql-with-tokudb-140319063324-phpapp02
20140128 webinar-get-more-out-of-mysql-with-tokudb-140319063324-phpapp0220140128 webinar-get-more-out-of-mysql-with-tokudb-140319063324-phpapp02
20140128 webinar-get-more-out-of-mysql-with-tokudb-140319063324-phpapp02
 
Deterministic capacity planning for OpenStack as elastic cloud infrastructure
Deterministic capacity planning for OpenStack as elastic cloud infrastructureDeterministic capacity planning for OpenStack as elastic cloud infrastructure
Deterministic capacity planning for OpenStack as elastic cloud infrastructure
 
Capacity Management in a Cloud Computing World
Capacity Management in a Cloud Computing WorldCapacity Management in a Cloud Computing World
Capacity Management in a Cloud Computing World
 
Build Consumer-Facing Apps with Heroku Connect
Build Consumer-Facing Apps with Heroku ConnectBuild Consumer-Facing Apps with Heroku Connect
Build Consumer-Facing Apps with Heroku Connect
 
Capturing Measurable Non Functional Requirements
Capturing Measurable Non Functional RequirementsCapturing Measurable Non Functional Requirements
Capturing Measurable Non Functional Requirements
 
Traditional Infrastructure Capacity Models vs. Cloud Capacity Models
Traditional Infrastructure Capacity Models vs. Cloud Capacity ModelsTraditional Infrastructure Capacity Models vs. Cloud Capacity Models
Traditional Infrastructure Capacity Models vs. Cloud Capacity Models
 
Capacity Planning for Cloud Computing
Capacity Planning for Cloud ComputingCapacity Planning for Cloud Computing
Capacity Planning for Cloud Computing
 
Who's Who in Container Land
Who's Who in Container LandWho's Who in Container Land
Who's Who in Container Land
 
Handling Non Functional Requirements on an Agile Project
Handling Non Functional Requirements on an Agile ProjectHandling Non Functional Requirements on an Agile Project
Handling Non Functional Requirements on an Agile Project
 
Case study: integrating azure with google app engine
Case study: integrating azure with google app engine Case study: integrating azure with google app engine
Case study: integrating azure with google app engine
 
Adressing nonfunctional requirements with agile practices
Adressing nonfunctional requirements with agile practicesAdressing nonfunctional requirements with agile practices
Adressing nonfunctional requirements with agile practices
 
Private PaaS & Container-as-a-Service for ISVs and Enterprise - Use Cases and...
Private PaaS & Container-as-a-Service for ISVs and Enterprise - Use Cases and...Private PaaS & Container-as-a-Service for ISVs and Enterprise - Use Cases and...
Private PaaS & Container-as-a-Service for ISVs and Enterprise - Use Cases and...
 
Non functional requirements framework
Non functional requirements frameworkNon functional requirements framework
Non functional requirements framework
 
Cross Platform Mobile Application Architecture
Cross Platform Mobile Application ArchitectureCross Platform Mobile Application Architecture
Cross Platform Mobile Application Architecture
 
Design for non functional requirements
Design for non functional requirementsDesign for non functional requirements
Design for non functional requirements
 
Non functional requirements. do we really care…?
Non functional requirements. do we really care…?Non functional requirements. do we really care…?
Non functional requirements. do we really care…?
 

Similar to Capacity Management/Provisioning (Cloud's full, Can't build here)

Managing Open vSwitch Across a Large Heterogenous Fleet
Managing Open vSwitch Across a Large Heterogenous FleetManaging Open vSwitch Across a Large Heterogenous Fleet
Managing Open vSwitch Across a Large Heterogenous Fleetandyhky
 
Java scalability considerations yogesh deshpande
Java scalability considerations   yogesh deshpandeJava scalability considerations   yogesh deshpande
Java scalability considerations yogesh deshpandeIndicThreads
 
Using OpenStack In a Traditional Hosting Environment
Using OpenStack In a Traditional Hosting EnvironmentUsing OpenStack In a Traditional Hosting Environment
Using OpenStack In a Traditional Hosting EnvironmentOpenStack Foundation
 
Openstack upgrade without_down_time_20141103r1
Openstack upgrade without_down_time_20141103r1Openstack upgrade without_down_time_20141103r1
Openstack upgrade without_down_time_20141103r1Yankai Liu
 
Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications OpenEBS
 
Ceph Goes on Online at Qihoo 360 - Xuehan Xu
Ceph Goes on Online at Qihoo 360 - Xuehan XuCeph Goes on Online at Qihoo 360 - Xuehan Xu
Ceph Goes on Online at Qihoo 360 - Xuehan XuCeph Community
 
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community
 
Stream Computing (The Engineer's Perspective)
Stream Computing (The Engineer's Perspective)Stream Computing (The Engineer's Perspective)
Stream Computing (The Engineer's Perspective)Ilya Ganelin
 
Blue host openstacksummit_2013
Blue host openstacksummit_2013Blue host openstacksummit_2013
Blue host openstacksummit_2013Jun Park
 
Blue host using openstack in a traditional hosting environment
Blue host using openstack in a traditional hosting environmentBlue host using openstack in a traditional hosting environment
Blue host using openstack in a traditional hosting environmentOpenStack Foundation
 
Red Hat Storage Day Seattle: Stabilizing Petabyte Ceph Cluster in OpenStack C...
Red Hat Storage Day Seattle: Stabilizing Petabyte Ceph Cluster in OpenStack C...Red Hat Storage Day Seattle: Stabilizing Petabyte Ceph Cluster in OpenStack C...
Red Hat Storage Day Seattle: Stabilizing Petabyte Ceph Cluster in OpenStack C...Red_Hat_Storage
 
Revolutionary Storage for Modern Databases, Applications and Infrastrcture
Revolutionary Storage for Modern Databases, Applications and InfrastrctureRevolutionary Storage for Modern Databases, Applications and Infrastrcture
Revolutionary Storage for Modern Databases, Applications and Infrastrcturesabnees
 
3.2 Streaming and Messaging
3.2 Streaming and Messaging3.2 Streaming and Messaging
3.2 Streaming and Messaging振东 刘
 
TechTarget Event - Storage Architectures for the Modern Data Center - Howard ...
TechTarget Event - Storage Architectures for the Modern Data Center - Howard ...TechTarget Event - Storage Architectures for the Modern Data Center - Howard ...
TechTarget Event - Storage Architectures for the Modern Data Center - Howard ...NetApp
 
Scylla Summit 2016: Compose on Containing the Database
Scylla Summit 2016: Compose on Containing the DatabaseScylla Summit 2016: Compose on Containing the Database
Scylla Summit 2016: Compose on Containing the DatabaseScyllaDB
 
Virtualizing Tier One Applications - Varrow
Virtualizing Tier One Applications - VarrowVirtualizing Tier One Applications - Varrow
Virtualizing Tier One Applications - VarrowAndrew Miller
 
NGENSTOR_ODA_P2V_V5
NGENSTOR_ODA_P2V_V5NGENSTOR_ODA_P2V_V5
NGENSTOR_ODA_P2V_V5UniFabric
 
Lc3 beijing-june262018-sahdev zala-guangya
Lc3 beijing-june262018-sahdev zala-guangyaLc3 beijing-june262018-sahdev zala-guangya
Lc3 beijing-june262018-sahdev zala-guangyaSahdev Zala
 
Kubernetes at NU.nl (Kubernetes meetup 2019-09-05)
Kubernetes at NU.nl   (Kubernetes meetup 2019-09-05)Kubernetes at NU.nl   (Kubernetes meetup 2019-09-05)
Kubernetes at NU.nl (Kubernetes meetup 2019-09-05)Tibo Beijen
 

Similar to Capacity Management/Provisioning (Cloud's full, Can't build here) (20)

Managing Open vSwitch Across a Large Heterogenous Fleet
Managing Open vSwitch Across a Large Heterogenous FleetManaging Open vSwitch Across a Large Heterogenous Fleet
Managing Open vSwitch Across a Large Heterogenous Fleet
 
Java scalability considerations yogesh deshpande
Java scalability considerations   yogesh deshpandeJava scalability considerations   yogesh deshpande
Java scalability considerations yogesh deshpande
 
Using OpenStack In a Traditional Hosting Environment
Using OpenStack In a Traditional Hosting EnvironmentUsing OpenStack In a Traditional Hosting Environment
Using OpenStack In a Traditional Hosting Environment
 
Openstack upgrade without_down_time_20141103r1
Openstack upgrade without_down_time_20141103r1Openstack upgrade without_down_time_20141103r1
Openstack upgrade without_down_time_20141103r1
 
Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications
 
Neutron scaling
Neutron scalingNeutron scaling
Neutron scaling
 
Ceph Goes on Online at Qihoo 360 - Xuehan Xu
Ceph Goes on Online at Qihoo 360 - Xuehan XuCeph Goes on Online at Qihoo 360 - Xuehan Xu
Ceph Goes on Online at Qihoo 360 - Xuehan Xu
 
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph
 
Stream Computing (The Engineer's Perspective)
Stream Computing (The Engineer's Perspective)Stream Computing (The Engineer's Perspective)
Stream Computing (The Engineer's Perspective)
 
Blue host openstacksummit_2013
Blue host openstacksummit_2013Blue host openstacksummit_2013
Blue host openstacksummit_2013
 
Blue host using openstack in a traditional hosting environment
Blue host using openstack in a traditional hosting environmentBlue host using openstack in a traditional hosting environment
Blue host using openstack in a traditional hosting environment
 
Red Hat Storage Day Seattle: Stabilizing Petabyte Ceph Cluster in OpenStack C...
Red Hat Storage Day Seattle: Stabilizing Petabyte Ceph Cluster in OpenStack C...Red Hat Storage Day Seattle: Stabilizing Petabyte Ceph Cluster in OpenStack C...
Red Hat Storage Day Seattle: Stabilizing Petabyte Ceph Cluster in OpenStack C...
 
Revolutionary Storage for Modern Databases, Applications and Infrastrcture
Revolutionary Storage for Modern Databases, Applications and InfrastrctureRevolutionary Storage for Modern Databases, Applications and Infrastrcture
Revolutionary Storage for Modern Databases, Applications and Infrastrcture
 
3.2 Streaming and Messaging
3.2 Streaming and Messaging3.2 Streaming and Messaging
3.2 Streaming and Messaging
 
TechTarget Event - Storage Architectures for the Modern Data Center - Howard ...
TechTarget Event - Storage Architectures for the Modern Data Center - Howard ...TechTarget Event - Storage Architectures for the Modern Data Center - Howard ...
TechTarget Event - Storage Architectures for the Modern Data Center - Howard ...
 
Scylla Summit 2016: Compose on Containing the Database
Scylla Summit 2016: Compose on Containing the DatabaseScylla Summit 2016: Compose on Containing the Database
Scylla Summit 2016: Compose on Containing the Database
 
Virtualizing Tier One Applications - Varrow
Virtualizing Tier One Applications - VarrowVirtualizing Tier One Applications - Varrow
Virtualizing Tier One Applications - Varrow
 
NGENSTOR_ODA_P2V_V5
NGENSTOR_ODA_P2V_V5NGENSTOR_ODA_P2V_V5
NGENSTOR_ODA_P2V_V5
 
Lc3 beijing-june262018-sahdev zala-guangya
Lc3 beijing-june262018-sahdev zala-guangyaLc3 beijing-june262018-sahdev zala-guangya
Lc3 beijing-june262018-sahdev zala-guangya
 
Kubernetes at NU.nl (Kubernetes meetup 2019-09-05)
Kubernetes at NU.nl   (Kubernetes meetup 2019-09-05)Kubernetes at NU.nl   (Kubernetes meetup 2019-09-05)
Kubernetes at NU.nl (Kubernetes meetup 2019-09-05)
 

Recently uploaded

TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....rightmanforbloodline
 
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxMarkSteadman7
 
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformLess Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformWSO2
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityVictorSzoltysek
 
How to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfHow to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfdanishmna97
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMKumar Satyam
 
Modernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaModernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaWSO2
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data SciencePaolo Missier
 
Quantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation ComputingQuantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation ComputingWSO2
 
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...caitlingebhard1
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightSafe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 

Recently uploaded (20)

TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
 
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptx
 
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformLess Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps Productivity
 
How to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfHow to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cf
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
Modernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaModernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using Ballerina
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data Science
 
Quantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation ComputingQuantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation Computing
 
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and Insight
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 

Capacity Management/Provisioning (Cloud's full, Can't build here)

  • 1. Capacity Management and Provisioning (Cloud's full, can't build here) Matt Van Winkle, Manager Cloud Engineering @mvanwink Andy Hill, Systems Engineer @andyhky Joel Preas, Systems Engineer @joelintheory
  • 2. Public Cloud Capacity at Rackspace • Rackspace Public Cloud has deployed 100+ cells in ~2 years • New cells used to take engineer assembly and 3-5w after bare OS install • 1 year later done by on-shift operators ~1w (as low as 1d) • Usually constrained by networking
  • 3. Control Plane Sizing • Data plane operations impacting both cell and top level control plane – Image downloads/uploads • How large should Nova DB be? – Breaking point of ‘standard’ cell control plane buildout - particularly database
  • 4. Cell Sizing Considerations • Efficient use of Private IP address space – Used for connections to services like Swift and dedicated environment • Broadcast domains • Attempt to have minimal control plane for overhead/complexity
  • 5. Hypervisor Sizing Considerations • Enough spare drive space for COW images – XS VHD size can easily be 2x space given to guest during normal operation! – Errors in cleaning up “snapshots” exacerbated by tight disk overhead constraints • Drive space for pre-cached images – cache_images=some # nova – use_cow_images=True # nova – cache_in_nova=True # glance
  • 6. Other Sizing Notes • Need reserve space for emergencies (host evac) • Reserve space is cell-bound, due to instances being unable to move between cells – https://review.openstack.org/#/c/125607/ – cells.host_reserve_percent • VM overhead – https://wiki.openstack.org/wiki/XenServer/Overhead – https://review.openstack.org/#/c/60087/
  • 7. Problems • Load Balancers • Glance and Swift • Fraud / Non Payment • Routes • Road Testing
  • 8. Load Balancers • Alternate Routes needed for high BW operations – Generally Glance • Load Balancer can become bottleneck • Database queries returning lots of rows (cell sizing)
  • 9. Swift and Glance Bandwidth Problems: • Creates single bottleneck • Imaging speeds monitored, exceeding thresholds triggers investigation / scale out • Cache not shared between glance-api nodes
  • 10. Swift and Glance Bandwidth Monitoring / Solutions: • Need to get downloads out of path of control plane (compute direct to image store) • Cache base images – Pre-seed when possible – Can cache images to HV ahead of time for fast-cloning https://wiki.openstack.org/wiki/FastCloningForXenServer • Glance and Swift having shared request IDs would be nice • Shared cache might elevate hit-rate, save bandwidth What about when scaling out doesn’t work? Rearchitecture.
  • 11. Fraud and Non-Payment Fraud • Mark instance as suspended • Still takes capacity • What do? • Account Actioneer Non-Payment • Similar to fraud but worse for capacity! • Try to give customer as much time as possible to return to the fold • Same overall strategy as fraud but instances kept longer
  • 12. Road Testing nodes before enabling • New Cell – Bypass URLs (cell-specific API nodes) • Different nova.conf not using cells – compute_api_class=nova.compute.api.API # before • Cell tenant restrictions • Existing Cell/Rekick - Not as easy :( – How to ensure customer builds don’t land on box that isn’t road tested?
  • 13. Managing the Capacity Management ● Supply Chain/Resource Pipeline ● Impact from Product Development ● Gaps/Challenges from upstream
  • 14. Capacity Pipeline • Large Customer Requests • Triggers – % Used – # Largest Slots per flavor • IPv4 Addresses – Cells and scheduler unaware :( – Auditor + Resolver • Control Plane (runs on OpenStack too)
  • 15. Product Implications • Keep up with code deploys (hotpatches) • Adjusting provisioning playbooks to: – new flavor types – new configurations/applications (quantum- >neutron, nova-conductor) – control plane changes (10g glance) – new hardware manufacturers (OCP) • Non production environments
  • 16. Upstream Challenges • Disabled flag for cells – Blueprint: http://bit.do/CellDisableBP – Bug: http://bit.do/CellDisableBug • Build to “disabled” host – Testing after a re-provision – Testing for adding new capacity to existing cell • Scheduling based on IP capacity – New scheduler service? – Currently handled by outside service “Resolver”, similar to Entropy • General “Cells as first class citizen” effort led by alaski
  • 17. Questions? THANK YOU RACKSPACE® | 1 FANATICAL PLACE, CITY OF WINDCREST | SAN ANTONIO, TX 78218 US SALES: 1-800-961-2888 | US SUPPORT: 1-800-961-4454 | WWW.RACKSPACE.COM © RACKSPACE LTD. | RACKSPACE® AND FANATICAL SUPPORT® ARE SERVICE MARKS OF RACKSPACE US, INC. REGISTERED IN THE UNITED STATES AND OTHER COUNTRIES. | WWW.RACKSPACE.COM