SlideShare a Scribd company logo
Chef Push in 2015
Mark Anderson, 2015-04-01
Mark
Anderson
Engineer, Chef
The basics of Chef Push
If you want to run a command on a set of nodes
• `knife ssh` can be problematic
• Key distribution/revocation
• Access control/User accounts
• Difficult to audit
• Extra work required if the node is behind firewall
• Doesn’t really scale very far past tens of nodes
• None of the alternative systems suited our needs
Why Chef Push?
• We wanted a remote execution system that is
• Robust under network and client failure
• Gates execution on a quorum being available
• Provides presence information
• Scale to hundreds if not thousands of nodes
• Integrated with Chef authentication and
authorization system
• Works behind firewalls and NAT
Why Chef Push?
• knife job start -quorum 90% 'chef-client' --search
'role:webapp'
• Finds all nodes with role webapp
• Submits a job to the push server.
• Checks quorum; 90% nodes listed must be available
• Starts job chef-client on available nodes
• Gathers success and failures
• And will do this for ten nodes...or a thousand
Push jobs in a command line
The lifecycle of a job
Server
Client
Job
Accepted
Send
Command
Clients
ACK
Wait for
Quorum
Start Exec
Clients
Exec
Collect
Results
• Erlang service
• Extends the Chef REST API
• Job creation and tracking
• Push client configuration
• Controls the clients via ZeroMQ
• Heartbeats to track node availability
• Command execution
• All ZeroMQ packets are signed
Chef Push Server
• Simple ruby client
• Receives heartbeats from the server
• Sends back heartbeats to the server
• Executes commands
• Configuration requirements are minimal
• The client initiates all connections to the server
• Most configuration is via Chef API call to config
endpoint
• Using that info opens ZeroMQ connections to
server
Chef Push Client
Chef Push Networking
Message
switch
Heartbeat
generator
REST API
Client
HTTPS
PUB/SUB
DEALER
ROUTER
• All control for push is via extensions to the chef API
• Node status
• Job control
• start
• stop
• status
• Job listing
Chef Push knife extension
• Access rights controlled by groups
• ‘push_job_writers’ group controls job creation and
deletion
• ‘push_job_readers’ group controls read access to
job status and results
• Whitelist for commands
• The client rejects commands that aren’t on the
whitelist
• We’d like to do finer grained access control in the
future
Access control
• Version 1.0 scales to 2k nodes
• Works with Chef 12
• Open source since Fall 2014
• We’ve been working on new features since last
spring
• But Chef 12 had to go out first
• Required features from Enterprise Chef
• Open sourcing chef push pretty meaningless
without a open source server
Status:
New Features in Chef Push 2.0
• Breaking change to the protocol
• End to end encryption of every packet
• Required for us to implement parameter passing
and output return features
• Built on the ZeroMQ4 implementation of CurveCP
• CurveCP provides a framework which is
• Fast
• Crypto hardened against modern attacks
• Forward secrecy
• We still bootstrap the authentication using the Chef
Client key
End to End Encryption
Enhanced control for the job execution environment
• A config file up 100k
• Effective User
• Working directory
• Environment variables
• User defined variables
• Special variables for
• job id
• job file location
Command environment and config files
• New flag for job
• capture_output: boolean
• Capture is all or nothing
• All nodes in the job
• Both stdout and stderr
• Stored on server with job description
• No streaming output … yet
Command output capture
Two event feeds
• Per org feed
• Job start
• Job completion summary
• Runs forever
• Per job feed with fine grained execution data
• Job voting start
• Quorum votes by node
• Job start
• Completion state by node
• Job completion
Server Sent Event Feeds
• Previously we’ve been advertising around 2k as the
limit
• 10k connected nodes demonstrated
• 10 sec heartbeats
• c3.2xlarge chef server in standalone mode
• Push server consumes 2 cores and about 2GB
• Up to 1k nodes in a single job
• around 1.5-2k nodes we start seeing some
stampede problems
• Not done scaling; there are a few tweaks left to do
Stable at 10k connected nodes
Demo some improvements
• That test was done with real push clients
• 20 m3.2xlarge nodes,
• Each running 500 docker containers
• But we also do a lot of testing using a simulator
• Understanding the limits of our current system
• SystemTap is amazing for this kind of work
Current work: Scalability and Stability drive
Axes of scaling tested
• # of active clients
• Heartbeat rate for a client
• Number of clients in a single job
Below 10k clients there is a pretty linear trade between
heartbeat rate and number of connected clients;
heartbeats/sec is was a useful metric
Must use care to avoid stampedes in job execution
Scaling and Tuning
• A port in ZeroMQ is bound to a single thread
• All communications go through a single ‘command
switch’
• Client heartbeats, and all command messages go
through the switch
• The switch ended up being a bottleneck at around
2k messages/sec
• Experiment: multiple command switches
• Exercises some weaknesses in the ZeroMQ -
Erlang interface
• Not as big of a win as hoped, ended up being more
complex than we’d like
Lessons from scaling
Nearly feature complete but:
• Remaining work for new features
• Knife push extensions for everything
• Documentation
• Windows testing and stability
• Committed to making Windows a first class citizen
• CentOS 7
• Polish around installation and cookbooks
• Upgrade tooling for 1.0->2.0
• Bug fixes
• Please file bugs
Remaining work for 2.0
Roadmap for 2.1 and beyond
• Currently we support
• Ubuntu 10.04, 12.04, 14.04 LTS
• CentOS 5, 6, and 7 soon
• Windows (client only)
• Investigating client support for
• AIX
• Solaris
Platform Support
• Key rotation support
• Multiple keys breaks some assumptions around
how we auth in push
• Needs fixes on Chef Server as well as Push
• Better access control
• Controlling access on a node by node basis
• Examining persistent jobs as a first class object
with their own ACLs - look for the RFC
Features for 2.x releases
• Integration into Chef Client package
• Delayed joining the two because of the protocol
breaking changes in 2.0
• Future server versions will be backward
compatible.
Features for 2.x releases
Scaling
• Rate limited job execution
• Prevent stampede effect
• Protects both push and chef server
• Starting 1k chef client runs at once is a bad idea
anyways
• Per-job and server global limits
• Multiple socket command switch
• Biggest scaling bottleneck
• Infrastructure for distributed server
Features for 2.x releases
• Move push connections to front ends in tiered Chef
• Push will be running on all of the front end nodes
• Expect should improve scaling
• Better HA support
• Move to a true active-active model on BE
• Scaling
• Our goal is to scale with Chef server
Future major releases - 3.x and beyond
Protocol changes required
• Complex networks difficult; proxies are hard
• ZeroMQ was helpful at first, but hitting limitations
• Stability problems at scale
• Erlang doesn’t need a lot of what ZeroMQ brings
• Backward compatibility will be a priority
Future major releases - 3.x and beyond
• Office hours
• Currently Monday and Wednesday 12:00PST
• chef-push is the master repository
• github.com/chef/chef-push
• File issues here
• Specific issues and PRs are fine to file against the
individual repos
• Pull requests always welcome
• RFCs for major new features
`

More Related Content

What's hot

Introduction to Chef
Introduction to ChefIntroduction to Chef
Introduction to Chef
Pubudu Suharshan Perera
 
Serverspec and Sensu - Testing and Monitoring collide
Serverspec and Sensu - Testing and Monitoring collideServerspec and Sensu - Testing and Monitoring collide
Serverspec and Sensu - Testing and Monitoring collide
m_richardson
 
Introduction to Chef: Automate Your Infrastructure by Modeling It In Code
Introduction to Chef: Automate Your Infrastructure by Modeling It In CodeIntroduction to Chef: Automate Your Infrastructure by Modeling It In Code
Introduction to Chef: Automate Your Infrastructure by Modeling It In Code
Josh Padnick
 
Server Installation and Configuration with Chef
Server Installation and Configuration with ChefServer Installation and Configuration with Chef
Server Installation and Configuration with Chef
Raimonds Simanovskis
 
SaltConf14 - Saurabh Surana, HP Cloud - Automating operations and support wit...
SaltConf14 - Saurabh Surana, HP Cloud - Automating operations and support wit...SaltConf14 - Saurabh Surana, HP Cloud - Automating operations and support wit...
SaltConf14 - Saurabh Surana, HP Cloud - Automating operations and support wit...
SaltStack
 
Using SaltStack to DevOps the enterprise
Using SaltStack to DevOps the enterpriseUsing SaltStack to DevOps the enterprise
Using SaltStack to DevOps the enterprise
Christian McHugh
 
Opscode Webinar: Managing Your VMware Infrastructure with Chef
Opscode Webinar: Managing Your VMware Infrastructure with ChefOpscode Webinar: Managing Your VMware Infrastructure with Chef
Opscode Webinar: Managing Your VMware Infrastructure with Chef
Chef Software, Inc.
 
Hosting Ruby Web Apps
Hosting Ruby Web AppsHosting Ruby Web Apps
Hosting Ruby Web Apps
Michael Reinsch
 
SaltConf14 - Craig Sebenik, LinkedIn - SaltStack at Web Scale
SaltConf14 - Craig Sebenik, LinkedIn - SaltStack at Web ScaleSaltConf14 - Craig Sebenik, LinkedIn - SaltStack at Web Scale
SaltConf14 - Craig Sebenik, LinkedIn - SaltStack at Web Scale
SaltStack
 
Understand Chef
Understand ChefUnderstand Chef
Understand Chef
devopsjourney
 
Automating Infrastructure with Chef
Automating Infrastructure with ChefAutomating Infrastructure with Chef
Automating Infrastructure with Chef
Jennifer Davis
 
Learning chef
Learning chefLearning chef
Learning chef
Jonathan Carrillo
 
Chef Fundamentals Training Series Module 1: Overview of Chef
Chef Fundamentals Training Series Module 1: Overview of ChefChef Fundamentals Training Series Module 1: Overview of Chef
Chef Fundamentals Training Series Module 1: Overview of ChefChef Software, Inc.
 
Chef introduction
Chef introductionChef introduction
Chef introduction
FENG Zhichao
 
Chef Workflow Demo
Chef Workflow DemoChef Workflow Demo
Chef Workflow Demo
Chef
 
Chef Automate Workflow Demo
Chef Automate Workflow DemoChef Automate Workflow Demo
Chef Automate Workflow Demo
Chef
 
Azure handsonlab
Azure handsonlabAzure handsonlab
Azure handsonlab
Chef
 
London Community Summit 2016 - Fresh New Chef Stuff
London Community Summit 2016 - Fresh New Chef StuffLondon Community Summit 2016 - Fresh New Chef Stuff
London Community Summit 2016 - Fresh New Chef Stuff
Chef
 
Chef ignited a DevOps revolution – BK Box
Chef ignited a DevOps revolution – BK BoxChef ignited a DevOps revolution – BK Box
Chef ignited a DevOps revolution – BK Box
Chef Software, Inc.
 

What's hot (20)

Introduction to Chef
Introduction to ChefIntroduction to Chef
Introduction to Chef
 
Serverspec and Sensu - Testing and Monitoring collide
Serverspec and Sensu - Testing and Monitoring collideServerspec and Sensu - Testing and Monitoring collide
Serverspec and Sensu - Testing and Monitoring collide
 
Introduction to Chef: Automate Your Infrastructure by Modeling It In Code
Introduction to Chef: Automate Your Infrastructure by Modeling It In CodeIntroduction to Chef: Automate Your Infrastructure by Modeling It In Code
Introduction to Chef: Automate Your Infrastructure by Modeling It In Code
 
Server Installation and Configuration with Chef
Server Installation and Configuration with ChefServer Installation and Configuration with Chef
Server Installation and Configuration with Chef
 
SaltConf14 - Saurabh Surana, HP Cloud - Automating operations and support wit...
SaltConf14 - Saurabh Surana, HP Cloud - Automating operations and support wit...SaltConf14 - Saurabh Surana, HP Cloud - Automating operations and support wit...
SaltConf14 - Saurabh Surana, HP Cloud - Automating operations and support wit...
 
Using SaltStack to DevOps the enterprise
Using SaltStack to DevOps the enterpriseUsing SaltStack to DevOps the enterprise
Using SaltStack to DevOps the enterprise
 
Opscode Webinar: Managing Your VMware Infrastructure with Chef
Opscode Webinar: Managing Your VMware Infrastructure with ChefOpscode Webinar: Managing Your VMware Infrastructure with Chef
Opscode Webinar: Managing Your VMware Infrastructure with Chef
 
Hosting Ruby Web Apps
Hosting Ruby Web AppsHosting Ruby Web Apps
Hosting Ruby Web Apps
 
SaltConf14 - Craig Sebenik, LinkedIn - SaltStack at Web Scale
SaltConf14 - Craig Sebenik, LinkedIn - SaltStack at Web ScaleSaltConf14 - Craig Sebenik, LinkedIn - SaltStack at Web Scale
SaltConf14 - Craig Sebenik, LinkedIn - SaltStack at Web Scale
 
Understand Chef
Understand ChefUnderstand Chef
Understand Chef
 
Automating Infrastructure with Chef
Automating Infrastructure with ChefAutomating Infrastructure with Chef
Automating Infrastructure with Chef
 
Infrastructure as Code
Infrastructure as CodeInfrastructure as Code
Infrastructure as Code
 
Learning chef
Learning chefLearning chef
Learning chef
 
Chef Fundamentals Training Series Module 1: Overview of Chef
Chef Fundamentals Training Series Module 1: Overview of ChefChef Fundamentals Training Series Module 1: Overview of Chef
Chef Fundamentals Training Series Module 1: Overview of Chef
 
Chef introduction
Chef introductionChef introduction
Chef introduction
 
Chef Workflow Demo
Chef Workflow DemoChef Workflow Demo
Chef Workflow Demo
 
Chef Automate Workflow Demo
Chef Automate Workflow DemoChef Automate Workflow Demo
Chef Automate Workflow Demo
 
Azure handsonlab
Azure handsonlabAzure handsonlab
Azure handsonlab
 
London Community Summit 2016 - Fresh New Chef Stuff
London Community Summit 2016 - Fresh New Chef StuffLondon Community Summit 2016 - Fresh New Chef Stuff
London Community Summit 2016 - Fresh New Chef Stuff
 
Chef ignited a DevOps revolution – BK Box
Chef ignited a DevOps revolution – BK BoxChef ignited a DevOps revolution – BK Box
Chef ignited a DevOps revolution – BK Box
 

Viewers also liked

Scaling Cassandra for Big Data
Scaling Cassandra for Big DataScaling Cassandra for Big Data
Scaling Cassandra for Big DataDataStax Academy
 
Customer Scale: Stateless Sessions and Managing High-Volume Digital Services
Customer Scale: Stateless Sessions and Managing High-Volume Digital ServicesCustomer Scale: Stateless Sessions and Managing High-Volume Digital Services
Customer Scale: Stateless Sessions and Managing High-Volume Digital Services
ForgeRock
 
Chef Fundamentals Training Series Module 6: Roles, Environments, Community Co...
Chef Fundamentals Training Series Module 6: Roles, Environments, Community Co...Chef Fundamentals Training Series Module 6: Roles, Environments, Community Co...
Chef Fundamentals Training Series Module 6: Roles, Environments, Community Co...
Chef Software, Inc.
 
DataStax: Backup and Restore in Cassandra and OpsCenter
DataStax: Backup and Restore in Cassandra and OpsCenterDataStax: Backup and Restore in Cassandra and OpsCenter
DataStax: Backup and Restore in Cassandra and OpsCenter
DataStax Academy
 
Aggregated queries with Druid on terrabytes and petabytes of data
Aggregated queries with Druid on terrabytes and petabytes of dataAggregated queries with Druid on terrabytes and petabytes of data
Aggregated queries with Druid on terrabytes and petabytes of data
Rostislav Pashuto
 
CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...
CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...
CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...
Animesh Singh
 
Chef Cookbook Testing and Continuous Integration
Chef Cookbook Testing and Continuous IntegrationChef Cookbook Testing and Continuous Integration
Chef Cookbook Testing and Continuous Integration
Julian Dunn
 
Deploying Docker (Provisioning /w Docker + Chef/Puppet) - DevopsDaysPGH
Deploying Docker (Provisioning /w Docker + Chef/Puppet) - DevopsDaysPGHDeploying Docker (Provisioning /w Docker + Chef/Puppet) - DevopsDaysPGH
Deploying Docker (Provisioning /w Docker + Chef/Puppet) - DevopsDaysPGH
Erica Windisch
 
Infrastructure Automation with Chef
Infrastructure Automation with ChefInfrastructure Automation with Chef
Infrastructure Automation with Chef
Adam Jacob
 
CI and CD with Jenkins
CI and CD with JenkinsCI and CD with Jenkins
CI and CD with Jenkins
Martin Málek
 
Jenkins and Chef: Infrastructure CI and Automated Deployment
Jenkins and Chef: Infrastructure CI and Automated DeploymentJenkins and Chef: Infrastructure CI and Automated Deployment
Jenkins and Chef: Infrastructure CI and Automated Deployment
Dan Stine
 
Anatomy of a Continuous Integration and Delivery (CICD) Pipeline
Anatomy of a Continuous Integration and Delivery (CICD) PipelineAnatomy of a Continuous Integration and Delivery (CICD) Pipeline
Anatomy of a Continuous Integration and Delivery (CICD) Pipeline
Robert McDermott
 
DevOpsとか言う前にAWSエンジニアに知ってほしいアプリケーションのこと
DevOpsとか言う前にAWSエンジニアに知ってほしいアプリケーションのことDevOpsとか言う前にAWSエンジニアに知ってほしいアプリケーションのこと
DevOpsとか言う前にAWSエンジニアに知ってほしいアプリケーションのこと
Terui Masashi
 

Viewers also liked (13)

Scaling Cassandra for Big Data
Scaling Cassandra for Big DataScaling Cassandra for Big Data
Scaling Cassandra for Big Data
 
Customer Scale: Stateless Sessions and Managing High-Volume Digital Services
Customer Scale: Stateless Sessions and Managing High-Volume Digital ServicesCustomer Scale: Stateless Sessions and Managing High-Volume Digital Services
Customer Scale: Stateless Sessions and Managing High-Volume Digital Services
 
Chef Fundamentals Training Series Module 6: Roles, Environments, Community Co...
Chef Fundamentals Training Series Module 6: Roles, Environments, Community Co...Chef Fundamentals Training Series Module 6: Roles, Environments, Community Co...
Chef Fundamentals Training Series Module 6: Roles, Environments, Community Co...
 
DataStax: Backup and Restore in Cassandra and OpsCenter
DataStax: Backup and Restore in Cassandra and OpsCenterDataStax: Backup and Restore in Cassandra and OpsCenter
DataStax: Backup and Restore in Cassandra and OpsCenter
 
Aggregated queries with Druid on terrabytes and petabytes of data
Aggregated queries with Druid on terrabytes and petabytes of dataAggregated queries with Druid on terrabytes and petabytes of data
Aggregated queries with Druid on terrabytes and petabytes of data
 
CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...
CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...
CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...
 
Chef Cookbook Testing and Continuous Integration
Chef Cookbook Testing and Continuous IntegrationChef Cookbook Testing and Continuous Integration
Chef Cookbook Testing and Continuous Integration
 
Deploying Docker (Provisioning /w Docker + Chef/Puppet) - DevopsDaysPGH
Deploying Docker (Provisioning /w Docker + Chef/Puppet) - DevopsDaysPGHDeploying Docker (Provisioning /w Docker + Chef/Puppet) - DevopsDaysPGH
Deploying Docker (Provisioning /w Docker + Chef/Puppet) - DevopsDaysPGH
 
Infrastructure Automation with Chef
Infrastructure Automation with ChefInfrastructure Automation with Chef
Infrastructure Automation with Chef
 
CI and CD with Jenkins
CI and CD with JenkinsCI and CD with Jenkins
CI and CD with Jenkins
 
Jenkins and Chef: Infrastructure CI and Automated Deployment
Jenkins and Chef: Infrastructure CI and Automated DeploymentJenkins and Chef: Infrastructure CI and Automated Deployment
Jenkins and Chef: Infrastructure CI and Automated Deployment
 
Anatomy of a Continuous Integration and Delivery (CICD) Pipeline
Anatomy of a Continuous Integration and Delivery (CICD) PipelineAnatomy of a Continuous Integration and Delivery (CICD) Pipeline
Anatomy of a Continuous Integration and Delivery (CICD) Pipeline
 
DevOpsとか言う前にAWSエンジニアに知ってほしいアプリケーションのこと
DevOpsとか言う前にAWSエンジニアに知ってほしいアプリケーションのことDevOpsとか言う前にAWSエンジニアに知ってほしいアプリケーションのこと
DevOpsとか言う前にAWSエンジニアに知ってほしいアプリケーションのこと
 

Similar to Inside the Chef Push Jobs Service - ChefConf 2015

Using The Right Tool For The Job
Using The Right Tool For The JobUsing The Right Tool For The Job
Using The Right Tool For The Job
Chris Baldock
 
MySQL Performance Tuning. Part 1: MySQL Configuration (includes MySQL 5.7)
MySQL Performance Tuning. Part 1: MySQL Configuration (includes MySQL 5.7)MySQL Performance Tuning. Part 1: MySQL Configuration (includes MySQL 5.7)
MySQL Performance Tuning. Part 1: MySQL Configuration (includes MySQL 5.7)
Aurimas Mikalauskas
 
Fixing Domino Server Sickness
Fixing Domino Server SicknessFixing Domino Server Sickness
Fixing Domino Server Sickness
Gabriella Davis
 
How we scaled Rudder to 10k, and the road to 50k
How we scaled Rudder to 10k, and the road to 50kHow we scaled Rudder to 10k, and the road to 50k
How we scaled Rudder to 10k, and the road to 50k
RUDDER
 
Designing your API Server for mobile apps
Designing your API Server for mobile appsDesigning your API Server for mobile apps
Designing your API Server for mobile appsMugunth Kumar
 
Deployment automation framework with selenium
Deployment automation framework with seleniumDeployment automation framework with selenium
Deployment automation framework with selenium
Wenhua Wang
 
Atril-Déjà Vu Tea mserver 2 general presentation
Atril-Déjà Vu Tea mserver 2   general presentationAtril-Déjà Vu Tea mserver 2   general presentation
Atril-Déjà Vu Tea mserver 2 general presentation
cohlmann
 
DevOps Interview Questions Part - 2 | Devops Interview Questions And Answers ...
DevOps Interview Questions Part - 2 | Devops Interview Questions And Answers ...DevOps Interview Questions Part - 2 | Devops Interview Questions And Answers ...
DevOps Interview Questions Part - 2 | Devops Interview Questions And Answers ...
Simplilearn
 
Moving to microservices – a technology and organisation transformational journey
Moving to microservices – a technology and organisation transformational journeyMoving to microservices – a technology and organisation transformational journey
Moving to microservices – a technology and organisation transformational journey
Boyan Dimitrov
 
Kafka Summit SF 2017 - Running Kafka for Maximum Pain
Kafka Summit SF 2017 - Running Kafka for Maximum PainKafka Summit SF 2017 - Running Kafka for Maximum Pain
Kafka Summit SF 2017 - Running Kafka for Maximum Pain
confluent
 
Harnessing the Power of Master/Slave Clusters to Operate Data-Driven Business...
Harnessing the Power of Master/Slave Clusters to Operate Data-Driven Business...Harnessing the Power of Master/Slave Clusters to Operate Data-Driven Business...
Harnessing the Power of Master/Slave Clusters to Operate Data-Driven Business...
Continuent
 
.NET microservices with Azure Service Fabric
.NET microservices with Azure Service Fabric.NET microservices with Azure Service Fabric
.NET microservices with Azure Service Fabric
Davide Benvegnù
 
Hadoop Migration from 0.20.2 to 2.0
Hadoop Migration from 0.20.2 to 2.0Hadoop Migration from 0.20.2 to 2.0
Hadoop Migration from 0.20.2 to 2.0
Jabir Ahmed
 
PuppetCamp Sydney 2012 - Building a Multimaster Environment
PuppetCamp Sydney 2012 - Building a Multimaster EnvironmentPuppetCamp Sydney 2012 - Building a Multimaster Environment
PuppetCamp Sydney 2012 - Building a Multimaster Environment
Greg Cockburn
 
Building Efficient Parallel Testing Platforms with Docker
Building Efficient Parallel Testing Platforms with DockerBuilding Efficient Parallel Testing Platforms with Docker
Building Efficient Parallel Testing Platforms with Docker
Laura Frank Tacho
 
Nagios XI Best Practices
Nagios XI Best PracticesNagios XI Best Practices
Nagios XI Best Practices
Nagios
 
Building a [micro]services platform on AWS
Building a [micro]services platform on AWSBuilding a [micro]services platform on AWS
Building a [micro]services platform on AWS
Shaun Pearce
 
Managing multi tenant resource toward Hive 2.0
Managing multi tenant resource toward Hive 2.0Managing multi tenant resource toward Hive 2.0
Managing multi tenant resource toward Hive 2.0
Kai Sasaki
 
CoAP Talk
CoAP TalkCoAP Talk
CoAP Talk
Basuke Suzuki
 
LandsEnd TechEd2016 (1)
LandsEnd TechEd2016 (1)LandsEnd TechEd2016 (1)
LandsEnd TechEd2016 (1)Lisa Lawver
 

Similar to Inside the Chef Push Jobs Service - ChefConf 2015 (20)

Using The Right Tool For The Job
Using The Right Tool For The JobUsing The Right Tool For The Job
Using The Right Tool For The Job
 
MySQL Performance Tuning. Part 1: MySQL Configuration (includes MySQL 5.7)
MySQL Performance Tuning. Part 1: MySQL Configuration (includes MySQL 5.7)MySQL Performance Tuning. Part 1: MySQL Configuration (includes MySQL 5.7)
MySQL Performance Tuning. Part 1: MySQL Configuration (includes MySQL 5.7)
 
Fixing Domino Server Sickness
Fixing Domino Server SicknessFixing Domino Server Sickness
Fixing Domino Server Sickness
 
How we scaled Rudder to 10k, and the road to 50k
How we scaled Rudder to 10k, and the road to 50kHow we scaled Rudder to 10k, and the road to 50k
How we scaled Rudder to 10k, and the road to 50k
 
Designing your API Server for mobile apps
Designing your API Server for mobile appsDesigning your API Server for mobile apps
Designing your API Server for mobile apps
 
Deployment automation framework with selenium
Deployment automation framework with seleniumDeployment automation framework with selenium
Deployment automation framework with selenium
 
Atril-Déjà Vu Tea mserver 2 general presentation
Atril-Déjà Vu Tea mserver 2   general presentationAtril-Déjà Vu Tea mserver 2   general presentation
Atril-Déjà Vu Tea mserver 2 general presentation
 
DevOps Interview Questions Part - 2 | Devops Interview Questions And Answers ...
DevOps Interview Questions Part - 2 | Devops Interview Questions And Answers ...DevOps Interview Questions Part - 2 | Devops Interview Questions And Answers ...
DevOps Interview Questions Part - 2 | Devops Interview Questions And Answers ...
 
Moving to microservices – a technology and organisation transformational journey
Moving to microservices – a technology and organisation transformational journeyMoving to microservices – a technology and organisation transformational journey
Moving to microservices – a technology and organisation transformational journey
 
Kafka Summit SF 2017 - Running Kafka for Maximum Pain
Kafka Summit SF 2017 - Running Kafka for Maximum PainKafka Summit SF 2017 - Running Kafka for Maximum Pain
Kafka Summit SF 2017 - Running Kafka for Maximum Pain
 
Harnessing the Power of Master/Slave Clusters to Operate Data-Driven Business...
Harnessing the Power of Master/Slave Clusters to Operate Data-Driven Business...Harnessing the Power of Master/Slave Clusters to Operate Data-Driven Business...
Harnessing the Power of Master/Slave Clusters to Operate Data-Driven Business...
 
.NET microservices with Azure Service Fabric
.NET microservices with Azure Service Fabric.NET microservices with Azure Service Fabric
.NET microservices with Azure Service Fabric
 
Hadoop Migration from 0.20.2 to 2.0
Hadoop Migration from 0.20.2 to 2.0Hadoop Migration from 0.20.2 to 2.0
Hadoop Migration from 0.20.2 to 2.0
 
PuppetCamp Sydney 2012 - Building a Multimaster Environment
PuppetCamp Sydney 2012 - Building a Multimaster EnvironmentPuppetCamp Sydney 2012 - Building a Multimaster Environment
PuppetCamp Sydney 2012 - Building a Multimaster Environment
 
Building Efficient Parallel Testing Platforms with Docker
Building Efficient Parallel Testing Platforms with DockerBuilding Efficient Parallel Testing Platforms with Docker
Building Efficient Parallel Testing Platforms with Docker
 
Nagios XI Best Practices
Nagios XI Best PracticesNagios XI Best Practices
Nagios XI Best Practices
 
Building a [micro]services platform on AWS
Building a [micro]services platform on AWSBuilding a [micro]services platform on AWS
Building a [micro]services platform on AWS
 
Managing multi tenant resource toward Hive 2.0
Managing multi tenant resource toward Hive 2.0Managing multi tenant resource toward Hive 2.0
Managing multi tenant resource toward Hive 2.0
 
CoAP Talk
CoAP TalkCoAP Talk
CoAP Talk
 
LandsEnd TechEd2016 (1)
LandsEnd TechEd2016 (1)LandsEnd TechEd2016 (1)
LandsEnd TechEd2016 (1)
 

More from Chef

Habitat Managed Chef
Habitat Managed ChefHabitat Managed Chef
Habitat Managed Chef
Chef
 
Automation, Audits, and Apps Tour
Automation, Audits, and Apps TourAutomation, Audits, and Apps Tour
Automation, Audits, and Apps Tour
Chef
 
Automation, Audits, and Apps Tour
Automation, Audits, and Apps TourAutomation, Audits, and Apps Tour
Automation, Audits, and Apps Tour
Chef
 
Compliance Automation Workshop
Compliance Automation WorkshopCompliance Automation Workshop
Compliance Automation Workshop
Chef
 
London Community Summit 2016 - Adopting Chef Compliance
London Community Summit 2016 - Adopting Chef ComplianceLondon Community Summit 2016 - Adopting Chef Compliance
London Community Summit 2016 - Adopting Chef Compliance
Chef
 
Learning from Configuration Management
Learning from Configuration Management Learning from Configuration Management
Learning from Configuration Management
Chef
 
London Community Summit - Chef at SkyBet
London Community Summit - Chef at SkyBetLondon Community Summit - Chef at SkyBet
London Community Summit - Chef at SkyBet
Chef
 
London Community Summit - From Contribution to Authorship
London Community Summit - From Contribution to AuthorshipLondon Community Summit - From Contribution to Authorship
London Community Summit - From Contribution to Authorship
Chef
 
London Community Summit 2016 - Chef Automate
London Community Summit 2016 - Chef AutomateLondon Community Summit 2016 - Chef Automate
London Community Summit 2016 - Chef Automate
Chef
 
London Community Summit 2016 - Community Update
London Community Summit 2016 - Community UpdateLondon Community Summit 2016 - Community Update
London Community Summit 2016 - Community Update
Chef
 
London Community Summit 2016 - Habitat
London Community Summit 2016 -  HabitatLondon Community Summit 2016 -  Habitat
London Community Summit 2016 - Habitat
Chef
 
Compliance Automation with Inspec Part 4
Compliance Automation with Inspec Part 4Compliance Automation with Inspec Part 4
Compliance Automation with Inspec Part 4
Chef
 
Compliance Automation with Inspec Part 3
Compliance Automation with Inspec Part 3Compliance Automation with Inspec Part 3
Compliance Automation with Inspec Part 3
Chef
 
Compliance Automation with Inspec Part 2
Compliance Automation with Inspec Part 2Compliance Automation with Inspec Part 2
Compliance Automation with Inspec Part 2
Chef
 
Compliance Automation with Inspec Part 1
Compliance Automation with Inspec Part 1Compliance Automation with Inspec Part 1
Compliance Automation with Inspec Part 1
Chef
 
Application Automation with Habitat
Application Automation with HabitatApplication Automation with Habitat
Application Automation with Habitat
Chef
 
Achieving DevOps Success with Chef Automate
Achieving DevOps Success with Chef AutomateAchieving DevOps Success with Chef Automate
Achieving DevOps Success with Chef Automate
Chef
 
Nike pop up habitat
Nike pop up   habitatNike pop up   habitat
Nike pop up habitat
Chef
 
Nike popup compliance workshop
Nike popup compliance workshopNike popup compliance workshop
Nike popup compliance workshop
Chef
 
The caseforawesome
The caseforawesomeThe caseforawesome
The caseforawesome
Chef
 

More from Chef (20)

Habitat Managed Chef
Habitat Managed ChefHabitat Managed Chef
Habitat Managed Chef
 
Automation, Audits, and Apps Tour
Automation, Audits, and Apps TourAutomation, Audits, and Apps Tour
Automation, Audits, and Apps Tour
 
Automation, Audits, and Apps Tour
Automation, Audits, and Apps TourAutomation, Audits, and Apps Tour
Automation, Audits, and Apps Tour
 
Compliance Automation Workshop
Compliance Automation WorkshopCompliance Automation Workshop
Compliance Automation Workshop
 
London Community Summit 2016 - Adopting Chef Compliance
London Community Summit 2016 - Adopting Chef ComplianceLondon Community Summit 2016 - Adopting Chef Compliance
London Community Summit 2016 - Adopting Chef Compliance
 
Learning from Configuration Management
Learning from Configuration Management Learning from Configuration Management
Learning from Configuration Management
 
London Community Summit - Chef at SkyBet
London Community Summit - Chef at SkyBetLondon Community Summit - Chef at SkyBet
London Community Summit - Chef at SkyBet
 
London Community Summit - From Contribution to Authorship
London Community Summit - From Contribution to AuthorshipLondon Community Summit - From Contribution to Authorship
London Community Summit - From Contribution to Authorship
 
London Community Summit 2016 - Chef Automate
London Community Summit 2016 - Chef AutomateLondon Community Summit 2016 - Chef Automate
London Community Summit 2016 - Chef Automate
 
London Community Summit 2016 - Community Update
London Community Summit 2016 - Community UpdateLondon Community Summit 2016 - Community Update
London Community Summit 2016 - Community Update
 
London Community Summit 2016 - Habitat
London Community Summit 2016 -  HabitatLondon Community Summit 2016 -  Habitat
London Community Summit 2016 - Habitat
 
Compliance Automation with Inspec Part 4
Compliance Automation with Inspec Part 4Compliance Automation with Inspec Part 4
Compliance Automation with Inspec Part 4
 
Compliance Automation with Inspec Part 3
Compliance Automation with Inspec Part 3Compliance Automation with Inspec Part 3
Compliance Automation with Inspec Part 3
 
Compliance Automation with Inspec Part 2
Compliance Automation with Inspec Part 2Compliance Automation with Inspec Part 2
Compliance Automation with Inspec Part 2
 
Compliance Automation with Inspec Part 1
Compliance Automation with Inspec Part 1Compliance Automation with Inspec Part 1
Compliance Automation with Inspec Part 1
 
Application Automation with Habitat
Application Automation with HabitatApplication Automation with Habitat
Application Automation with Habitat
 
Achieving DevOps Success with Chef Automate
Achieving DevOps Success with Chef AutomateAchieving DevOps Success with Chef Automate
Achieving DevOps Success with Chef Automate
 
Nike pop up habitat
Nike pop up   habitatNike pop up   habitat
Nike pop up habitat
 
Nike popup compliance workshop
Nike popup compliance workshopNike popup compliance workshop
Nike popup compliance workshop
 
The caseforawesome
The caseforawesomeThe caseforawesome
The caseforawesome
 

Recently uploaded

How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
Bhaskar Mitra
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 

Recently uploaded (20)

How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 

Inside the Chef Push Jobs Service - ChefConf 2015

  • 1.
  • 2. Chef Push in 2015 Mark Anderson, 2015-04-01
  • 4. The basics of Chef Push
  • 5. If you want to run a command on a set of nodes • `knife ssh` can be problematic • Key distribution/revocation • Access control/User accounts • Difficult to audit • Extra work required if the node is behind firewall • Doesn’t really scale very far past tens of nodes • None of the alternative systems suited our needs Why Chef Push?
  • 6. • We wanted a remote execution system that is • Robust under network and client failure • Gates execution on a quorum being available • Provides presence information • Scale to hundreds if not thousands of nodes • Integrated with Chef authentication and authorization system • Works behind firewalls and NAT Why Chef Push?
  • 7. • knife job start -quorum 90% 'chef-client' --search 'role:webapp' • Finds all nodes with role webapp • Submits a job to the push server. • Checks quorum; 90% nodes listed must be available • Starts job chef-client on available nodes • Gathers success and failures • And will do this for ten nodes...or a thousand Push jobs in a command line
  • 8. The lifecycle of a job Server Client Job Accepted Send Command Clients ACK Wait for Quorum Start Exec Clients Exec Collect Results
  • 9. • Erlang service • Extends the Chef REST API • Job creation and tracking • Push client configuration • Controls the clients via ZeroMQ • Heartbeats to track node availability • Command execution • All ZeroMQ packets are signed Chef Push Server
  • 10. • Simple ruby client • Receives heartbeats from the server • Sends back heartbeats to the server • Executes commands • Configuration requirements are minimal • The client initiates all connections to the server • Most configuration is via Chef API call to config endpoint • Using that info opens ZeroMQ connections to server Chef Push Client
  • 11. Chef Push Networking Message switch Heartbeat generator REST API Client HTTPS PUB/SUB DEALER ROUTER
  • 12. • All control for push is via extensions to the chef API • Node status • Job control • start • stop • status • Job listing Chef Push knife extension
  • 13. • Access rights controlled by groups • ‘push_job_writers’ group controls job creation and deletion • ‘push_job_readers’ group controls read access to job status and results • Whitelist for commands • The client rejects commands that aren’t on the whitelist • We’d like to do finer grained access control in the future Access control
  • 14. • Version 1.0 scales to 2k nodes • Works with Chef 12 • Open source since Fall 2014 • We’ve been working on new features since last spring • But Chef 12 had to go out first • Required features from Enterprise Chef • Open sourcing chef push pretty meaningless without a open source server Status:
  • 15. New Features in Chef Push 2.0
  • 16. • Breaking change to the protocol • End to end encryption of every packet • Required for us to implement parameter passing and output return features • Built on the ZeroMQ4 implementation of CurveCP • CurveCP provides a framework which is • Fast • Crypto hardened against modern attacks • Forward secrecy • We still bootstrap the authentication using the Chef Client key End to End Encryption
  • 17. Enhanced control for the job execution environment • A config file up 100k • Effective User • Working directory • Environment variables • User defined variables • Special variables for • job id • job file location Command environment and config files
  • 18. • New flag for job • capture_output: boolean • Capture is all or nothing • All nodes in the job • Both stdout and stderr • Stored on server with job description • No streaming output … yet Command output capture
  • 19. Two event feeds • Per org feed • Job start • Job completion summary • Runs forever • Per job feed with fine grained execution data • Job voting start • Quorum votes by node • Job start • Completion state by node • Job completion Server Sent Event Feeds
  • 20. • Previously we’ve been advertising around 2k as the limit • 10k connected nodes demonstrated • 10 sec heartbeats • c3.2xlarge chef server in standalone mode • Push server consumes 2 cores and about 2GB • Up to 1k nodes in a single job • around 1.5-2k nodes we start seeing some stampede problems • Not done scaling; there are a few tweaks left to do Stable at 10k connected nodes
  • 22. • That test was done with real push clients • 20 m3.2xlarge nodes, • Each running 500 docker containers • But we also do a lot of testing using a simulator • Understanding the limits of our current system • SystemTap is amazing for this kind of work Current work: Scalability and Stability drive
  • 23. Axes of scaling tested • # of active clients • Heartbeat rate for a client • Number of clients in a single job Below 10k clients there is a pretty linear trade between heartbeat rate and number of connected clients; heartbeats/sec is was a useful metric Must use care to avoid stampedes in job execution Scaling and Tuning
  • 24. • A port in ZeroMQ is bound to a single thread • All communications go through a single ‘command switch’ • Client heartbeats, and all command messages go through the switch • The switch ended up being a bottleneck at around 2k messages/sec • Experiment: multiple command switches • Exercises some weaknesses in the ZeroMQ - Erlang interface • Not as big of a win as hoped, ended up being more complex than we’d like Lessons from scaling
  • 25. Nearly feature complete but: • Remaining work for new features • Knife push extensions for everything • Documentation • Windows testing and stability • Committed to making Windows a first class citizen • CentOS 7 • Polish around installation and cookbooks • Upgrade tooling for 1.0->2.0 • Bug fixes • Please file bugs Remaining work for 2.0
  • 26. Roadmap for 2.1 and beyond
  • 27. • Currently we support • Ubuntu 10.04, 12.04, 14.04 LTS • CentOS 5, 6, and 7 soon • Windows (client only) • Investigating client support for • AIX • Solaris Platform Support
  • 28. • Key rotation support • Multiple keys breaks some assumptions around how we auth in push • Needs fixes on Chef Server as well as Push • Better access control • Controlling access on a node by node basis • Examining persistent jobs as a first class object with their own ACLs - look for the RFC Features for 2.x releases
  • 29. • Integration into Chef Client package • Delayed joining the two because of the protocol breaking changes in 2.0 • Future server versions will be backward compatible. Features for 2.x releases
  • 30. Scaling • Rate limited job execution • Prevent stampede effect • Protects both push and chef server • Starting 1k chef client runs at once is a bad idea anyways • Per-job and server global limits • Multiple socket command switch • Biggest scaling bottleneck • Infrastructure for distributed server Features for 2.x releases
  • 31. • Move push connections to front ends in tiered Chef • Push will be running on all of the front end nodes • Expect should improve scaling • Better HA support • Move to a true active-active model on BE • Scaling • Our goal is to scale with Chef server Future major releases - 3.x and beyond
  • 32. Protocol changes required • Complex networks difficult; proxies are hard • ZeroMQ was helpful at first, but hitting limitations • Stability problems at scale • Erlang doesn’t need a lot of what ZeroMQ brings • Backward compatibility will be a priority Future major releases - 3.x and beyond
  • 33. • Office hours • Currently Monday and Wednesday 12:00PST • chef-push is the master repository • github.com/chef/chef-push • File issues here • Specific issues and PRs are fine to file against the individual repos • Pull requests always welcome • RFCs for major new features `