Atmosphere 2014: Really large scale systems configuration - Phil Dibowitz
Upcoming SlideShare
Loading in...5

Atmosphere 2014: Really large scale systems configuration - Phil Dibowitz



For many years, Facebook managed its systems with cfengine2. With many individual clusters over 10k nodes in size, a slew of different constantly-changing system configurations, and small teams, this ...

For many years, Facebook managed its systems with cfengine2. With many individual clusters over 10k nodes in size, a slew of different constantly-changing system configurations, and small teams, this system was showing its age and the complexity was steadily increasing, limiting its effectiveness and usability. It was difficult to integrate with internal systems, testing was often impractical, and it provided no isolation of configurations, among many other problems. After an extensive evaluation of the tools and paradigms in modern systems configuration management – open source, proprietary, and a potential home-grown solution – we built a system based on the open-source project Chef. The evaluation process involved understanding the direction we wanted to take in managing the next many iterations of systems, clusters, and teams. More importantly, we evaluated the various paradigms behind effective configuration management and the different kinds of scale they provide. What we ended up with is an extremely flexible system that allows a tiny team to manage an incredibly large number of systems with a variety of unique configuration needs. In this talk we will look at the paradigms behind the system we built, the software we chose and why, and the system we built using that software. Further, we will look at how the philosophies we followed can apply to anyone wanting to scale their systems infrastructure.

Phil Dibowitz - Phil Dibowitz has been working in systems engineering for 12 years and is currently a production engineer at Facebook. Initially, he worked on the traffic infrastructure team, automating load balancer configuration management, as well as designing and building the production IPv6 infrastructure. He now leads the team responsible for rebuilding the configuration management system from the ground up. Prior to Facebook, he worked at Google, where he managed the large Gmail environment, and at Ticketmaster, where he co-authored and open sourced a configuration management tool called Spine. He also contributes to, and maintains, various open source projects and has spoken at conferences and LUG’s on a variety of topics from Path MTU Discovery to X509.



Total Views
Views on SlideShare
Embed Views



1 Embed 3 3


Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

Atmosphere 2014: Really large scale systems configuration - Phil Dibowitz Atmosphere 2014: Really large scale systems configuration - Phil Dibowitz Presentation Transcript

  • Really large scale systems configuration Config Management @ Facebook Phil Dibowitz Atmosphere – May 2014
  • Configuration Management Experience ● Co-authored Spine (2005-2007) ● Authored Provision Scale Experience ● Ticketmaster, Google, Facebook Passionate about scaling configuration management Who am I?
  • Scaling View slide
  • Scaling Configuration Management How many homogeneous systems can I maintain? How many heterogeneous systems can I maintain? How many people are needed? Can I safely delegate delta configuration? View slide
  • The Goal
  • The Goal ● 4 people ● Hundreds of thousands of heterogeneous systems ● Service owners own/adjust relevant settings
  • What did we need?
  • 1. Basic Scalable Building Blocks
  • 1. Basic Scalable Build Blocks Distributed! Everything on the client (duh!)
  • 1. Basic Scalable Build Blocks Distributed! Everything on the client (duh!) Deterministic! The system you want on every run Idempotent! Only the necessary changes Extensible! Tied into internal systems Flexible! No dictated workflow
  • 2. Configuration as Data
  • 2. Configuration as Data Service Owner I want • shared mem • VIP • core files somewhere else • service running • less/more/no nscd caching
  • 2. Configuration as Data Service owners may not know: • How to configure VIPs • Optimal sysctl settings • Network settings • Authentication settings
  • 3. Flexibility
  • 3. Flexibility • Super-fast prototyping • Internal assumptions can be changed - easily • Extend in new ways – easily • Adapt to our workflow ...
  • 3. Flexibility - Example • Template /etc/sysctl.conf • Build a hash of default sysctls • Provide these defaults early in “run” • Let any engineer munge the bits they want • /etc/sysctl.conf template interpolated “after”
  • Picking a tool
  • Why Chef? Easier to see from a problem with Chef
  • Problem (for us): ● Scale issues with 15k-node clusters ● Standard solution sucks (disable ohai plugins)
  • Solution (for us): whitelist_node_attrs • New cookbook re-opens • Deletes non-white-listed attrs before saving • Have as much data as you want during the run • We send < 1kb back to the server! Code available:
  • Chef: whitelist_node_attrs class Chef class Node alias_method :old_save, :save # Overwrite chef’s to whitelist. doesn’t get “later” than this def save“Whitelisting node attributes”) whitelist = self[:whitelist].to_hash self.default_attrs = Whitelist.filter(self.default_attrs, whitelist) self.normal_attrs = Whitelist.filter(self.normal_attrs, whitelist) self.override_attrs = Whitelist.filter(self.override_attrs, whitelist) self.automatic_attrs = Whitelist.filter(self.override_attrs, whitelist) old_save end end end
  • Chef: whitelist_node_attrs Well... that’s flexible! (we did this a few times)
  • Our desired workflow
  • Our Desired Workflow • Provide API for anyone, anywhere to extend configs by munging data structures • Engineers don’t need to know what they’re building on, just what they want to change • Engineers can change their systems without fear of changing anything else • Testing should be easy • And...
  • Something Different “Moving idempotency up” Or “Idempotent Systems”
  • Moving Idempotency Up ● Idempotent records can get stale ● Remove cron/sysctl/user/etc. resource ● Resulting entry not removed => stale entries ● Idempotent systems control a set of configs ● Remove cron/sysctl/user/etc. ● No longer rendered in config
  • Idempotent Records vs. Systems This is a pain:
  • Idempotent Records vs. Systems This is better:
  • Case Study
  • Case Study: sysctl ● fb_sysctl/attributes/default.rb ● Provides defaults looking at hw, kernel, etc. ● fb_sysctl/recipes/default.rb ● Defines a template ● fb_sysctl/templates/default/sysctl.conf.erb ● 3-line template
  • Case Study: sysctl # Generated by Chef, do not edit directly! <%- node['fb']['fb_sysctl'].to_hash.keys.sort.each do |key| %> <%= key %> = <%= node['fb']['fb_sysctl'][key] %> <%- end %> Template: # Generated by Chef, do not edit directly! ... net.ipv6.conf.eth0.accept_ra = 1 net.ipv6.conf.eth0.accept_ra_pinfo = 0 net.ipv6.conf.eth0.autoconf = 0 ... Result:
  • Case Study: sysctl In the cookbook for the DB servers: node.default['fb']['fb_sysctl']['kernel.shmmax'] = 19541180416 node.default['fb']['fb_sysctl']['kernel.shmall'] = 5432001 fb_database/recipes/default.rb
  • Case Study: sysctl How does this help us scale? • Delegation is simple, thus... • Fewer people need to manage configs, therefore... • Significantly better heterogenous scale
  • Other Examples node.default['fb']['fb_networking']['want_ipv6'] = true Want IPv6? node.is_layer3? Want to know what kind of network? node.default['fb']['fb_cron']['jobs']['myjob'] = { 'time' => '*/15 * * * *', 'command' => 'thing', 'user' => 'myservice', } New cronjob?
  • Other Examples: DSR LB Web Web Web Internet
  • Other Examples: DSR node.add_dsr_vip('')
  • Our Chef Infrastructure
  • Our Chef Infrastructure Open Source Chef and Enterprise Chef
  • Our Chef Infrastructure - Customizations • Stateless Chef Servers • No search • No databags • Separate Failure Domains • Tiered Model
  • Production: Global SVN RepoSVN Repo CodeCode Cluster 1Cluster 1 ReviewReview LintLint Cluster 2Cluster 2 Cluster 3Cluster 3 Cluster 4Cluster 4
  • Production: Cluster Chef FE 2Chef FE 2Chef FE 1Chef FE 1 LB LB Chef BE 2Chef BE 2 Chef FE 3Chef FE 3 Web Web SVC DB DB (grocery-delivery) Chef BE 1Chef BE 1 (grocery-delivery) SVNSVN
  • Assumptions • Server is basically stateless • Node data not persistent • No databags • Grocery-delivery keeps roles/cookbooks in sync • Chef only knows about the cluster it is in
  • Implementation Details •Grocery-delivery runs on every backend, every minute •Persistent data needs to come from FB systems of record (SOR) •Ohai is tied into necessary SORs •Runlist is forced on every run by node
  • Implementation Details: Client ● Report Handlers feed data into monitoring: ● Last exception seen ● Success/Failure of run ● Number of resources ● Time to run ● Time since last run ● Other system info
  • Implementation Details: Server ● chef-server-stats gathers data for monitoring: ● Stats (postgres, authz [opc], etc.) ● Errors (nginx, erchef, etc.) ● Grocery-delivery syncs cookbooks, roles, (databags): ● Pluggable and configurable ● Both open-source: ●
  • But does it scale?
  • Scale • Cluster size ~10k+ nodes • Lots of clusters • 15 minute convergence (14 min splay) • grocery-delivery runs every minute
  • Scale - Erchef Pre-erchef vs Post-erchef (Enterprise Chef 1.4+)
  • Scale - Erchef
  • Scale – Open Source Chef Server 11 Let’s throw more than a cluster at a Chef instance! (Open Source Chef 11.0)
  • Scale – Open Source Chef Server 11
  • Testing: Multi-prong approach • Foodcritic pre-commits for sanity • Chefspec for unit-testing of core/critical cookbooks • Taste-tester for full test-in-prod
  • Testing: taste-tester •Manage a local chef-zero instance •Track changes to local git repo, upload deltas •Point real production servers at chef-zero for testing •Configurable, with plug-in framework
  • Testing: taste-tester desktop$ taste-tester test -s Sync your repo to chef-zero; Test on a prod machine Run Chef on test server server1# chef-client Fix bugs, re-sync desktop$ vim … ; taste-tester upload Available at
  • Our source... •Chef-server-stats – Get perf stats from chef-servers •Taste-tester – Our testing system (new!) •Grocery-delivery – Our cookbook syncing system (rewritten and re-released!) •Between-meals – changes-tracking lib for TT and GD (new!) •Example cookbooks – fb_cron and fb_sysctl (new!) • – Ideas in this talk in text form (new!)
  • Our source... Coming soon... releases of TT, BM, GD
  • Lessons
  • Lessons • Idempotent systems > idempotent records • Delegating delta config == easier heterogeneity • Full programming languages > restrictive DSLs • Scale is more than just a number of clients • Easy abstractions are critical • Testing against real systems is useful and necessary
  • Summary So how about those types of scale?
  • Summary How many homogeneous systems can you maintain? How many heterogeneous systems can you maintain? How many people are needed? Can you safely delegate delta configuration? >> 17k > 17k ~4 Yes
  • Thanks ● Opscode ● Adam Jacob, Chris Brown, Steven Danna, & the erchef and bifrost teams ● Andrew Crump ● foodcritic rules! ● Everyone I work(ed) with ● Aaron, Bethanye, David, KC, Larry, Marcin, Pedro, Tyler