Really large scale
systems configuration
Config Management @ Facebook
Phil Dibowitz
Atmosphere – May 2014
Configuration Management Experience
●
Co-authored Spine (2005-2007)
●
Authored Provision
Scale Experience
●
Ticketmaster, ...
http://coolinterestingstuff.com/amazing-space-images/
Scaling
Scaling Configuration Management
How many homogeneous systems can I maintain?
How many heterogeneous systems can I maintai...
The Goal
http://www.prathidhwani.org/sports/goal-2012-the-beautiful-game-is-back/
The Goal
●
4 people
●
Hundreds of thousands of heterogeneous systems
●
Service owners own/adjust relevant settings
What did we need?
https://www.asburyseminary.edu/elink/my-profile-on-the-hub/
1. Basic
Scalable
Building
Blocks
1. Basic Scalable Build Blocks
Distributed! Everything on the client (duh!)
1. Basic Scalable Build Blocks
Distributed! Everything on the client (duh!)
Deterministic! The system you want on every ru...
http://www.greenbookblog.org/2012/03/21/big-data-opportunity-or-threat-for-market-research/
2. Configuration as Data
2. Configuration as Data
Service Owner
http://www.flickr.com/photos/laurapple/7370381182/
I want
• shared mem
• VIP
• core...
2. Configuration as Data
Service owners may not know:
• How to configure VIPs
• Optimal sysctl settings
• Network settings...
http://livelovesweat.wordpress.com/2011/12/07/the-importance-of-flexibility/
3. Flexibility
3. Flexibility
• Super-fast prototyping
• Internal assumptions can be changed - easily
• Extend in new ways – easily
• Ada...
3. Flexibility - Example
• Template /etc/sysctl.conf
• Build a hash of default sysctls
• Provide these defaults early in “...
http://www.flickr.com/photos/75905404@N00/7126147125/
Picking a tool
Why Chef?
Easier to see from a problem with Chef
Problem (for us): node.save()
●
Scale issues with 15k-node clusters
●
Standard solution sucks (disable ohai plugins)
Solution (for us): whitelist_node_attrs
• New cookbook re-opens Chef::Node.save()
• Deletes non-white-listed attrs before ...
Chef: whitelist_node_attrs
class Chef
class Node
alias_method :old_save, :save
# Overwrite chef’s node.save to whitelist. ...
Chef: whitelist_node_attrs
Well... that’s flexible!
(we did this a few times)
Our desired workflow
Our Desired Workflow
• Provide API for anyone, anywhere to extend configs by
munging data structures
• Engineers don’t nee...
Something Different
“Moving idempotency up”
Or
“Idempotent Systems”
http://www.flickr.com/photos/esi_design/4548531839/siz...
Moving Idempotency Up
●
Idempotent records can get stale
●
Remove cron/sysctl/user/etc. resource
●
Resulting entry not rem...
Idempotent Records vs. Systems
This is a pain:
Idempotent Records vs. Systems
This is better:
Case Study
Case Study: sysctl
●
fb_sysctl/attributes/default.rb
●
Provides defaults looking at hw, kernel, etc.
●
fb_sysctl/recipes/d...
Case Study: sysctl
# Generated by Chef, do not edit directly!
<%- node['fb']['fb_sysctl'].to_hash.keys.sort.each do |key| ...
Case Study: sysctl
In the cookbook for the DB servers:
node.default['fb']['fb_sysctl']['kernel.shmmax'] = 19541180416
node...
Case Study: sysctl
How does this help us scale?
• Delegation is simple, thus...
• Fewer people need to manage configs, the...
Other Examples
node.default['fb']['fb_networking']['want_ipv6'] = true
Want IPv6?
node.is_layer3?
Want to know what kind o...
Other Examples: DSR
LB
Web Web Web
Internet
Other Examples: DSR
node.add_dsr_vip('10.1.1.2')
Our Chef Infrastructure
Our Chef Infrastructure
Open Source Chef and Enterprise Chef
Our Chef Infrastructure - Customizations
• Stateless Chef Servers
• No search
• No databags
• Separate Failure Domains
• T...
Production: Global
SVN RepoSVN Repo
CodeCode Cluster 1Cluster 1
ReviewReview
LintLint
Cluster 2Cluster 2 Cluster 3Cluster ...
Production: Cluster
Chef FE 2Chef FE 2Chef FE 1Chef FE 1
LB
LB
Chef BE 2Chef BE 2
Chef FE 3Chef FE 3
Web Web SVC DB DB
(gr...
Assumptions
• Server is basically stateless
• Node data not persistent
• No databags
• Grocery-delivery keeps roles/cookbo...
Implementation Details
•Grocery-delivery runs on every backend, every minute
•Persistent data needs to come from FB system...
Implementation Details: Client
●
Report Handlers feed data into monitoring:
●
Last exception seen
●
Success/Failure of run...
Implementation Details: Server
●
chef-server-stats gathers data for monitoring:
●
Stats (postgres, authz [opc], etc.)
●
Er...
But does it scale?
Scale
• Cluster size ~10k+ nodes
• Lots of clusters
• 15 minute convergence (14 min splay)
• grocery-delivery runs every m...
Scale - Erchef
Pre-erchef vs Post-erchef
(Enterprise Chef 1.4+)
Scale - Erchef
Scale – Open Source Chef Server 11
Let’s throw more than a cluster at
a Chef instance!
(Open Source Chef 11.0)
Scale – Open Source Chef Server 11
Testing: Multi-prong approach
• Foodcritic pre-commits for sanity
• Chefspec for unit-testing of core/critical cookbooks
•...
Testing: taste-tester
•Manage a local chef-zero instance
•Track changes to local git repo, upload deltas
•Point real produ...
Testing: taste-tester
desktop$ taste-tester test -s server1.example.com
Sync your repo to chef-zero; Test on a prod machin...
Our source...
•Chef-server-stats – Get perf stats from chef-servers
•Taste-tester – Our testing system (new!)
•Grocery-del...
Our source...
Coming soon...
rubygems.org releases of TT, BM, GD
Lessons
Lessons
• Idempotent systems > idempotent records
• Delegating delta config == easier heterogeneity
• Full programming lan...
Summary
So how about those types of scale?
Summary
How many homogeneous systems can you
maintain?
How many heterogeneous systems can you
maintain?
How many people ar...
Thanks
●
Opscode
●
Adam Jacob, Chris Brown, Steven Danna, & the erchef and
bifrost teams
●
Andrew Crump
●
foodcritic rules...
INFRASTRUCTURE
Atmosphere 2014: Really large scale systems configuration - Phil Dibowitz
Atmosphere 2014: Really large scale systems configuration - Phil Dibowitz
Atmosphere 2014: Really large scale systems configuration - Phil Dibowitz
Upcoming SlideShare
Loading in …5
×

Atmosphere 2014: Really large scale systems configuration - Phil Dibowitz

341 views

Published on

For many years, Facebook managed its systems with cfengine2. With many individual clusters over 10k nodes in size, a slew of different constantly-changing system configurations, and small teams, this system was showing its age and the complexity was steadily increasing, limiting its effectiveness and usability. It was difficult to integrate with internal systems, testing was often impractical, and it provided no isolation of configurations, among many other problems. After an extensive evaluation of the tools and paradigms in modern systems configuration management – open source, proprietary, and a potential home-grown solution – we built a system based on the open-source project Chef. The evaluation process involved understanding the direction we wanted to take in managing the next many iterations of systems, clusters, and teams. More importantly, we evaluated the various paradigms behind effective configuration management and the different kinds of scale they provide. What we ended up with is an extremely flexible system that allows a tiny team to manage an incredibly large number of systems with a variety of unique configuration needs. In this talk we will look at the paradigms behind the system we built, the software we chose and why, and the system we built using that software. Further, we will look at how the philosophies we followed can apply to anyone wanting to scale their systems infrastructure.

Phil Dibowitz - Phil Dibowitz has been working in systems engineering for 12 years and is currently a production engineer at Facebook. Initially, he worked on the traffic infrastructure team, automating load balancer configuration management, as well as designing and building the production IPv6 infrastructure. He now leads the team responsible for rebuilding the configuration management system from the ground up. Prior to Facebook, he worked at Google, where he managed the large Gmail environment, and at Ticketmaster, where he co-authored and open sourced a configuration management tool called Spine. He also contributes to, and maintains, various open source projects and has spoken at conferences and LUG’s on a variety of topics from Path MTU Discovery to X509.

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
341
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
4
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Atmosphere 2014: Really large scale systems configuration - Phil Dibowitz

  1. 1. Really large scale systems configuration Config Management @ Facebook Phil Dibowitz Atmosphere – May 2014
  2. 2. Configuration Management Experience ● Co-authored Spine (2005-2007) ● Authored Provision Scale Experience ● Ticketmaster, Google, Facebook Passionate about scaling configuration management Who am I?
  3. 3. http://coolinterestingstuff.com/amazing-space-images/ Scaling
  4. 4. Scaling Configuration Management How many homogeneous systems can I maintain? How many heterogeneous systems can I maintain? How many people are needed? Can I safely delegate delta configuration?
  5. 5. The Goal http://www.prathidhwani.org/sports/goal-2012-the-beautiful-game-is-back/
  6. 6. The Goal ● 4 people ● Hundreds of thousands of heterogeneous systems ● Service owners own/adjust relevant settings
  7. 7. What did we need?
  8. 8. https://www.asburyseminary.edu/elink/my-profile-on-the-hub/ 1. Basic Scalable Building Blocks
  9. 9. 1. Basic Scalable Build Blocks Distributed! Everything on the client (duh!)
  10. 10. 1. Basic Scalable Build Blocks Distributed! Everything on the client (duh!) Deterministic! The system you want on every run Idempotent! Only the necessary changes Extensible! Tied into internal systems Flexible! No dictated workflow
  11. 11. http://www.greenbookblog.org/2012/03/21/big-data-opportunity-or-threat-for-market-research/ 2. Configuration as Data
  12. 12. 2. Configuration as Data Service Owner http://www.flickr.com/photos/laurapple/7370381182/ I want • shared mem • VIP • core files somewhere else • service running • less/more/no nscd caching
  13. 13. 2. Configuration as Data Service owners may not know: • How to configure VIPs • Optimal sysctl settings • Network settings • Authentication settings
  14. 14. http://livelovesweat.wordpress.com/2011/12/07/the-importance-of-flexibility/ 3. Flexibility
  15. 15. 3. Flexibility • Super-fast prototyping • Internal assumptions can be changed - easily • Extend in new ways – easily • Adapt to our workflow ...
  16. 16. 3. Flexibility - Example • Template /etc/sysctl.conf • Build a hash of default sysctls • Provide these defaults early in “run” • Let any engineer munge the bits they want • /etc/sysctl.conf template interpolated “after”
  17. 17. http://www.flickr.com/photos/75905404@N00/7126147125/ Picking a tool
  18. 18. Why Chef? Easier to see from a problem with Chef
  19. 19. Problem (for us): node.save() ● Scale issues with 15k-node clusters ● Standard solution sucks (disable ohai plugins)
  20. 20. Solution (for us): whitelist_node_attrs • New cookbook re-opens Chef::Node.save() • Deletes non-white-listed attrs before saving • Have as much data as you want during the run • We send < 1kb back to the server! Code available: https://github.com/opscode-cookbooks/whitelist-node-attrs
  21. 21. Chef: whitelist_node_attrs class Chef class Node alias_method :old_save, :save # Overwrite chef’s node.save to whitelist. doesn’t get “later” than this def save Chef::Log.info(“Whitelisting node attributes”) whitelist = self[:whitelist].to_hash self.default_attrs = Whitelist.filter(self.default_attrs, whitelist) self.normal_attrs = Whitelist.filter(self.normal_attrs, whitelist) self.override_attrs = Whitelist.filter(self.override_attrs, whitelist) self.automatic_attrs = Whitelist.filter(self.override_attrs, whitelist) old_save end end end
  22. 22. Chef: whitelist_node_attrs Well... that’s flexible! (we did this a few times)
  23. 23. Our desired workflow
  24. 24. Our Desired Workflow • Provide API for anyone, anywhere to extend configs by munging data structures • Engineers don’t need to know what they’re building on, just what they want to change • Engineers can change their systems without fear of changing anything else • Testing should be easy • And...
  25. 25. Something Different “Moving idempotency up” Or “Idempotent Systems” http://www.flickr.com/photos/esi_design/4548531839/sizes/l/in/photostream/
  26. 26. Moving Idempotency Up ● Idempotent records can get stale ● Remove cron/sysctl/user/etc. resource ● Resulting entry not removed => stale entries ● Idempotent systems control a set of configs ● Remove cron/sysctl/user/etc. ● No longer rendered in config
  27. 27. Idempotent Records vs. Systems This is a pain:
  28. 28. Idempotent Records vs. Systems This is better:
  29. 29. Case Study
  30. 30. Case Study: sysctl ● fb_sysctl/attributes/default.rb ● Provides defaults looking at hw, kernel, etc. ● fb_sysctl/recipes/default.rb ● Defines a template ● fb_sysctl/templates/default/sysctl.conf.erb ● 3-line template
  31. 31. Case Study: sysctl # Generated by Chef, do not edit directly! <%- node['fb']['fb_sysctl'].to_hash.keys.sort.each do |key| %> <%= key %> = <%= node['fb']['fb_sysctl'][key] %> <%- end %> Template: # Generated by Chef, do not edit directly! ... net.ipv6.conf.eth0.accept_ra = 1 net.ipv6.conf.eth0.accept_ra_pinfo = 0 net.ipv6.conf.eth0.autoconf = 0 ... Result:
  32. 32. Case Study: sysctl In the cookbook for the DB servers: node.default['fb']['fb_sysctl']['kernel.shmmax'] = 19541180416 node.default['fb']['fb_sysctl']['kernel.shmall'] = 5432001 fb_database/recipes/default.rb
  33. 33. Case Study: sysctl How does this help us scale? • Delegation is simple, thus... • Fewer people need to manage configs, therefore... • Significantly better heterogenous scale
  34. 34. Other Examples node.default['fb']['fb_networking']['want_ipv6'] = true Want IPv6? node.is_layer3? Want to know what kind of network? node.default['fb']['fb_cron']['jobs']['myjob'] = { 'time' => '*/15 * * * *', 'command' => 'thing', 'user' => 'myservice', } New cronjob?
  35. 35. Other Examples: DSR LB Web Web Web Internet
  36. 36. Other Examples: DSR node.add_dsr_vip('10.1.1.2')
  37. 37. Our Chef Infrastructure
  38. 38. Our Chef Infrastructure Open Source Chef and Enterprise Chef
  39. 39. Our Chef Infrastructure - Customizations • Stateless Chef Servers • No search • No databags • Separate Failure Domains • Tiered Model
  40. 40. Production: Global SVN RepoSVN Repo CodeCode Cluster 1Cluster 1 ReviewReview LintLint Cluster 2Cluster 2 Cluster 3Cluster 3 Cluster 4Cluster 4
  41. 41. Production: Cluster Chef FE 2Chef FE 2Chef FE 1Chef FE 1 LB LB Chef BE 2Chef BE 2 Chef FE 3Chef FE 3 Web Web SVC DB DB (grocery-delivery) Chef BE 1Chef BE 1 (grocery-delivery) SVNSVN
  42. 42. Assumptions • Server is basically stateless • Node data not persistent • No databags • Grocery-delivery keeps roles/cookbooks in sync • Chef only knows about the cluster it is in
  43. 43. Implementation Details •Grocery-delivery runs on every backend, every minute •Persistent data needs to come from FB systems of record (SOR) •Ohai is tied into necessary SORs •Runlist is forced on every run by node
  44. 44. Implementation Details: Client ● Report Handlers feed data into monitoring: ● Last exception seen ● Success/Failure of run ● Number of resources ● Time to run ● Time since last run ● Other system info
  45. 45. Implementation Details: Server ● chef-server-stats gathers data for monitoring: ● Stats (postgres, authz [opc], etc.) ● Errors (nginx, erchef, etc.) ● Grocery-delivery syncs cookbooks, roles, (databags): ● Pluggable and configurable ● Both open-source: ● https://github.com/facebook/chef-utils
  46. 46. But does it scale?
  47. 47. Scale • Cluster size ~10k+ nodes • Lots of clusters • 15 minute convergence (14 min splay) • grocery-delivery runs every minute
  48. 48. Scale - Erchef Pre-erchef vs Post-erchef (Enterprise Chef 1.4+)
  49. 49. Scale - Erchef
  50. 50. Scale – Open Source Chef Server 11 Let’s throw more than a cluster at a Chef instance! (Open Source Chef 11.0)
  51. 51. Scale – Open Source Chef Server 11
  52. 52. Testing: Multi-prong approach • Foodcritic pre-commits for sanity • Chefspec for unit-testing of core/critical cookbooks • Taste-tester for full test-in-prod
  53. 53. Testing: taste-tester •Manage a local chef-zero instance •Track changes to local git repo, upload deltas •Point real production servers at chef-zero for testing •Configurable, with plug-in framework
  54. 54. Testing: taste-tester desktop$ taste-tester test -s server1.example.com Sync your repo to chef-zero; Test on a prod machine Run Chef on test server server1# chef-client Fix bugs, re-sync desktop$ vim … ; taste-tester upload Available at http://github.com/facebook/chef-utils
  55. 55. Our source... •Chef-server-stats – Get perf stats from chef-servers •Taste-tester – Our testing system (new!) •Grocery-delivery – Our cookbook syncing system (rewritten and re-released!) •Between-meals – changes-tracking lib for TT and GD (new!) •Example cookbooks – fb_cron and fb_sysctl (new!) •Philosophy.md – Ideas in this talk in text form (new!)
  56. 56. Our source... Coming soon... rubygems.org releases of TT, BM, GD
  57. 57. Lessons
  58. 58. Lessons • Idempotent systems > idempotent records • Delegating delta config == easier heterogeneity • Full programming languages > restrictive DSLs • Scale is more than just a number of clients • Easy abstractions are critical • Testing against real systems is useful and necessary
  59. 59. Summary So how about those types of scale?
  60. 60. Summary How many homogeneous systems can you maintain? How many heterogeneous systems can you maintain? How many people are needed? Can you safely delegate delta configuration? >> 17k > 17k ~4 Yes
  61. 61. Thanks ● Opscode ● Adam Jacob, Chris Brown, Steven Danna, & the erchef and bifrost teams ● Andrew Crump ● foodcritic rules! ● Everyone I work(ed) with ● Aaron, Bethanye, David, KC, Larry, Marcin, Pedro, Tyler
  62. 62. INFRASTRUCTURE

×