Configuration management 101 - A tale of disaster recovery using CFEngine 3


Published on

Published in: Technology, Business
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Configuration management 101 - A tale of disaster recovery using CFEngine 3

  1. 1. RMLL 2011 @Strasbourg, France A tale of disaster recovery CFEngine everyday, practices and toolsNicolas Charles <>Jonathan Clarke <>    
  2. 2. About the speakers Nicolas Charles Jonathan Clarke CFEngine contributor CFEngine contributor CFEngine ”Community Contributor to various Champion” (C3) LDAP FLOSS projects Scala Developer Sysadmin But we get on pretty well! (mostly...)    
  3. 3. Agenda1) Configuration Management 1012) A tale of disaster recovery3) Our choice of tool4) About CFEngine 3    
  4. 4. A bit aboutConfiguration Management...    
  5. 5. Configuration management What is it?  Configuration Management is a field of management that focuses on establishing and maintaining consistency of a system (..) throughout its life  Software configuration management is the task of tracking and controlling changes in the softwareSources:    
  6. 6. A server crashed. Install a new one, people cant work without it!OK, itll be done inabout two days... Why configuration management? Theres a new critical security patch we must deploy on all our servers! Get it out quickly! Right, Ill put the whole team on it.    
  7. 7. Reproducibility Industrialization Automation Why configuration management?    
  8. 8. How do we setup service X? Ask Jim, hes the expert on that.But he left the company... Why configuration management? Huh, this server has been logging errors for a few weeks. Oh? I think Michael changed something on it recently... Hell tell you what it was. Damn, hes on vacation!    
  9. 9. Documentation History Building-up knowledge Why configuration management?   
  10. 10. An intruder just stole our datausing a vulnerability in amodule we dont need... I thought the project specification ensured that we disabled that?Er, it did, but we enabled it tosolve a problem and forgot todisable it afterwards... sorry... Why configuration management?    
  11. 11. Why configuration management? VigilanceAutomatic repairs Alerts    
  12. 12. I dont understand how this server is setup. It doesnt match our best-practices. Oh, thats a legacy server... Why configuration management? Give me details on our current security policy. Well, its a collection of little things, here and there... Ah... Well, OK. Tell me: is it fully applied on all our critical servers? Er...   
  13. 13. Why configuration management? Rationalization Normalization Control   
  14. 14. Reproducibility Industrialization Documentation History Automation Building-up knowledge Configuration management Vigilance RationalizationAutomatic repairs Alerts Normalization Control    
  15. 15. Disaster Recovery An ill-fated tale from the recent past (CASE STUDY)   
  16. 16. Before the disaster... Our companys IT infrastructure Small company: small requirements  Web site, email  Git repository, Redmine... Small company: small budget  All on one hosted server    
  17. 17. Asking for trouble? Just one hosted server! Critical services! No, a ”safe” configuration:  Redundant hardware, 3 disk RAID-5 array  All services automatically installed and setup using Configuration Management  Backups: daily (several off-site locations)  Several VMs to separate services    
  18. 18. A critical failure 2 hard drives fail simultaneously → RAID-5 array is down → Almost all services fail immediately → ”The end of the world as we know it” → Need to rebuild everything NOW    
  19. 19. Recovering Step 1: Panic! Step 2: Get a new server Step 3: Reinstall base OS + virtualization Step 4: Restore VM configuration whoops Step 4: Re-create the VMs manually Step 5: Reinstall each OS in each VM...    
  20. 20. Recovering Step 6: Installation Configuration Management Step 7: Sit back and watch all the services coming back online as if by magic! Step 8: Huh, wheres my data? Step 9: Manually restore backups Step 10: Make a list of missing data...    
  21. 21. Lessons learned1) Hard disks fail reliably2) Restoring virtualization setups: ● Backing up the config files would have helped ● Need CM tools to describe the desired state! (Cfengine Nova does this)3) Configuration Management should tie in to our backup system4) Backups were lacking some files: always test!    
  22. 22. Wishlist and discussion Integrating Configuration Management tools and backup systems is a crucial step for CM to be efficient for disaster recovery  What do others do? Provisioning VMs and their resources (disks, network) should be automated too  Cloud providers are one solution  What about ”plain” virtualization?    
  23. 23. Configuration Management Tools What we chose, and why    
  24. 24. Our choice Back in mid 2009 Needed a configuration management tool Criteria:  Open source  Multi-platform agent (including Windows)  Resilient  Non-disruptive    
  25. 25. Our choice: candidates CFEngine 3 Puppet Chef    
  26. 26. Our choice: candidates CFEngine 3 More on this choice later...    
  27. 27. A bit about CFEngine 3... Sources: across the Internet   
  28. 28.    CFEngine: History Source: f-cfengine-chef-and-puppet
  29. 29. CFEngine 3: Intro Configuration management software Written in C Two versions :  Community (GPL v3)  Nova (closed source)  Community + extra features  Some features released in Community Backed by CFEngine AS – Norway based company founded in 2009    
  30. 30. CFEngine 3: Features Multi platform Multi-agent technology Adapted to Lightweight, non-intrusive heterogeneous environments Autonomous Fault-tolerant Highly scalable Progressive roll-out Large user base and community    
  31. 31. CFEngine 3: Components Cf-agent  Runs on all managed hosts  Applies configuration – this is the heart  Can connect to cf-serverd to get policies / files Cf-serverd  Distributes policies and files  Must be run on policy server(s)  Usually run on all hosts to enable remote runs Cf-monitord  Collects statistics on all nodes    
  32. 32. Memory usage Daemon consumption on managed hosts    
  33. 33. CFEngine 3: Usage examples Large companies Critical systems: Joint Australia Tsunami Warning Centre Personal computers Mobile devices: Nokia N900 Underwater devices: army submarines Small and medium companies... Community    
  34. 34. Feature: Multi-platform Define a configuration for all operating systems  Windows, Linux  Make it ”transparent” (forget about the complexity)  Existing standard library handling the differences between each OS and distribution    
  35. 35. CFEngine 3: Promises Configuration rules are called promises  ”Promise” to be in the desired state  Cfengine agent handles the steps to get there: convergence Promise theory is based on research done in the University of Oslo    
  36. 36. Feature: File editing Only change what you need to  You like your distributions defaults?  You have various different systems already setup and just need to change something? Search for lines and replace/delete/add them Only change one field in a file  /etc/passwd for example    
  37. 37. Feature: Complex tasks Powerful class system to trigger promises  Based on nodes itself  Based on time  Based on whatever you might imagine Complex workflow can be created    
  38. 38. Configuration example Install the LAMP stack bundle agent caller {   vars: "pkg_list" slist => { "httpd", "php5", "mysql" };   packages:     "${pkg_list}"       package_method => generic,       package_method => "addupdate"; }    
  39. 39. Thank you ! RMLL 2011   
  40. 40. CFEngine 3: Features  According to Kuleven comparative study of configuration management systems:  Very mature  Cross platform (*BSD, AIX, HP-UX, Linux, Mac OS X, Solaris, Windows)  Strongly distributed  Based on state description and convergence  Very high scalabily ( > 10000 nodes )  Very small footprintSource: