Configuration management 101 - A tale of disaster recovery using CFEngine 3
RMLL 2011 @Strasbourg, France A tale of disaster recovery CFEngine everyday, practices and toolsNicolas Charles <firstname.lastname@example.org>Jonathan Clarke <email@example.com>
About the speakers Nicolas Charles Jonathan Clarke CFEngine contributor CFEngine contributor CFEngine ”Community Contributor to various Champion” (C3) LDAP FLOSS projects Scala Developer Sysadmin But we get on pretty well! (mostly...)
Agenda1) Configuration Management 1012) A tale of disaster recovery3) Our choice of tool4) About CFEngine 3
Configuration management What is it? Configuration Management is a field of management that focuses on establishing and maintaining consistency of a system (..) throughout its life Software configuration management is the task of tracking and controlling changes in the softwareSources:http://en.wikipedia.org/wiki/Configuration_managementhttp://en.wikipedia.org/wiki/Software_configuration_management
A server crashed. Install a new one, people cant work without it!OK, itll be done inabout two days... Why configuration management? Theres a new critical security patch we must deploy on all our servers! Get it out quickly! Right, Ill put the whole team on it.
How do we setup service X? Ask Jim, hes the expert on that.But he left the company... Why configuration management? Huh, this server has been logging errors for a few weeks. Oh? I think Michael changed something on it recently... Hell tell you what it was. Damn, hes on vacation!
Documentation History Building-up knowledge Why configuration management?
An intruder just stole our datausing a vulnerability in amodule we dont need... I thought the project specification ensured that we disabled that?Er, it did, but we enabled it tosolve a problem and forgot todisable it afterwards... sorry... Why configuration management?
I dont understand how this server is setup. It doesnt match our best-practices. Oh, thats a legacy server... Why configuration management? Give me details on our current security policy. Well, its a collection of little things, here and there... Ah... Well, OK. Tell me: is it fully applied on all our critical servers? Er...
Why configuration management? Rationalization Normalization Control
Reproducibility Industrialization Documentation History Automation Building-up knowledge Configuration management Vigilance RationalizationAutomatic repairs Alerts Normalization Control
Disaster Recovery An ill-fated tale from the recent past (CASE STUDY)
Before the disaster... Our companys IT infrastructure Small company: small requirements Web site, email Git repository, Redmine... Small company: small budget All on one hosted server
Asking for trouble? Just one hosted server! Critical services! No, a ”safe” configuration: Redundant hardware, 3 disk RAID-5 array All services automatically installed and setup using Configuration Management Backups: daily (several off-site locations) Several VMs to separate services
A critical failure 2 hard drives fail simultaneously → RAID-5 array is down → Almost all services fail immediately → ”The end of the world as we know it” → Need to rebuild everything NOW
Recovering Step 1: Panic! Step 2: Get a new server Step 3: Reinstall base OS + virtualization Step 4: Restore VM configuration whoops Step 4: Re-create the VMs manually Step 5: Reinstall each OS in each VM...
Recovering Step 6: Installation Configuration Management Step 7: Sit back and watch all the services coming back online as if by magic! Step 8: Huh, wheres my data? Step 9: Manually restore backups Step 10: Make a list of missing data...
Lessons learned1) Hard disks fail reliably2) Restoring virtualization setups: ● Backing up the config files would have helped ● Need CM tools to describe the desired state! (Cfengine Nova does this)3) Configuration Management should tie in to our backup system4) Backups were lacking some files: always test!
Wishlist and discussion Integrating Configuration Management tools and backup systems is a crucial step for CM to be efficient for disaster recovery What do others do? Provisioning VMs and their resources (disks, network) should be automated too Cloud providers are one solution What about ”plain” virtualization?
Configuration Management Tools What we chose, and why
Our choice Back in mid 2009 Needed a configuration management tool Criteria: Open source Multi-platform agent (including Windows) Resilient Non-disruptive
Our choice: candidates CFEngine 3 More on this choice later...
A bit about CFEngine 3... Sources: across the Internet
CFEngine: History Source: http://verticalsysadmin.com/blog/uncategorized/relative-origins-o f-cfengine-chef-and-puppet
CFEngine 3: Intro Configuration management software Written in C Two versions : Community (GPL v3) Nova (closed source) Community + extra features Some features released in Community Backed by CFEngine AS – Norway based company founded in 2009
CFEngine 3: Features Multi platform Multi-agent technology Adapted to Lightweight, non-intrusive heterogeneous environments Autonomous Fault-tolerant Highly scalable Progressive roll-out Large user base and community
CFEngine 3: Components Cf-agent Runs on all managed hosts Applies configuration – this is the heart Can connect to cf-serverd to get policies / files Cf-serverd Distributes policies and files Must be run on policy server(s) Usually run on all hosts to enable remote runs Cf-monitord Collects statistics on all nodes
Memory usage Daemon consumption on managed hosts
CFEngine 3: Usage examples Large companies Critical systems: Joint Australia Tsunami Warning Centre Personal computers Mobile devices: Nokia N900 Underwater devices: army submarines Small and medium companies... Community
Feature: Multi-platform Define a configuration for all operating systems Windows, Linux Make it ”transparent” (forget about the complexity) Existing standard library handling the differences between each OS and distribution
CFEngine 3: Promises Configuration rules are called promises ”Promise” to be in the desired state Cfengine agent handles the steps to get there: convergence Promise theory is based on research done in the University of Oslo
Feature: File editing Only change what you need to You like your distributions defaults? You have various different systems already setup and just need to change something? Search for lines and replace/delete/add them Only change one field in a file /etc/passwd for example
Feature: Complex tasks Powerful class system to trigger promises Based on nodes itself Based on time Based on whatever you might imagine Complex workflow can be created
CFEngine 3: Features According to Kuleven comparative study of configuration management systems: Very mature Cross platform (*BSD, AIX, HP-UX, Linux, Mac OS X, Solaris, Windows) Strongly distributed Based on state description and convergence Very high scalabily ( > 10000 nodes ) Very small footprintSource: http://distrinet.cs.kuleuven.be/software/sysconfigtools/overview