Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

A tale of Disaster Recovery (Cfengine everyday, practices and tools)


Published on

After a brief presentation of configuration management (CM) basics, we start with an ill-fated tale from the recent past about disaster recovery (also known as a case study, if you must): how our CM saved us, how it didn't, and what could have been done better. This could lead to a discussion about best practices.

We use Cfengine 3, and will introduce the software, overview the main differences with other open source CM tools before explaining why we like this choice. But Cfengine is not all: what enables us to manage our configuration completely are the practices and tools we've built around it.

Published in: Technology
  • Be the first to comment

A tale of Disaster Recovery (Cfengine everyday, practices and tools)

  1. 1. FOSDEM 2011 @Brussels, Belgium A tale of disaster recovery Cfengine everyday, practices and toolsNicolas Charles <>Jonathan Clarke <>    
  2. 2. About the speakers Nicolas Charles Jonathan Clarke Cfengine contributor OpenLDAP commiter Cfengine ”Community Champion” (C3) Scala Developer Sysadmin But we get on pretty well! (mostly...)    
  3. 3. Agenda1) Configuration Management 1012) Our choice of tool3) A tale of disaster recovery4) Introducing Cfengine 35) Why we love Cfengine 3    
  4. 4. A bit aboutConfiguration Management...    
  5. 5. Configuration management What is it ?  Configuration Management is a field of management that focuses on establishing and maintaining consistency of a system (..) throughout its life  Software configuration management is the task of tracking and controlling changes in the softwareSources:    
  6. 6. Configuration management Why is it useful ?  Control changes  Reproduce over time and nodes  Audit and keep history data  Repair automaticaly    
  7. 7. Configuration Management Tools What we chose, and why    
  8. 8. Our choice Back in mid 2009 Needed a configuration management tool Criteria:  Open source  Multi-platform agent (including Windows)  Resilient  Non-disruptive    
  9. 9. Our choice: candidates Cfengine 3 Puppet Chef    
  10. 10. Our choice: candidates Cfengine 3 More on this choice later...    
  11. 11. Disaster Recovery An ill-fated tale from the recent past (CASE STUDY)   
  12. 12. Before the disaster... Our companys IT infrastructure Small company: small requirements  Web site, email  Git repository, Redmine... Small company: small budget  All on one hosted server    
  13. 13. Asking for trouble? Just one hosted server! Critical services! No, a ”safe” configuration:  Redundant hardware, 3 disk RAID-5 array  All services automatically installed and setup using Configuration Management  Backups: daily (several off-site locations)  Several VMs to separate services    
  14. 14. A critical failure 2 hard drives fail simultaneously → RAID-5 array is down → Almost all services fail immediately → ”The end of the world as we know it” → Need to rebuild everything NOW    
  15. 15. Recovering Step 1: Panic! Step 2: Get a new server Step 3: Reinstall base OS + virtualization Step 4: Restore VM configuration... whoops Step 4: Re-create the VMs manually Step 5: Reinstall each OS in each VM...    
  16. 16. Recovering Step 6: Installation Configuration Management Step 7: Sit back and watch all the services coming back online as if by magic! Step 8: Huh, wheres my data? Step 9: Manually restore backups Step 10: Make a list of missing data...    
  17. 17. Lessons learned1) Hard disks fail reliably2) Restoring virtualization setups: ● Backing up the config files would have helped ● Need CM tools to describe the desired state! (Cfengine Nova does this)3) Configuration Management should tie in to our backup system4) Backups were lacking some files: always test!    
  18. 18. Wishlist and discussion Integrating Configuration Management tools and backup systems is a crucial step for CM to be efficient for disaster recovery  What do others do? Provisioning VMs and their resources (disks, network) should be automated too  Cloud providers are one solution  What about ”plain” virtualization?    
  19. 19. A bit about Cfengine 3... Sources: across the Internet   
  20. 20. Cfengine: HistorySource:    
  21. 21. Cfengine 3: Intro Configuration management software Written in C Two versions :  Community (GPL v3)  Nova (closed source) : Community + extra features Backed by Cfengine AS – Norway based company founded in 2009    
  22. 22. Cfengine 3: Features  According to Kuleven comparative study of configuration management systems:  Very mature  Cross platform (*BSD, AIX, HP-UX, Linux, Mac OS X, Solaris, Windows)  Strongly distributed  Based on state description and convergence  Very high scalabily ( > 10000 nodes )  Very small footprintSource:    
  23. 23. Cfengine 3: Components Cf-agent  Runs on all managed hosts  Applies configuration – this is the heart  Can connect to cf-serverd to get policies / files Cf-serverd  Distributes policies and files  Must be run on policy server(s)  Usually run on all hosts to enable remote runs Cf-monitord  Collects statistics on all nodes    
  24. 24. Cfengine 3: Promises Configuration rules are called promises  ”Promise” to be in the desired state  Cfengine agent handles the steps to get there: convergence Promise theory is based on research done in the University of Oslo    
  25. 25. Cfengine 3: Usage examples Large companies (Facebook, AMD, …) Critical systems: Joint Australia Tsunami Warning Centre Personal computers Mobile devices: Nokia N900 Underwater devices: army submarines Small and medium companies...    
  26. 26. Why we love Cfengine 3...Sources: our experience and opinions    
  27. 27. Memory usage Daemon consumption on managed hosts    
  28. 28. Multi-platform Define a configuration for all operating systems  Windows, Linux  Make it ”transparent” (forget about the complexity)  Existing standard library handling the differences between each OS and distribution    
  29. 29. File editing Only change what you need to  You like your distributions defaults?  You have various different systems already setup and just need to change something? Search for lines and replace/delete/add them Only change one field in a file  /etc/passwd for example...    
  30. 30. Complex tasks Powerful class system to trigger promises  Based on nodes itself  Based on time  Based on whatever you might imagine Complex workflow can be created    
  31. 31. Thank you ! FOSDEM 2011Configuration Management roomAnd those brave enough to wake up early