Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

How Percolate uses CFEngine to Manage AWS Stateless Infrastructure


Published on

Published in: Technology

How Percolate uses CFEngine to Manage AWS Stateless Infrastructure

  1. 1. CFEngine on AWS: a Stateless Infrastructure Laurent Raufaste @_LR_
  2. 2. Hello Ops
  3. 3. I work at Percolate
  4. 4. We are a tech company Percolate helps brands create content at a social scale
  5. 5. We are a SaaS We live in the cloud
  6. 6. We use a bunch of servers • 5% serving data • 10% doing chores • 85% working on data
  7. 7. Those 85% do • ingest data • digest data • close to RT
  8. 8. It’s expensive We need to act smart to keep the business sustainable
  9. 9. CFEngine
  10. 10. #WTF is #CFE ? A tool to gently "dictate" what your infrastructure should be #dontgetmadmarkburgess
  11. 11. Some history • 1993 CFEngine • 2003 Puppet • 2006 EC2 • 2008 CFEngine 3 • 2009 Chef
  12. 12. A simple example Our Redis policy
  13. 13. Chef, Puppet, Ansible Why CFEngine ?
  14. 14. Convergence Keep promises if it can. No need to start from a known state.
  15. 15. Portability Same policies on Solaris, GNU/Linux, *BSD, AIX, HP-UX, Windows, OSX, …
  16. 16. It’s old CFEngine v1 released in 1993, as a “teddy bear”, it’s reassuring: it’s been used for this long without any big problem, cf. OpenBSD’s “2 holes since 1996”
  17. 17. Let’s focus Here come the deal breakers
  18. 18. Dedicated Language The CFEngine DSL has been tailored for this purpose, no legacy, based on the promise theory
  19. 19. Documented Infrastructure Solves the outdated and useless doc problem
  20. 20. Documented Infrastructure • grep the whole cluster • what's in there is what's live • no need to SSH • knowledge is shared • history is kept • company is more valuable
  21. 21. Scalability We want to build for success, not failure We hope what we build will succeed
  22. 22. Scalability • Decentralized by nature • Can scale both ways • Largest cluster is X00,000s • m1.small on AWS
  23. 23. Reusability It let us build things that last and can be reused
  24. 24. Reusability • DRY • Build service/servers blocks • Reuse them on live, staging, dev • Change them once for all
  25. 25. Footprint It’s tailored for the job
  26. 26. Footprint • Package to install is < 3MB • Largest binary is 320kB (96% C, 3% C++) • The server is just letting clients download policies • Clients are trying to apply the policies locally
  27. 27. It’s GPL It’s free (libre) and will ever be. It’s in Debian so it passed the DFSG test: Fastest way to check.
  28. 28. Open & active community You can open bug reports and submit Pull Requests on Github, a must nowadays
  29. 29. Here’s what CFEngine allows us to do
  30. 30. Pwn our infrastructure We don’t let it pwn us
  31. 31. Normalized Infrastructure Minimize redundancy and dependency
  32. 32. Being unpredictable, it’s fun As the Netflix Chaos Monkey, I randomly kill instances
  33. 33. Maintain costs 2011-2013: Employees x10, Clients x20, Servers x2, Infrastructure cost x1.2
  34. 34. Keep your infrastructure homogeneous Don’t let exceptions waste your time
  35. 35. Not scared of changes Ops should not slow things down
  36. 36. Ops at Percolate
  37. 37. Ops are not DevOps Ops are sysadmins that do their job well: Build+Automate+Maintain+Monitor+Document
  38. 38. Devs are not DevOps Ask your devs for the commands make them a policy
  39. 39. Same infrastructure on all environments Live policies are used to build staging, smaller & fewer instances, and it’s always up to date
  40. 40. Same infrastructure on all environments It takes a few mins to get a small replica of live on your workstation, and it’s always up to date
  41. 41. GitHub Flow applied to Ops • Develop in a branch • Test (Vagrant) • Review (Pull Request) • Merge • Deploy
  42. 42. Ops use IaaS+Metal to provide a PaaS to devs Be the Heroku or the GAE of your team
  43. 43. It does not solve it all Pieces we added around CFEngine
  44. 44. Bootstrapping CFEngine is missing the bootstrap process, is it really its job ? We did it in-house, in Python/Bash
  45. 45. Bootstrapping • Request an instance • Name it • Install CFEngine • CFEngine handles the rest
  46. 46. Bootstrapping We define all our servers in a INI file
  47. 47. Bootstrapping Everything can be overridden per instance type
  48. 48. Bootstrapping Easy to define, easy to launch
  49. 49. We don’t use CFEngine for complex stuff 3 ordered dependencies max, e.g. “Hell” or deploy a Python app with on-demand pip requirements
  50. 50. Naming convention to leverage CFEngine classes • [id.][subrole.]role.environment • • • •
  51. 51. Naming convention to leverage CFEngine classes • Our DNS is our inventory • We leverage it with a coordination service (AWS Tags (does not scale), Zookeeper, …)
  52. 52. Server Structure • Application layer • CFE: Specialized layer (Role) • CFE: Basic layer (Environment) • Pristine Ubuntu • EC2
  53. 53. CFEngine does not take care of it all It takes care of all the basics
  54. 54. CFEngine does not take care of it all It makes sure the complex pieces are there and operational
  55. 55. We started with the simple and obvious syslog, smtp, ... you don’t want to fail big
  56. 56. We finished with the critical When we reached the big stuff, it was easy, and we had all the bricks to reuse
  57. 57. Achievements
  58. 58. Recap of previous benefits • Documentation • Scalability • Reusability • Easy and fast to change
  59. 59. But our huge win is ...
  60. 60. Our infrastructure has no state What’s the big deal ?
  61. 61. Our infrastructure has no state • Policies in git • App code in git • Data in datastores • No backup: Images are cache
  62. 62. No instance backup at all ? 2 exceptions: S3 for cryptic generated config files (Jenkins) EBS for large non-vital changing data (RabbitMQ)
  63. 63. We are independent No state is left on AWS (No AMI), we migrate away For better prices, stability, features, mood
  64. 64. We know and hear everything But tell everyone to shut up (email). When something happens, you'll know. Your goal is silence: 0 email.
  65. 65. We don’t push to deploy It does not scale. We update the live version and every server updates itself. You can do this if your infrastructure is limpid, CFEnginized.
  66. 66. We are resilient Anything can go down, it will go up and rebuild itself automatically - It happens nightly.
  67. 67. We can change our shape Upgrading a server takes 2 commands: 1. Launch a beefier instance with the same name 2. Kill the weak one
  68. 68. We use spot instances, it’s cheap! We can launch and kill any server anytime. It happens while we sleep.
  69. 69. For the smaller instance types
  70. 70. We are almost there Some free tips
  71. 71. Watch Mark’s videos It’s pretty dense, e.g. “The Promise of System Configuration” enlightened me
  72. 72. Buy Diego’s book Don’t bother anything else, it will give you the “I understand” feeling we all love
  73. 73. Work with CFEngine at Percolate We are hiring JS, Mobile, Python, Data, Ops
  74. 74. Thank you Laurent Raufaste @_LR_