CFEngine on AWS:
a Stateless Infrastructure

Laurent Raufaste @_LR_
Hello Ops
I work at Percolate
We are a tech company

Percolate helps brands create
content at a social scale
We are a SaaS

We live in the cloud
We use a bunch of servers
• 5% serving data
• 10% doing chores
• 85% working on data
Those 85% do
• ingest data
• digest data
• close to RT
It’s expensive

We need to act smart
to keep the business
sustainable
CFEngine
#WTF is #CFE ?

A tool to gently "dictate"
what your infrastructure
should be
#dontgetmadmarkburgess
Some history
• 1993 CFEngine
• 2003 Puppet
• 2006 EC2
• 2008 CFEngine 3
• 2009 Chef
A simple example

Our Redis policy
Chef, Puppet, Ansible
Why CFEngine ?
Convergence
Keep promises if it can.
No need to start from a known
state.
Portability
Same policies on Solaris,
GNU/Linux, *BSD, AIX,
HP-UX, Windows, OSX, …
It’s old
CFEngine v1
released in 1993,
as a “teddy
bear”, it’s
reassuring: it’s
been used for
this long without
any big pr...
Let’s focus

Here come the
deal breakers
Dedicated Language

The CFEngine DSL has been tailored for this
purpose, no legacy, based on the promise theory
Documented Infrastructure

Solves the outdated and useless doc problem
Documented Infrastructure
• grep the whole cluster
• what's in there is what's live
• no need to SSH
• knowledge is shared...
Scalability

We want to build for success, not failure
We hope what we build will succeed
Scalability
• Decentralized by nature
• Can scale both ways
• Largest cluster is X00,000s
• m1.small on AWS
Reusability

It let us build things that last and can be reused
Reusability
• DRY
• Build service/servers blocks
• Reuse them on live, staging, dev
• Change them once for all
Footprint

It’s tailored for the job
Footprint
• Package to install is < 3MB
• Largest binary is 320kB
(96% C, 3% C++)
• The server is just letting
clients dow...
It’s GPL

It’s free (libre) and will ever be. It’s in Debian so it
passed the DFSG test: Fastest way to check.
Open & active community

You can open bug reports and submit Pull
Requests on Github, a must nowadays
Here’s what CFEngine
allows us to do
Pwn our infrastructure

We don’t let it pwn us
Normalized Infrastructure

Minimize redundancy and dependency
Being unpredictable, it’s fun

As the Netflix Chaos Monkey,
I randomly kill instances
Maintain costs

2011-2013: Employees x10, Clients x20, Servers x2,
Infrastructure cost x1.2
Keep your infrastructure
homogeneous

Don’t let exceptions waste your time
Not scared of changes

Ops should not slow things down
Ops at Percolate
Ops are not DevOps

Ops are sysadmins that do their job well:
Build+Automate+Maintain+Monitor+Document
Devs are not DevOps

Ask your devs for the commands
make them a policy
Same infrastructure
on all environments
Live policies are used to
build staging, smaller &
fewer instances, and it’s
alway...
Same infrastructure
on all environments
It takes a few mins to get
a small replica of live on
your workstation, and
it’s a...
GitHub Flow applied to Ops
• Develop in a branch
• Test (Vagrant)
• Review (Pull Request)
• Merge
• Deploy
Ops use IaaS+Metal to
provide a PaaS to devs
Be the Heroku or the
GAE of your team
It does not solve it all
Pieces we added around
CFEngine
Bootstrapping

CFEngine is missing the bootstrap process, is it
really its job ? We did it in-house, in Python/Bash
Bootstrapping
• Request an instance
• Name it
• Install CFEngine
• CFEngine handles the rest
Bootstrapping

We define all our servers in a INI file
Bootstrapping

Everything can be overridden per instance type
Bootstrapping

Easy to define, easy to launch
We don’t use CFEngine for
complex stuff

3 ordered dependencies max, e.g. “Hell” or deploy a
Python app with on-demand pip...
Naming convention to leverage
CFEngine classes
• [id.][subrole.]role.environment
• smtp.live.com
• i-1ab345.worker.live.co...
Naming convention to leverage
CFEngine classes
• Our DNS is our inventory
• We leverage it with a
coordination service (AW...
Server Structure
• Application layer
• CFE: Specialized layer (Role)
• CFE: Basic layer (Environment)
• Pristine Ubuntu
• ...
CFEngine does not
take care of it all
It takes care of all
the basics
CFEngine does not
take care of it all
It makes sure the
complex pieces are
there and operational
We started with the
simple and obvious

syslog, smtp, ... you
don’t want to fail big
We finished with the critical

When we reached the big stuff, it was easy,
and we had all the bricks to reuse
Achievements
Recap of previous benefits
• Documentation
• Scalability
• Reusability
• Easy and fast to change
But our huge win is ...
Our
infrastructure
has no state

What’s the big deal ?
Our infrastructure has no state
• Policies in git
• App code in git
• Data in datastores
• No backup: Images are cache
No instance backup at all ?

2 exceptions:
S3 for cryptic generated config files (Jenkins)
EBS for large non-vital changin...
We are independent

No state is left on AWS (No AMI), we migrate away
For better prices, stability, features, mood
We know and hear everything

But tell everyone to shut up (email). When
something happens, you'll know. Your goal is
silen...
We don’t push to deploy

It does not scale. We update the live version and
every server updates itself. You can do this if...
We are resilient

Anything can go down, it will go up and rebuild
itself automatically - It happens nightly.
We can change our shape

Upgrading a server takes 2 commands:
1. Launch a beefier instance with the same name
2. Kill the ...
We use spot instances, it’s
cheap!

We can launch and kill any server anytime. It
happens while we sleep.
For the smaller instance types
We are almost there
Some free tips
Watch Mark’s videos

It’s pretty dense, e.g. “The Promise of System
Configuration” enlightened me
Buy Diego’s book

Don’t bother anything else, it will give you the “I
understand” feeling we all love
Work with CFEngine at Percolate

We are hiring JS, Mobile, Python, Data, Ops
Thank you

Laurent Raufaste @_LR_
Upcoming SlideShare
Loading in...5
×

How Percolate uses CFEngine to Manage AWS Stateless Infrastructure

3,905

Published on

Published in: Technology
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,905
On Slideshare
0
From Embeds
0
Number of Embeds
10
Actions
Shares
0
Downloads
0
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

How Percolate uses CFEngine to Manage AWS Stateless Infrastructure

  1. 1. CFEngine on AWS: a Stateless Infrastructure Laurent Raufaste @_LR_
  2. 2. Hello Ops
  3. 3. I work at Percolate
  4. 4. We are a tech company Percolate helps brands create content at a social scale
  5. 5. We are a SaaS We live in the cloud
  6. 6. We use a bunch of servers • 5% serving data • 10% doing chores • 85% working on data
  7. 7. Those 85% do • ingest data • digest data • close to RT
  8. 8. It’s expensive We need to act smart to keep the business sustainable
  9. 9. CFEngine
  10. 10. #WTF is #CFE ? A tool to gently "dictate" what your infrastructure should be #dontgetmadmarkburgess
  11. 11. Some history • 1993 CFEngine • 2003 Puppet • 2006 EC2 • 2008 CFEngine 3 • 2009 Chef
  12. 12. A simple example Our Redis policy
  13. 13. Chef, Puppet, Ansible Why CFEngine ?
  14. 14. Convergence Keep promises if it can. No need to start from a known state.
  15. 15. Portability Same policies on Solaris, GNU/Linux, *BSD, AIX, HP-UX, Windows, OSX, …
  16. 16. It’s old CFEngine v1 released in 1993, as a “teddy bear”, it’s reassuring: it’s been used for this long without any big problem, cf. OpenBSD’s “2 holes since 1996”
  17. 17. Let’s focus Here come the deal breakers
  18. 18. Dedicated Language The CFEngine DSL has been tailored for this purpose, no legacy, based on the promise theory
  19. 19. Documented Infrastructure Solves the outdated and useless doc problem
  20. 20. Documented Infrastructure • grep the whole cluster • what's in there is what's live • no need to SSH • knowledge is shared • history is kept • company is more valuable
  21. 21. Scalability We want to build for success, not failure We hope what we build will succeed
  22. 22. Scalability • Decentralized by nature • Can scale both ways • Largest cluster is X00,000s • m1.small on AWS
  23. 23. Reusability It let us build things that last and can be reused
  24. 24. Reusability • DRY • Build service/servers blocks • Reuse them on live, staging, dev • Change them once for all
  25. 25. Footprint It’s tailored for the job
  26. 26. Footprint • Package to install is < 3MB • Largest binary is 320kB (96% C, 3% C++) • The server is just letting clients download policies • Clients are trying to apply the policies locally
  27. 27. It’s GPL It’s free (libre) and will ever be. It’s in Debian so it passed the DFSG test: Fastest way to check.
  28. 28. Open & active community You can open bug reports and submit Pull Requests on Github, a must nowadays
  29. 29. Here’s what CFEngine allows us to do
  30. 30. Pwn our infrastructure We don’t let it pwn us
  31. 31. Normalized Infrastructure Minimize redundancy and dependency
  32. 32. Being unpredictable, it’s fun As the Netflix Chaos Monkey, I randomly kill instances
  33. 33. Maintain costs 2011-2013: Employees x10, Clients x20, Servers x2, Infrastructure cost x1.2
  34. 34. Keep your infrastructure homogeneous Don’t let exceptions waste your time
  35. 35. Not scared of changes Ops should not slow things down
  36. 36. Ops at Percolate
  37. 37. Ops are not DevOps Ops are sysadmins that do their job well: Build+Automate+Maintain+Monitor+Document
  38. 38. Devs are not DevOps Ask your devs for the commands make them a policy
  39. 39. Same infrastructure on all environments Live policies are used to build staging, smaller & fewer instances, and it’s always up to date
  40. 40. Same infrastructure on all environments It takes a few mins to get a small replica of live on your workstation, and it’s always up to date
  41. 41. GitHub Flow applied to Ops • Develop in a branch • Test (Vagrant) • Review (Pull Request) • Merge • Deploy
  42. 42. Ops use IaaS+Metal to provide a PaaS to devs Be the Heroku or the GAE of your team
  43. 43. It does not solve it all Pieces we added around CFEngine
  44. 44. Bootstrapping CFEngine is missing the bootstrap process, is it really its job ? We did it in-house, in Python/Bash
  45. 45. Bootstrapping • Request an instance • Name it • Install CFEngine • CFEngine handles the rest
  46. 46. Bootstrapping We define all our servers in a INI file
  47. 47. Bootstrapping Everything can be overridden per instance type
  48. 48. Bootstrapping Easy to define, easy to launch
  49. 49. We don’t use CFEngine for complex stuff 3 ordered dependencies max, e.g. “Hell” or deploy a Python app with on-demand pip requirements
  50. 50. Naming convention to leverage CFEngine classes • [id.][subrole.]role.environment • smtp.live.com • i-1ab345.worker.live.com • i-23f432.api.staging.com • lb.api.staging.com
  51. 51. Naming convention to leverage CFEngine classes • Our DNS is our inventory • We leverage it with a coordination service (AWS Tags (does not scale), Zookeeper, …)
  52. 52. Server Structure • Application layer • CFE: Specialized layer (Role) • CFE: Basic layer (Environment) • Pristine Ubuntu • EC2
  53. 53. CFEngine does not take care of it all It takes care of all the basics
  54. 54. CFEngine does not take care of it all It makes sure the complex pieces are there and operational
  55. 55. We started with the simple and obvious syslog, smtp, ... you don’t want to fail big
  56. 56. We finished with the critical When we reached the big stuff, it was easy, and we had all the bricks to reuse
  57. 57. Achievements
  58. 58. Recap of previous benefits • Documentation • Scalability • Reusability • Easy and fast to change
  59. 59. But our huge win is ...
  60. 60. Our infrastructure has no state What’s the big deal ?
  61. 61. Our infrastructure has no state • Policies in git • App code in git • Data in datastores • No backup: Images are cache
  62. 62. No instance backup at all ? 2 exceptions: S3 for cryptic generated config files (Jenkins) EBS for large non-vital changing data (RabbitMQ)
  63. 63. We are independent No state is left on AWS (No AMI), we migrate away For better prices, stability, features, mood
  64. 64. We know and hear everything But tell everyone to shut up (email). When something happens, you'll know. Your goal is silence: 0 email.
  65. 65. We don’t push to deploy It does not scale. We update the live version and every server updates itself. You can do this if your infrastructure is limpid, CFEnginized.
  66. 66. We are resilient Anything can go down, it will go up and rebuild itself automatically - It happens nightly.
  67. 67. We can change our shape Upgrading a server takes 2 commands: 1. Launch a beefier instance with the same name 2. Kill the weak one
  68. 68. We use spot instances, it’s cheap! We can launch and kill any server anytime. It happens while we sleep.
  69. 69. For the smaller instance types
  70. 70. We are almost there Some free tips
  71. 71. Watch Mark’s videos It’s pretty dense, e.g. “The Promise of System Configuration” enlightened me
  72. 72. Buy Diego’s book Don’t bother anything else, it will give you the “I understand” feeling we all love
  73. 73. Work with CFEngine at Percolate We are hiring JS, Mobile, Python, Data, Ops
  74. 74. Thank you Laurent Raufaste @_LR_

×