Test driven
Infrastructure
development
Tomas Doran
bobtfish@bobtfish.net
@bobtfish
Puppetconf 2013
Today, I’m going to talk about the promised land!
And by ‘repeatable’, I mean I need to be able to spin up an arbitrary se...
•High availability!
Today, I’m going to talk about the promised land!
And by ‘repeatable’, I mean I need to be able to spi...
•High availability!
•Automated testing of all
infrastructure changes
Today, I’m going to talk about the promised land!
And...
•High availability!
•Automated testing of all
infrastructure changes
•Entirely repeatable application
environments
Today, ...
•High availability!
•Automated testing of all
infrastructure changes
•Entirely repeatable application
environments
•High c...
•High availability!
•Automated testing of all
infrastructure changes
•Entirely repeatable application
environments
•High c...
So who the hell am I?
Dev
Infrastructure automation nut!
Ex-backend web developer, Ex-security, currently fixing puppet at Yelp!
Dev / Ops
State of repeatability and testing in infrastructures is generally shocking.
Leads to systems/operations teams b...
Dev / Ops
•Developer viewpoint
State of repeatability and testing in infrastructures is generally shocking.
Leads to syste...
Dev / Ops
•Developer viewpoint
•Grass IS greener
State of repeatability and testing in infrastructures is generally shocki...
Dev / Ops
•Developer viewpoint
•Grass IS greener
State of repeatability and testing in infrastructures is generally shocki...
Dev / Ops
•Developer viewpoint
•Grass IS greener
•Think of your infra as an
agile software project...
State of repeatabili...
Dev / Ops
•Developer viewpoint
•Grass IS greener
•Think of your infra as an
agile software project...
•What workflow do I w...
The state of the art
Going to talk about how I think the generally accepted way of doing some things is
fundamentally brok...
CM = state machine
Each change puppet makes (or attempts to make) is a state transition. Each circle represents
the configu...
Non deterministic
This is the key observation here - you don’t know which way puppet’s gonna jump :)
In this case - it doe...
Convergent!
Convergence is when each run of puppet takes you nearer to 0 changes, but the next run
makes additional change...
Convergent!
Of course, this doesn’t happen - the first step goes BANG, then mysql gets installed,
creates /etc/mysql.
The s...
err: /Stage[main]//File[/etc/mysql/my.cnf]/
ensure: change from absent to file failed:
Could not set 'file on ensure: No suc...
Purple text of rage!
err: /Stage[main]//File[/etc/mysql/my.cnf]/
ensure: change from absent to file failed:
Could not set '...
Convergent!
(Shamelessly stolen from https://www.usenix.org/legacy/publications/library/proceedings/lisa02/tech/full_paper...
•before
•require
•subscribe
•notify
As I noted, this all happens as you missed a dependency. This is the easy case, where ...
Fixable!
•before
•require
•subscribe
•notify
As I noted, this all happens as you missed a dependency. This is the easy cas...
Fixable!
•before
•require
•subscribe
•notify
What about an
entire
infrastructure?
The $64,000 question is....
A whole stack
Lets start simple, but semi realistic.
Gonna ignore databases.
Gonna ignore monitoring.
Gonna ignore the n[e...
Exported resources
Each layer of systems can publish data to the systems which depend on it. (I.e. webs register,
proxies ...
Exported resources
• Inter machine dependencies
Each layer of systems can publish data to the systems which depend on it. ...
Exported resources
• Inter machine dependencies
• Unidirectional!
Each layer of systems can publish data to the systems wh...
Exported resources
• Inter machine dependencies
• Unidirectional!
• Known graph - webs, proxies, lbs
Each layer of systems...
Exported resources
• Inter machine dependencies
• Unidirectional!
• Known graph - webs, proxies, lbs
• Puppetroll (github....
Exported resources
(Shameless ripoff of http://xkcd.com/1171/ )
Ordering dependent. Hard to test (in isolation). Slooow (h...
Co-dependence
And if we really are talking about entire infrastructures...
Then maybe we need some of these.
Co-dependence
:(
You _know_ that if everything is dynamically configured that you’re gonna have to do
multiple puppet runs ...
The solution - an
external model
Use your software model to generate a set of machines for an environment.
And generate co...
The solution - an
external model
• Represent system as a set of ruby classes
Use your software model to generate a set of ...
The solution - an
external model
• Represent system as a set of ruby classes
• DSL for describing environments
Use your so...
The solution - an
external model
• Represent system as a set of ruby classes
• DSL for describing environments
• Dependenc...
The solution - an
external model
• Represent system as a set of ruby classes
• DSL for describing environments
• Dependenc...
This is a simplified / minimal example jenkins environment - just 4 machines (2 web apps, 2
load balancers)
ENC data!
Our external node classifier generates this for each of the 4 machines, which translates to
puppet code run on th...
Call tree looks something like this: Model all the nodes, allocate all their IPs. Make calls to
KVM servers to provision m...
Automate all the things
Suddenly, I have massive power.
I can write a small script to bring up a whole production like env...
BDD infrastructure
Behavior driven development - given I have a high level model of the systems comprising an
infrastructu...
BDD infrastructure
• Given
For example...
BDD infrastructure
• Given – the Service has finished being
provisioned
BDD infrastructure
• Given – the Service has finished being
provisioned
• And
BDD infrastructure
• Given – the Service has finished being
provisioned
• And – all monitoring related to the service is
pa...
BDD infrastructure
• Given – the Service has finished being
provisioned
• And – all monitoring related to the service is
pa...
BDD infrastructure
• Given – the Service has finished being
provisioned
• And – all monitoring related to the service is
pa...
BDD infrastructure
• Given – the Service has finished being
provisioned
• And – all monitoring related to the service is
pa...
BDD infrastructure
• Given – the Service has finished being
provisioned
• And – all monitoring related to the service is
pa...
BDD infrastructure
• Given – the Service has finished being
provisioned
• And – all monitoring related to the service is
pa...
BDD infrastructure
• Given – the Service has finished being
provisioned
• And – all monitoring related to the service is
pa...
Is this for real?
Is this for real?
•Yes!
Is this for real?
•Yes!
• We actually built this, the core parts are on
github
Is this for real?
•Yes!
• We actually built this, the core parts are on
github
• Deployed real applications to production ...
•High availability!
•Automated testing of all
infrastructure changes
•Entirely repeatable application
environments
•High c...
Questions?
• https://devblog.timgroup.com/2013/06/14/
exported-resources-considered-harmful/
• https://devblog.timgroup.co...
Upcoming SlideShare
Loading in …5
×

Test driven infrastructure development (2 - puppetconf 2013 edition)

1,181 views

Published on

Published in: Technology, Education
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,181
On SlideShare
0
From Embeds
0
Number of Embeds
67
Actions
Shares
0
Downloads
7
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Test driven infrastructure development (2 - puppetconf 2013 edition)

  1. 1. Test driven Infrastructure development Tomas Doran bobtfish@bobtfish.net @bobtfish Puppetconf 2013
  2. 2. Today, I’m going to talk about the promised land! And by ‘repeatable’, I mean I need to be able to spin up an arbitrary set of servers for any environment I want, whenever I want - so _all_ the configuration of all the instances has to be dynamic!
  3. 3. •High availability! Today, I’m going to talk about the promised land! And by ‘repeatable’, I mean I need to be able to spin up an arbitrary set of servers for any environment I want, whenever I want - so _all_ the configuration of all the instances has to be dynamic!
  4. 4. •High availability! •Automated testing of all infrastructure changes Today, I’m going to talk about the promised land! And by ‘repeatable’, I mean I need to be able to spin up an arbitrary set of servers for any environment I want, whenever I want - so _all_ the configuration of all the instances has to be dynamic!
  5. 5. •High availability! •Automated testing of all infrastructure changes •Entirely repeatable application environments Today, I’m going to talk about the promised land! And by ‘repeatable’, I mean I need to be able to spin up an arbitrary set of servers for any environment I want, whenever I want - so _all_ the configuration of all the instances has to be dynamic!
  6. 6. •High availability! •Automated testing of all infrastructure changes •Entirely repeatable application environments •High confidence in changes Today, I’m going to talk about the promised land! And by ‘repeatable’, I mean I need to be able to spin up an arbitrary set of servers for any environment I want, whenever I want - so _all_ the configuration of all the instances has to be dynamic!
  7. 7. •High availability! •Automated testing of all infrastructure changes •Entirely repeatable application environments •High confidence in changes •Continuous integration and deployment for infrastructure Today, I’m going to talk about the promised land! And by ‘repeatable’, I mean I need to be able to spin up an arbitrary set of servers for any environment I want, whenever I want - so _all_ the configuration of all the instances has to be dynamic!
  8. 8. So who the hell am I?
  9. 9. Dev Infrastructure automation nut! Ex-backend web developer, Ex-security, currently fixing puppet at Yelp!
  10. 10. Dev / Ops State of repeatability and testing in infrastructures is generally shocking. Leads to systems/operations teams being adverse to change and conservative - slows the business down! Why isn’t your infrastructure an agile software project?
  11. 11. Dev / Ops •Developer viewpoint State of repeatability and testing in infrastructures is generally shocking. Leads to systems/operations teams being adverse to change and conservative - slows the business down! Why isn’t your infrastructure an agile software project?
  12. 12. Dev / Ops •Developer viewpoint •Grass IS greener State of repeatability and testing in infrastructures is generally shocking. Leads to systems/operations teams being adverse to change and conservative - slows the business down! Why isn’t your infrastructure an agile software project?
  13. 13. Dev / Ops •Developer viewpoint •Grass IS greener State of repeatability and testing in infrastructures is generally shocking. Leads to systems/operations teams being adverse to change and conservative - slows the business down! Why isn’t your infrastructure an agile software project?
  14. 14. Dev / Ops •Developer viewpoint •Grass IS greener •Think of your infra as an agile software project... State of repeatability and testing in infrastructures is generally shocking. Leads to systems/operations teams being adverse to change and conservative - slows the business down! Why isn’t your infrastructure an agile software project?
  15. 15. Dev / Ops •Developer viewpoint •Grass IS greener •Think of your infra as an agile software project... •What workflow do I want? State of repeatability and testing in infrastructures is generally shocking. Leads to systems/operations teams being adverse to change and conservative - slows the business down! Why isn’t your infrastructure an agile software project?
  16. 16. The state of the art Going to talk about how I think the generally accepted way of doing some things is fundamentally broken! But lets start with a simple description of the issues I’m worrying about.
  17. 17. CM = state machine Each change puppet makes (or attempts to make) is a state transition. Each circle represents the configuration state of the server on disc + services running etc..
  18. 18. Non deterministic This is the key observation here - you don’t know which way puppet’s gonna jump :) In this case - it doesn’t matter, as the two operations are orthogonal.
  19. 19. Convergent! Convergence is when each run of puppet takes you nearer to 0 changes, but the next run makes additional changes.. The classic way to screw this up is to miss a dependency in your code.
  20. 20. Convergent! Of course, this doesn’t happen - the first step goes BANG, then mysql gets installed, creates /etc/mysql. The second puppet run _then_ sets the config up..
  21. 21. err: /Stage[main]//File[/etc/mysql/my.cnf]/ ensure: change from absent to file failed: Could not set 'file on ensure: No such file or directory - /etc/mysql/ my.cnf.puppettmp_3706 at /home/tdoran/ test.pp:4 Aaand in your puppet logs, you get.
  22. 22. Purple text of rage! err: /Stage[main]//File[/etc/mysql/my.cnf]/ ensure: change from absent to file failed: Could not set 'file on ensure: No such file or directory - /etc/mysql/ my.cnf.puppettmp_3706 at /home/tdoran/ test.pp:4 THE PURPLE TEXT OF RAGE
  23. 23. Convergent! (Shamelessly stolen from https://www.usenix.org/legacy/publications/library/proceedings/lisa02/tech/full_papers/traugott/traugott.pdf) Aaand your machine is convergent - i.e. it gets towards the desired state in a number of steps..
  24. 24. •before •require •subscribe •notify As I noted, this all happens as you missed a dependency. This is the easy case, where puppet can detect hat and tell you! It’s also entirely possible to be totally silent. It is though totally possible to write your puppet code well enough to need EXACTLY 1 puppet run to fully provision a server!
  25. 25. Fixable! •before •require •subscribe •notify As I noted, this all happens as you missed a dependency. This is the easy case, where puppet can detect hat and tell you! It’s also entirely possible to be totally silent. It is though totally possible to write your puppet code well enough to need EXACTLY 1 puppet run to fully provision a server!
  26. 26. Fixable! •before •require •subscribe •notify What about an entire infrastructure? The $64,000 question is....
  27. 27. A whole stack Lets start simple, but semi realistic. Gonna ignore databases. Gonna ignore monitoring. Gonna ignore the n[eo]twork.
  28. 28. Exported resources Each layer of systems can publish data to the systems which depend on it. (I.e. webs register, proxies find the webs + register themselves, lbs then find the proxy). Given you know the dependencies - you can get consistent runs by ordering them.
  29. 29. Exported resources • Inter machine dependencies Each layer of systems can publish data to the systems which depend on it. (I.e. webs register, proxies find the webs + register themselves, lbs then find the proxy). Given you know the dependencies - you can get consistent runs by ordering them.
  30. 30. Exported resources • Inter machine dependencies • Unidirectional! Each layer of systems can publish data to the systems which depend on it. (I.e. webs register, proxies find the webs + register themselves, lbs then find the proxy). Given you know the dependencies - you can get consistent runs by ordering them.
  31. 31. Exported resources • Inter machine dependencies • Unidirectional! • Known graph - webs, proxies, lbs Each layer of systems can publish data to the systems which depend on it. (I.e. webs register, proxies find the webs + register themselves, lbs then find the proxy). Given you know the dependencies - you can get consistent runs by ordering them.
  32. 32. Exported resources • Inter machine dependencies • Unidirectional! • Known graph - webs, proxies, lbs • Puppetroll (github.com/youdevise/ puppetroll) Each layer of systems can publish data to the systems which depend on it. (I.e. webs register, proxies find the webs + register themselves, lbs then find the proxy). Given you know the dependencies - you can get consistent runs by ordering them.
  33. 33. Exported resources (Shameless ripoff of http://xkcd.com/1171/ ) Ordering dependent. Hard to test (in isolation). Slooow (have to run in order)
  34. 34. Co-dependence And if we really are talking about entire infrastructures... Then maybe we need some of these.
  35. 35. Co-dependence :( You _know_ that if everything is dynamically configured that you’re gonna have to do multiple puppet runs per server... Do we _really_ want to keep running puppet till it stops changing things?
  36. 36. The solution - an external model Use your software model to generate a set of machines for an environment. And generate config for puppet to apply to each system to configure it Add super secret special sauce (lots and lots of mcollective!)
  37. 37. The solution - an external model • Represent system as a set of ruby classes Use your software model to generate a set of machines for an environment. And generate config for puppet to apply to each system to configure it Add super secret special sauce (lots and lots of mcollective!)
  38. 38. The solution - an external model • Represent system as a set of ruby classes • DSL for describing environments Use your software model to generate a set of machines for an environment. And generate config for puppet to apply to each system to configure it Add super secret special sauce (lots and lots of mcollective!)
  39. 39. The solution - an external model • Represent system as a set of ruby classes • DSL for describing environments • Dependencies Use your software model to generate a set of machines for an environment. And generate config for puppet to apply to each system to configure it Add super secret special sauce (lots and lots of mcollective!)
  40. 40. The solution - an external model • Represent system as a set of ruby classes • DSL for describing environments • Dependencies • Domain knowledge Use your software model to generate a set of machines for an environment. And generate config for puppet to apply to each system to configure it Add super secret special sauce (lots and lots of mcollective!)
  41. 41. This is a simplified / minimal example jenkins environment - just 4 machines (2 web apps, 2 load balancers)
  42. 42. ENC data! Our external node classifier generates this for each of the 4 machines, which translates to puppet code run on the server. Note how every server gets all of it’s dependencies There’s a companion data structure sent to the agent which actually provisons the virtual
  43. 43. Call tree looks something like this: Model all the nodes, allocate all their IPs. Make calls to KVM servers to provision machines.. VMs start, boot, run puppet, send cert to puppetmaster, --waitforcert. Central provisioning asks ‘do we have a cert’, waits - signs it. Looks up DNS and ENC to
  44. 44. Automate all the things Suddenly, I have massive power. I can write a small script to bring up a whole production like environment, run tests against it, tear it down. I can do this against the latest puppet changes, and only promote them to run on production servers when the tests pass!
  45. 45. BDD infrastructure Behavior driven development - given I have a high level model of the systems comprising an infrastructure, I can then write equally high level tests to assert the behavior of that infrastructure
  46. 46. BDD infrastructure • Given For example...
  47. 47. BDD infrastructure • Given – the Service has finished being provisioned
  48. 48. BDD infrastructure • Given – the Service has finished being provisioned • And
  49. 49. BDD infrastructure • Given – the Service has finished being provisioned • And – all monitoring related to the service is passing
  50. 50. BDD infrastructure • Given – the Service has finished being provisioned • And – all monitoring related to the service is passing • When
  51. 51. BDD infrastructure • Given – the Service has finished being provisioned • And – all monitoring related to the service is passing • When – when we destroy a single member of the service
  52. 52. BDD infrastructure • Given – the Service has finished being provisioned • And – all monitoring related to the service is passing • When – when we destroy a single member of the service • Then
  53. 53. BDD infrastructure • Given – the Service has finished being provisioned • And – all monitoring related to the service is passing • When – when we destroy a single member of the service • Then – we expect all monitoring at the service level to be passing
  54. 54. BDD infrastructure • Given – the Service has finished being provisioned • And – all monitoring related to the service is passing • When – when we destroy a single member of the service • Then – we expect all monitoring at the service level to be passing • And
  55. 55. BDD infrastructure • Given – the Service has finished being provisioned • And – all monitoring related to the service is passing • When – when we destroy a single member of the service • Then – we expect all monitoring at the service level to be passing • And – we expect all monitoring at the single machine level to be failing Yes, I am suggesting regression testing your load balancer setup...
  56. 56. Is this for real?
  57. 57. Is this for real? •Yes!
  58. 58. Is this for real? •Yes! • We actually built this, the core parts are on github
  59. 59. Is this for real? •Yes! • We actually built this, the core parts are on github • Deployed real applications to production at TIM Group
  60. 60. •High availability! •Automated testing of all infrastructure changes •Entirely repeatable application environments •High confidence in changes •Continuous integration and deployment for infrastructure This is my promised land!
  61. 61. Questions? • https://devblog.timgroup.com/2013/06/14/ exported-resources-considered-harmful/ • https://devblog.timgroup.com/2013/06/26/ scenario-testing-infrastructures/ • https://github.com/youdevise/provisioning- tools • https://github.com/youdevise/stackbuilder

×