Choreography? You Don't Need Choreography. What You Want is Orchestration

975 views

Published on

Orchestration-track talk at CfgMgmtCamp 2016

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
975
On SlideShare
0
From Embeds
0
Number of Embeds
443
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Tell them who you are
    Thanks for coming
    Talk origins: Jez
  • What exactly *is* orchestration? Who wants to take a crack at defining that?

    4.IX Orchestra Sinfonica Nazionale della Rai
    Creative Commons by MITO SettembreMusica
  • But since I’m in the orchestration track I’d better try to define it so that I actually have a talk, right?

    Here is the definition I'll be using for the rest of the talk.

    And then I’m still going to tell you how and why that breaks down.
  • IBM Cloud Orchestrator
    HP Operations Orchestration
    VMWare vRealize Orchestrator
  • But since I’m in the orchestration track I’d better try to define it so that I actually have a talk, right?

    Here is the definition I'll be using for the rest of the talk.

    And then I’m still going to tell you how and why that breaks down.
  • Orchestration in practice:

    Deploy or change a database first
    Load it with data
    Deploy some middleware
    Deploy some frontends

    There are really only two times in which orchestration comes into play:
    Initial setup
    Redeployment (change the stack)
  • Existing tools for orchestration as defined:
    Ansible
    Terraform

    Ok yeah and this is probably the last time you’ll see some Ansible code on a slide from Chef 
  • Where do you orchestrate from?

    A system admin’s laptop
    An orchestration “node” (CLI or server) – example Hashicorp Atlas
  • Great, ok, so everyone clear on how I’m defining orchestration?

    What are some problems with the approaches I just mentioned?
  • Problem #1: Resilience

    Treating machines all connected via an unreliable network as an atomic unit to which updates must be applied in full, or not at all
    This *used* to work when you had a small fleet and/or your network was mostly reliable (e.g. on a LAN) - not so good in a cloud
  • Problem #2: Failure is hard to recover from

    Orchestration operations are also often not convergent or idempotent, particularly because you’re specifying operations and not state
    When an orchestration operation fails, recovery processes are often manual
    What parts do I roll back?
    What parts can I roll forward if so?
    Need to understand the whole map in order to know what's impacted, and manually trace the digraph to figure out what to do
    It's not enough for you to understand direct deps, you also need to understand and analyze transitive deps. This makes doing orchestration at any reasonable scale a daunting proposition.
  • Problem #3: Orchestrator is a SPOF

    What happens when I lose my laptop?
    Often these orchestration operations depend on stored state (like Terraform’s tfstate file) – if I lose that, I’m kinda screwed
  • Problem #4: Orchestration doesn’t really scale

    It’s great to be able to do orchestration across a small number of machines, but what about a giant cluster? I’d be waiting forever for individual SSH operations
  • Let’s do the wave – everyone know what that is?

    Let’s do it like this first – when I point at the people in your “column” you’re going to raise your arms
    Ok now you’re going to look at the person to the left of you and when they raise their arms you’re going to do that as well

  • When first thinking about this topic and why I dislike orchestration so much I realized that it is because orchestration is in conflict with ”promise theory”
    Which is what configuration management is based on!
    Orchestration and config management are actually orthogonal
  • To illustrate obligations versus promises, I’m going to use my cat
  • Imagine you are going on a holiday, like I’m coming here to speak to you at CM camp, and I got to my neighbor: “You will feed my cat”
    And then I leave
    This is me imposing desired behavior by an act of force onto my neighbor. What are the chances that my cat will be fed?


    Now imagine I go to my neighbor and convince him to give me a commitment that he will feed my cat while I am here in Belgium
    This is getting a *promise” from that neighbor – the neighbor is now acting of his free will, but
  • Now imagine I go to my neighbor and convince him to give me a commitment that he will feed my cat while I am here in Belgium
    This is getting a *promise” from that neighbor – the neighbor is now acting of his free will, but I'm more likely to get my cat fed.
  • This is true in life, too. While we think of laws as being obligations, they are actually individual commitments to abide by it.
    A law that can’t secure an ongoing commitment from a broad population to follow it (societal norms) will eventually fail, e.g. marijuana laws in the United States.
  • Burgess and Bergstra go on to state that promise-based systems work better in the *world*
    We individually as IT professionals want autonomy in our work, right?
    We ”promise” to perform our duties and our employers “promise” to pay us
    Yet we then turn around and build systems that are very command-and-control and wonder why they are brittle
  • So you see, the problems that I pointed out in orchestration are core to the approach: the orchestrator is trying to make promises as a whole about the behavior of others, whereas they are autonomous agents by definition – networked systems over unreliable networks are!

    You can regard orchestration as being incompatible with some underlying principles of configuration management. That’s not to say that people still don’t want to do it, right, because it’s a natural behavior? And it’s “easier”, which is.
  • So what we in society would call “individual autonomy”, as applied to distributed systems we might call this “choreography”. Think about ballet dancers

    So choreography vis-a-vis computers is asking a group of distributed systems to converge on a desired end state and possibly change that end state in the presence of others’ changes.

    we just want to modify the implementation so that it’s promise-based, not obligation based. What are the characteristics of these autonomous agents and how would they behave?
  • Make progress towards promised desired state
    - Expressed in code

    Expose interfaces to allow others to verify promises
    Directly, without a mediator (at the edge)

    Promise to take behaviors in the failure of others
    Intelligent recovery may cause its own original promises to be broken, but this fans out across a fleet of machines
    Promise fulfillment failure has intelligent recovery
  • How might we do a deploy as in the previous slide using choreography instead of orchestration?
    Let’s just talk about the update (redeploy) case, the starting-from-scratch case will be self-evident after
    Should be able to change any one component and others react.

    e.g. on DB server, start update of schema
    Return maintenance mode to Java application
    Java application finishes up activities within a timeout, possibly sends maintenance responses to frontend app
    Load the updated schema
    Return normal in-service mode to Java application
    Java application notices this in real time, starts serving again

    The downside: yes your app developers need to make things more resilient to failure. There shouldn’t be the need to restart anything to do this deploy for example, you’re only modifying policy on the DB server and the other actors are responding in realtime.
  • How are we doing in CM with respect to being able to describe the type of policy I’m talking about?

    We’re super good at the first item – we’re nailing that.
    Pretty meh on the second – we don’t really allow others to verify promises except via a central point of coordination like a Chef Server or a Puppet Master
    Pretty bad on the last item
  • When it blows up now, recovery scenarios can be gnarly, especially at scale. Which leads into problem #2
  • Still very, very early days
    Choreography tools are primitive, many are hard to use
    Very loose coupling between CM and choreography - still an externality to CM systems, which seems weird to me. CM systems are very good at defining policy on a particular node, but can't ask peers to change their policies?
    Still very centralized management of CM policy. Chef is no exception, where the Chef Server is effectively a policy SPOF.

    So now we have solutions for limited modification of system policy via choreography, but the rest of CM is still using orchestration-type primitives

    Also app developers have some part in this:
    Many legacy applications are still tightly coupled and depend on orchestration
    Many developers still continue to write applications like this
    Applications don’t know how to respond in real-time to events
    Process supervision systems like systemd aren’t peer-aware
  • Would like to see CM systems do better at coordinating across a fleet – between nodes, between containers, etc. This includes not only defining interdependent policies like the ones I mentioned, but also the ability to recover from promise failure / unreliability, e.g. I detect that others are being “untruthful” in execution of their promises and I can take some recovery action
    Want this to work without reliance on external systems like ZK, Consul, etc. – not that these are bad, just another thing an admin needs to set up and worty about
    Whatever is built has to work for workloads of both long and short lifespans (containers) – CM is not just traditional convergent CM of long-lived workloads
    Dynamic join/leave mechanics and policy propagation must happen over unreliable networks without a central coordinator – implies a gossip protocol with policies provided as a rumor, like Serf
    Implies that strong but easy to use security – must be built-in
    Interesting pilot project by James Shubin who spoke yesterday in the Main Track but I encourage you to check out his project “mgmt” at https://ttboj.wordpress.com/2016/01/18/next-generation-configuration-mgmt/

    Now we do have a James Shubin up here at CM camp that I think takes exception with my use of this term and says that it’s overloaded, but I have yet to see IBM Choreographer or HP Choreographer or really any product named that, so I’m going to argue that I’m safe for now 

  • Choreography? You Don't Need Choreography. What You Want is Orchestration

    1. 1. Who Needs Orchestration? What You Want is Choreography! Julian Dunn – Product Manager, Chef Software, Inc. – @julian_dunn
    2. 2. What is orchestration?
    3. 3. An ordered set of operations Across a set of independent machines Connected to an orchestrator only via a network.
    4. 4. Humans acting on Visio acting on machines Humans acting on code acting on machines
    5. 5. An ordered set of operations Defined in code Across a set of independent machines Connected to an orchestrator only via a network.
    6. 6. tasks: - name: remove host from LB shell: /usr/local/bin/remove_host {{ ansible_hostname }} delegate_to: loadbalancer.example.com - name: deploy code git: repo=http://github.com/foo/bar.git dest=/var/www/html/ notify: - restart apache - name: add host to LB shell: /usr/local/bin/add_host {{ ansible_hostname }} delegate_to: loadbalancer.example.com
    7. 7. Failure is hard to recover from
    8. 8. Mark Burgess, the father of Promise
    9. 9. Promises versus obligations according to P. Elliot Kitten
    10. 10. “You will feed my cat”
    11. 11. “Will you promise to feed my cat?”
    12. 12. “Obligations are far from being a reliable tool for ensuring compliance. If a law-giver wanted to ensure the compliance of an agent, a better strategy would be to obtain a promise from the agent and to convince it to view the intention as a commitment since the law-giver could never know whether the agent had indeed committed to the body of the obligation.” - Bergstra & Burgess, “A Static Theory of Promises”
    13. 13. Individual autonomy beats command-and-control
    14. 14. Trying to make promises about the behavior of the other nodes
    15. 15. What is choreography?
    16. 16. Autonomous actors: Make progress towards promised desired state Expose interfaces to allow others to verify promises Can promise to take certain behaviors in the face of failure of
    17. 17. Autonomous actors: Make progress towards promised desired state Expose interfaces to allow others to verify promises Can promise to take certain behaviors in the face of failure of
    18. 18. Chef Search has a foundation backends = search(:node, ‘role[web]’) # problem: slow update time # problem: depends on single point of synchronization template ‘/etc/haproxy/haproxy.cfg’ do variables(:backends => backends) … action :reload, ‘service[haproxy]’ end # problem: policy only updated whenever Chef converge happens
    19. 19. What about this? # on the backends themselves service ‘httpd’ do action :start notifies :create, ‘template[/etc/haproxy/haproxy.cfg]’, :nodes => ‘loadbalancers’ end
    20. 20. Chef, Puppet, Ansible, Salt, etc. +
    21. 21. • Better coordination across a fleet • Less reliance on external real-time state systems • Useful for workloads of short and long lifespans • Strong security built-in • “mgmt” – https://ttboj.wordpress.com/ What of the future?
    22. 22. Don’t build or choose things requiring orchestration Build choreography into configuration management systems Make configuration management systems fleet-aware, not just node-aware Conclusions

    ×