Event Driven Automation
and Workflows
Dmitri Zimine
CTO, StackStorm
#Stack_Storm
About myself
• Past:
– Opalis Software (now aka M$ SC Orchestrator)
– VMware
• Present:
– StackStorm CTO & co-founder
– Mistral core team member
– I don’t ops (but most Stormers do)
Agenda
1. High level:
Brief History Of Event Driven
Automation
2. Into the weeds:
Workflow patterns for IT automation
Business Process Management
VMware
CA
BMC
OpsWare HP
CISCO
Microsoft
BMC
Citrix
The Problem is Bigger
than it was
5 years ago
More tools…
Still…
• Manual operations
• Custom scripts
Solution
• Event Driven Automation – with modern twist
– FBAR (saving 1532 hours/day)
– Salt Conf - Event Driven Infrastructure
– Microsoft – new Azure Automation (RunBooks)
Solution: Event Driven Automation
Event Driven Automation
Actions
Trigger
Rules
Infrastructure – Cloud – Applications – Tools – Processes
{.}
Sensors
Call
Workflows
/
/
WORKFLOWS
Zoom to Workflow, and Get Practical
• From now on I focus on workflow
• Reminder: EDA != Workflow, but Workflow is a
big part of it.
Patterns vs Practice
• ~100 patterns
http://www.workflowpatterns.com/
• Practice – IMAO: only few sufficient
• Workflow do two things well:
– Keeps state
– Carry data across systems
Basic: Sequence
...
tasks:
t1_update_config:
action: core.remote_sudo
input:
cmd: sed -i -e"s/keepalive_timeout
hosts: my_webserver.example.com
on-complete: t2_cleanup_logs
t2_cleanup_logs:
action: core.remote_sudo
input:
cmd: rm /var/log/nginx/
hosts: my_webserer.example.com
on-complete: t3_restart_service
t3_restart_service:
action: core.remote_sudo cmd="servic
t1 t2 t3
Basic: Data Passing
t1.code=0
msg=“Some string..”
t1 t2
examples.data_pass:
input:
- host
tasks:
t1_diagnose:
action: diag.run_mysql_diag
input:
host: <% $.host %>
publish:
- msg: <% t1_diagnose.stdout.summary %>
on-complete: t2_cleanup_logs
t2_post_to_chat:
action: chatops.say
input:
header: Returned <% $.t1_diagnose.code %>
details: <% $.msg %>
Basic: Conditions
t1
t3
t2
tasks:
...
t1_deploy:
action: ops.deploy_fleet
on-success: t2_post_to_chat
on-failure: t3_page_ops
t2_post_to_chat:
action: chatops.say
input:
header: Successfully deployed <% $.t1_diag
t3_page_admin:
action: pagerduty.launch_incident
input:
details: Have to wake up dude...
details: <% $.msg %>
Basic: Conditions on Data
t1
t3
t2
t1_diagnose:
action: ops.run_mysql_diag
publish:
- code: <% t1_diagnose.return_code %>
on-complete:
- t2_post_to_chat: <% $.code == 0 %>
- t3_page_mysql_admin: <% $.code > 0 %>
t2_post_to_chat:
action: chatops.say
input:
header: "mysql checked, OK"
t3_page_mysql_admin:
action: pagerduty.launch_incident
input:
details: Have to wake up dude...
details: <% $.t1_diagnose.stdout %>
t1.code==0
t1.code >0
THAT’S THE BASICS!
SUFFICIENT.
THERE’S MORE…
More: Parallel Execution
t1
t4
t2
...
t1_do_build:
action: cicd.do_build_and_packages
on-success:
- t2_test_ubuntu14
- t3_test_fedora20
- t3_test_rhel6
t2_test_ubuntu14:
action: cicd.deploy_and_test distro="UBUNTU14"
t3_test_fedora20:
action: cicd.deploy_and_test distro="F20"
t4_test_rhel6:
action: cicd.deploy_and_test distro="RHEL6"
t3
More: Join
t5
t4
t2
t3t1
More: Join
t5
t4
t2
t3t1
16 ways to join
More: Join – Simple Merge
t5
t4
t2
...
t2_test_ubuntu14:
action: cicd.deploy_and_test distro="UBUNTU14”
on-success: t5_post_status
t3_test_fedora20:
action: cicd.deploy_and_test distro="F20"
on-success: t5_post_status
t4_test_rhel6:
action: cicd.deploy_and_test distro="RHEL6"
on-success: t5_post_status
t5_post_status:
action: chatops.say
input:
header: Test completed!
t3
http://www.workflowpatterns.com/patterns/control/basic/wcp5.php
Simple Merge
t5
t5
More: Join – AND Join
t5
t4
t2
...
t2_test_ubuntu14:
action: cicd.deploy_and_test distro="UBUNTU14”
on-success: t5_post_status
t3_test_fedora20:
action: cicd.deploy_and_test distro="F20"
on-success: t5_post_status
t4_test_rhel6:
action: cicd.deploy_and_test distro="RHEL6"
on-success: t5_post_status
t5_tag_release:
join: all
action: cicd.tag_release
t3
http://www.workflowpatterns.com/patterns/control/new/wcp33.php
Full AND Join
More: Join - Discriminator
t5
t4
t2
...
t2_test_ubuntu14:
action: cicd.deploy_and_test distro="UBUNTU14”
on-failure: t5_report_and_fail
t3_test_fedora20:
action: cicd.deploy_and_test distro="F20"
on-failure: t5_report_and_fail
t4_test_rhel6:
action: cicd.deploy_and_test distro="RHEL6"
on-failure: t5_report_and_fail
t5_report_and_fail:
join: one
action: chatops.say header=“FAILURE!”
on-complete: fail
t3
http://www.workflowpatterns.com/patterns/control/advanced_branching/wcp9.php
Discriminator
More: Multiple Data
t1 t2
ip_list=[...]
...
t1_get_ip_list:
action: myaws.allocate_floating_ips num=4
publish:
- ip_list: <% $.t1_get_ip_list.ips %>
on-complete: t2_create_vms
t2_create_vms:
with-items: ip in <% $. ip_list %>
action: myaws.create_vms ip=<% $.ip %>
And More Details…
• Nesting
– Nothing to say except
– Input and output
– Nested workflow is an action, not a task
• Retries, Waits, Pause/Resume
• Default task policies
Recap: Workflow Operations
• Sequence
• Data passing
• Conditions (on data)
• Parallel execution
• Joins
• Multiple Data Items
What else
• Other than pattern support:
• Reliability
• Manageability – API, CLI, DSL, infra as code…
• Good to have: good GUI
Summary
• Event Driven Automation is coming back
– with a new twist
• EDA > Workflow,
but Workflow is a key component
• Shameless plug
StackStorm is covering it all
• OpenSource Event Automation Platform
• Github: github.com/stackstorm/st2
• Twitter: Stack_Storm
• IRC: #stackstorm on FreeNode
• www.stackstorm.com

Event Driven Automation Meetup May 14/2015

  • 1.
    Event Driven Automation andWorkflows Dmitri Zimine CTO, StackStorm #Stack_Storm
  • 2.
    About myself • Past: –Opalis Software (now aka M$ SC Orchestrator) – VMware • Present: – StackStorm CTO & co-founder – Mistral core team member – I don’t ops (but most Stormers do)
  • 3.
    Agenda 1. High level: BriefHistory Of Event Driven Automation 2. Into the weeds: Workflow patterns for IT automation
  • 4.
  • 9.
  • 12.
    The Problem isBigger than it was 5 years ago
  • 15.
  • 16.
  • 18.
    Solution • Event DrivenAutomation – with modern twist – FBAR (saving 1532 hours/day) – Salt Conf - Event Driven Infrastructure – Microsoft – new Azure Automation (RunBooks)
  • 19.
  • 20.
    Event Driven Automation Actions Trigger Rules Infrastructure– Cloud – Applications – Tools – Processes {.} Sensors Call Workflows / /
  • 21.
  • 22.
    Zoom to Workflow,and Get Practical • From now on I focus on workflow • Reminder: EDA != Workflow, but Workflow is a big part of it.
  • 23.
    Patterns vs Practice •~100 patterns http://www.workflowpatterns.com/ • Practice – IMAO: only few sufficient • Workflow do two things well: – Keeps state – Carry data across systems
  • 24.
    Basic: Sequence ... tasks: t1_update_config: action: core.remote_sudo input: cmd:sed -i -e"s/keepalive_timeout hosts: my_webserver.example.com on-complete: t2_cleanup_logs t2_cleanup_logs: action: core.remote_sudo input: cmd: rm /var/log/nginx/ hosts: my_webserer.example.com on-complete: t3_restart_service t3_restart_service: action: core.remote_sudo cmd="servic t1 t2 t3
  • 25.
    Basic: Data Passing t1.code=0 msg=“Somestring..” t1 t2 examples.data_pass: input: - host tasks: t1_diagnose: action: diag.run_mysql_diag input: host: <% $.host %> publish: - msg: <% t1_diagnose.stdout.summary %> on-complete: t2_cleanup_logs t2_post_to_chat: action: chatops.say input: header: Returned <% $.t1_diagnose.code %> details: <% $.msg %>
  • 26.
    Basic: Conditions t1 t3 t2 tasks: ... t1_deploy: action: ops.deploy_fleet on-success:t2_post_to_chat on-failure: t3_page_ops t2_post_to_chat: action: chatops.say input: header: Successfully deployed <% $.t1_diag t3_page_admin: action: pagerduty.launch_incident input: details: Have to wake up dude... details: <% $.msg %>
  • 27.
    Basic: Conditions onData t1 t3 t2 t1_diagnose: action: ops.run_mysql_diag publish: - code: <% t1_diagnose.return_code %> on-complete: - t2_post_to_chat: <% $.code == 0 %> - t3_page_mysql_admin: <% $.code > 0 %> t2_post_to_chat: action: chatops.say input: header: "mysql checked, OK" t3_page_mysql_admin: action: pagerduty.launch_incident input: details: Have to wake up dude... details: <% $.t1_diagnose.stdout %> t1.code==0 t1.code >0
  • 28.
  • 29.
    More: Parallel Execution t1 t4 t2 ... t1_do_build: action:cicd.do_build_and_packages on-success: - t2_test_ubuntu14 - t3_test_fedora20 - t3_test_rhel6 t2_test_ubuntu14: action: cicd.deploy_and_test distro="UBUNTU14" t3_test_fedora20: action: cicd.deploy_and_test distro="F20" t4_test_rhel6: action: cicd.deploy_and_test distro="RHEL6" t3
  • 30.
  • 31.
  • 32.
    More: Join –Simple Merge t5 t4 t2 ... t2_test_ubuntu14: action: cicd.deploy_and_test distro="UBUNTU14” on-success: t5_post_status t3_test_fedora20: action: cicd.deploy_and_test distro="F20" on-success: t5_post_status t4_test_rhel6: action: cicd.deploy_and_test distro="RHEL6" on-success: t5_post_status t5_post_status: action: chatops.say input: header: Test completed! t3 http://www.workflowpatterns.com/patterns/control/basic/wcp5.php Simple Merge t5 t5
  • 33.
    More: Join –AND Join t5 t4 t2 ... t2_test_ubuntu14: action: cicd.deploy_and_test distro="UBUNTU14” on-success: t5_post_status t3_test_fedora20: action: cicd.deploy_and_test distro="F20" on-success: t5_post_status t4_test_rhel6: action: cicd.deploy_and_test distro="RHEL6" on-success: t5_post_status t5_tag_release: join: all action: cicd.tag_release t3 http://www.workflowpatterns.com/patterns/control/new/wcp33.php Full AND Join
  • 34.
    More: Join -Discriminator t5 t4 t2 ... t2_test_ubuntu14: action: cicd.deploy_and_test distro="UBUNTU14” on-failure: t5_report_and_fail t3_test_fedora20: action: cicd.deploy_and_test distro="F20" on-failure: t5_report_and_fail t4_test_rhel6: action: cicd.deploy_and_test distro="RHEL6" on-failure: t5_report_and_fail t5_report_and_fail: join: one action: chatops.say header=“FAILURE!” on-complete: fail t3 http://www.workflowpatterns.com/patterns/control/advanced_branching/wcp9.php Discriminator
  • 35.
    More: Multiple Data t1t2 ip_list=[...] ... t1_get_ip_list: action: myaws.allocate_floating_ips num=4 publish: - ip_list: <% $.t1_get_ip_list.ips %> on-complete: t2_create_vms t2_create_vms: with-items: ip in <% $. ip_list %> action: myaws.create_vms ip=<% $.ip %>
  • 36.
    And More Details… •Nesting – Nothing to say except – Input and output – Nested workflow is an action, not a task • Retries, Waits, Pause/Resume • Default task policies
  • 37.
    Recap: Workflow Operations •Sequence • Data passing • Conditions (on data) • Parallel execution • Joins • Multiple Data Items
  • 38.
    What else • Otherthan pattern support: • Reliability • Manageability – API, CLI, DSL, infra as code… • Good to have: good GUI
  • 39.
    Summary • Event DrivenAutomation is coming back – with a new twist • EDA > Workflow, but Workflow is a key component • Shameless plug StackStorm is covering it all
  • 40.
    • OpenSource EventAutomation Platform • Github: github.com/stackstorm/st2 • Twitter: Stack_Storm • IRC: #stackstorm on FreeNode • www.stackstorm.com

Editor's Notes

  • #3 Why listen to me… Created one of the legacy RunBook automation products Currently, I am set to fix my past mistakes core member of Mistral team
  • #5 All started with Business Process Automation
  • #6 Applied software to business BPM come to life Body of Comp Sci research on Workflow dated late 90s. Petri-net, math, workflow nomenclature, definitions, pattersn – all started there.
  • #7  Tibco – who was - apply to IT systems? Enterprise message bus… IT automation
  • #8 Others picked up the idea, Run Book Automation
  • #14 Servers took days to deploy (and tickets were the say to go) Docker deploys at split seconds Speed is addictive – we now hate JIRA and love Slack and Chatops
  • #15 Tools – ways more
  • #18 Tools – ways more
  • #20 Close the loop: O.O.D.A
  • #22 Why workflows are better than scripts –leave the proof to the reader as an exercise, actually Brian covered it
  • #24 Walk you through these pattersns, show Mistral as Example
  • #27 Pre-conditions, post conditions
  • #28 Pre-conditions, post conditions For simple case both work, for advanced patterns – more/less friendly.
  • #30 Example: run full deployment and e2e tests on 3 platforms You can do it sequentually but it takes forever.
  • #31 How many times t5 is gonna run?
  • #32 How many times t5 is gonna run?
  • #33 How many times chatops_say is gonna run?
  • #34 How many times t5 is gonna run now? Once!
  • #35 How many times t5 is gonna run now? Once!
  • #36  Cool: Watch, ma, the multi-data are running in parallel! And the final data Check concurrency
  • #37 There are few more nuances within these patternns Which in the interest of time, I just mention in passing:
  • #38 This is the minimal set that gives enough power but keeps it simple to create, track, and reason.