SlideShare a Scribd company logo
1 of 64
Event Driven Automation and
Workflows for Auto-remediation
© 2016 BROCADE COMMUNICATIONS SYSTEMS, INC.
About myself
Past
• Opalis Software (now aka M$ SC Orchestrator)
• VMware
• OpenStack Mistral core team member
• StackStorm founder & CTO
Present:
• Automation and Integration @ Brocade
© 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 2
Agenda
• Brief History of Event Driven Automation and Workflows
• How it works
• What can be automated
• Workflows - detailed
• Workflow based automation vs alternatives
© 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 3
Automation starts with the workflow
© 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 4
“ Workflow is a set of tasks strung together
to achieve some meaningful business objective “
5
6
Business Process Management
Apply BPM to IT Automation?
7
The TIBCO Integration Platform
© 2016 BROCADE COMMUNICATIONS SYSTEMS, INC.
© 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 8
Hype Cycle for Real-Time Infrastructure, 2008
© 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 10
BMC
BMC
CA
Cisco
VMware
Citrix
OpsWare HP
Microsoft
© 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 11
12
The problem is bigger
than it was 5 years ago
13
Speed
© 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 14
Tools
© 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 15
More Tools…
© 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 16
Still…
• Manual operations
• Custom scripts
Event Driven Automation 2.0
© 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 18
FBAR (saving 13,680 hours/day)
Naoru
Nurse
Winston (powered by StackStorm)
Azure Automation
Mistral workflow service
StackStorm automation platform
ACT
OBSERVE
ORIENT
DECIDE
Ingredients
© 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 19
IT Domains
Config mgmtStorageNetworking ContainersCloud InfraMonitoring
ActionsSensors
WorkflowsRules
Ops Support
Automation Example
© 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 20
Automation
EngineerService
Monitoring Incident
Management
Event: “low disk
on web301”
Web301 is
“low disk”
Resolve known cases,
fast. Is it
/var/log?
Clean up!
Unknown
problem, need a
human
Wake up, buddy.
Something real
is going on…
What can be automated?
• Security checks
– On malware detection in a VM, isolate
network port on a switch
• App blue-green deployment
– On Jenkins tests passed, bring new vm
claster, deploy and configure app, set
loadbalancer to send % of traffic to new
app, monitor, roll forward, or back out
• Networking
– On BGP peer goes down: collect
troubleshooting data, post on slack & create
JIRA ticket
– On Link aggregation member error, check
load, if capacity of rest of LAG bundle
enough, disable link with error
• OpenStack
– orphan VM clean-up: On orphans detected,
shut down, email owner, keep for few days,
delete
– VM evacuation on HW failures: On host RAID
failure, get list of impacted VMs, email VM
owners, evacuate VMs, create JIRA ticket for
hardware replacement.
• NFV:
– Nokia, Ericson, AT&T, with Mistral and
OpenStack
• Service remediation:
– Cassandra “node down” recovery: On ring
node dying, deploy new node, configure, add
to the ring.
– Remediating RabbitMQ, Galera cluster,
MySQL, and more…
© 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 21
What can be automated?
From: Practice of Cloud System Administration, by Thomas Limoncelli
Benefits
• Avoid failures (fixing on computer time, not human time)
• Reduce incident MTTR (Mean Time To Recover)
• Reduce risk of human error (no fat fingers)
•
–
–
–
© 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 24
Engineer
Wakes up
Logs in
and ACK
Checks
runbook
Studies
the alert
Fixes the
problem
Runs
diagnostics
PagerDuty
Alert
2:02 AM 2:07 AM 2:15 AM2:10 AM 2:30 AM2:20 AM2:00 AM
On-call, Without Automation
False
Positive
Winston
2:00 AM
2:05 AM
2:05 AM
2:15 AMAssisted
Diagnostics
Fixed the
problem
On-call With Winston
27
Benefits
© 2016 BROCADE COMMUNICATIONS SYSTEMS, INC.
Uses event driven automation and
workflows with Brocade Workflow
Composer to run Virtual Desktop Service
Virus Detection 80% reduction in ops man-hours to
detect, isolate and resolve
Adding tenant 70% reduction in man-hours,
Environment Verification 50% time to verify reduced
120% verification coverage
Threshold Monitoring 40% decrease incidences caused by
lack of resources
Troubleshooting 40% reduced data collection time
Network Troubleshooting
(congestion, loops)
80% reduction in man-hours,
minimizing operational mistakes
“Sleep Better at Night: OpenStack Cloud Auto-Healing” @ OpenStack Summit Barcelona
Mirantis: Auto-remediating 2,000 node OpenStack cluster at Symantec with StackStorm
Benefits
• Reduce MTR (Mean Time to Resolution)
• Avoid failures (fixing on computer time, not human time)
• Reduce risk of human error (no fat fingers)
• Positive team impact
– Avoid pager fatigue and team burn-out
– Turn from reactive to proactive (break reactive vicious cycle)
– Capture operational knowledge – as code
© 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 29
•
•
•
•
Into Details:
© 2016 BROCADE COMMUNICATIONS SYSTEMS, INC.
Workflows
© 2016 BROCADE COMMUNICATIONS SYSTEMS, INC.
Workflows
© 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 32
IT Domains
Config mgmtStorageNetworking ContainersCloud InfraMonitoring
ActionsSensors
WorkflowsRules
Ops Support
MISTRAL
N.B: Event Driven Automation > Workflow,
but Workflow is a key element.
Key Workflow Patterns
• Theory: ~100 patterns - http://www.workflowpatterns.com/
• Practice: IMAO, only few sufficient for IT & DC automation
© 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 33
Basic: Sequence
© 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 34
...
tasks:
t1_update_config:
action: core.remote_sudo
input:
cmd: sed -i -e"s/keepalive_timeout
hosts: my_webserver.example.com
on-complete: t2_cleanup_logs
t2_cleanup_logs:
action: core.remote_sudo
input:
cmd: rm /var/log/nginx/
hosts: my_webserer.example.com
on-complete: t3_restart_service
t3_restart_service:
action: core.remote_sudo cmd="servic
t1 t2 t3
Basic: Data Passing
© 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 35
examples.data_pass:
input:
- host
tasks:
t1_diagnose:
action: diag.run_mysql_diag
input:
host: <% $.host %>
publish:
- msg: <% t1_diagnose.stdout.summary %>
on-complete: t2_cleanup_logs
t2_post_to_chat:
action: chatops.say
input:
header: Returned <% $.t1_diagnose.code %>
details: <% $.msg %>
t1.code=0
msg=“Some string..”
t1 t2
Basic: Conditions
© 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 36
tasks:
...
t1_deploy:
action: ops.deploy_fleet
on-success: t2_post_to_chat
on-failure: t3_page_ops
t2_post_to_chat:
action: chatops.say
input:
header: Successfully deployed <% $.t1_diag
t3_page_admin:
action: pagerduty.launch_incident
input:
details: Have to wake up dude...
details: <% $.msg %>
t1
t2
t3
Basic: Conditions on Data
© 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 37
t1_diagnose:
action: ops.run_switch_diag
publish:
- code: <% t1_diagnose.return_code %>
on-complete:
- t2_post_to_chat: <% $.code == 0 %>
- t3_page_network_admin: <% $.code > 0 %>
t2_post_to_chat:
action: slack.post
input:
header: ”Switch <% switch %> checked, OK"
t3_page_network_admin:
action: pagerduty.launch_incident
input:
details: Have to wake up dude...
details: <% $.t1_diagnose.stdout %>
t1.code==0
t1.code >0
t1
t2
t3
Sufficient. But there is more…
That’s the basics!
© 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 38
More: Parallel Execution
© 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 39
t4
...
t1_do_build:
action: cicd.do_build_and_packages
on-success:
- t2_test_ubuntu14
- t3_test_fedora20
- t3_test_rhel6
t2_test_ubuntu14:
action: cicd.deploy_and_test distro="UBUNTU14"
t3_test_fedora20:
action: cicd.deploy_and_test distro="F20"
t4_test_rhel6:
action: cicd.deploy_and_test distro="RHEL6"
t4
t1 t3
t2
More: Join
© 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 40
t1 t5
t4
t3
t2
More: Join
© 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 41
16 ways to join
t4
t1 t3
t2
t5
More: Join—Simple Merge
© 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 42HTTP://WWW.WORKFLOWPATTERNS.COM/PATTERNS/CONTROL/BASIC/WCP5.PHP
...
t2_test_ubuntu14:
action: cicd.deploy_and_test distro="UBUNTU14”
on-success: t5_post_status
t3_test_fedora20:
action: cicd.deploy_and_test distro="F20"
on-success: t5_post_status
t4_test_rhel6:
action: cicd.deploy_and_test distro="RHEL6"
on-success: t5_post_status
t5_post_status:
action: chatops.say
input:
header: Test completed!
Simple Merge
t5t5t5
t2
t3
t4
t5
More: Join—AND Join
© 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 43HTTP://WWW.WORKFLOWPATTERNS.COM/PATTERNS/CONTROL/NEW/WCP33.PHP
Full “AND” Join
...
t2_test_ubuntu14:
action: cicd.deploy_and_test distro="UBUNTU14”
on-success: t5_post_status
t3_test_fedora20:
action: cicd.deploy_and_test distro="F20"
on-success: t5_post_status
t4_test_rhel6:
action: cicd.deploy_and_test distro="RHEL6"
on-success: t5_post_status
t5_tag_release:
join: all
action: cicd.tag_release
t2
t3
t4
t5
More: Join—Discriminator
© 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 44HTTP://WWW.WORKFLOWPATTERNS.COM/PATTERNS/CONTROL/ADVANCED_BRANCHING/WCP9.PHP
Discriminator
...
t2_test_ubuntu14:
action: cicd.deploy_and_test distro="UBUNTU14”
on-failure: t5_report_and_fail
t3_test_fedora20:
action: cicd.deploy_and_test distro="F20"
on-failure: t5_report_and_fail
t4_test_rhel6:
action: cicd.deploy_and_test distro="RHEL6"
on-failure: t5_report_and_fail
t5_report_and_fail:
join: one
action: chatops.say header=“FAILURE!”
on-complete: fail
t2
t3
t4
t5
t2t2
More: Multiple Data
© 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 45
...
t1_get_ip_list:
action: myinventory.allocate_ips num=4
publish:
- ip_list: <% $.t1_get_ip_list.ips %>
on-complete: t2_create_vms
t2_create_vms:
with-items: ip in <% $. ip_list %>
action: myaws.create_vms ip=<% $.ip %>
t1 t2
ip_list=[...]
Recap: Key Workflow Operations
• Sequence
• Data passing
• Conditions (on data)
• Parallel execution
• Joins
• Multiple Data Items
© 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 46
Why not Scripts?
47
Why not Scripts?
48
• Simple to define, reason, visualize
• Transparent
– state is clear, execution is trackable: running, complete, failed steps
•
–
–
–
© 2016 BROCADE COMMUNICATIONS SYSTEMS, INC.49
Workflows Better in Operations
• Simple to define, reason, visualize
• Transparent
– state is clear, execution is trackable: running, complete, failed steps
• Reliable
– Workflows are long-running
– Crash tolerance
– “Restart from point of failure”
© 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 50
Why not Legacy RunBook Automation?
© 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 51
DevOps:
Infrastructure as Code
52
Infrastructure as code
Case Study
• Automated provisioning, 4 Data centers
• Before: CPO, operator updates via GUI, click and pray, x4
• After: BWC, dev -> code review -> staging -> QA-> prod
Infrastructure as code
© 2016 BROCADE COMMUNICATIONS SYSTEMS, INC.53
Top predictor of IT performance?
Version control used by Ops
for Ops artifacts!
Designed for DevOps
1. Support infrastructure as code
2. Open Source
3. Scale and reliability
4. Part of tool chain
5. Social coding & collaboration
6. More demanding - requires skills
54© 2016 BROCADE COMMUNICATIONS SYSTEMS, INC.
Part of tool chain
© 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 55
Devops Tools vs Enterprise Suites
OR
Leverage social coding
Community packs @ StackStorm exchange
More demanding
© 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 57
OR
Requires skills – CLI, scripting, understanding
Operation Patterns
© 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 58
Capture and share operational patters
as code!
© 2016 BROCADE COMMUNICATIONS SYSTEMS, INC.59
© 2016 BROCADE COMMUNICATIONS SYSTEMS, INC.60
61
• Event-driven automation works –
- benefits to reliable cloud operations
• Automation must be reliable and transparent –
- workflows beat scripts
• Infra as code is a key –
- repeatable, testable, reliable automation
Summary
© 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 62
OpenSource Apache 2.0
• Github: github.com/StackStorm/st2
• Twitter: Stack_Storm
• IRC: #stackstorm on FreeNode
• stackstorm.slack.com on Slack
• www.stackstorm.com
© 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 63
StackStorm Brocade Workflow Composer
Commercial Edition
• Enterprise features
• Priority support
• brocade.com/bwc
• docs: bwc-docs.brocade.com
• Network lifecycle automation suite
Questions & Answers
© 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 64

More Related Content

What's hot

Kubernetes & Google Kubernetes Engine (GKE)
Kubernetes & Google Kubernetes Engine (GKE)Kubernetes & Google Kubernetes Engine (GKE)
Kubernetes & Google Kubernetes Engine (GKE)Akash Agrawal
 
Advanced Deployment Strategies with Kubernetes and Istio
Advanced Deployment Strategies with Kubernetes and IstioAdvanced Deployment Strategies with Kubernetes and Istio
Advanced Deployment Strategies with Kubernetes and IstioCloudOps2005
 
Re-Host or Re-Architect: Understanding the Why and How of Very Different Path...
Re-Host or Re-Architect: Understanding the Why and How of Very Different Path...Re-Host or Re-Architect: Understanding the Why and How of Very Different Path...
Re-Host or Re-Architect: Understanding the Why and How of Very Different Path...Amazon Web Services
 
Cloud native principles
Cloud native principlesCloud native principles
Cloud native principlesDiego Pacheco
 
Using HashiCorp’s Terraform to build your infrastructure on AWS - Pop-up Loft...
Using HashiCorp’s Terraform to build your infrastructure on AWS - Pop-up Loft...Using HashiCorp’s Terraform to build your infrastructure on AWS - Pop-up Loft...
Using HashiCorp’s Terraform to build your infrastructure on AWS - Pop-up Loft...Amazon Web Services
 
Containers and workload security an overview
Containers and workload security an overview Containers and workload security an overview
Containers and workload security an overview Krishna-Kumar
 
Admission controllers - PSP, OPA, Kyverno and more!
Admission controllers - PSP, OPA, Kyverno and more!Admission controllers - PSP, OPA, Kyverno and more!
Admission controllers - PSP, OPA, Kyverno and more!SebastienSEYMARC
 
Service Mesh on Kubernetes with Istio
Service Mesh on Kubernetes with IstioService Mesh on Kubernetes with Istio
Service Mesh on Kubernetes with IstioMichelle Holley
 
Shift-left SRE: Self-healing on OpenShift with Ansible
Shift-left SRE: Self-healing on OpenShift with AnsibleShift-left SRE: Self-healing on OpenShift with Ansible
Shift-left SRE: Self-healing on OpenShift with AnsibleJürgen Etzlstorfer
 
Kubernetes 101 for Beginners
Kubernetes 101 for BeginnersKubernetes 101 for Beginners
Kubernetes 101 for BeginnersOktay Esgul
 
Introduction to kubernetes
Introduction to kubernetesIntroduction to kubernetes
Introduction to kubernetesGabriel Carro
 
Kubernetes
KubernetesKubernetes
KubernetesHenry He
 
Learning Docker from Square One
Learning Docker from Square OneLearning Docker from Square One
Learning Docker from Square OneDocker, Inc.
 
Velero search &amp; practice 20210609
Velero search &amp; practice 20210609Velero search &amp; practice 20210609
Velero search &amp; practice 20210609KAI CHU CHUNG
 
An introduction to terraform
An introduction to terraformAn introduction to terraform
An introduction to terraformJulien Pivotto
 
Deep Dive on Amazon EC2 Instances & Performance Optimization Best Practices (...
Deep Dive on Amazon EC2 Instances & Performance Optimization Best Practices (...Deep Dive on Amazon EC2 Instances & Performance Optimization Best Practices (...
Deep Dive on Amazon EC2 Instances & Performance Optimization Best Practices (...Amazon Web Services
 

What's hot (20)

Kubernetes & Google Kubernetes Engine (GKE)
Kubernetes & Google Kubernetes Engine (GKE)Kubernetes & Google Kubernetes Engine (GKE)
Kubernetes & Google Kubernetes Engine (GKE)
 
Intro to kubernetes
Intro to kubernetesIntro to kubernetes
Intro to kubernetes
 
Advanced Deployment Strategies with Kubernetes and Istio
Advanced Deployment Strategies with Kubernetes and IstioAdvanced Deployment Strategies with Kubernetes and Istio
Advanced Deployment Strategies with Kubernetes and Istio
 
Re-Host or Re-Architect: Understanding the Why and How of Very Different Path...
Re-Host or Re-Architect: Understanding the Why and How of Very Different Path...Re-Host or Re-Architect: Understanding the Why and How of Very Different Path...
Re-Host or Re-Architect: Understanding the Why and How of Very Different Path...
 
Cloud native principles
Cloud native principlesCloud native principles
Cloud native principles
 
Container Security
Container SecurityContainer Security
Container Security
 
Using HashiCorp’s Terraform to build your infrastructure on AWS - Pop-up Loft...
Using HashiCorp’s Terraform to build your infrastructure on AWS - Pop-up Loft...Using HashiCorp’s Terraform to build your infrastructure on AWS - Pop-up Loft...
Using HashiCorp’s Terraform to build your infrastructure on AWS - Pop-up Loft...
 
Containers and workload security an overview
Containers and workload security an overview Containers and workload security an overview
Containers and workload security an overview
 
Admission controllers - PSP, OPA, Kyverno and more!
Admission controllers - PSP, OPA, Kyverno and more!Admission controllers - PSP, OPA, Kyverno and more!
Admission controllers - PSP, OPA, Kyverno and more!
 
Service Mesh on Kubernetes with Istio
Service Mesh on Kubernetes with IstioService Mesh on Kubernetes with Istio
Service Mesh on Kubernetes with Istio
 
Docker Kubernetes Istio
Docker Kubernetes IstioDocker Kubernetes Istio
Docker Kubernetes Istio
 
Shift-left SRE: Self-healing on OpenShift with Ansible
Shift-left SRE: Self-healing on OpenShift with AnsibleShift-left SRE: Self-healing on OpenShift with Ansible
Shift-left SRE: Self-healing on OpenShift with Ansible
 
Kubernetes 101 for Beginners
Kubernetes 101 for BeginnersKubernetes 101 for Beginners
Kubernetes 101 for Beginners
 
Running Kubernetes on AWS.pdf
Running Kubernetes on AWS.pdfRunning Kubernetes on AWS.pdf
Running Kubernetes on AWS.pdf
 
Introduction to kubernetes
Introduction to kubernetesIntroduction to kubernetes
Introduction to kubernetes
 
Kubernetes
KubernetesKubernetes
Kubernetes
 
Learning Docker from Square One
Learning Docker from Square OneLearning Docker from Square One
Learning Docker from Square One
 
Velero search &amp; practice 20210609
Velero search &amp; practice 20210609Velero search &amp; practice 20210609
Velero search &amp; practice 20210609
 
An introduction to terraform
An introduction to terraformAn introduction to terraform
An introduction to terraform
 
Deep Dive on Amazon EC2 Instances & Performance Optimization Best Practices (...
Deep Dive on Amazon EC2 Instances & Performance Optimization Best Practices (...Deep Dive on Amazon EC2 Instances & Performance Optimization Best Practices (...
Deep Dive on Amazon EC2 Instances & Performance Optimization Best Practices (...
 

Similar to Event driven-automation and workflows

Incident Management with Workflows
Incident Management with WorkflowsIncident Management with Workflows
Incident Management with WorkflowsPatrick Hoolboom
 
Event-driven automation, DevOps way ~IoT時代の自動化、そのリアリティとは?~
Event-driven automation, DevOps way ~IoT時代の自動化、そのリアリティとは?~Event-driven automation, DevOps way ~IoT時代の自動化、そのリアリティとは?~
Event-driven automation, DevOps way ~IoT時代の自動化、そのリアリティとは?~Brocade
 
Webinar: How Microsoft is changing the game with Windows Azure
Webinar: How Microsoft is changing the game with Windows AzureWebinar: How Microsoft is changing the game with Windows Azure
Webinar: How Microsoft is changing the game with Windows AzureCommon Sense
 
StackStrom: If-This-Than-That for Devops Automation
StackStrom: If-This-Than-That for Devops AutomationStackStrom: If-This-Than-That for Devops Automation
StackStrom: If-This-Than-That for Devops AutomationDmitri Zimine
 
The Machine Learning behind the Autonomous Database ILOUG Feb 2020
The Machine Learning behind the Autonomous Database   ILOUG Feb 2020 The Machine Learning behind the Autonomous Database   ILOUG Feb 2020
The Machine Learning behind the Autonomous Database ILOUG Feb 2020 Sandesh Rao
 
Data Driven DevOps: from Culture to Gamification
Data Driven DevOps: from Culture to GamificationData Driven DevOps: from Culture to Gamification
Data Driven DevOps: from Culture to GamificationBrian McCallion
 
Mistral and StackStorm
Mistral and StackStormMistral and StackStorm
Mistral and StackStormDmitri Zimine
 
Service Virtualization - Next Gen Testing Conference Singapore 2013
Service Virtualization - Next Gen Testing Conference Singapore 2013Service Virtualization - Next Gen Testing Conference Singapore 2013
Service Virtualization - Next Gen Testing Conference Singapore 2013Min Fang
 
Lessons from Large-Scale Cloud Software at Databricks
Lessons from Large-Scale Cloud Software at DatabricksLessons from Large-Scale Cloud Software at Databricks
Lessons from Large-Scale Cloud Software at DatabricksMatei Zaharia
 
OpsStack--Integrated Operation Platform
OpsStack--Integrated Operation PlatformOpsStack--Integrated Operation Platform
OpsStack--Integrated Operation PlatformChinaNetCloud
 
AWS Public Sector Symposium 2014 Canberra | Putting the "Crowd" to work in th...
AWS Public Sector Symposium 2014 Canberra | Putting the "Crowd" to work in th...AWS Public Sector Symposium 2014 Canberra | Putting the "Crowd" to work in th...
AWS Public Sector Symposium 2014 Canberra | Putting the "Crowd" to work in th...Amazon Web Services
 
2016 - 10 questions you should answer before building a new microservice
2016 - 10 questions you should answer before building a new microservice2016 - 10 questions you should answer before building a new microservice
2016 - 10 questions you should answer before building a new microservicedevopsdaysaustin
 
Primend Pilvekonverents - Azure Infrastruktuur
Primend Pilvekonverents - Azure InfrastruktuurPrimend Pilvekonverents - Azure Infrastruktuur
Primend Pilvekonverents - Azure InfrastruktuurPrimend
 
network-management Web base.ppt
network-management Web base.pptnetwork-management Web base.ppt
network-management Web base.pptAssadLeo1
 
Monitorama - Please, no more Minutes, Milliseconds, Monoliths or Monitoring T...
Monitorama - Please, no more Minutes, Milliseconds, Monoliths or Monitoring T...Monitorama - Please, no more Minutes, Milliseconds, Monoliths or Monitoring T...
Monitorama - Please, no more Minutes, Milliseconds, Monoliths or Monitoring T...Adrian Cockcroft
 
Oracle Open World 2018 - Cloud Lift Accelerator Suite
Oracle Open World 2018 - Cloud Lift Accelerator SuiteOracle Open World 2018 - Cloud Lift Accelerator Suite
Oracle Open World 2018 - Cloud Lift Accelerator SuiteIke Aniagoh
 
Elastic Morocco Meetup Nov 2020
Elastic Morocco Meetup Nov 2020Elastic Morocco Meetup Nov 2020
Elastic Morocco Meetup Nov 2020Anna Ossowski
 
Why NBC Universal Migrated to MongoDB Atlas
Why NBC Universal Migrated to MongoDB AtlasWhy NBC Universal Migrated to MongoDB Atlas
Why NBC Universal Migrated to MongoDB AtlasDatavail
 
The art of the event streaming application: streams, stream processors and sc...
The art of the event streaming application: streams, stream processors and sc...The art of the event streaming application: streams, stream processors and sc...
The art of the event streaming application: streams, stream processors and sc...confluent
 
Kafka summit SF 2019 - the art of the event-streaming app
Kafka summit SF 2019 - the art of the event-streaming appKafka summit SF 2019 - the art of the event-streaming app
Kafka summit SF 2019 - the art of the event-streaming appNeil Avery
 

Similar to Event driven-automation and workflows (20)

Incident Management with Workflows
Incident Management with WorkflowsIncident Management with Workflows
Incident Management with Workflows
 
Event-driven automation, DevOps way ~IoT時代の自動化、そのリアリティとは?~
Event-driven automation, DevOps way ~IoT時代の自動化、そのリアリティとは?~Event-driven automation, DevOps way ~IoT時代の自動化、そのリアリティとは?~
Event-driven automation, DevOps way ~IoT時代の自動化、そのリアリティとは?~
 
Webinar: How Microsoft is changing the game with Windows Azure
Webinar: How Microsoft is changing the game with Windows AzureWebinar: How Microsoft is changing the game with Windows Azure
Webinar: How Microsoft is changing the game with Windows Azure
 
StackStrom: If-This-Than-That for Devops Automation
StackStrom: If-This-Than-That for Devops AutomationStackStrom: If-This-Than-That for Devops Automation
StackStrom: If-This-Than-That for Devops Automation
 
The Machine Learning behind the Autonomous Database ILOUG Feb 2020
The Machine Learning behind the Autonomous Database   ILOUG Feb 2020 The Machine Learning behind the Autonomous Database   ILOUG Feb 2020
The Machine Learning behind the Autonomous Database ILOUG Feb 2020
 
Data Driven DevOps: from Culture to Gamification
Data Driven DevOps: from Culture to GamificationData Driven DevOps: from Culture to Gamification
Data Driven DevOps: from Culture to Gamification
 
Mistral and StackStorm
Mistral and StackStormMistral and StackStorm
Mistral and StackStorm
 
Service Virtualization - Next Gen Testing Conference Singapore 2013
Service Virtualization - Next Gen Testing Conference Singapore 2013Service Virtualization - Next Gen Testing Conference Singapore 2013
Service Virtualization - Next Gen Testing Conference Singapore 2013
 
Lessons from Large-Scale Cloud Software at Databricks
Lessons from Large-Scale Cloud Software at DatabricksLessons from Large-Scale Cloud Software at Databricks
Lessons from Large-Scale Cloud Software at Databricks
 
OpsStack--Integrated Operation Platform
OpsStack--Integrated Operation PlatformOpsStack--Integrated Operation Platform
OpsStack--Integrated Operation Platform
 
AWS Public Sector Symposium 2014 Canberra | Putting the "Crowd" to work in th...
AWS Public Sector Symposium 2014 Canberra | Putting the "Crowd" to work in th...AWS Public Sector Symposium 2014 Canberra | Putting the "Crowd" to work in th...
AWS Public Sector Symposium 2014 Canberra | Putting the "Crowd" to work in th...
 
2016 - 10 questions you should answer before building a new microservice
2016 - 10 questions you should answer before building a new microservice2016 - 10 questions you should answer before building a new microservice
2016 - 10 questions you should answer before building a new microservice
 
Primend Pilvekonverents - Azure Infrastruktuur
Primend Pilvekonverents - Azure InfrastruktuurPrimend Pilvekonverents - Azure Infrastruktuur
Primend Pilvekonverents - Azure Infrastruktuur
 
network-management Web base.ppt
network-management Web base.pptnetwork-management Web base.ppt
network-management Web base.ppt
 
Monitorama - Please, no more Minutes, Milliseconds, Monoliths or Monitoring T...
Monitorama - Please, no more Minutes, Milliseconds, Monoliths or Monitoring T...Monitorama - Please, no more Minutes, Milliseconds, Monoliths or Monitoring T...
Monitorama - Please, no more Minutes, Milliseconds, Monoliths or Monitoring T...
 
Oracle Open World 2018 - Cloud Lift Accelerator Suite
Oracle Open World 2018 - Cloud Lift Accelerator SuiteOracle Open World 2018 - Cloud Lift Accelerator Suite
Oracle Open World 2018 - Cloud Lift Accelerator Suite
 
Elastic Morocco Meetup Nov 2020
Elastic Morocco Meetup Nov 2020Elastic Morocco Meetup Nov 2020
Elastic Morocco Meetup Nov 2020
 
Why NBC Universal Migrated to MongoDB Atlas
Why NBC Universal Migrated to MongoDB AtlasWhy NBC Universal Migrated to MongoDB Atlas
Why NBC Universal Migrated to MongoDB Atlas
 
The art of the event streaming application: streams, stream processors and sc...
The art of the event streaming application: streams, stream processors and sc...The art of the event streaming application: streams, stream processors and sc...
The art of the event streaming application: streams, stream processors and sc...
 
Kafka summit SF 2019 - the art of the event-streaming app
Kafka summit SF 2019 - the art of the event-streaming appKafka summit SF 2019 - the art of the event-streaming app
Kafka summit SF 2019 - the art of the event-streaming app
 

Recently uploaded

Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 

Recently uploaded (20)

Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 

Event driven-automation and workflows

  • 1. Event Driven Automation and Workflows for Auto-remediation © 2016 BROCADE COMMUNICATIONS SYSTEMS, INC.
  • 2. About myself Past • Opalis Software (now aka M$ SC Orchestrator) • VMware • OpenStack Mistral core team member • StackStorm founder & CTO Present: • Automation and Integration @ Brocade © 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 2
  • 3. Agenda • Brief History of Event Driven Automation and Workflows • How it works • What can be automated • Workflows - detailed • Workflow based automation vs alternatives © 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 3
  • 4. Automation starts with the workflow © 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 4 “ Workflow is a set of tasks strung together to achieve some meaningful business objective “
  • 5. 5
  • 7. Apply BPM to IT Automation? 7 The TIBCO Integration Platform © 2016 BROCADE COMMUNICATIONS SYSTEMS, INC.
  • 8. © 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 8
  • 9. Hype Cycle for Real-Time Infrastructure, 2008
  • 10. © 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 10 BMC BMC CA Cisco VMware Citrix OpsWare HP Microsoft
  • 11. © 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 11
  • 12. 12
  • 13. The problem is bigger than it was 5 years ago 13
  • 14. Speed © 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 14
  • 15. Tools © 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 15
  • 16. More Tools… © 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 16
  • 18. Event Driven Automation 2.0 © 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 18 FBAR (saving 13,680 hours/day) Naoru Nurse Winston (powered by StackStorm) Azure Automation Mistral workflow service StackStorm automation platform ACT OBSERVE ORIENT DECIDE
  • 19. Ingredients © 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 19 IT Domains Config mgmtStorageNetworking ContainersCloud InfraMonitoring ActionsSensors WorkflowsRules Ops Support
  • 20. Automation Example © 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 20 Automation EngineerService Monitoring Incident Management Event: “low disk on web301” Web301 is “low disk” Resolve known cases, fast. Is it /var/log? Clean up! Unknown problem, need a human Wake up, buddy. Something real is going on…
  • 21. What can be automated? • Security checks – On malware detection in a VM, isolate network port on a switch • App blue-green deployment – On Jenkins tests passed, bring new vm claster, deploy and configure app, set loadbalancer to send % of traffic to new app, monitor, roll forward, or back out • Networking – On BGP peer goes down: collect troubleshooting data, post on slack & create JIRA ticket – On Link aggregation member error, check load, if capacity of rest of LAG bundle enough, disable link with error • OpenStack – orphan VM clean-up: On orphans detected, shut down, email owner, keep for few days, delete – VM evacuation on HW failures: On host RAID failure, get list of impacted VMs, email VM owners, evacuate VMs, create JIRA ticket for hardware replacement. • NFV: – Nokia, Ericson, AT&T, with Mistral and OpenStack • Service remediation: – Cassandra “node down” recovery: On ring node dying, deploy new node, configure, add to the ring. – Remediating RabbitMQ, Galera cluster, MySQL, and more… © 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 21
  • 22. What can be automated? From: Practice of Cloud System Administration, by Thomas Limoncelli
  • 23.
  • 24. Benefits • Avoid failures (fixing on computer time, not human time) • Reduce incident MTTR (Mean Time To Recover) • Reduce risk of human error (no fat fingers) • – – – © 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 24
  • 25. Engineer Wakes up Logs in and ACK Checks runbook Studies the alert Fixes the problem Runs diagnostics PagerDuty Alert 2:02 AM 2:07 AM 2:15 AM2:10 AM 2:30 AM2:20 AM2:00 AM On-call, Without Automation
  • 26. False Positive Winston 2:00 AM 2:05 AM 2:05 AM 2:15 AMAssisted Diagnostics Fixed the problem On-call With Winston
  • 27. 27 Benefits © 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. Uses event driven automation and workflows with Brocade Workflow Composer to run Virtual Desktop Service Virus Detection 80% reduction in ops man-hours to detect, isolate and resolve Adding tenant 70% reduction in man-hours, Environment Verification 50% time to verify reduced 120% verification coverage Threshold Monitoring 40% decrease incidences caused by lack of resources Troubleshooting 40% reduced data collection time Network Troubleshooting (congestion, loops) 80% reduction in man-hours, minimizing operational mistakes
  • 28. “Sleep Better at Night: OpenStack Cloud Auto-Healing” @ OpenStack Summit Barcelona Mirantis: Auto-remediating 2,000 node OpenStack cluster at Symantec with StackStorm
  • 29. Benefits • Reduce MTR (Mean Time to Resolution) • Avoid failures (fixing on computer time, not human time) • Reduce risk of human error (no fat fingers) • Positive team impact – Avoid pager fatigue and team burn-out – Turn from reactive to proactive (break reactive vicious cycle) – Capture operational knowledge – as code © 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 29
  • 30. • • • • Into Details: © 2016 BROCADE COMMUNICATIONS SYSTEMS, INC.
  • 31. Workflows © 2016 BROCADE COMMUNICATIONS SYSTEMS, INC.
  • 32. Workflows © 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 32 IT Domains Config mgmtStorageNetworking ContainersCloud InfraMonitoring ActionsSensors WorkflowsRules Ops Support MISTRAL N.B: Event Driven Automation > Workflow, but Workflow is a key element.
  • 33. Key Workflow Patterns • Theory: ~100 patterns - http://www.workflowpatterns.com/ • Practice: IMAO, only few sufficient for IT & DC automation © 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 33
  • 34. Basic: Sequence © 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 34 ... tasks: t1_update_config: action: core.remote_sudo input: cmd: sed -i -e"s/keepalive_timeout hosts: my_webserver.example.com on-complete: t2_cleanup_logs t2_cleanup_logs: action: core.remote_sudo input: cmd: rm /var/log/nginx/ hosts: my_webserer.example.com on-complete: t3_restart_service t3_restart_service: action: core.remote_sudo cmd="servic t1 t2 t3
  • 35. Basic: Data Passing © 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 35 examples.data_pass: input: - host tasks: t1_diagnose: action: diag.run_mysql_diag input: host: <% $.host %> publish: - msg: <% t1_diagnose.stdout.summary %> on-complete: t2_cleanup_logs t2_post_to_chat: action: chatops.say input: header: Returned <% $.t1_diagnose.code %> details: <% $.msg %> t1.code=0 msg=“Some string..” t1 t2
  • 36. Basic: Conditions © 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 36 tasks: ... t1_deploy: action: ops.deploy_fleet on-success: t2_post_to_chat on-failure: t3_page_ops t2_post_to_chat: action: chatops.say input: header: Successfully deployed <% $.t1_diag t3_page_admin: action: pagerduty.launch_incident input: details: Have to wake up dude... details: <% $.msg %> t1 t2 t3
  • 37. Basic: Conditions on Data © 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 37 t1_diagnose: action: ops.run_switch_diag publish: - code: <% t1_diagnose.return_code %> on-complete: - t2_post_to_chat: <% $.code == 0 %> - t3_page_network_admin: <% $.code > 0 %> t2_post_to_chat: action: slack.post input: header: ”Switch <% switch %> checked, OK" t3_page_network_admin: action: pagerduty.launch_incident input: details: Have to wake up dude... details: <% $.t1_diagnose.stdout %> t1.code==0 t1.code >0 t1 t2 t3
  • 38. Sufficient. But there is more… That’s the basics! © 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 38
  • 39. More: Parallel Execution © 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 39 t4 ... t1_do_build: action: cicd.do_build_and_packages on-success: - t2_test_ubuntu14 - t3_test_fedora20 - t3_test_rhel6 t2_test_ubuntu14: action: cicd.deploy_and_test distro="UBUNTU14" t3_test_fedora20: action: cicd.deploy_and_test distro="F20" t4_test_rhel6: action: cicd.deploy_and_test distro="RHEL6" t4 t1 t3 t2
  • 40. More: Join © 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 40 t1 t5 t4 t3 t2
  • 41. More: Join © 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 41 16 ways to join t4 t1 t3 t2 t5
  • 42. More: Join—Simple Merge © 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 42HTTP://WWW.WORKFLOWPATTERNS.COM/PATTERNS/CONTROL/BASIC/WCP5.PHP ... t2_test_ubuntu14: action: cicd.deploy_and_test distro="UBUNTU14” on-success: t5_post_status t3_test_fedora20: action: cicd.deploy_and_test distro="F20" on-success: t5_post_status t4_test_rhel6: action: cicd.deploy_and_test distro="RHEL6" on-success: t5_post_status t5_post_status: action: chatops.say input: header: Test completed! Simple Merge t5t5t5 t2 t3 t4 t5
  • 43. More: Join—AND Join © 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 43HTTP://WWW.WORKFLOWPATTERNS.COM/PATTERNS/CONTROL/NEW/WCP33.PHP Full “AND” Join ... t2_test_ubuntu14: action: cicd.deploy_and_test distro="UBUNTU14” on-success: t5_post_status t3_test_fedora20: action: cicd.deploy_and_test distro="F20" on-success: t5_post_status t4_test_rhel6: action: cicd.deploy_and_test distro="RHEL6" on-success: t5_post_status t5_tag_release: join: all action: cicd.tag_release t2 t3 t4 t5
  • 44. More: Join—Discriminator © 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 44HTTP://WWW.WORKFLOWPATTERNS.COM/PATTERNS/CONTROL/ADVANCED_BRANCHING/WCP9.PHP Discriminator ... t2_test_ubuntu14: action: cicd.deploy_and_test distro="UBUNTU14” on-failure: t5_report_and_fail t3_test_fedora20: action: cicd.deploy_and_test distro="F20" on-failure: t5_report_and_fail t4_test_rhel6: action: cicd.deploy_and_test distro="RHEL6" on-failure: t5_report_and_fail t5_report_and_fail: join: one action: chatops.say header=“FAILURE!” on-complete: fail t2 t3 t4 t5
  • 45. t2t2 More: Multiple Data © 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 45 ... t1_get_ip_list: action: myinventory.allocate_ips num=4 publish: - ip_list: <% $.t1_get_ip_list.ips %> on-complete: t2_create_vms t2_create_vms: with-items: ip in <% $. ip_list %> action: myaws.create_vms ip=<% $.ip %> t1 t2 ip_list=[...]
  • 46. Recap: Key Workflow Operations • Sequence • Data passing • Conditions (on data) • Parallel execution • Joins • Multiple Data Items © 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 46
  • 48. Why not Scripts? 48 • Simple to define, reason, visualize • Transparent – state is clear, execution is trackable: running, complete, failed steps • – – –
  • 49. © 2016 BROCADE COMMUNICATIONS SYSTEMS, INC.49
  • 50. Workflows Better in Operations • Simple to define, reason, visualize • Transparent – state is clear, execution is trackable: running, complete, failed steps • Reliable – Workflows are long-running – Crash tolerance – “Restart from point of failure” © 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 50
  • 51. Why not Legacy RunBook Automation? © 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 51 DevOps: Infrastructure as Code
  • 52. 52 Infrastructure as code Case Study • Automated provisioning, 4 Data centers • Before: CPO, operator updates via GUI, click and pray, x4 • After: BWC, dev -> code review -> staging -> QA-> prod
  • 53. Infrastructure as code © 2016 BROCADE COMMUNICATIONS SYSTEMS, INC.53 Top predictor of IT performance? Version control used by Ops for Ops artifacts!
  • 54. Designed for DevOps 1. Support infrastructure as code 2. Open Source 3. Scale and reliability 4. Part of tool chain 5. Social coding & collaboration 6. More demanding - requires skills 54© 2016 BROCADE COMMUNICATIONS SYSTEMS, INC.
  • 55. Part of tool chain © 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 55 Devops Tools vs Enterprise Suites OR
  • 56. Leverage social coding Community packs @ StackStorm exchange
  • 57. More demanding © 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 57 OR Requires skills – CLI, scripting, understanding
  • 58. Operation Patterns © 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 58 Capture and share operational patters as code!
  • 59. © 2016 BROCADE COMMUNICATIONS SYSTEMS, INC.59
  • 60. © 2016 BROCADE COMMUNICATIONS SYSTEMS, INC.60
  • 61. 61
  • 62. • Event-driven automation works – - benefits to reliable cloud operations • Automation must be reliable and transparent – - workflows beat scripts • Infra as code is a key – - repeatable, testable, reliable automation Summary © 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 62
  • 63. OpenSource Apache 2.0 • Github: github.com/StackStorm/st2 • Twitter: Stack_Storm • IRC: #stackstorm on FreeNode • stackstorm.slack.com on Slack • www.stackstorm.com © 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 63 StackStorm Brocade Workflow Composer Commercial Edition • Enterprise features • Priority support • brocade.com/bwc • docs: bwc-docs.brocade.com • Network lifecycle automation suite
  • 64. Questions & Answers © 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 64

Editor's Notes

  1. And now with Winston. They started using Winston for cassandra auto-remediation, and it grew into remediation-as-a-service. Presentation on QCon. Winston gets the Alert. Using its rule engine decide what the right action is. Action then analyse the issue and if it’s identified as a False Positive, no need to Page the on-call. Another use case is that Winston will identify that it can fix the issue. When it does, again, no need to Page the on-call. Last use case, the one we want you to focus on is Assisted Diagnostics. While the on-call is being Paged, Winston runs a series of pre-defined diagnostics and prepare a report for the On-call so that when he logs in the system, he has comprehensive information like the Discovery status, list of recent exceptions or error, or any other relevant context to help him make a decision faster.
  2. Now let’s talk about workflows
  3. remember that workflow is a part of event driven automation… but a very important part
  4. Sequence: tasks run one after another. Typical remediation sequence: update config, clean the logs, restart the server. Note the workflow definition: name of the task, action with input, transition. Simple, concise, readable YAML.
  5. Data passing: workflow ability carry the data downstream, and efficiently refer those data, is the key. In this example, troubleshooting results obtained by task 1 are published to chatops by task 2. We can refer the task results directly, or “publish” a named variable for convenience. This funny syntax here is YAQL – yet another query langue – we prefered it over JINGA for extensibility and type support.
  6. Simple conditions: simply – deploy app, on success – post to chat, on failure, page admin on call. Conditions can be based on data:
  7. Conditions can be based on data: This workflow runs switch diagnostic action, that may be just a shell script, and act based on the return code. Most common pattern.
  8. And that’s it! In my view, that set of patterns is sufficient. To make it “efficient”, we may want few more patterns.
  9. Parallel task execution. This example is from our own CI: we use stackstorm to build stackstorm. When it is built and packaged, I deploy and test it on 3 operation systems. Obviously, in parallel.
  10. Now that the execution is split in parallel, how to join it? How to get this humpy-dumpy back together again? It’s not easy.
  11. According to workflow patterns, there are 16 ways to join. How many times t5 is going to run, and how, depends of the type of join.
  12. Simple merge. T5 runs 3 times, one for each upstream execution. That’s what I want here: report the completion on each of parallel tasks.
  13. Now: to tag the release, we want the tests on all 3 operation systems passed. That is what “AND” join pattern will do.
  14. If on the other hand, if any of the OS tests fails, we don’t wait for the rest to call it a failure. In this example, t5 also only runs one, but it will do so on whatever upstream tasks comes first, and workflow moves on. This join is called “discriminator”, because US legal compliance people didn’t review workflow pattern language yet…
  15. Finally, “multiple data”. People ask “can workflow have loops”? My answer is “it can but you don’t want it”. If all you need is the same action run on a set of data, use “this pattern”. In Mistal, the keyword for it is “with-items”. Here, task 1 gets the list of available ip addresses from inventory system, and task 2 uses them as an input to create vm action. Here is a cool thing about Mistral workflow: actions run in parallel, AND, you can control concurrency.
  16. That’s it, that’s all you need. This is the minimal set that gives enough power but keeps workflows simple to create, track, and reason.
  17. D Apply devops to automation itself