Presentation by Damon Edwards, co-founder of Rundeck at USENIX LISA in San Francisco, CA on November 3, 2017
See a Demo of Rundeck Enterprise :
https://www.rundeck.com/see-demo
--or--
Download Rundeck Open Source here:
https://rundeck.com/open-source
Connect:
Stack Overflow community: https://stackoverflow.com/questions/tagged/rundeck
Github: https://github.com/rundeck/rundeck/issues
Twitter: https://twitter.com/Rundeck
Facebook: https://www.facebook.com/RundeckInc/
LinkedIn: www.linkedin.com › company › rundeck-inc
Damon Edwards, co-founder of Rundeck, presentation at Nexus Conf 2018 on how Security teams can help Operations and, in turn, help themselves.
See a Demo of Rundeck Enterprise :
https://www.rundeck.com/see-demo
--or--
Download Rundeck Open Source here:
https://rundeck.com/open-source
Connect:
Stack Overflow community: https://stackoverflow.com/questions/tagged/rundeck
Github: https://github.com/rundeck/rundeck/issues
Twitter: https://twitter.com/Rundeck
Facebook: https://www.facebook.com/RundeckInc/
LinkedIn: www.linkedin.com › company › rundeck-inc
Modern Operations: Solving DevOps’ Last Mile Problem Rundeck
Damon Edwards, co-founder of Rundeck, presentation at Nike internal DevOps Day in Beaverton, OR on June 18, 2018.
This talk looks at the forces that fundamentally undermine operations work and what needs to be addressed if enterprises are going to get the most out of their digital transformation and DevOps initiatives.
See a Demo of Rundeck Enterprise :
https://www.rundeck.com/see-demo
--or--
Download Rundeck Open Source here:
https://rundeck.com/open-source
Connect:
Stack Overflow community: https://stackoverflow.com/questions/tagged/rundeck
Github: https://github.com/rundeck/rundeck/issues
Twitter: https://twitter.com/Rundeck
Facebook: https://www.facebook.com/RundeckInc/
LinkedIn: www.linkedin.com › company › rundeck-inc
Operations as a Service: Because Failure Still Happens Rundeck
Presentation by Damon Edwards, co-founder of Rundeck, at All Day DevOps on October 24, 2017.
See a Demo of Rundeck Enterprise :
https://www.rundeck.com/see-demo
--or--
Download Rundeck Open Source here:
https://rundeck.com/open-source
Connect:
Stack Overflow community: https://stackoverflow.com/questions/tagged/rundeck
Github: https://github.com/rundeck/rundeck/issues
Twitter: https://twitter.com/Rundeck
Facebook: https://www.facebook.com/RundeckInc/
LinkedIn: www.linkedin.com › company › rundeck-inc
Keeping Your DevOps Transformation From Crushing Your Ops Capacity Rundeck
Presentation by Damon Edwards, co-founder of Rundeck, at DevOps Enterprise Summit in San Francisco, November 13, 2017
See a Demo of Rundeck Enterprise :
https://www.rundeck.com/see-demo
--or--
Download Rundeck Open Source here:
https://rundeck.com/open-source
Connect:
Stack Overflow community: https://stackoverflow.com/questions/tagged/rundeck
Github: https://github.com/rundeck/rundeck/issues
Twitter: https://twitter.com/Rundeck
Facebook: https://www.facebook.com/RundeckInc/
LinkedIn: www.linkedin.com › company › rundeck-inc
Ops Happens: Improving Incident Response Using DevOps and SRE PracticesRundeck
Damon Edwards, co-founder of Rundeck, presents at Interop ITX in Las Vegas on May 3, 2018.
See a Demo of Rundeck Enterprise :
https://www.rundeck.com/see-demo
--or--
Download Rundeck Open Source here:
https://rundeck.com/open-source
Connect:
Stack Overflow community: https://stackoverflow.com/questions/tagged/rundeck
Github: https://github.com/rundeck/rundeck/issues
Twitter: https://twitter.com/Rundeck
Facebook: https://www.facebook.com/RundeckInc/
LinkedIn: www.linkedin.com › company › rundeck-inc
Self-Service Operations: Because Failure Still Happens (Developer Edition)Rundeck
Keynote presentation at DevNet Create 2017 by Damon Edwards, co-founder of Rundeck.
Agile and DevOps have provided plenty of lessons for how to speed up the pace of application delivery and the frequency of application deployment. But delivery and deployment only covers one part of the day-to-day life of developers in large enterprises. What about what happens after deployment? In many enterprises, increasing the pace of delivery and frequency of deployment has just increased the operational support load, work interrupts, and context switching that were already cutting deeply into development teams' time.
This talk will focus on the successful design patterns that high-performing, large scale organizations have applied to reduce the operational burden and support costs across their entire organization. Specifically, we’ll look at how they apply DevOps principles to improving the post-deployment lifecycle and how Developers play the key role in reducing the difficultly and cost of operations activity for everyone.
Incident Management in the Age of DevOps and SRE Rundeck
Damon Edwards, co-founder of Rundeck, presents at Salt Lake City DevOps Meetup, November 13, 2019.
There is no doubt that DevOps has changed how we deliver software. But what about after deployment? Whether you are in a traditional operations organization or a “you build it, you run it” team, how do you mobilize, resolve, and learn from incidents? This talk will look at how high performing organizations have applied DevOps and SRE practices to shorten incidents and reduce escalations. Less frustration for the engineers. Lower costs for the business. Everybody wins.
See a Demo of Rundeck Enterprise :
https://www.rundeck.com/see-demo
--or--
Download Rundeck Open Source here:
https://rundeck.com/open-source
Connect:
Stack Overflow community: https://stackoverflow.com/questions/tagged/rundeck
Github: https://github.com/rundeck/rundeck/issues
Twitter: https://twitter.com/Rundeck
Facebook: https://www.facebook.com/RundeckInc/
LinkedIn: www.linkedin.com › company › rundeck-inc
SRE for Everyone: Making Tomorrow Better Than Today Rundeck
Keynote presentation at DevOps Days Austin 2019 by Damon Edwards, co-founder of Rundeck.
Wouldn't everyone doing operations work love more time to focus on exciting projects? Build out new platforms, improve performance, contribute to open source projects, pay down tech debt, level-up their automation — all things that add value to your company and advance your career.
But instead, we find ourselves buried in interruptions and repetitive work. Imagine the things you could do, if you just had the time to get to it.
This talk is about applying ideas from the SRE movement that can be applied to any organization. Ideas that can help us all make tomorrow better than today.
See a Demo of Rundeck Enterprise :
https://www.rundeck.com/see-demo
--or--
Download Rundeck Open Source here:
https://rundeck.com/open-source
Connect:
Stack Overflow community: https://stackoverflow.com/questions/tagged/rundeck
Github: https://github.com/rundeck/rundeck/issues
Twitter: https://twitter.com/Rundeck
Facebook: https://www.facebook.com/RundeckInc/
LinkedIn: www.linkedin.com › company › rundeck-inc
Damon Edwards, co-founder of Rundeck, presentation at Nexus Conf 2018 on how Security teams can help Operations and, in turn, help themselves.
See a Demo of Rundeck Enterprise :
https://www.rundeck.com/see-demo
--or--
Download Rundeck Open Source here:
https://rundeck.com/open-source
Connect:
Stack Overflow community: https://stackoverflow.com/questions/tagged/rundeck
Github: https://github.com/rundeck/rundeck/issues
Twitter: https://twitter.com/Rundeck
Facebook: https://www.facebook.com/RundeckInc/
LinkedIn: www.linkedin.com › company › rundeck-inc
Modern Operations: Solving DevOps’ Last Mile Problem Rundeck
Damon Edwards, co-founder of Rundeck, presentation at Nike internal DevOps Day in Beaverton, OR on June 18, 2018.
This talk looks at the forces that fundamentally undermine operations work and what needs to be addressed if enterprises are going to get the most out of their digital transformation and DevOps initiatives.
See a Demo of Rundeck Enterprise :
https://www.rundeck.com/see-demo
--or--
Download Rundeck Open Source here:
https://rundeck.com/open-source
Connect:
Stack Overflow community: https://stackoverflow.com/questions/tagged/rundeck
Github: https://github.com/rundeck/rundeck/issues
Twitter: https://twitter.com/Rundeck
Facebook: https://www.facebook.com/RundeckInc/
LinkedIn: www.linkedin.com › company › rundeck-inc
Operations as a Service: Because Failure Still Happens Rundeck
Presentation by Damon Edwards, co-founder of Rundeck, at All Day DevOps on October 24, 2017.
See a Demo of Rundeck Enterprise :
https://www.rundeck.com/see-demo
--or--
Download Rundeck Open Source here:
https://rundeck.com/open-source
Connect:
Stack Overflow community: https://stackoverflow.com/questions/tagged/rundeck
Github: https://github.com/rundeck/rundeck/issues
Twitter: https://twitter.com/Rundeck
Facebook: https://www.facebook.com/RundeckInc/
LinkedIn: www.linkedin.com › company › rundeck-inc
Keeping Your DevOps Transformation From Crushing Your Ops Capacity Rundeck
Presentation by Damon Edwards, co-founder of Rundeck, at DevOps Enterprise Summit in San Francisco, November 13, 2017
See a Demo of Rundeck Enterprise :
https://www.rundeck.com/see-demo
--or--
Download Rundeck Open Source here:
https://rundeck.com/open-source
Connect:
Stack Overflow community: https://stackoverflow.com/questions/tagged/rundeck
Github: https://github.com/rundeck/rundeck/issues
Twitter: https://twitter.com/Rundeck
Facebook: https://www.facebook.com/RundeckInc/
LinkedIn: www.linkedin.com › company › rundeck-inc
Ops Happens: Improving Incident Response Using DevOps and SRE PracticesRundeck
Damon Edwards, co-founder of Rundeck, presents at Interop ITX in Las Vegas on May 3, 2018.
See a Demo of Rundeck Enterprise :
https://www.rundeck.com/see-demo
--or--
Download Rundeck Open Source here:
https://rundeck.com/open-source
Connect:
Stack Overflow community: https://stackoverflow.com/questions/tagged/rundeck
Github: https://github.com/rundeck/rundeck/issues
Twitter: https://twitter.com/Rundeck
Facebook: https://www.facebook.com/RundeckInc/
LinkedIn: www.linkedin.com › company › rundeck-inc
Self-Service Operations: Because Failure Still Happens (Developer Edition)Rundeck
Keynote presentation at DevNet Create 2017 by Damon Edwards, co-founder of Rundeck.
Agile and DevOps have provided plenty of lessons for how to speed up the pace of application delivery and the frequency of application deployment. But delivery and deployment only covers one part of the day-to-day life of developers in large enterprises. What about what happens after deployment? In many enterprises, increasing the pace of delivery and frequency of deployment has just increased the operational support load, work interrupts, and context switching that were already cutting deeply into development teams' time.
This talk will focus on the successful design patterns that high-performing, large scale organizations have applied to reduce the operational burden and support costs across their entire organization. Specifically, we’ll look at how they apply DevOps principles to improving the post-deployment lifecycle and how Developers play the key role in reducing the difficultly and cost of operations activity for everyone.
Incident Management in the Age of DevOps and SRE Rundeck
Damon Edwards, co-founder of Rundeck, presents at Salt Lake City DevOps Meetup, November 13, 2019.
There is no doubt that DevOps has changed how we deliver software. But what about after deployment? Whether you are in a traditional operations organization or a “you build it, you run it” team, how do you mobilize, resolve, and learn from incidents? This talk will look at how high performing organizations have applied DevOps and SRE practices to shorten incidents and reduce escalations. Less frustration for the engineers. Lower costs for the business. Everybody wins.
See a Demo of Rundeck Enterprise :
https://www.rundeck.com/see-demo
--or--
Download Rundeck Open Source here:
https://rundeck.com/open-source
Connect:
Stack Overflow community: https://stackoverflow.com/questions/tagged/rundeck
Github: https://github.com/rundeck/rundeck/issues
Twitter: https://twitter.com/Rundeck
Facebook: https://www.facebook.com/RundeckInc/
LinkedIn: www.linkedin.com › company › rundeck-inc
SRE for Everyone: Making Tomorrow Better Than Today Rundeck
Keynote presentation at DevOps Days Austin 2019 by Damon Edwards, co-founder of Rundeck.
Wouldn't everyone doing operations work love more time to focus on exciting projects? Build out new platforms, improve performance, contribute to open source projects, pay down tech debt, level-up their automation — all things that add value to your company and advance your career.
But instead, we find ourselves buried in interruptions and repetitive work. Imagine the things you could do, if you just had the time to get to it.
This talk is about applying ideas from the SRE movement that can be applied to any organization. Ideas that can help us all make tomorrow better than today.
See a Demo of Rundeck Enterprise :
https://www.rundeck.com/see-demo
--or--
Download Rundeck Open Source here:
https://rundeck.com/open-source
Connect:
Stack Overflow community: https://stackoverflow.com/questions/tagged/rundeck
Github: https://github.com/rundeck/rundeck/issues
Twitter: https://twitter.com/Rundeck
Facebook: https://www.facebook.com/RundeckInc/
LinkedIn: www.linkedin.com › company › rundeck-inc
SysAdmin to SRE: Solving the Last Mile ProblemRundeck
Presented by Damon Edwards, co-founder of Rundeck, at DevOps Days Dallas on August 20, 2019.
Some DevOps transformations flourish, but others are stalling. Why is that? This talk will make the case that Operations is the most predictable differentiator.
So much of the energy in DevOps has been about activities that start in Dev and move towards Ops — continuous delivery, deployment pipelines, automated testing, and of course, the unofficial mantra of “deploy, deploy, deploy. “However, post-deployment, too many DevOps transformations maintain the status quo and leave questionable Operations practices in place.
Now along comes a new vision for Operations called SRE (a.k.a. Site Reliability Engineering)… But SRE seems almost too good to be true!
SREs are cover much of what systems administrators used to do, but get to spend most of their time doing engineering work that adds enduring value to their company? How is it that SREs’ don’t get caught up in the interruptions, repetitive work, and drudgery that consumes so much of our time? And how do companies use SRE to do so much more with the same or less headcount?
This talk will take a close look at what SRE is, what SRE isn’t, and how SRE avoids the pitfalls that have plagued traditional Ops work. Finally, we’ll break down the principles behind the SRE movement and highlight how early examples are proving that DevOps + SRE = the end-to-end speed and quality promised since the early days of DevOps.
See a Demo of Rundeck Enterprise :
https://www.rundeck.com/see-demo
--or--
Download Rundeck Open Source here:
https://rundeck.com/open-source
Connect:
Stack Overflow community: https://stackoverflow.com/questions/tagged/rundeck
Github: https://github.com/rundeck/rundeck/issues
Twitter: https://twitter.com/Rundeck
Facebook: https://www.facebook.com/RundeckInc/
LinkedIn: www.linkedin.com › company › rundeck-inc
Self-Service Operations: Because Ops Still HappensRundeck
Keynote Presentation by Damon Edwards, co-founder of Rundeck, at DevOps Days Austin , May 4, 2017.
Deployment is a solved problem. Sure there is still work to be done, but the DevOps community has successfully proven that anyone can both scale deployment automation and distribute the capability to execute deployments. Now, we have to turn our attention to the next critical constraint: What happens after deployment?
We all know that failure is inevitable and is coming our way at any moment. How do we respond quickly and effectively to those failures? What works when there is just a small set of teams or an isolated system to manage will quickly break down when the organization grows in size and complexity. But on the other hand, what has been commonly practiced in large-scale enterprises is proving to be too cumbersome, too silo dependent, and simply too slow for today's business needs.
How do we rapidly respond to incidents and recover complex interdependent systems while working within an equally complex and interdependent organization? How do Ops teams embrace the DevOps and Agile inspired demand for speed while maintaining quality and control?
This talk examines the trial-and-error lessons learned by some forward-thinking enterprises who are currently streamlining how they:
-Resolve incidents
-Reduce friction between teams
-Divide up operational responsibilities
-Improve the quality of their ongoing operations (and organizational learning)
See a Demo of Rundeck Enterprise :
https://www.rundeck.com/see-demo
--or--
Download Rundeck Open Source here:
https://rundeck.com/open-source
Connect:
Stack Overflow community: https://stackoverflow.com/questions/tagged/rundeck
Github: https://github.com/rundeck/rundeck/issues
Twitter: https://twitter.com/Rundeck
Facebook: https://www.facebook.com/RundeckInc/
LinkedIn: www.linkedin.com › company › rundeck-inc
Some DevOps transformations flourish, but many others are stalling. Why is that?
Damon Edwards, co-founder at Rundeck, makes the case that Operations is the difference maker.
As presented at Comcast DevOps Days Denver 2019
See a Demo of Rundeck Enterprise :
https://www.rundeck.com/see-demo
--or--
Download Rundeck Open Source here:
https://rundeck.com/open-source
Connect:
Stack Overflow community: https://stackoverflow.com/questions/tagged/rundeck
Github: https://github.com/rundeck/rundeck/issues
Twitter: https://twitter.com/Rundeck
Facebook: https://www.facebook.com/RundeckInc/
LinkedIn: www.linkedin.com › company › rundeck-inc
Helping Ops Help You: Development’s Role in Enabling Self-Service OperationsRundeck
Presented by Damon Edwards, co-founder of Rundeck, at JAX DevOps and Finance London, April 5, 2017.
DevOps has provided plenty of lessons for how to speed up the pace of delivery and frequency of deployments. But, delivery and deployment only covers one part of the day-to-day life for developers in large enterprises.
What about what happens after deployment? In most cases, increasing the pace of delivery and frequency of deployment just increases the operational support load, work interrupts, and context switching that has always cut deeply into a development team’s time.
This talk focuses on the successful design patterns that high-performing, large scale organizations have applied to reduce the operational burden and support costs across their entire organization. Specifically, we’ll look at how they apply DevOps principles to improving the post-deployment lifecycle and how Developers play the key role in reducing the difficultly and cost of operations activity for everyone.
See a Demo of Rundeck Enterprise :
https://www.rundeck.com/see-demo
--or--
Download Rundeck Open Source here:
https://rundeck.com/open-source
Connect:
Stack Overflow community: https://stackoverflow.com/questions/tagged/rundeck
Github: https://github.com/rundeck/rundeck/issues
Twitter: https://twitter.com/Rundeck
Facebook: https://www.facebook.com/RundeckInc/
LinkedIn: www.linkedin.com › company › rundeck-inc
Tickets Make Operations Work Unnecessarily MiserableRundeck
Presentation by Damon Edwards, co-founder of Rundeck, at Interop ITX 2019 Las Vegas
Ticket-driven request queues have become the default way of working in operations for a long time and are the cornerstone of most operations management and ITSM strategies. But what if ticket queues are actually the source of much of the dysfunction, bottlenecks, and capacity issues that have traditionally plagued our organizations? This session will examine the dark side of ticket queues, including hidden costs and how they undermine DevOps and SRE transformations. Then we’ll explore alternative strategies high-performing operations organizations use to minimize dependence on tickets queues.
Incident Management in the Age of DevOps and SRE Rundeck
Presented by Damon Edwards, co-founder of Rundeck, at QCon San Francisco 2019.
See a Demo of Rundeck Enterprise :
https://www.rundeck.com/see-demo
--or--
Download Rundeck Open Source here:
https://rundeck.com/open-source
Connect:
Stack Overflow community: https://stackoverflow.com/questions/tagged/rundeck
Github: https://github.com/rundeck/rundeck/issues
Twitter: https://twitter.com/Rundeck
Facebook: https://www.facebook.com/RundeckInc/
LinkedIn: www.linkedin.com › company › rundeck-inc
Clearing the Way For SRE In the Enterprise Rundeck
As presented by Damon Edwards, co-founder of Rundeck, at SREcon in Dusseldorf, Germany on 30 Aug 2018.
Video available here:
https://www.usenix.org/conference/srecon18europe/presentation/edwards
See a Demo of Rundeck Enterprise :
https://www.rundeck.com/see-demo
--or--
Download Rundeck Open Source here:
https://rundeck.com/open-source
Connect:
Stack Overflow community: https://stackoverflow.com/questions/tagged/rundeck
Github: https://github.com/rundeck/rundeck/issues
Twitter: https://twitter.com/Rundeck
Facebook: https://www.facebook.com/RundeckInc/
LinkedIn: www.linkedin.com › company › rundeck-inc
Making Tomorrow Better than Today - Unlocking the Full Potential of OperationsRundeck
Keynote presentation by Damon Edward, co-founder of Rundeck, at DevOps Days Salt Lake City on May 15, 2019.
See a Demo of Rundeck Enterprise :
https://www.rundeck.com/see-demo
--or--
Download Rundeck Open Source here:
https://rundeck.com/open-source
Connect:
Stack Overflow community: https://stackoverflow.com/questions/tagged/rundeck
Github: https://github.com/rundeck/rundeck/issues
Twitter: https://twitter.com/Rundeck
Facebook: https://www.facebook.com/RundeckInc/
LinkedIn: www.linkedin.com › company › rundeck-inc
Incident Management in the Age of DevOps and SRE Rundeck
Keynote presentation at DevOps Con Munich, December 3, 2019, presented by Damon Edwards, co-founder of Rundeck.
Responding to incidents has always been the core job of Operations. With the rise of DevOps and SRE, how Operations work gets done — and who is doing the work — is changing. This talk will look at how high-performing organizations are applying DevOps and SRE practices to shorten incidents and reduce escalations.
See a Demo of Rundeck Enterprise :
https://www.rundeck.com/see-demo
--or--
Download Rundeck Open Source here:
https://rundeck.com/open-source
Connect:
Stack Overflow community: https://stackoverflow.com/questions/tagged/rundeck
Github: https://github.com/rundeck/rundeck/issues
Twitter: https://twitter.com/Rundeck
Facebook: https://www.facebook.com/RundeckInc/
LinkedIn: www.linkedin.com › company › rundeck-inc
Damon Edwards, co-founder of Rundeck, presentation at NewOps Days in Raleigh, NC on December 4, 2018.
See a Demo of Rundeck Enterprise :
https://www.rundeck.com/see-demo
--or--
Download Rundeck Open Source here:
https://rundeck.com/open-source
Connect:
Stack Overflow community: https://stackoverflow.com/questions/tagged/rundeck
Github: https://github.com/rundeck/rundeck/issues
Twitter: https://twitter.com/Rundeck
Facebook: https://www.facebook.com/RundeckInc/
LinkedIn: www.linkedin.com › company › rundeck-inc
How to bootstrap an SRE team into your company. How to hire them, what to have them work on and how to interact with them as a team. Finally some thought on general practices to consider before your SREs arrive. There are also kitten pictures.
SysAdmin to SRE: Creating Capacity to Make Tomorrow Better Than Today Rundeck
Damon Edwards, co-founder of Rundeck, talks at SCALE 17x on March 9, 2019 in Pasadena, CA.
Wouldn't everyone in operations love more time to work on exciting projects? Build out new platforms, improve performance, contribute to open source projects focus on security, level-up their automation — all things that add value to your companies and advance your career. But instead, the life of a traditional systems administrator is often buried in interruptions and repetitive work. Imagine the things you could do, if you just had the time to get to it.
Then along comes a new way of working and a new role called Site Reliability Engineering (SRE). But SRE almost seems too good to be true! People are doing what systems administrators used to do, but getting to spend more than 50% of their time doing engineering work that adds enduring value to their company? How can less than half of these SREs' time be wasted on the interruptions, repetitive work, and drudgery that seem to consume most of the traditional systems administrator's time? And do this with the same or less headcount?
This talk will first take a close look at what SRE is and what SRE isn't. We will break down the principles behind the SRE movement and highlight where SRE departs from the current conventional wisdom of Operations and Systems Administration work. You'll learn about key concepts like Toil, SLOs, Error Budgets, and Shared Responsibility Models.
Next, we'll look at how to move to an SRE style of working. We'll look at how traditional operations beliefs and practices can leave organizational scar tissue that is difficult to overcome. We'll examine examples of how silos, excessive toil, reliance on queues, and incorrectly applied governance models undermine the adoption of SRE principles and practices in the enterprise. We'll also look at the individual skills and mindset changes that you'll need to adopt an SRE way of working.
You'll leave this talk with an appreciation for how SRE can create the capacity you need to make tomorrow better than today.
See a Demo of Rundeck Enterprise :
https://www.rundeck.com/see-demo
--or--
Download Rundeck Open Source here:
https://rundeck.com/open-source
Connect:
Stack Overflow community: https://stackoverflow.com/questions/tagged/rundeck
Github: https://github.com/rundeck/rundeck/issues
Twitter: https://twitter.com/Rundeck
Facebook: https://www.facebook.com/RundeckInc/
LinkedIn: www.linkedin.com › company › rundeck-inc
Empower Devs, Simplify Ops, and Accelerate your Digital TransformationRundeck
From development to operations, interrupt-driven work and failed deployments erode productivity and reduces time available to innovate and add strategic value. Join Damon Edwards, Chief Product Officer at Rundeck and Anders Wallgren, VP of Technology Strategy at CloudBees as they discuss proven practices that can help teams:
Deliver software services more efficiently, without the toil
Use self-service to avoid “death by a thousand service desk tickets”
Eliminate variance and drift and while increasing security and uptime
See a Demo of Rundeck Enterprise :
https://www.rundeck.com/see-demo
--or--
Download Rundeck Open Source here:
https://rundeck.com/open-source
Connect:
Stack Overflow community: https://stackoverflow.com/questions/tagged/rundeck
Github: https://github.com/rundeck/rundeck/issues
Twitter: https://twitter.com/Rundeck
Facebook: https://www.facebook.com/RundeckInc/
LinkedIn: www.linkedin.com › company › rundeck-inc
Operations: The Last Mile Problem For DevOpsRundeck
Presented by Damon Edwards, co-founder of Rundeck, at DevOps Enterprise Summit London 2018
View the video here:
https://www.youtube.com/watch?v=dp76E7j0FdQ&t=755s
See a Demo of Rundeck Enterprise :
https://www.rundeck.com/see-demo
--or--
Download Rundeck Open Source here:
https://rundeck.com/open-source
Connect:
Stack Overflow community: https://stackoverflow.com/questions/tagged/rundeck
Github: https://github.com/rundeck/rundeck/issues
Twitter: https://twitter.com/Rundeck
Facebook: https://www.facebook.com/RundeckInc/
LinkedIn: www.linkedin.com › company › rundeck-inc
Continuously Deploying Culture: Scaling Culture at Etsy - Velocity Europe 2012Patrick McDonnell
There was a time not long ago when Etsy was laden with barriers, silos, broken communication, and noncooperation. This talk will focus on the various stages of Etsy's cultural development from the early days to present. We will tell of how Etsy overcame numerous challenges and built a strong company culture while continuing to scale.
Chaos Engineering: Why the World Needs More Resilient SystemsC4Media
Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/2luk9iS.
Tammy Butow shares her experiences using chaos engineering to build resilient systems, when they couldn’t build their systems from scratch. Filmed at qconlondon.com.
Tammy Butow is a Principal SRE at Gremlin where she works on Chaos Engineering, the facilitation of controlled experiments to identify systemic weaknesses. Previously, she led SRE teams at Dropbox responsible for Databases and Storage systems used by over 500 million customers.
SysAdmin to SRE: Solving the Last Mile ProblemRundeck
Presented by Damon Edwards, co-founder of Rundeck, at DevOps Days Dallas on August 20, 2019.
Some DevOps transformations flourish, but others are stalling. Why is that? This talk will make the case that Operations is the most predictable differentiator.
So much of the energy in DevOps has been about activities that start in Dev and move towards Ops — continuous delivery, deployment pipelines, automated testing, and of course, the unofficial mantra of “deploy, deploy, deploy. “However, post-deployment, too many DevOps transformations maintain the status quo and leave questionable Operations practices in place.
Now along comes a new vision for Operations called SRE (a.k.a. Site Reliability Engineering)… But SRE seems almost too good to be true!
SREs are cover much of what systems administrators used to do, but get to spend most of their time doing engineering work that adds enduring value to their company? How is it that SREs’ don’t get caught up in the interruptions, repetitive work, and drudgery that consumes so much of our time? And how do companies use SRE to do so much more with the same or less headcount?
This talk will take a close look at what SRE is, what SRE isn’t, and how SRE avoids the pitfalls that have plagued traditional Ops work. Finally, we’ll break down the principles behind the SRE movement and highlight how early examples are proving that DevOps + SRE = the end-to-end speed and quality promised since the early days of DevOps.
See a Demo of Rundeck Enterprise :
https://www.rundeck.com/see-demo
--or--
Download Rundeck Open Source here:
https://rundeck.com/open-source
Connect:
Stack Overflow community: https://stackoverflow.com/questions/tagged/rundeck
Github: https://github.com/rundeck/rundeck/issues
Twitter: https://twitter.com/Rundeck
Facebook: https://www.facebook.com/RundeckInc/
LinkedIn: www.linkedin.com › company › rundeck-inc
Self-Service Operations: Because Ops Still HappensRundeck
Keynote Presentation by Damon Edwards, co-founder of Rundeck, at DevOps Days Austin , May 4, 2017.
Deployment is a solved problem. Sure there is still work to be done, but the DevOps community has successfully proven that anyone can both scale deployment automation and distribute the capability to execute deployments. Now, we have to turn our attention to the next critical constraint: What happens after deployment?
We all know that failure is inevitable and is coming our way at any moment. How do we respond quickly and effectively to those failures? What works when there is just a small set of teams or an isolated system to manage will quickly break down when the organization grows in size and complexity. But on the other hand, what has been commonly practiced in large-scale enterprises is proving to be too cumbersome, too silo dependent, and simply too slow for today's business needs.
How do we rapidly respond to incidents and recover complex interdependent systems while working within an equally complex and interdependent organization? How do Ops teams embrace the DevOps and Agile inspired demand for speed while maintaining quality and control?
This talk examines the trial-and-error lessons learned by some forward-thinking enterprises who are currently streamlining how they:
-Resolve incidents
-Reduce friction between teams
-Divide up operational responsibilities
-Improve the quality of their ongoing operations (and organizational learning)
See a Demo of Rundeck Enterprise :
https://www.rundeck.com/see-demo
--or--
Download Rundeck Open Source here:
https://rundeck.com/open-source
Connect:
Stack Overflow community: https://stackoverflow.com/questions/tagged/rundeck
Github: https://github.com/rundeck/rundeck/issues
Twitter: https://twitter.com/Rundeck
Facebook: https://www.facebook.com/RundeckInc/
LinkedIn: www.linkedin.com › company › rundeck-inc
Some DevOps transformations flourish, but many others are stalling. Why is that?
Damon Edwards, co-founder at Rundeck, makes the case that Operations is the difference maker.
As presented at Comcast DevOps Days Denver 2019
See a Demo of Rundeck Enterprise :
https://www.rundeck.com/see-demo
--or--
Download Rundeck Open Source here:
https://rundeck.com/open-source
Connect:
Stack Overflow community: https://stackoverflow.com/questions/tagged/rundeck
Github: https://github.com/rundeck/rundeck/issues
Twitter: https://twitter.com/Rundeck
Facebook: https://www.facebook.com/RundeckInc/
LinkedIn: www.linkedin.com › company › rundeck-inc
Helping Ops Help You: Development’s Role in Enabling Self-Service OperationsRundeck
Presented by Damon Edwards, co-founder of Rundeck, at JAX DevOps and Finance London, April 5, 2017.
DevOps has provided plenty of lessons for how to speed up the pace of delivery and frequency of deployments. But, delivery and deployment only covers one part of the day-to-day life for developers in large enterprises.
What about what happens after deployment? In most cases, increasing the pace of delivery and frequency of deployment just increases the operational support load, work interrupts, and context switching that has always cut deeply into a development team’s time.
This talk focuses on the successful design patterns that high-performing, large scale organizations have applied to reduce the operational burden and support costs across their entire organization. Specifically, we’ll look at how they apply DevOps principles to improving the post-deployment lifecycle and how Developers play the key role in reducing the difficultly and cost of operations activity for everyone.
See a Demo of Rundeck Enterprise :
https://www.rundeck.com/see-demo
--or--
Download Rundeck Open Source here:
https://rundeck.com/open-source
Connect:
Stack Overflow community: https://stackoverflow.com/questions/tagged/rundeck
Github: https://github.com/rundeck/rundeck/issues
Twitter: https://twitter.com/Rundeck
Facebook: https://www.facebook.com/RundeckInc/
LinkedIn: www.linkedin.com › company › rundeck-inc
Tickets Make Operations Work Unnecessarily MiserableRundeck
Presentation by Damon Edwards, co-founder of Rundeck, at Interop ITX 2019 Las Vegas
Ticket-driven request queues have become the default way of working in operations for a long time and are the cornerstone of most operations management and ITSM strategies. But what if ticket queues are actually the source of much of the dysfunction, bottlenecks, and capacity issues that have traditionally plagued our organizations? This session will examine the dark side of ticket queues, including hidden costs and how they undermine DevOps and SRE transformations. Then we’ll explore alternative strategies high-performing operations organizations use to minimize dependence on tickets queues.
Incident Management in the Age of DevOps and SRE Rundeck
Presented by Damon Edwards, co-founder of Rundeck, at QCon San Francisco 2019.
See a Demo of Rundeck Enterprise :
https://www.rundeck.com/see-demo
--or--
Download Rundeck Open Source here:
https://rundeck.com/open-source
Connect:
Stack Overflow community: https://stackoverflow.com/questions/tagged/rundeck
Github: https://github.com/rundeck/rundeck/issues
Twitter: https://twitter.com/Rundeck
Facebook: https://www.facebook.com/RundeckInc/
LinkedIn: www.linkedin.com › company › rundeck-inc
Clearing the Way For SRE In the Enterprise Rundeck
As presented by Damon Edwards, co-founder of Rundeck, at SREcon in Dusseldorf, Germany on 30 Aug 2018.
Video available here:
https://www.usenix.org/conference/srecon18europe/presentation/edwards
See a Demo of Rundeck Enterprise :
https://www.rundeck.com/see-demo
--or--
Download Rundeck Open Source here:
https://rundeck.com/open-source
Connect:
Stack Overflow community: https://stackoverflow.com/questions/tagged/rundeck
Github: https://github.com/rundeck/rundeck/issues
Twitter: https://twitter.com/Rundeck
Facebook: https://www.facebook.com/RundeckInc/
LinkedIn: www.linkedin.com › company › rundeck-inc
Making Tomorrow Better than Today - Unlocking the Full Potential of OperationsRundeck
Keynote presentation by Damon Edward, co-founder of Rundeck, at DevOps Days Salt Lake City on May 15, 2019.
See a Demo of Rundeck Enterprise :
https://www.rundeck.com/see-demo
--or--
Download Rundeck Open Source here:
https://rundeck.com/open-source
Connect:
Stack Overflow community: https://stackoverflow.com/questions/tagged/rundeck
Github: https://github.com/rundeck/rundeck/issues
Twitter: https://twitter.com/Rundeck
Facebook: https://www.facebook.com/RundeckInc/
LinkedIn: www.linkedin.com › company › rundeck-inc
Incident Management in the Age of DevOps and SRE Rundeck
Keynote presentation at DevOps Con Munich, December 3, 2019, presented by Damon Edwards, co-founder of Rundeck.
Responding to incidents has always been the core job of Operations. With the rise of DevOps and SRE, how Operations work gets done — and who is doing the work — is changing. This talk will look at how high-performing organizations are applying DevOps and SRE practices to shorten incidents and reduce escalations.
See a Demo of Rundeck Enterprise :
https://www.rundeck.com/see-demo
--or--
Download Rundeck Open Source here:
https://rundeck.com/open-source
Connect:
Stack Overflow community: https://stackoverflow.com/questions/tagged/rundeck
Github: https://github.com/rundeck/rundeck/issues
Twitter: https://twitter.com/Rundeck
Facebook: https://www.facebook.com/RundeckInc/
LinkedIn: www.linkedin.com › company › rundeck-inc
Damon Edwards, co-founder of Rundeck, presentation at NewOps Days in Raleigh, NC on December 4, 2018.
See a Demo of Rundeck Enterprise :
https://www.rundeck.com/see-demo
--or--
Download Rundeck Open Source here:
https://rundeck.com/open-source
Connect:
Stack Overflow community: https://stackoverflow.com/questions/tagged/rundeck
Github: https://github.com/rundeck/rundeck/issues
Twitter: https://twitter.com/Rundeck
Facebook: https://www.facebook.com/RundeckInc/
LinkedIn: www.linkedin.com › company › rundeck-inc
How to bootstrap an SRE team into your company. How to hire them, what to have them work on and how to interact with them as a team. Finally some thought on general practices to consider before your SREs arrive. There are also kitten pictures.
SysAdmin to SRE: Creating Capacity to Make Tomorrow Better Than Today Rundeck
Damon Edwards, co-founder of Rundeck, talks at SCALE 17x on March 9, 2019 in Pasadena, CA.
Wouldn't everyone in operations love more time to work on exciting projects? Build out new platforms, improve performance, contribute to open source projects focus on security, level-up their automation — all things that add value to your companies and advance your career. But instead, the life of a traditional systems administrator is often buried in interruptions and repetitive work. Imagine the things you could do, if you just had the time to get to it.
Then along comes a new way of working and a new role called Site Reliability Engineering (SRE). But SRE almost seems too good to be true! People are doing what systems administrators used to do, but getting to spend more than 50% of their time doing engineering work that adds enduring value to their company? How can less than half of these SREs' time be wasted on the interruptions, repetitive work, and drudgery that seem to consume most of the traditional systems administrator's time? And do this with the same or less headcount?
This talk will first take a close look at what SRE is and what SRE isn't. We will break down the principles behind the SRE movement and highlight where SRE departs from the current conventional wisdom of Operations and Systems Administration work. You'll learn about key concepts like Toil, SLOs, Error Budgets, and Shared Responsibility Models.
Next, we'll look at how to move to an SRE style of working. We'll look at how traditional operations beliefs and practices can leave organizational scar tissue that is difficult to overcome. We'll examine examples of how silos, excessive toil, reliance on queues, and incorrectly applied governance models undermine the adoption of SRE principles and practices in the enterprise. We'll also look at the individual skills and mindset changes that you'll need to adopt an SRE way of working.
You'll leave this talk with an appreciation for how SRE can create the capacity you need to make tomorrow better than today.
See a Demo of Rundeck Enterprise :
https://www.rundeck.com/see-demo
--or--
Download Rundeck Open Source here:
https://rundeck.com/open-source
Connect:
Stack Overflow community: https://stackoverflow.com/questions/tagged/rundeck
Github: https://github.com/rundeck/rundeck/issues
Twitter: https://twitter.com/Rundeck
Facebook: https://www.facebook.com/RundeckInc/
LinkedIn: www.linkedin.com › company › rundeck-inc
Empower Devs, Simplify Ops, and Accelerate your Digital TransformationRundeck
From development to operations, interrupt-driven work and failed deployments erode productivity and reduces time available to innovate and add strategic value. Join Damon Edwards, Chief Product Officer at Rundeck and Anders Wallgren, VP of Technology Strategy at CloudBees as they discuss proven practices that can help teams:
Deliver software services more efficiently, without the toil
Use self-service to avoid “death by a thousand service desk tickets”
Eliminate variance and drift and while increasing security and uptime
See a Demo of Rundeck Enterprise :
https://www.rundeck.com/see-demo
--or--
Download Rundeck Open Source here:
https://rundeck.com/open-source
Connect:
Stack Overflow community: https://stackoverflow.com/questions/tagged/rundeck
Github: https://github.com/rundeck/rundeck/issues
Twitter: https://twitter.com/Rundeck
Facebook: https://www.facebook.com/RundeckInc/
LinkedIn: www.linkedin.com › company › rundeck-inc
Operations: The Last Mile Problem For DevOpsRundeck
Presented by Damon Edwards, co-founder of Rundeck, at DevOps Enterprise Summit London 2018
View the video here:
https://www.youtube.com/watch?v=dp76E7j0FdQ&t=755s
See a Demo of Rundeck Enterprise :
https://www.rundeck.com/see-demo
--or--
Download Rundeck Open Source here:
https://rundeck.com/open-source
Connect:
Stack Overflow community: https://stackoverflow.com/questions/tagged/rundeck
Github: https://github.com/rundeck/rundeck/issues
Twitter: https://twitter.com/Rundeck
Facebook: https://www.facebook.com/RundeckInc/
LinkedIn: www.linkedin.com › company › rundeck-inc
Continuously Deploying Culture: Scaling Culture at Etsy - Velocity Europe 2012Patrick McDonnell
There was a time not long ago when Etsy was laden with barriers, silos, broken communication, and noncooperation. This talk will focus on the various stages of Etsy's cultural development from the early days to present. We will tell of how Etsy overcame numerous challenges and built a strong company culture while continuing to scale.
Chaos Engineering: Why the World Needs More Resilient SystemsC4Media
Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/2luk9iS.
Tammy Butow shares her experiences using chaos engineering to build resilient systems, when they couldn’t build their systems from scratch. Filmed at qconlondon.com.
Tammy Butow is a Principal SRE at Gremlin where she works on Chaos Engineering, the facilitation of controlled experiments to identify systemic weaknesses. Previously, she led SRE teams at Dropbox responsible for Databases and Storage systems used by over 500 million customers.
This is the video of me giving this presentation: https://youtu.be/fxorF66SDmo
What does a lead developer do, and how do they function on a software development team? This videos takes you through an average lead developer's calendar and their daily interactions, in order to gain a better insight into their function and methodologies.
BizOps Done Right: Breaking DevOps Silos to Deliver Great User ExperiencesKlaus Enzenhofer
Learn how breaking down IT silos and business data barriers within your DevOps strategy can directly improve your customer experience and provide significant value in agility, business responsiveness and success. You’ll learn:
How companies are connecting “feedback loops” across their digital delivery pipelines to improve business synchronization and success
pecific areas in the delivery pipeline where you can add best practices for better performing touchpoints and faster responding apps – removing the possibility of frustrated end-users
Insight into how you can gain a holistic view across the digital user journey, for increased visibility into decisions and improvements
Chicago Docker Meetup Presentation - MediaflyMediafly
Bryan Murphy's presentation from the 2nd Chicago Docker meetup on March 12, 2014 at Mediafly HQ. In his presentation, Bryan explains how we use Docker right now at Mediafly in production.
SaltConf 2015: Salt stack at web scale: Better, Stronger, FasterThomas Jackson
This talk will discuss best practices for scaling SaltStack from thousands to hundreds of thousands of minions. But the devil is in the details and how do you scale without losing performance and making sure it all works? At LinkedIn we've learned some valuable lessons as we've grown our SaltStack footprint. We'll discuss how to run SaltStack, how to not run SaltStack, and how we've contributed to the Salt project to help make it better, stronger and faster.
Youtube: https://www.youtube.com/watch?v=qjFOY-QrW_k
The DevOps Pay Raise: Quantifying Your Value to Move Up the Laddertlevey
DevOps, when done right, usually goes unnoticed. It's only when something breaks that all eyes turn to IT.
If your boss only sees you when the app is down, however, that's not really doing your career any favors. In this session we'll talk about how to prove your value to the organization by looking at the positive side -- that is, how much money you've saved your company.
Continuous Delivery for Python Developers – PyCon OttoPeter Bittner
Continuous Delivery sounds easy in theory, but it’s hard to do in practice. There are myriads of things you can and should do to get your code delivered faster, reliably. We look at what we can do as Python developers, or as a small or mid-sized team to make the industrialized software development production chain come true.
https://getnobullshit.com/
Nous savons tous développer une API mais avons-nous bien intégré toutes les problématiques?
Son aspect organisationnel et humain, sa gouvernance, ses contraintes business et d'opérabilité (SLA, SLO, SLI), son release management, ses méthodes de requêtage, sa sécurité (ses performances, sa mise à l'échelle), ses différents types de test, sa documentation, son versioning (compatibilité, changelog), son monitoring — et bien plus encore — de cette API une fois en production ?
Durant ce talk, c'est plus de 30 points d'attentions rarement évoqué que je vous propose d'aborder, à la lumière de retours d'expériences provenant de tech-leader comme Uber, Stripe, Facebook et Google mais aussi d'entreprise française de la petite startup à la PME.
Continuous Delivery and Automated Operations on k8s with keptnAndreas Grabner
Slidedeck from Vienna DevOps & Security Meetup. This talk is keptn - an open source event driven control plane for continuous delivery and automated operations for kubernetes
FIPS 140-2 is a cryptographic functionality standard that is often required when handling sensitive information, particularly in government and regulated industries. Support for FIPS was a community request that has been addressed thanks to numerous contributors. Come hear about the journey taken by the Node.js community to allow building and testing a FIPS capable runtime. The talk will cover FIPS requirements, and the specific enablement, test updates, and changes to dependencies that were required. We’ll include code samples and walkthroughs, and end with an overview of how to use FIPS capable Node.js through sample deployments in the cloud.
“Serverless” can be defined as a couple simple things: 1 - It’s a programming model for structuring applications as functions and events (basically a manifestation of microservices). 2 - It’s a cloud business model, where use is billed by the function call instead of by the provisioned server, so apps only pay when they run and for how long they run, eliminating over-provisioning and typically reducing costs.
In this talk, we’ll cover the what, why and how of serverless, and learn more about it through running code.
Throughout the session, we’ll focus on how the serverless model is being leveraged in the real world - not just toy functions and demos. Legacy enterprise apps - which are typically monolithic, written by large teams of Java and .Net devs, and resembling a bit of a mud ball - are being shaved down to take advantage of serverless, and we’ll be sharing some early results from those efforts. We'll discuss examples of how Fortune 50 companies are building their serverless projects on the Kubernetes and Mesos clouds they have already deployed.
Le terme “Serverless” a plusieurs significations: 1 - un modèle de programmation pour structurer les applications en tant que fonctions et événements (essentiellement une manifestation de microservices); et 2 - Il s'agit d'un modèle d'entreprise Cloud, où l'utilisation est facturée par l'appel de fonction plutôt que par le serveur provisionné, de sorte que les applications ne paient que lorsqu'elles fonctionnent et pour combien de temps elles courent, éliminant le sur-provisionnement et réduisant les coûts associés.
Dans ce discours, nous allons couvrir le quoi, le pourquoi et comment de Serverless, et en savoir plus à ce sujet en exécutant le code. Nous nous concentrerons sur la façon dont le modèle Serverless est utilisé dans le monde réel - pas seulement les fonctions et démos. Les applications d'entreprise héritées - qui sont généralement monolithiques, écrites par de grandes équipes de développeurs Java et .Net et ressemblant à un peu une grande boule de boue - sont rasées pour profiter de Serverless, et nous partagerons des résultats préliminaires de ces efforts.
2014-10 DevOps NFi - Why it's a good idea to deploy 10 times per day v1.0Joakim Lindbom
Corporations are struggling with overly complex systems and system landscapes. DevOps is presented as one piece of the puzzle to go for much leaner and simpler landscapes - all in order to increase the readiness for change and innovation.
The presentation also discusses the the basic thought error behind organising according to Design-Build-Run, which is the basis for most ICT IM outsourcing.
Rundeck Community Office Hours: Using Variables with Job Steps Rundeck
Rundeck offers powerful runbook automation. Most Runbooks are complicated multi-step processes. We will show various examples of how to share data from one step to another through the use of Log Filters.
Come join this session to learn how to:
Use different types of Log Filters to gather variables from your Job Steps
Gather variables and use the values in other Job Steps
Use the Result Data feature to format your output in a consistent format regardless of the log output.
Most of what Rundeck does is via one of it’s plugins. There are already over 100+ plugins to perform various services including executing commands on nodes, performing step in a workflow, or sending notification about job status. There may be instances where you need to write your own plugin to perform a specific step or action. In this session, will walk through the steps for writing our own plugin.
In this session you'll learn:
Review the structure of plugin
How to use the structure and what information you need to include in other files to make your plugin work
How to write a simple plugin example using java
How to reply and use your plugin
Lunch and learn: Getting started with Rundeck & AnsibleRundeck
Operations teams depend on a mixture of tools to keep their systems running. One popular pairing for Rundeck users is integrating Ansible playbooks into Rundeck to orchestrate and schedule workflows across multiple tools.
Join us for this Lunch and Learn event to learn how you can use Rundeck to create runbooks that span your existing Ansible playbooks -- as well as any other scripts, tools, APIs, or systems commands, to respond to incidents or perform Operations tasks.
Join us to learn:
Benefits of using Rundeck and Ansible together
How to configure your Rundeck to use the Ansible plugin
Tips for getting started with the integration
And see a demo of the integration
This event is recommended for beginners.
Self Service Cloud Operations: Safely Delegate the Management of your Cloud ...Rundeck
Running Operations is not an easy job, especially these days. Ops teams have to ensure excellent user experiences, resolve incidents quickly and help developers stay productive. Yet at the same time, there is also the need to maintain systems security and keep downtime to a minimum.
While advances in cloud computing have helped address some of these challenges, many organizations find it difficult to leverage the cloud at scale because of bottlenecks that form around repetitive tasks, such as developers having to wait for provisioning infrastructure. Despite having access to abundant cloud resources, these speedbumps often make it difficult to achieve team objectives.
Join this talk to learn:
How to safely delegate the management of your cloud deployment (to developers and other end users) with self-service operations.
How to create powerful runbooks with guardrails that leverage existing scripting languages, infrastructure, and tools to remove bottlenecks that form around repetitive tasks.
Strategies for getting started with self-service.
Rundeck Office Hours: Best Practices Access Control PoliciesRundeck
Join us this month for an AMA discussion followed by a live Q&A led by technical experts from Rundeck’s engineering, product, and solution engineering teams. Experts are available to provide advice on your technical architecture, give recommendations for operational best practices, review current Github issues, or dive into the open source code itself.
Don’t miss the opportunity to learn Rundeck product best practices and ask experts your questions about Rundeck.
https://www.rundeck.com/rundeck-office-hours
Secure IT infrastructure is well protected by access keys, passwords, and other credentials. Admins need these secrets to gain access, as does any automation executed by Rundeck. Rundeck has rich support for secrets management with native key storage, as well as integrations with best-of-breed standardized solutions. In this webinar, we’ll cover best practices for working with Rundeck’s runbook automation platform in securing IT infrastructure. We’ll explore the secrets management options in Rundeck and we’ll highlight a new plugin with Thycotic Secret Server for Privileged Access Management.
In this webinar, we will demonstrate:
How Rundeck works with underlying secrets of the systems it manages
New Rundeck plugins that allow users to protect privileged accounts with enterprise-grade, privileged access management solutions
How you can use Rundeck plugins with HashiCorp Vault, Thycotic, and CyberArk as keys for jobs and other Rundeck configurations
In this session we will give a live walkthrough covering new capabilities released in Rundeck 3.4. Learn about security & compliance improvements we’ve made including the ability to organize secrets management by project -- so now each Runbook can access a different set of passwords and keys for its access control list (ACL). We also have a new plug-in for Thycotic users to manage secrets. Rundeck 3.4 now allows for queueing of jobs when those jobs must be run serially. Finally, we’ll discuss our vision for the future of Rundeck, and our primary development themes for the next year.
Automate Yourself Out of a Job: Safely Delegate the Management of your Azure...Rundeck
Running Operations is not an easy job, especially these days. Ops teams have to ensure excellent user experiences, resolve incidents quickly and help developers stay productive. Yet at the same time, there is also the need to maintain systems security and keep downtime to a minimum - goals which many struggle with at scale.
While advances in cloud computing have helped address some of these challenges, many organizations find it difficult to leverage the cloud at scale because of bottlenecks that form around repetitive tasks, such as developers having to wait for provisioning infrastructure. Despite having access to abundant cloud resources, these speedbumps often make it difficult or impossible to achieve team objectives.
Join this talk to learn:
-How to safely delegate the management of your Azure deployment (to developers and other colleagues) with self-service operations.
-How to create powerful runbooks with guardrails that leverage existing scripting languages (including PowerShell), infrastructure, and tools to remove the human from the bottleneck that forms around repetitive tasks.
-Strategies for getting started
-And how to create an Easy Button to handle the repetitive tasks that are interrupting your flow of work.
As presented by Jesse Houldsworth at PowerShell + DevOps Global Summit 2021
Super-Charge Your Site Reliability Practices with Runbook Automation Rundeck
On Demand Viewing: https://www.rundeck.com/super-charge-reliability
To win in today’s digital age, organizations need to balance product reliability and feature delivery with dynamic business needs and legacy and multi-cloud environments. Automation, as a main SRE practice, scales product reliability practices by reducing tedious tasks related to production operations, freeing up engineers to work on innovation.
Whether you are in a traditional operations organization or a “you build it, you run it” team, this webinar will explore strategies for increasing automation to improve your Operations so you can continue to create excellent experiences for your customers.
-How you can reduce MTTR and eliminate toil with Self-Service Operations
-Common workflow challenges and opportunities
-How you can use Runbook Automation to enable Self-Service Operations
-Ways to leverage existing assets and workflows by integrating Rundeck with existing toolsets
-See a demo of real world cases
https://youtu.be/4jAf6cbxsgo
As operators, it’s our job to monitor infrastructure, systems and applications and only wake up humans for tasks machines can’t fix on their own. Automated remediation pairs monitoring and runbook automation, giving you a monitoring system that can trigger operational actions with runbook automation to shorten incident response times and avoid alert fatigue.
Rundeck Director of Product Management Forrest Evans and Sensu Developer Advocate Todd Campbell discuss the key role automated remediation plays in the monitoring journey, with live demos of both the Rundeck and Sensu integrations. You’ll learn all about monitoring as code workflows with the Sensu Observability Pipeline and how to deliver runbook automation with Rundeck — and see how the two together can help you achieve automated remediation.
Failure is inevitable. But are you incurring more downtime and disruption than necessary? Legacy incident response techniques have difficulty keeping up with the increasing pace of change and skyrocketing complexity of today’s application environments.
During this webinar, you’ll learn about modern incident response techniques that can dramatically shorten incidents and reduce escalations. Join the experts from Rundeck and PagerDuty as they share:
*How a real-time operations platform intelligently manages alerts and on-call mobilization, delivering the right people the right information at the right time
*How runbook automation gives front-line response teams self-service access to run automated workflows – or runbooks – that diagnose and resolve incidents without escalating to an expert.
*How to automatically detect, diagnose, and resolve incidents without human intervention.
https://youtu.be/9yYwTPMRSOY
Nathan Fluegel, head of Customer Success at Rundeck, talks clustering and high availability. We'll show how to deploy Rundeck servers in a clustered configuration with Rundeck Enterprise.
https://youtu.be/PmBIGP3M9sI
Understand how to migrate your Rundeck environment from the community edition to Enterprise, including the pros and cons of each migratory approach.
In this webinar, you will learn how to:
-Determine which migration approach is most appropriate for your environment
-Shift from a single-server to clustered environment
-Migrate jobs and projects while keeping a clean install
Business Continuity for Humans: Keeping Your Business Running When Your Peopl...Rundeck
Damon Edwards (Rundeck) presentation from TechStrongConf on June 4, 2020.
Learn more: https://www.rundeck.com/business-continuity-for-digital-operations
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
7. 1. Use Lean thinking and analyze like any process
8. 1
-NOC alerts
-Biz Manager
phone call
-Escalate to
L2
-L2 bridge call
-SysAdmin try
this & that
-Consensus:
Foo service
-Interrupt Foo
app dev
-Find sysadmin
with access
-Multiple
roundtrips to
get logs
2 3
-Identify
incorrect
restarts
-Attempt to get
middleware to
restart
-Need approval
4
-Escalate to
SVP
-VPs try to
assess impact
-Approve
5 6
-Find
middleware
engineer
-Review docs
-Trial & error
-Need API
tests per policy
-Find tester
-Pass tests
7
-Middleware
calls restart
done
-NOC sees
green
-Ticket closed
8
10:00am 12:00pm 2:00pm 2:30pm 5:00pm 6:00pm 9:30pm 10:30pm
+00:30
after alert
+02:30
after alert
+04:30
after alert
+05:00
after alert
+07:30
after alert
+08:30
after alert
+12:00
after alert
+13:00
after alert
Ticket
Context Wagon
NOC (Bob)
Biz Manager
NOC (Bob)
Biz Manager
SysAdmin (Steve)
7 x L2
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
SVP
Chief of Staff
2 x Tech VP
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
SVP
Chief of Staff
2 x Tech VP
Middleware (Scott)
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
SVP
Chief of Staff
2 x Tech VP
Middleware (Scott)
Cust. Engage.
(Varsha)
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
SVP
Chief of Staff
2 x Tech VP
Middleware (Scott)
Cust. Engage.
(Varsha)
What got in the way? (Wastes)
PD - Partially Done
TS - Task Switching
W - Waiting
M - Motion / Manual
D - Defects
EP - Extra Process
EF - Extra Features
H - Heroics
Structured analysis building on DevOps
adaptation of “7 deadly wastes” from Lean / Agile:
9. 1
-NOC alerts
-Biz Manager
phone call
-Escalate to
L2
-L2 bridge call
-SysAdmin try
this & that
-Consensus:
Foo service
-Interrupt Foo
app dev
-Find sysadmin
with access
-Multiple
roundtrips to
get logs
2 3
-Identify
incorrect
restarts
-Attempt to get
middleware to
restart
-Need approval
4
-Escalate to
SVP
-VPs try to
assess impact
-Approve
5 6
-Find
middleware
engineer
-Review docs
-Trial & error
-Need API
tests per policy
-Find tester
-Pass tests
7
-Middleware
calls restart
done
-NOC sees
green
-Ticket closed
8
10:00am 12:00pm 2:00pm 2:30pm 5:00pm 6:00pm 9:30pm 10:30pm
+00:30
after alert
+02:30
after alert
+04:30
after alert
+05:00
after alert
+07:30
after alert
+08:30
after alert
+12:00
after alert
+13:00
after alert
Ticket
Context Wagon
NOC (Bob)
Biz Manager
NOC (Bob)
Biz Manager
SysAdmin (Steve)
7 x L2
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
SVP
Chief of Staff
2 x Tech VP
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
SVP
Chief of Staff
2 x Tech VP
Middleware (Scott)
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
SVP
Chief of Staff
2 x Tech VP
Middleware (Scott)
Cust. Engage.
(Varsha)
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
SVP
Chief of Staff
2 x Tech VP
Middleware (Scott)
Cust. Engage.
(Varsha)
What got in the way? (Wastes)
PD - Partially Done
TS - Task Switching
W - Waiting
M - Motion / Manual
D - Defects
EP - Extra Process
EF - Extra Features
H - Heroics
Structured analysis building on DevOps
adaptation of “7 deadly wastes” from Lean / Agile:
M
M
M
M
10. 1
-NOC alerts
-Biz Manager
phone call
-Escalate to
L2
-L2 bridge call
-SysAdmin try
this & that
-Consensus:
Foo service
-Interrupt Foo
app dev
-Find sysadmin
with access
-Multiple
roundtrips to
get logs
2 3
-Identify
incorrect
restarts
-Attempt to get
middleware to
restart
-Need approval
4
-Escalate to
SVP
-VPs try to
assess impact
-Approve
5 6
-Find
middleware
engineer
-Review docs
-Trial & error
-Need API
tests per policy
-Find tester
-Pass tests
7
-Middleware
calls restart
done
-NOC sees
green
-Ticket closed
8
10:00am 12:00pm 2:00pm 2:30pm 5:00pm 6:00pm 9:30pm 10:30pm
+00:30
after alert
+02:30
after alert
+04:30
after alert
+05:00
after alert
+07:30
after alert
+08:30
after alert
+12:00
after alert
+13:00
after alert
Ticket
Context Wagon
NOC (Bob)
Biz Manager
NOC (Bob)
Biz Manager
SysAdmin (Steve)
7 x L2
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
SVP
Chief of Staff
2 x Tech VP
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
SVP
Chief of Staff
2 x Tech VP
Middleware (Scott)
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
SVP
Chief of Staff
2 x Tech VP
Middleware (Scott)
Cust. Engage.
(Varsha)
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
SVP
Chief of Staff
2 x Tech VP
Middleware (Scott)
Cust. Engage.
(Varsha)
What got in the way? (Wastes)
PD - Partially Done
TS - Task Switching
W - Waiting
M - Motion / Manual
D - Defects
EP - Extra Process
EF - Extra Features
H - Heroics
Structured analysis building on DevOps
adaptation of “7 deadly wastes” from Lean / Agile:
TS
TS
TS
TS
TS
TS TSM
M
M
M
11. 1
-NOC alerts
-Biz Manager
phone call
-Escalate to
L2
-L2 bridge call
-SysAdmin try
this & that
-Consensus:
Foo service
-Interrupt Foo
app dev
-Find sysadmin
with access
-Multiple
roundtrips to
get logs
2 3
-Identify
incorrect
restarts
-Attempt to get
middleware to
restart
-Need approval
4
-Escalate to
SVP
-VPs try to
assess impact
-Approve
5 6
-Find
middleware
engineer
-Review docs
-Trial & error
-Need API
tests per policy
-Find tester
-Pass tests
7
-Middleware
calls restart
done
-NOC sees
green
-Ticket closed
8
10:00am 12:00pm 2:00pm 2:30pm 5:00pm 6:00pm 9:30pm 10:30pm
+00:30
after alert
+02:30
after alert
+04:30
after alert
+05:00
after alert
+07:30
after alert
+08:30
after alert
+12:00
after alert
+13:00
after alert
Ticket
Context Wagon
NOC (Bob)
Biz Manager
NOC (Bob)
Biz Manager
SysAdmin (Steve)
7 x L2
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
SVP
Chief of Staff
2 x Tech VP
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
SVP
Chief of Staff
2 x Tech VP
Middleware (Scott)
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
SVP
Chief of Staff
2 x Tech VP
Middleware (Scott)
Cust. Engage.
(Varsha)
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
SVP
Chief of Staff
2 x Tech VP
Middleware (Scott)
Cust. Engage.
(Varsha)
What got in the way? (Wastes)
PD - Partially Done
TS - Task Switching
W - Waiting
M - Motion / Manual
D - Defects
EP - Extra Process
EF - Extra Features
H - Heroics
Structured analysis building on DevOps
adaptation of “7 deadly wastes” from Lean / Agile:
TS
TS
TS
TS
TS
TS TS
W
W
W W
W
M
M
M
M
12. 1
-NOC alerts
-Biz Manager
phone call
-Escalate to
L2
-L2 bridge call
-SysAdmin try
this & that
-Consensus:
Foo service
-Interrupt Foo
app dev
-Find sysadmin
with access
-Multiple
roundtrips to
get logs
2 3
-Identify
incorrect
restarts
-Attempt to get
middleware to
restart
-Need approval
4
-Escalate to
SVP
-VPs try to
assess impact
-Approve
5 6
-Find
middleware
engineer
-Review docs
-Trial & error
-Need API
tests per policy
-Find tester
-Pass tests
7
-Middleware
calls restart
done
-NOC sees
green
-Ticket closed
8
10:00am 12:00pm 2:00pm 2:30pm 5:00pm 6:00pm 9:30pm 10:30pm
+00:30
after alert
+02:30
after alert
+04:30
after alert
+05:00
after alert
+07:30
after alert
+08:30
after alert
+12:00
after alert
+13:00
after alert
Ticket
Context Wagon
NOC (Bob)
Biz Manager
NOC (Bob)
Biz Manager
SysAdmin (Steve)
7 x L2
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
SVP
Chief of Staff
2 x Tech VP
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
SVP
Chief of Staff
2 x Tech VP
Middleware (Scott)
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
SVP
Chief of Staff
2 x Tech VP
Middleware (Scott)
Cust. Engage.
(Varsha)
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
SVP
Chief of Staff
2 x Tech VP
Middleware (Scott)
Cust. Engage.
(Varsha)
What got in the way? (Wastes)
PD - Partially Done
TS - Task Switching
W - Waiting
M - Motion / Manual
D - Defects
EP - Extra Process
EF - Extra Features
H - Heroics
Structured analysis building on DevOps
adaptation of “7 deadly wastes” from Lean / Agile:
TS
TS
TS
TS
TS
TS TS
W
W
W W
W
EP
EP
M
M
M
M
13. 1
-NOC alerts
-Biz Manager
phone call
-Escalate to
L2
-L2 bridge call
-SysAdmin try
this & that
-Consensus:
Foo service
-Interrupt Foo
app dev
-Find sysadmin
with access
-Multiple
roundtrips to
get logs
2 3
-Identify
incorrect
restarts
-Attempt to get
middleware to
restart
-Need approval
4
-Escalate to
SVP
-VPs try to
assess impact
-Approve
5 6
-Find
middleware
engineer
-Review docs
-Trial & error
-Need API
tests per policy
-Find tester
-Pass tests
7
-Middleware
calls restart
done
-NOC sees
green
-Ticket closed
8
10:00am 12:00pm 2:00pm 2:30pm 5:00pm 6:00pm 9:30pm 10:30pm
+00:30
after alert
+02:30
after alert
+04:30
after alert
+05:00
after alert
+07:30
after alert
+08:30
after alert
+12:00
after alert
+13:00
after alert
Ticket
Context Wagon
NOC (Bob)
Biz Manager
NOC (Bob)
Biz Manager
SysAdmin (Steve)
7 x L2
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
SVP
Chief of Staff
2 x Tech VP
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
SVP
Chief of Staff
2 x Tech VP
Middleware (Scott)
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
SVP
Chief of Staff
2 x Tech VP
Middleware (Scott)
Cust. Engage.
(Varsha)
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
SVP
Chief of Staff
2 x Tech VP
Middleware (Scott)
Cust. Engage.
(Varsha)
What got in the way? (Wastes)
PD - Partially Done
TS - Task Switching
W - Waiting
M - Motion / Manual
D - Defects
EP - Extra Process
EF - Extra Features
H - Heroics
Structured analysis building on DevOps
adaptation of “7 deadly wastes” from Lean / Agile:
TS
TS
TS
TS
TS
TS TS
W
W
W W
W
EP
EP
H HH
M
M
M
M
14. 1
-NOC alerts
-Biz Manager
phone call
-Escalate to
L2
-L2 bridge call
-SysAdmin try
this & that
-Consensus:
Foo service
-Interrupt Foo
app dev
-Find sysadmin
with access
-Multiple
roundtrips to
get logs
2 3
-Identify
incorrect
restarts
-Attempt to get
middleware to
restart
-Need approval
4
-Escalate to
SVP
-VPs try to
assess impact
-Approve
5 6
-Find
middleware
engineer
-Review docs
-Trial & error
-Need API
tests per policy
-Find tester
-Pass tests
7
-Middleware
calls restart
done
-NOC sees
green
-Ticket closed
8
10:00am 12:00pm 2:00pm 2:30pm 5:00pm 6:00pm 9:30pm 10:30pm
+00:30
after alert
+02:30
after alert
+04:30
after alert
+05:00
after alert
+07:30
after alert
+08:30
after alert
+12:00
after alert
+13:00
after alert
Ticket
Context Wagon
NOC (Bob)
Biz Manager
NOC (Bob)
Biz Manager
SysAdmin (Steve)
7 x L2
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
SVP
Chief of Staff
2 x Tech VP
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
SVP
Chief of Staff
2 x Tech VP
Middleware (Scott)
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
SVP
Chief of Staff
2 x Tech VP
Middleware (Scott)
Cust. Engage.
(Varsha)
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
SVP
Chief of Staff
2 x Tech VP
Middleware (Scott)
Cust. Engage.
(Varsha)
What got in the way? (Wastes)
PD - Partially Done
TS - Task Switching
W - Waiting
M - Motion / Manual
D - Defects
EP - Extra Process
EF - Extra Features
H - Heroics
Structured analysis building on DevOps
adaptation of “7 deadly wastes” from Lean / Agile:
TS
TS
TS
TS
TS
TS TS
PD
PD
PD
W
W
W W
W
EP
EP
H HH
M
M
M
M
16. 1. Use Lean thinking and analyze like any process
2. Make sure you are measuring the right things
17. MTTD?
1
-NOC alerts
-Biz Manager
phone call
-Escalate to
L2
-L2 bridge call
-SysAdmin try
this & that
-Consensus:
Foo service
-Interrupt Foo
app dev
-Find sysadmin
with access
-Multiple
roundtrips to
get logs
2 3
-Identify
incorrect
restarts
-Attempt to get
middleware to
restart
-Need approval
4
-Escalate to
SVP
-VPs try to
assess impact
-Approve
5 6
-Find
middleware
engineer
-Review docs
-Trial & error
-Need API
tests per policy
-Find tester
-Pass tests
7
-Middleware
calls restart
done
-NOC sees
green
-Ticket closed
8
10:00am 12:00pm 2:00pm 2:30pm 5:00pm 6:00pm 9:30pm 10:30pm
+00:30
after alert
+02:30
after alert
+04:30
after alert
+05:00
after alert
+07:30
after alert
+08:30
after alert
+12:00
after alert
+13:00
after alert
Ticket
Context Wagon
NOC (Bob)
Biz Manager
NOC (Bob)
Biz Manager
SysAdmin (Steve)
7 x L2
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
SVP
Chief of Staff
2 x Tech VP
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
SVP
Chief of Staff
2 x Tech VP
Middleware (Scott)
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
SVP
Chief of Staff
2 x Tech VP
Middleware (Scott)
Cust. Engage.
(Varsha)
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
SVP
Chief of Staff
2 x Tech VP
Middleware (Scott)
Cust. Engage.
(Varsha)
18. MTTD?
1
-NOC alerts
-Biz Manager
phone call
-Escalate to
L2
-L2 bridge call
-SysAdmin try
this & that
-Consensus:
Foo service
-Interrupt Foo
app dev
-Find sysadmin
with access
-Multiple
roundtrips to
get logs
2 3
-Identify
incorrect
restarts
-Attempt to get
middleware to
restart
-Need approval
4
-Escalate to
SVP
-VPs try to
assess impact
-Approve
5 6
-Find
middleware
engineer
-Review docs
-Trial & error
-Need API
tests per policy
-Find tester
-Pass tests
7
-Middleware
calls restart
done
-NOC sees
green
-Ticket closed
8
10:00am 12:00pm 2:00pm 2:30pm 5:00pm 6:00pm 9:30pm 10:30pm
+00:30
after alert
+02:30
after alert
+04:30
after alert
+05:00
after alert
+07:30
after alert
+08:30
after alert
+12:00
after alert
+13:00
after alert
Ticket
Context Wagon
NOC (Bob)
Biz Manager
NOC (Bob)
Biz Manager
SysAdmin (Steve)
7 x L2
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
SVP
Chief of Staff
2 x Tech VP
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
SVP
Chief of Staff
2 x Tech VP
Middleware (Scott)
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
SVP
Chief of Staff
2 x Tech VP
Middleware (Scott)
Cust. Engage.
(Varsha)
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
SVP
Chief of Staff
2 x Tech VP
Middleware (Scott)
Cust. Engage.
(Varsha)
Detect?
19. MTTD?
1
-NOC alerts
-Biz Manager
phone call
-Escalate to
L2
-L2 bridge call
-SysAdmin try
this & that
-Consensus:
Foo service
-Interrupt Foo
app dev
-Find sysadmin
with access
-Multiple
roundtrips to
get logs
2 3
-Identify
incorrect
restarts
-Attempt to get
middleware to
restart
-Need approval
4
-Escalate to
SVP
-VPs try to
assess impact
-Approve
5 6
-Find
middleware
engineer
-Review docs
-Trial & error
-Need API
tests per policy
-Find tester
-Pass tests
7
-Middleware
calls restart
done
-NOC sees
green
-Ticket closed
8
10:00am 12:00pm 2:00pm 2:30pm 5:00pm 6:00pm 9:30pm 10:30pm
+00:30
after alert
+02:30
after alert
+04:30
after alert
+05:00
after alert
+07:30
after alert
+08:30
after alert
+12:00
after alert
+13:00
after alert
Ticket
Context Wagon
NOC (Bob)
Biz Manager
NOC (Bob)
Biz Manager
SysAdmin (Steve)
7 x L2
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
SVP
Chief of Staff
2 x Tech VP
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
SVP
Chief of Staff
2 x Tech VP
Middleware (Scott)
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
SVP
Chief of Staff
2 x Tech VP
Middleware (Scott)
Cust. Engage.
(Varsha)
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
SVP
Chief of Staff
2 x Tech VP
Middleware (Scott)
Cust. Engage.
(Varsha)
Detect? Diagnose!
20. MTTD?
1
-NOC alerts
-Biz Manager
phone call
-Escalate to
L2
-L2 bridge call
-SysAdmin try
this & that
-Consensus:
Foo service
-Interrupt Foo
app dev
-Find sysadmin
with access
-Multiple
roundtrips to
get logs
2 3
-Identify
incorrect
restarts
-Attempt to get
middleware to
restart
-Need approval
4
-Escalate to
SVP
-VPs try to
assess impact
-Approve
5 6
-Find
middleware
engineer
-Review docs
-Trial & error
-Need API
tests per policy
-Find tester
-Pass tests
7
-Middleware
calls restart
done
-NOC sees
green
-Ticket closed
8
10:00am 12:00pm 2:00pm 2:30pm 5:00pm 6:00pm 9:30pm 10:30pm
+00:30
after alert
+02:30
after alert
+04:30
after alert
+05:00
after alert
+07:30
after alert
+08:30
after alert
+12:00
after alert
+13:00
after alert
Ticket
Context Wagon
NOC (Bob)
Biz Manager
NOC (Bob)
Biz Manager
SysAdmin (Steve)
7 x L2
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
SVP
Chief of Staff
2 x Tech VP
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
SVP
Chief of Staff
2 x Tech VP
Middleware (Scott)
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
SVP
Chief of Staff
2 x Tech VP
Middleware (Scott)
Cust. Engage.
(Varsha)
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
SVP
Chief of Staff
2 x Tech VP
Middleware (Scott)
Cust. Engage.
(Varsha)
Detect? Diagnose!
Diagnose > Detect
21. MTTR?
1
-NOC alerts
-Biz Manager
phone call
-Escalate to
L2
-L2 bridge call
-SysAdmin try
this & that
-Consensus:
Foo service
-Interrupt Foo
app dev
-Find sysadmin
with access
-Multiple
roundtrips to
get logs
2 3
-Identify
incorrect
restarts
-Attempt to get
middleware to
restart
-Need approval
4
-Escalate to
SVP
-VPs try to
assess impact
-Approve
5 6
-Find
middleware
engineer
-Review docs
-Trial & error
-Need API
tests per policy
-Find tester
-Pass tests
7
-Middleware
calls restart
done
-NOC sees
green
-Ticket closed
8
10:00am 12:00pm 2:00pm 2:30pm 5:00pm 6:00pm 9:30pm 10:30pm
+00:30
after alert
+02:30
after alert
+04:30
after alert
+05:00
after alert
+07:30
after alert
+08:30
after alert
+12:00
after alert
+13:00
after alert
Ticket
Context Wagon
NOC (Bob)
Biz Manager
NOC (Bob)
Biz Manager
SysAdmin (Steve)
7 x L2
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
SVP
Chief of Staff
2 x Tech VP
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
SVP
Chief of Staff
2 x Tech VP
Middleware (Scott)
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
SVP
Chief of Staff
2 x Tech VP
Middleware (Scott)
Cust. Engage.
(Varsha)
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
SVP
Chief of Staff
2 x Tech VP
Middleware (Scott)
Cust. Engage.
(Varsha)
22. MTTR?
1
-NOC alerts
-Biz Manager
phone call
-Escalate to
L2
-L2 bridge call
-SysAdmin try
this & that
-Consensus:
Foo service
-Interrupt Foo
app dev
-Find sysadmin
with access
-Multiple
roundtrips to
get logs
2 3
-Identify
incorrect
restarts
-Attempt to get
middleware to
restart
-Need approval
4
-Escalate to
SVP
-VPs try to
assess impact
-Approve
5 6
-Find
middleware
engineer
-Review docs
-Trial & error
-Need API
tests per policy
-Find tester
-Pass tests
7
-Middleware
calls restart
done
-NOC sees
green
-Ticket closed
8
10:00am 12:00pm 2:00pm 2:30pm 5:00pm 6:00pm 9:30pm 10:30pm
+00:30
after alert
+02:30
after alert
+04:30
after alert
+05:00
after alert
+07:30
after alert
+08:30
after alert
+12:00
after alert
+13:00
after alert
Ticket
Context Wagon
NOC (Bob)
Biz Manager
NOC (Bob)
Biz Manager
SysAdmin (Steve)
7 x L2
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
SVP
Chief of Staff
2 x Tech VP
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
SVP
Chief of Staff
2 x Tech VP
Middleware (Scott)
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
SVP
Chief of Staff
2 x Tech VP
Middleware (Scott)
Cust. Engage.
(Varsha)
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
SVP
Chief of Staff
2 x Tech VP
Middleware (Scott)
Cust. Engage.
(Varsha)
Restore?
23. MTTR?
1
-NOC alerts
-Biz Manager
phone call
-Escalate to
L2
-L2 bridge call
-SysAdmin try
this & that
-Consensus:
Foo service
-Interrupt Foo
app dev
-Find sysadmin
with access
-Multiple
roundtrips to
get logs
2 3
-Identify
incorrect
restarts
-Attempt to get
middleware to
restart
-Need approval
4
-Escalate to
SVP
-VPs try to
assess impact
-Approve
5 6
-Find
middleware
engineer
-Review docs
-Trial & error
-Need API
tests per policy
-Find tester
-Pass tests
7
-Middleware
calls restart
done
-NOC sees
green
-Ticket closed
8
10:00am 12:00pm 2:00pm 2:30pm 5:00pm 6:00pm 9:30pm 10:30pm
+00:30
after alert
+02:30
after alert
+04:30
after alert
+05:00
after alert
+07:30
after alert
+08:30
after alert
+12:00
after alert
+13:00
after alert
Ticket
Context Wagon
NOC (Bob)
Biz Manager
NOC (Bob)
Biz Manager
SysAdmin (Steve)
7 x L2
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
SVP
Chief of Staff
2 x Tech VP
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
SVP
Chief of Staff
2 x Tech VP
Middleware (Scott)
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
SVP
Chief of Staff
2 x Tech VP
Middleware (Scott)
Cust. Engage.
(Varsha)
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
SVP
Chief of Staff
2 x Tech VP
Middleware (Scott)
Cust. Engage.
(Varsha)
Restore?
Repair!
DEV
24. MTTR?
1
-NOC alerts
-Biz Manager
phone call
-Escalate to
L2
-L2 bridge call
-SysAdmin try
this & that
-Consensus:
Foo service
-Interrupt Foo
app dev
-Find sysadmin
with access
-Multiple
roundtrips to
get logs
2 3
-Identify
incorrect
restarts
-Attempt to get
middleware to
restart
-Need approval
4
-Escalate to
SVP
-VPs try to
assess impact
-Approve
5 6
-Find
middleware
engineer
-Review docs
-Trial & error
-Need API
tests per policy
-Find tester
-Pass tests
7
-Middleware
calls restart
done
-NOC sees
green
-Ticket closed
8
10:00am 12:00pm 2:00pm 2:30pm 5:00pm 6:00pm 9:30pm 10:30pm
+00:30
after alert
+02:30
after alert
+04:30
after alert
+05:00
after alert
+07:30
after alert
+08:30
after alert
+12:00
after alert
+13:00
after alert
Ticket
Context Wagon
NOC (Bob)
Biz Manager
NOC (Bob)
Biz Manager
SysAdmin (Steve)
7 x L2
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
SVP
Chief of Staff
2 x Tech VP
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
SVP
Chief of Staff
2 x Tech VP
Middleware (Scott)
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
SVP
Chief of Staff
2 x Tech VP
Middleware (Scott)
Cust. Engage.
(Varsha)
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
SVP
Chief of Staff
2 x Tech VP
Middleware (Scott)
Cust. Engage.
(Varsha)
Restore?
Restore is good. Repair (fix) is ultimate measure.
Repair!
DEV
25. Cost?
1
-NOC alerts
-Biz Manager
phone call
-Escalate to
L2
-L2 bridge call
-SysAdmin try
this & that
-Consensus:
Foo service
-Interrupt Foo
app dev
-Find sysadmin
with access
-Multiple
roundtrips to
get logs
2 3
-Identify
incorrect
restarts
-Attempt to get
middleware to
restart
-Need approval
4
-Escalate to
SVP
-VPs try to
assess impact
-Approve
5 6
-Find
middleware
engineer
-Review docs
-Trial & error
-Need API
tests per policy
-Find tester
-Pass tests
7
-Middleware
calls restart
done
-NOC sees
green
-Ticket closed
8
10:00am 12:00pm 2:00pm 2:30pm 5:00pm 6:00pm 9:30pm 10:30pm
+00:30
after alert
+02:30
after alert
+04:30
after alert
+05:00
after alert
+07:30
after alert
+08:30
after alert
+12:00
after alert
+13:00
after alert
Ticket
Context Wagon
NOC (Bob)
Biz Manager
NOC (Bob)
Biz Manager
SysAdmin (Steve)
7 x L2
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
SVP
Chief of Staff
2 x Tech VP
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
SVP
Chief of Staff
2 x Tech VP
Middleware (Scott)
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
SVP
Chief of Staff
2 x Tech VP
Middleware (Scott)
Cust. Engage.
(Varsha)
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
SVP
Chief of Staff
2 x Tech VP
Middleware (Scott)
Cust. Engage.
(Varsha)
HC: 2 HC: 9 HC: 2 HC: 3 HC: 4 HC: 1.5 HC: 1.5 HC: 1.5
26. Cost?
1
-NOC alerts
-Biz Manager
phone call
-Escalate to
L2
-L2 bridge call
-SysAdmin try
this & that
-Consensus:
Foo service
-Interrupt Foo
app dev
-Find sysadmin
with access
-Multiple
roundtrips to
get logs
2 3
-Identify
incorrect
restarts
-Attempt to get
middleware to
restart
-Need approval
4
-Escalate to
SVP
-VPs try to
assess impact
-Approve
5 6
-Find
middleware
engineer
-Review docs
-Trial & error
-Need API
tests per policy
-Find tester
-Pass tests
7
-Middleware
calls restart
done
-NOC sees
green
-Ticket closed
8
10:00am 12:00pm 2:00pm 2:30pm 5:00pm 6:00pm 9:30pm 10:30pm
+00:30
after alert
+02:30
after alert
+04:30
after alert
+05:00
after alert
+07:30
after alert
+08:30
after alert
+12:00
after alert
+13:00
after alert
Ticket
Context Wagon
NOC (Bob)
Biz Manager
NOC (Bob)
Biz Manager
SysAdmin (Steve)
7 x L2
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
SVP
Chief of Staff
2 x Tech VP
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
SVP
Chief of Staff
2 x Tech VP
Middleware (Scott)
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
SVP
Chief of Staff
2 x Tech VP
Middleware (Scott)
Cust. Engage.
(Varsha)
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
SVP
Chief of Staff
2 x Tech VP
Middleware (Scott)
Cust. Engage.
(Varsha)
HC: 2 HC: 9 HC: 2 HC: 3 HC: 4 HC: 1.5 HC: 1.5 HC: 1.5
Context:
2
Context:
10
Context:
6
Context:
7
Context:
10
Context:
12
Context:
13
Context:
13
27. Cost?
1
-NOC alerts
-Biz Manager
phone call
-Escalate to
L2
-L2 bridge call
-SysAdmin try
this & that
-Consensus:
Foo service
-Interrupt Foo
app dev
-Find sysadmin
with access
-Multiple
roundtrips to
get logs
2 3
-Identify
incorrect
restarts
-Attempt to get
middleware to
restart
-Need approval
4
-Escalate to
SVP
-VPs try to
assess impact
-Approve
5 6
-Find
middleware
engineer
-Review docs
-Trial & error
-Need API
tests per policy
-Find tester
-Pass tests
7
-Middleware
calls restart
done
-NOC sees
green
-Ticket closed
8
10:00am 12:00pm 2:00pm 2:30pm 5:00pm 6:00pm 9:30pm 10:30pm
+00:30
after alert
+02:30
after alert
+04:30
after alert
+05:00
after alert
+07:30
after alert
+08:30
after alert
+12:00
after alert
+13:00
after alert
Ticket
Context Wagon
NOC (Bob)
Biz Manager
NOC (Bob)
Biz Manager
SysAdmin (Steve)
7 x L2
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
SVP
Chief of Staff
2 x Tech VP
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
SVP
Chief of Staff
2 x Tech VP
Middleware (Scott)
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
SVP
Chief of Staff
2 x Tech VP
Middleware (Scott)
Cust. Engage.
(Varsha)
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
SVP
Chief of Staff
2 x Tech VP
Middleware (Scott)
Cust. Engage.
(Varsha)
Don’t underestimate the expense of context switching!
HC: 2 HC: 9 HC: 2 HC: 3 HC: 4 HC: 1.5 HC: 1.5 HC: 1.5
Context:
2
Context:
10
Context:
6
Context:
7
Context:
10
Context:
12
Context:
13
Context:
13
28. 1. Use Lean thinking and analyze like any process
2. Make sure you are measuring the right things
3. “Shift Left” control and decision making
29. 1
-NOC alerts
-Biz Manager
phone call
-Escalate to
L2
-L2 bridge call
-SysAdmin try
this & that
-Consensus:
Foo service
-Interrupt Foo
app dev
-Find sysadmin
with access
-Multiple
roundtrips to
get logs
2 3
-Identify
incorrect
restarts
-Attempt to get
middleware to
restart
-Need approval
4
-Escalate to
SVP
-VPs try to
assess impact
-Approve
5 6
-Find
middleware
engineer
-Review docs
-Trial & error
-Need API
tests per policy
-Find tester
-Pass tests
7
-Middleware
calls restart
done
-NOC sees
green
-Ticket closed
8
10:00am 12:00pm 2:00pm 2:30pm 5:00pm 6:00pm 9:30pm 10:30pm
+00:30
after alert
+02:30
after alert
+04:30
after alert
+05:00
after alert
+07:30
after alert
+08:30
after alert
+12:00
after alert
+13:00
after alert
Ticket
Context Wagon
NOC (Bob)
Biz Manager
NOC (Bob)
Biz Manager
SysAdmin (Steve)
7 x L2
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
SVP
Chief of Staff
2 x Tech VP
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
SVP
Chief of Staff
2 x Tech VP
Middleware (Scott)
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
SVP
Chief of Staff
2 x Tech VP
Middleware (Scott)
Cust. Engage.
(Varsha)
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
SVP
Chief of Staff
2 x Tech VP
Middleware (Scott)
Cust. Engage.
(Varsha)
Shift-Left
30. 1
-NOC alerts
-Biz Manager
phone call
-Escalate to
L2
-L2 bridge call
-SysAdmin try
this & that
-Consensus:
Foo service
-Interrupt Foo
app dev
-Find sysadmin
with access
-Multiple
roundtrips to
get logs
2 3
-Identify
incorrect
restarts
-Attempt to get
middleware to
restart
-Need approval
4
-Escalate to
SVP
-VPs try to
assess impact
-Approve
5 6
-Find
middleware
engineer
-Review docs
-Trial & error
-Need API
tests per policy
-Find tester
-Pass tests
7
-Middleware
calls restart
done
-NOC sees
green
-Ticket closed
8
10:00am 12:00pm 2:00pm 2:30pm 5:00pm 6:00pm 9:30pm 10:30pm
+00:30
after alert
+02:30
after alert
+04:30
after alert
+05:00
after alert
+07:30
after alert
+08:30
after alert
+12:00
after alert
+13:00
after alert
Ticket
Context Wagon
NOC (Bob)
Biz Manager
NOC (Bob)
Biz Manager
SysAdmin (Steve)
7 x L2
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
SVP
Chief of Staff
2 x Tech VP
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
SVP
Chief of Staff
2 x Tech VP
Middleware (Scott)
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
SVP
Chief of Staff
2 x Tech VP
Middleware (Scott)
Cust. Engage.
(Varsha)
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
SVP
Chief of Staff
2 x Tech VP
Middleware (Scott)
Cust. Engage.
(Varsha)
Solve here or here
Shift-Left
31. 1
-NOC alerts
-Biz Manager
phone call
-Escalate to
L2
-L2 bridge call
-SysAdmin try
this & that
-Consensus:
Foo service
-Interrupt Foo
app dev
-Find sysadmin
with access
-Multiple
roundtrips to
get logs
2 3
-Identify
incorrect
restarts
-Attempt to get
middleware to
restart
-Need approval
4
-Escalate to
SVP
-VPs try to
assess impact
-Approve
5 6
-Find
middleware
engineer
-Review docs
-Trial & error
-Need API
tests per policy
-Find tester
-Pass tests
7
-Middleware
calls restart
done
-NOC sees
green
-Ticket closed
8
10:00am 12:00pm 2:00pm 2:30pm 5:00pm 6:00pm 9:30pm 10:30pm
+00:30
after alert
+02:30
after alert
+04:30
after alert
+05:00
after alert
+07:30
after alert
+08:30
after alert
+12:00
after alert
+13:00
after alert
Ticket
Context Wagon
NOC (Bob)
Biz Manager
NOC (Bob)
Biz Manager
SysAdmin (Steve)
7 x L2
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
SVP
Chief of Staff
2 x Tech VP
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
SVP
Chief of Staff
2 x Tech VP
Middleware (Scott)
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
SVP
Chief of Staff
2 x Tech VP
Middleware (Scott)
Cust. Engage.
(Varsha)
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo L2
SysAdmin (Lee)
Middleware Mngr.
SVP
Chief of Staff
2 x Tech VP
Middleware (Scott)
Cust. Engage.
(Varsha)
Solve here or here NOT HERE
Shift-Left
33. Empower those closest to the issue
escalate escalate
1° 2° 3°
escalate
4°
Push the ability to take action this direction
34. Empower those closest to the issue
escalate escalate
1° 2° 3°
escalate
4°
Push the ability to take action this direction
But what gets
in the way?
35. Empower those closest to the issue
escalate escalate
1° 2° 3°
escalate
4°
Push the ability to take action this direction
SilosBut what gets
in the way?
36. 1. Use Lean thinking and analyze like any process
2. Make sure you are measuring the right things
3. “Shift Left” control and decision making
4. Get rid of silos
37. Silos ruin everything
Backlog Context
I need X
Backlog
I do X
Requests
for X
Silo A
Priorities
Context
Priorities
Silo B
Tools Tools
38. Silos ruin everything
Backlog Context
I need X
Backlog
I do X
Requests
for X
Silo A
Priorities
Context
Priorities
Silo B
Tools Tools
43. Popular: Replace Silos with Cross Functional Team
Dev/Test Release OperatePlanning
Cross-Functional Teams
Cross-Functional Teams
Cross-Functional Teams
45. EnvironmentsDBAs Network Security NOC
Dev/Test Release OperatePlanning
Popular: Replace Silos with Cross Functional Team
Cross-Functional Teams
Cross-Functional Teams
Cross-Functional Teams
46. EnvironmentsDBAs Network Security NOC
Dev/Test Release OperatePlanning
Team A
(Dev)
Team B
(Ops)
Ticket
System
??
Popular: Replace Silos with Cross Functional Team
Cross-Functional Teams
Cross-Functional Teams
Cross-Functional Teams
47. 1. Use Lean thinking and analyze like any process
2. Make sure you are measuring the right things
3. “Shift Left” control and decision making
4. Get rid of silos
5. Establish Operations as a Service
50. … replace with Operations as a Service design pattern
Team A
(Dev)
Team B
(Ops)Ticket
System
Operations
as a
Service
Execute
On Demand
Define
Procedures
Vet
Procedures
Define
Policies
Actual Exceptions
Execute
On Demand
51. Changes how your organization thinks
about automated procedures…
52. Automated procedures are comprised of three parts
Definition of the automated procedure
Execution of the automated procedure
Governance of the automated procedure
Define
Execute
Govern
53. Automated procedures are comprised of three parts
Definition of the automated procedure
Execution of the automated procedure
Governance of the automated procedure
Define
Execute
Govern
(security, oversight, compliance, etc.)
60. fdfd
Operations as a Service
Operations
as a
Service
ED G
Team B
(Ops)
Vet
Procedures
Define
Policies
Execute
On Demand
Team A
(Dev)
Define
Procedures
Execute
On Demand
61. fdfd
Operations as a Service
Move definition, execution, and governance to where you
get the most effective use of labor and best flow of work
Operations
as a
Service
ED G
Team B
(Ops)
Vet
Procedures
Define
Policies
Execute
On Demand
Team A
(Dev)
Define
Procedures
Execute
On Demand
62. Rundeck: Open Source Platform For Operations as a Service
#! ! "# $
Scripts APIs Tools Cloud VMs Containers
Orchestration &
Scheduling of Workflows
Collect and
Process Output
Infrastructure
details and state
from multiple
sources
Config.
Man.
CMDB
Monitor.
Metrics
Cloud
Corp
Directory
Authentication
and roles
ITSM Tickets, work
status, approvals
>_
Create workflows ● Define ACL policies ● Execute workflows
Web GUI API CLI
64. Step 1: Establish a Secure Ops Hub
Operations as a Service
Engineers get visibility
and controlled self-service
Secrets
Ops Procedures
“Status”
“Firewall Change”
"Restart"
deny
allow
Identity Audit Logs
Infrastructure view
Service health
System metrics
Ops Support use for
remediation procedures
Inventory and Health
Execute
+ Monitoring Tools
Security and Ops manages
access, configuration, and compliance
65. Step 2: Establish a SDLC for Ops Procedures
Operations as a Service
Engineers get visibility
and controlled self-service
Secrets
Ops Procedures
“Status”
“Firewall Change”
"Restart"
deny
allow
Identity Audit Logs
Infrastructure view
Service health
System metrics
Ops Support use for
remediation procedures
Inventory and Health
Execute
Source Code
Repo
if (($state==wait))
then
kill -9 $PID
fi
Change
Product Engineers
produce automated
procedures and health
checks.
RISKY
Automated Procedures
and Health Checks
FIX
Code review
+ Monitoring Tools
Security and Ops manages
access, configuration, and compliance
66. Step 3: Connect with Enterprise Management Systems
Service Desk
CustomersOps Support get
visibility and audit trail
updated by support tools
Service Ticket
Execute
Software
Supply Chain
Ops integrate
with artifact
flow
Operations as a Service
Engineers get visibility
and controlled self-service
Secrets
Ops Procedures
“Status”
“Firewall Change”
"Restart"
deny
allow
Identity Audit Logs
Infrastructure view
Service health
System metrics
Ops Support use for
remediation procedures
Inventory and Health
Source Code
Repo
if (($state==wait))
then
kill -9 $PID
fi
Change
Product Engineers
produce automated
procedures and health
checks.
RISKY
Automated Procedures
and Health Checks
FIX
Code review
+ Monitoring Tools
Security and Ops manages
access, configuration, and compliance
67. Step 4: Make Compliance Really Happy
Service Desk
CustomersOps Support get
visibility and audit trail
updated by support tools
Service Ticket
Execute
Software
Supply Chain
Ops integrate
with artifact
flow
Who reviewed it? Who ran it? When? Where? Approval trail?
Who created the procedure?
Who created the policy?
Operations as a Service
Engineers get visibility
and controlled self-service
Secrets
Ops Procedures
“Status”
“Firewall Change”
"Restart"
deny
allow
Identity Audit Logs
Infrastructure view
Service health
System metrics
Ops Support use for
remediation procedures
Inventory and Health
Source Code
Repo
if (($state==wait))
then
kill -9 $PID
fi
Change
Product Engineers
produce automated
procedures and health
checks.
RISKY
Automated Procedures
and Health Checks
FIX
Code review
+ Monitoring Tools
Security and Ops manages
access, configuration, and compliance
68. 1. Use Lean thinking and analyze like any process
2. Make sure you are measuring the right things
3. “Shift Left” control and decision making
4. Get rid of silos
5. Establish Operations as a Service
6. Encourage Developers to think “Operable First”
70. Operations is the business (unless you literally sell packaged software)
Encourage Developers to think “Operable First”
71. Operations is the business (unless you literally sell packaged software)
Deployability, configurability, monitoring are service features
Encourage Developers to think “Operable First”
72. Operations is the business (unless you literally sell packaged software)
Deployability, configurability, monitoring are service features
Build configurability into the service, don’t externalize it
Encourage Developers to think “Operable First”
73. Operations is the business (unless you literally sell packaged software)
Deployability, configurability, monitoring are service features
Build configurability into the service, don’t externalize it
Demand “prod-like” environments everywhere
Encourage Developers to think “Operable First”
74. Operations is the business (unless you literally sell packaged software)
Deployability, configurability, monitoring are service features
Build configurability into the service, don’t externalize it
Demand “prod-like” environments everywhere
Make any handoff between teams “verification-driven”
Encourage Developers to think “Operable First”
75. Operations is the business (unless you literally sell packaged software)
Deployability, configurability, monitoring are service features
Build configurability into the service, don’t externalize it
Demand “prod-like” environments everywhere
Make any handoff between teams “verification-driven”
Create immutable versioned artifacts and use standard packaging
Encourage Developers to think “Operable First”
76. Operations is the business (unless you literally sell packaged software)
Deployability, configurability, monitoring are service features
Build configurability into the service, don’t externalize it
Demand “prod-like” environments everywhere
Make any handoff between teams “verification-driven”
Create immutable versioned artifacts and use standard packaging
Integration tests over unit tests
Encourage Developers to think “Operable First”
77. Let’s look at a company improving it’s
incident response capability…
84. escalate
1° 2° 3° 4°
escalate escalate
1. New org, support, and escalation model
85. escalate
1° 2° 3° 4°
escalate escalate
1. New org, support, and escalation model
EMT ER Trauma
Surgeon
Specialist
Surgeon
TOC (NOC) SRE
Production Eng.
Scrum Teams
Data Services
Platform Eng.
Global Network
> 15 min > 30 min > 60 min
86. 2. Key: Push the ability to take action closest to the problem
escalate
1° 2° 3° 4°
escalate escalate
1. New org, support, and escalation model
EMT ER Trauma
Surgeon
Specialist
Surgeon
TOC (NOC) SRE
Production Eng.
Scrum Teams
Data Services
Platform Eng.
Global Network
> 15 min > 30 min > 60 min
87. 2. Key: Push the ability to take action closest to the problem
escalate
1° 2° 3° 4°
escalate escalate
1. New org, support, and escalation model
EMT ER Trauma
Surgeon
Specialist
Surgeon
TOC (NOC) SRE
Production Eng.
Scrum Teams
Data Services
Platform Eng.
Global Network
> 15 min > 30 min > 60 min
3. Longterm investment in operability
(deployment, configuration, monitoring, automated runbooks)
88. “Support at the Edge”
Sources: https://www.youtube.com/watch?v=_hr4KiB19bQ
http://rundeck.org/stories/mark_maun.html
89. “Support at the Edge”
Sources: https://www.youtube.com/watch?v=_hr4KiB19bQ
http://rundeck.org/stories/mark_maun.html
• Automated Ops procedures written/
vetted by the delivery teams
90. “Support at the Edge”
Sources: https://www.youtube.com/watch?v=_hr4KiB19bQ
http://rundeck.org/stories/mark_maun.html
• Automated Ops procedures written/
vetted by the delivery teams
• Ops remained in full control of what
can run and security policy
91. “Support at the Edge”
Sources: https://www.youtube.com/watch?v=_hr4KiB19bQ
http://rundeck.org/stories/mark_maun.html
• Automated Ops procedures written/
vetted by the delivery teams
• Ops remained in full control of what
can run and security policy
• Empowered NOC and other support
teams with self-service ops tasks
92. “Support at the Edge”
Sources: https://www.youtube.com/watch?v=_hr4KiB19bQ
http://rundeck.org/stories/mark_maun.html
• Automated Ops procedures written/
vetted by the delivery teams
• Ops remained in full control of what
can run and security policy
• Empowered NOC and other support
teams with self-service ops tasks
• Empowered developers with limited
self-service operations
93. “Support at the Edge”
Sources: https://www.youtube.com/watch?v=_hr4KiB19bQ
http://rundeck.org/stories/mark_maun.html
• Automated Ops procedures written/
vetted by the delivery teams
• Ops remained in full control of what
can run and security policy
• Empowered NOC and other support
teams with self-service ops tasks
• Empowered developers with limited
self-service operations
• Combined with new incident response
and escalation model
94. 1. Use Lean thinking and analyze like any process
2. Make sure you are measuring the right things
3. “Shift Left” control and decision making
4. Get rid of silos
5. Establish Operations as a Service
6. Encourage Developers to think “Operable First”
Recap!