Damon Edwards, co-founder of Rundeck, presentation at Nike internal DevOps Day in Beaverton, OR on June 18, 2018.
This talk looks at the forces that fundamentally undermine operations work and what needs to be addressed if enterprises are going to get the most out of their digital transformation and DevOps initiatives.
See a Demo of Rundeck Enterprise :
https://www.rundeck.com/see-demo
--or--
Download Rundeck Open Source here:
https://rundeck.com/open-source
Connect:
Stack Overflow community: https://stackoverflow.com/questions/tagged/rundeck
Github: https://github.com/rundeck/rundeck/issues
Twitter: https://twitter.com/Rundeck
Facebook: https://www.facebook.com/RundeckInc/
LinkedIn: www.linkedin.com › company › rundeck-inc
Security Validation through Continuous Delivery at Verizon - DEV403 - re:Inve...Amazon Web Services
In this session, Verizon and Stelligent demonstrate techniques and approaches on how to validate your security infrastructure during the development process through Continuous Security, and keep it that way through AWS Lambda auto-remediation. Verizon and Stelligent present a hands-on demo of these techniques, and a deep dive into the code that enables these technologies.
Observability for Modern Applications (CON306-R1) - AWS re:Invent 2018Amazon Web Services
In modern, microservices-based applications, it’s critical to have end-to-end observability of each microservice and the communications between them in order to quickly identify and debug issues. In this session, we cover the techniques and tools to achieve consistent, full-application observability, including monitoring, tracing, logging, and service mesh.
Customer case - Dynatrace Monitoring RedefinedMichel Duruel
One of the largest Airline in the world chose Dynatrace, here is the customer case.
Including:
Vision and Goal / Challenges / Requirements / Why Dynatrace is Unique / ROI and TCO / Rollout Status / Solution Screenshots
Dynatrace redefined monitoring with AI powered 3rd Generation APM, User Experience Monitoring & Continuous Improvement, Cloud-native, Full Stack, Auto Everything, End-to-End, Easiest to Implement, Use and Maintain
This presentation is intended to provide an overview of various options for off-shoring activities mainly ones from the IT field to India.
It discusses Off-shoring maturity model (OMM) that allows a company to setup and manage the ODC successfully with the help of a competent partner like Sangsoft.
Adopting Kubernetes for production has huge impacts on operations at all levels. We present our pattern for formalizing cluster operations as a separate role from infrastructure and application operations, and explore the impact on the role of the SRE.
Application Performance Monitoring is a mandatory discipline of any production environment of today. But due to the heterogeneous nature of modern applications, it faces many challenges.
Note: This presentation was made for a 2008 seminar.
The technical debt metaphor is useful in capturing the long-term impacts of
tradeoffs taken during software maintenance between productivity (getting
something done sooner) and maintainability (degradation of the code's
quality over time). This webinar on Technical Debt will present
techniques and insights that help software engineers to identify and track
technical debt in their projects. We will outline how business and product
quality goals should affect the choice of approaches (and combinations of
approaches) for managing technical debt. More specifically, we will discuss
a set of automated approaches based on static code analysis that are likely
to spot problems in source code that have real impact on productivity and
defect proneness. Based on previous empirical studies, we will give further
advice on which types of debt can be found by these tools, and which types
are not yet detectable.
Security Validation through Continuous Delivery at Verizon - DEV403 - re:Inve...Amazon Web Services
In this session, Verizon and Stelligent demonstrate techniques and approaches on how to validate your security infrastructure during the development process through Continuous Security, and keep it that way through AWS Lambda auto-remediation. Verizon and Stelligent present a hands-on demo of these techniques, and a deep dive into the code that enables these technologies.
Observability for Modern Applications (CON306-R1) - AWS re:Invent 2018Amazon Web Services
In modern, microservices-based applications, it’s critical to have end-to-end observability of each microservice and the communications between them in order to quickly identify and debug issues. In this session, we cover the techniques and tools to achieve consistent, full-application observability, including monitoring, tracing, logging, and service mesh.
Customer case - Dynatrace Monitoring RedefinedMichel Duruel
One of the largest Airline in the world chose Dynatrace, here is the customer case.
Including:
Vision and Goal / Challenges / Requirements / Why Dynatrace is Unique / ROI and TCO / Rollout Status / Solution Screenshots
Dynatrace redefined monitoring with AI powered 3rd Generation APM, User Experience Monitoring & Continuous Improvement, Cloud-native, Full Stack, Auto Everything, End-to-End, Easiest to Implement, Use and Maintain
This presentation is intended to provide an overview of various options for off-shoring activities mainly ones from the IT field to India.
It discusses Off-shoring maturity model (OMM) that allows a company to setup and manage the ODC successfully with the help of a competent partner like Sangsoft.
Adopting Kubernetes for production has huge impacts on operations at all levels. We present our pattern for formalizing cluster operations as a separate role from infrastructure and application operations, and explore the impact on the role of the SRE.
Application Performance Monitoring is a mandatory discipline of any production environment of today. But due to the heterogeneous nature of modern applications, it faces many challenges.
Note: This presentation was made for a 2008 seminar.
The technical debt metaphor is useful in capturing the long-term impacts of
tradeoffs taken during software maintenance between productivity (getting
something done sooner) and maintainability (degradation of the code's
quality over time). This webinar on Technical Debt will present
techniques and insights that help software engineers to identify and track
technical debt in their projects. We will outline how business and product
quality goals should affect the choice of approaches (and combinations of
approaches) for managing technical debt. More specifically, we will discuss
a set of automated approaches based on static code analysis that are likely
to spot problems in source code that have real impact on productivity and
defect proneness. Based on previous empirical studies, we will give further
advice on which types of debt can be found by these tools, and which types
are not yet detectable.
DevOps Evolution - The Next Generation ?Marc Hornbeek
Where is DevOps in its maturity? Is DevOps life near its beginning, middle, mature, near end-of-life or near extinction? What does the next generation look like? This presentation posits the next generation will be a new level of process optimization driven by coupling analytics with DevOps pipeline tools and associated role shifts.
This talk explains a proven approach to assessment SRE practices for an organization. The approach uses a 9 pillar model and 7 step transformation blueprint to determine current state of SRE practices and to set a roadmap to improve SRE practices towards industry best practices.
Getting started with Site Reliability Engineering (SRE)Abeer R
"Getting started with Site Reliability Engineering (SRE): A guide to improving systems reliability at production"
This is an intro guide to share some of the common concepts of SRE to a non-technical audience. We will look at both technical and organizational changes that should be adopted to increase operational efficiency, ultimately benefiting for global optimizations - such as minimize downtime, improve systems architecture & infrastructure:
- improving incident response
- Defining error budgets
- Better monitoring of systems
- Getting the best out of systems alerting
- Eliminating manual, repetitive actions (toils) by automation
- Designing better on-call shifts/rotations
How to design the role of the Site Reliability Engineer (who effectively works between application development teams and operations support teams)
Transforming Consumer Banking with a 100% Cloud-Based Bank (FSV204) - AWS re:...Amazon Web Services
Customer demands for higher levels of service and value, constantly evolving technology capabilities, and stringent regulatory requirements are all powerful forces reshaping retail banking. Built exclusively on AWS, Starling Bank’s 100% cloud-based, mobile-only banking solution satisfies regulators in terms of its resilience, security, and reliability. It also satisfies consumers by giving them greater control over their data, streamlining the account opening process, accelerating payments, and providing access to innovative new services developed from scratch with open APIs, a developer platform, integration with Apple Pay, Google Pay, and Fitbit Pay and a custom backend ledger and payments integrations. Starling Bank is leading the open banking revolution. In this session, learn how Starling Bank delivers value to their customers and innovates at a very fast pace in a sector that can be slow to evolve.
Marlabs Capabilities Overview: Application Maintenance Support Services Marlabs
Marlabs application development and support services include application design, development, systems integration/consolidation, re-engineering, and implementation of packages.
Learn how Site24x7 gives you end-to-end application performance visibility for your Java, .NET and Ruby web transactions with metrics of all components starting from URLs to SQL queries.
What does a Maturity Curve for Enterprise Adoption of Agile and DevOps look like? Where would an organization like yours rank on the curve? Are there specific areas of improvement you might want to consider?
In this presentation I will speak how are the SRE and DevOps, what is a reliability. Also about the reliability approach in Competitive Gaming in Wargaming and show a few cases.
Presentation about IT managed services and solutions being offered by IISGL .
At IISGL, we have a fully consultative approach. We want
to understand your business, its pain points and
ambitions. We can then utilize that knowledge,
dovetailing with our years of extensive experience of
the technologies available, to provide you with a custom
solution.
It is not to complicated to keep new project with good code quality for half year. Maybe, for one year. But what if team works on some project for years? Or even ”better”: you need to support and grow large project after another team. Presentation describes Continuous Inspection, main measures of code quality that will make your life better, continuous inspection and how to cook it with SonarQube.
This talk was prepared and performed as lightning talk for 15 minutes at XP Days 2016 in Kiev.
Think that DevOps is just for product? Think again.
In this webinar, ITSM expert John Custy shows you how to apply DevOps principles to your IT org. This event is for anyone involved in the support and development of IT systems and services. The keys to higher-performing services are so simple, they might surprise you.
Watch the full webinar here: http://atlassian.com/help-desk/how-to-run-it-support-devops-way
Brought to you by JIRA Service Desk. Learn more: http://atlassian.com/service-desk
Dynatrace: New Approach to Digital Performance Management - Gartner Symposium...Michael Allen
New cloud stacks, containers, micro-services, automation and DevOps is driving an explosion of application code and infrastructure complexity. It's now nearly impossible to solve the Digital Application Performance Management challenges with traditional tools and approaches. Hear how we are delivering on our vision for Digital performance management, and how the role of digital virtual assistants might transcend into your enterprise. Meet D.A.V.I.S.
SRE (service reliability engineer) on big DevOps platform running on the clou...DevClub_lv
SRE (service reliability engineer). The talk is to explain the SRE philosophy and the principles of production engineering and operations in clouds.
(Language – English)
Pavlo is ADOP (Accenture DevOps Platform) Service Reliability Team Lead, SRE practitioner. Has more then 18 years of IT experience in Ops and Dev.
Operations: The Last Mile Problem For DevOpsRundeck
Presented by Damon Edwards, co-founder of Rundeck, at DevOps Enterprise Summit London 2018
View the video here:
https://www.youtube.com/watch?v=dp76E7j0FdQ&t=755s
See a Demo of Rundeck Enterprise :
https://www.rundeck.com/see-demo
--or--
Download Rundeck Open Source here:
https://rundeck.com/open-source
Connect:
Stack Overflow community: https://stackoverflow.com/questions/tagged/rundeck
Github: https://github.com/rundeck/rundeck/issues
Twitter: https://twitter.com/Rundeck
Facebook: https://www.facebook.com/RundeckInc/
LinkedIn: www.linkedin.com › company › rundeck-inc
Ops Happens: Improving Incident Response Using DevOps and SRE PracticesRundeck
Damon Edwards, co-founder of Rundeck, presents at Interop ITX in Las Vegas on May 3, 2018.
See a Demo of Rundeck Enterprise :
https://www.rundeck.com/see-demo
--or--
Download Rundeck Open Source here:
https://rundeck.com/open-source
Connect:
Stack Overflow community: https://stackoverflow.com/questions/tagged/rundeck
Github: https://github.com/rundeck/rundeck/issues
Twitter: https://twitter.com/Rundeck
Facebook: https://www.facebook.com/RundeckInc/
LinkedIn: www.linkedin.com › company › rundeck-inc
DevOps Evolution - The Next Generation ?Marc Hornbeek
Where is DevOps in its maturity? Is DevOps life near its beginning, middle, mature, near end-of-life or near extinction? What does the next generation look like? This presentation posits the next generation will be a new level of process optimization driven by coupling analytics with DevOps pipeline tools and associated role shifts.
This talk explains a proven approach to assessment SRE practices for an organization. The approach uses a 9 pillar model and 7 step transformation blueprint to determine current state of SRE practices and to set a roadmap to improve SRE practices towards industry best practices.
Getting started with Site Reliability Engineering (SRE)Abeer R
"Getting started with Site Reliability Engineering (SRE): A guide to improving systems reliability at production"
This is an intro guide to share some of the common concepts of SRE to a non-technical audience. We will look at both technical and organizational changes that should be adopted to increase operational efficiency, ultimately benefiting for global optimizations - such as minimize downtime, improve systems architecture & infrastructure:
- improving incident response
- Defining error budgets
- Better monitoring of systems
- Getting the best out of systems alerting
- Eliminating manual, repetitive actions (toils) by automation
- Designing better on-call shifts/rotations
How to design the role of the Site Reliability Engineer (who effectively works between application development teams and operations support teams)
Transforming Consumer Banking with a 100% Cloud-Based Bank (FSV204) - AWS re:...Amazon Web Services
Customer demands for higher levels of service and value, constantly evolving technology capabilities, and stringent regulatory requirements are all powerful forces reshaping retail banking. Built exclusively on AWS, Starling Bank’s 100% cloud-based, mobile-only banking solution satisfies regulators in terms of its resilience, security, and reliability. It also satisfies consumers by giving them greater control over their data, streamlining the account opening process, accelerating payments, and providing access to innovative new services developed from scratch with open APIs, a developer platform, integration with Apple Pay, Google Pay, and Fitbit Pay and a custom backend ledger and payments integrations. Starling Bank is leading the open banking revolution. In this session, learn how Starling Bank delivers value to their customers and innovates at a very fast pace in a sector that can be slow to evolve.
Marlabs Capabilities Overview: Application Maintenance Support Services Marlabs
Marlabs application development and support services include application design, development, systems integration/consolidation, re-engineering, and implementation of packages.
Learn how Site24x7 gives you end-to-end application performance visibility for your Java, .NET and Ruby web transactions with metrics of all components starting from URLs to SQL queries.
What does a Maturity Curve for Enterprise Adoption of Agile and DevOps look like? Where would an organization like yours rank on the curve? Are there specific areas of improvement you might want to consider?
In this presentation I will speak how are the SRE and DevOps, what is a reliability. Also about the reliability approach in Competitive Gaming in Wargaming and show a few cases.
Presentation about IT managed services and solutions being offered by IISGL .
At IISGL, we have a fully consultative approach. We want
to understand your business, its pain points and
ambitions. We can then utilize that knowledge,
dovetailing with our years of extensive experience of
the technologies available, to provide you with a custom
solution.
It is not to complicated to keep new project with good code quality for half year. Maybe, for one year. But what if team works on some project for years? Or even ”better”: you need to support and grow large project after another team. Presentation describes Continuous Inspection, main measures of code quality that will make your life better, continuous inspection and how to cook it with SonarQube.
This talk was prepared and performed as lightning talk for 15 minutes at XP Days 2016 in Kiev.
Think that DevOps is just for product? Think again.
In this webinar, ITSM expert John Custy shows you how to apply DevOps principles to your IT org. This event is for anyone involved in the support and development of IT systems and services. The keys to higher-performing services are so simple, they might surprise you.
Watch the full webinar here: http://atlassian.com/help-desk/how-to-run-it-support-devops-way
Brought to you by JIRA Service Desk. Learn more: http://atlassian.com/service-desk
Dynatrace: New Approach to Digital Performance Management - Gartner Symposium...Michael Allen
New cloud stacks, containers, micro-services, automation and DevOps is driving an explosion of application code and infrastructure complexity. It's now nearly impossible to solve the Digital Application Performance Management challenges with traditional tools and approaches. Hear how we are delivering on our vision for Digital performance management, and how the role of digital virtual assistants might transcend into your enterprise. Meet D.A.V.I.S.
SRE (service reliability engineer) on big DevOps platform running on the clou...DevClub_lv
SRE (service reliability engineer). The talk is to explain the SRE philosophy and the principles of production engineering and operations in clouds.
(Language – English)
Pavlo is ADOP (Accenture DevOps Platform) Service Reliability Team Lead, SRE practitioner. Has more then 18 years of IT experience in Ops and Dev.
Operations: The Last Mile Problem For DevOpsRundeck
Presented by Damon Edwards, co-founder of Rundeck, at DevOps Enterprise Summit London 2018
View the video here:
https://www.youtube.com/watch?v=dp76E7j0FdQ&t=755s
See a Demo of Rundeck Enterprise :
https://www.rundeck.com/see-demo
--or--
Download Rundeck Open Source here:
https://rundeck.com/open-source
Connect:
Stack Overflow community: https://stackoverflow.com/questions/tagged/rundeck
Github: https://github.com/rundeck/rundeck/issues
Twitter: https://twitter.com/Rundeck
Facebook: https://www.facebook.com/RundeckInc/
LinkedIn: www.linkedin.com › company › rundeck-inc
Ops Happens: Improving Incident Response Using DevOps and SRE PracticesRundeck
Damon Edwards, co-founder of Rundeck, presents at Interop ITX in Las Vegas on May 3, 2018.
See a Demo of Rundeck Enterprise :
https://www.rundeck.com/see-demo
--or--
Download Rundeck Open Source here:
https://rundeck.com/open-source
Connect:
Stack Overflow community: https://stackoverflow.com/questions/tagged/rundeck
Github: https://github.com/rundeck/rundeck/issues
Twitter: https://twitter.com/Rundeck
Facebook: https://www.facebook.com/RundeckInc/
LinkedIn: www.linkedin.com › company › rundeck-inc
Tickets Make Operations Work Unnecessarily MiserableRundeck
Presentation by Damon Edwards, co-founder of Rundeck, at Interop ITX 2019 Las Vegas
Ticket-driven request queues have become the default way of working in operations for a long time and are the cornerstone of most operations management and ITSM strategies. But what if ticket queues are actually the source of much of the dysfunction, bottlenecks, and capacity issues that have traditionally plagued our organizations? This session will examine the dark side of ticket queues, including hidden costs and how they undermine DevOps and SRE transformations. Then we’ll explore alternative strategies high-performing operations organizations use to minimize dependence on tickets queues.
Making Tomorrow Better than Today - Unlocking the Full Potential of OperationsRundeck
Keynote presentation by Damon Edward, co-founder of Rundeck, at DevOps Days Salt Lake City on May 15, 2019.
See a Demo of Rundeck Enterprise :
https://www.rundeck.com/see-demo
--or--
Download Rundeck Open Source here:
https://rundeck.com/open-source
Connect:
Stack Overflow community: https://stackoverflow.com/questions/tagged/rundeck
Github: https://github.com/rundeck/rundeck/issues
Twitter: https://twitter.com/Rundeck
Facebook: https://www.facebook.com/RundeckInc/
LinkedIn: www.linkedin.com › company › rundeck-inc
Some DevOps transformations flourish, but many others are stalling. Why is that?
Damon Edwards, co-founder at Rundeck, makes the case that Operations is the difference maker.
As presented at Comcast DevOps Days Denver 2019
See a Demo of Rundeck Enterprise :
https://www.rundeck.com/see-demo
--or--
Download Rundeck Open Source here:
https://rundeck.com/open-source
Connect:
Stack Overflow community: https://stackoverflow.com/questions/tagged/rundeck
Github: https://github.com/rundeck/rundeck/issues
Twitter: https://twitter.com/Rundeck
Facebook: https://www.facebook.com/RundeckInc/
LinkedIn: www.linkedin.com › company › rundeck-inc
"Product Architecture: failures and lessons learnt" - Royi Benyossef @Product...Product of Things
Product architecture is the scheme by which the function of a product is allocated to physical components. The process includes building out a software and hardware product, while simultaneously conducting market research, receiving customer feedback, and developing the hardware, must be an informed and strategic process.In his session Royi will discuss the various architectures that were required for his team to develop in order to achieve different, yet optimal product versions for the Vidmind product. Through each product version, Royi covered where they went wrong and elaborate on what the company did to resolve these challenges in the next version and of course the outcome of each change that was implemented.
Failure Happens: Improving Incident Response In Enterprises Rundeck
Presentation by Damon Edwards, co-founder of Rundeck at USENIX LISA in San Francisco, CA on November 3, 2017
See a Demo of Rundeck Enterprise :
https://www.rundeck.com/see-demo
--or--
Download Rundeck Open Source here:
https://rundeck.com/open-source
Connect:
Stack Overflow community: https://stackoverflow.com/questions/tagged/rundeck
Github: https://github.com/rundeck/rundeck/issues
Twitter: https://twitter.com/Rundeck
Facebook: https://www.facebook.com/RundeckInc/
LinkedIn: www.linkedin.com › company › rundeck-inc
SysAdmin to SRE: Solving the Last Mile ProblemRundeck
Presented by Damon Edwards, co-founder of Rundeck, at DevOps Days Dallas on August 20, 2019.
Some DevOps transformations flourish, but others are stalling. Why is that? This talk will make the case that Operations is the most predictable differentiator.
So much of the energy in DevOps has been about activities that start in Dev and move towards Ops — continuous delivery, deployment pipelines, automated testing, and of course, the unofficial mantra of “deploy, deploy, deploy. “However, post-deployment, too many DevOps transformations maintain the status quo and leave questionable Operations practices in place.
Now along comes a new vision for Operations called SRE (a.k.a. Site Reliability Engineering)… But SRE seems almost too good to be true!
SREs are cover much of what systems administrators used to do, but get to spend most of their time doing engineering work that adds enduring value to their company? How is it that SREs’ don’t get caught up in the interruptions, repetitive work, and drudgery that consumes so much of our time? And how do companies use SRE to do so much more with the same or less headcount?
This talk will take a close look at what SRE is, what SRE isn’t, and how SRE avoids the pitfalls that have plagued traditional Ops work. Finally, we’ll break down the principles behind the SRE movement and highlight how early examples are proving that DevOps + SRE = the end-to-end speed and quality promised since the early days of DevOps.
See a Demo of Rundeck Enterprise :
https://www.rundeck.com/see-demo
--or--
Download Rundeck Open Source here:
https://rundeck.com/open-source
Connect:
Stack Overflow community: https://stackoverflow.com/questions/tagged/rundeck
Github: https://github.com/rundeck/rundeck/issues
Twitter: https://twitter.com/Rundeck
Facebook: https://www.facebook.com/RundeckInc/
LinkedIn: www.linkedin.com › company › rundeck-inc
Scaling Up Lookout was originally presented at Lookout's Scaling for Mobile event on July 25, 2013. R. Tyler Croy is a Senior Software Engineer at Lookout, Inc. Lookout has grown immensely in the last year. We've doubled the size of the company—added more than 80 engineers to the team, support 45+ million users, have over 1000 machines in production, see over 125,000 QPS and more than 2.6 billion requests/month. Our analysts use Hadoop, Hive, and MySQL to interactively manipulate multibillion row tables. With that, there are bound to be some growing pains and lessons learned.
The Ember.js Framework - Everything You Need To KnowAll Things Open
All Things Open 2014 - Day 2
Thursday, October 23rd, 2014
Yehuda Katz
Founder of Tilde
Front Dev 1
The Ember.js Framework - Everything You Need To Know
SaltConf 2015: Salt stack at web scale: Better, Stronger, FasterThomas Jackson
This talk will discuss best practices for scaling SaltStack from thousands to hundreds of thousands of minions. But the devil is in the details and how do you scale without losing performance and making sure it all works? At LinkedIn we've learned some valuable lessons as we've grown our SaltStack footprint. We'll discuss how to run SaltStack, how to not run SaltStack, and how we've contributed to the Salt project to help make it better, stronger and faster.
Youtube: https://www.youtube.com/watch?v=qjFOY-QrW_k
Crossroads of Asynchrony and Graceful DegradationC4Media
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/1VmbI3t.
Nitesh Kant describes how embracing asynchrony in the Netflix applications, from networking to business processing, creates gracefully degrading and highly resilient applications. Filmed at qconsf.com.
Nitesh Kant is an engineer in Netflix’s Edge Gateway team, working on Netflix’s asynchronous Inter Process Communication stack. He is the author of RxNetty which forms the core of this stack and is currently moving Zuul to this new architecture.
Expecto Performa! The Magic and Reality of Performance TuningAtlassian
In the enterprise there are rarely simple solutions to highly nuanced problems that satisfy all needs. Several customers might each ask "How do I make Jira/Confluence faster?" and each require a different answer. Using this example, this talk will pick apart the inputs, outputs, concerns, and realities of answering a short question with a long answer. We'll then discuss real-world examples from our own internal instances, to give you a taste of the process we've gone through to solve our own performance problems, and to show why there is no simple playbook; "it depends" on a lot! The key takeaways are:
* The importance of having a shared definition of performance
* The importance of having agreed-upon priorities, including what isn't important
* The importance of measuring (allthethings) and understanding them
* The thing you think is the problem might not be the problem, and vice versa.
* The real world and the ideal world tend to look nothing alike!
Energy proportionality is the key in order to reduce the Total Cost of Ownership (TCO) of Warehouse Scale Computer (WSC) systems, yet is difficult to achieve in practice. Typical WSC hardware usually does not meet this principle. Furthermore, critical services (e.g. billing) require all servers to remain up regardless the current traffic intensity. These two issues make existing power management technique ineffective at reducing energy use in a WSC dimension. We present Hybrid Performance-aware Power-capping Orchestrator (HyPPO), a distributed Observe Decide Act (ODA) control loop for optimizing energy proportionality of a distribute containerized infrastructures. This first version of HyPPO uses Kubernetes resource metrics (e.g. milli-cpus consumption) in order to dynamically adjust node power consumption, while respecting the Service Level Agreement (SLA) agreement defined by the containerized application owners.
Similar to Modern Operations: Solving DevOps’ Last Mile Problem (20)
Rundeck Community Office Hours: Using Variables with Job Steps Rundeck
Rundeck offers powerful runbook automation. Most Runbooks are complicated multi-step processes. We will show various examples of how to share data from one step to another through the use of Log Filters.
Come join this session to learn how to:
Use different types of Log Filters to gather variables from your Job Steps
Gather variables and use the values in other Job Steps
Use the Result Data feature to format your output in a consistent format regardless of the log output.
Most of what Rundeck does is via one of it’s plugins. There are already over 100+ plugins to perform various services including executing commands on nodes, performing step in a workflow, or sending notification about job status. There may be instances where you need to write your own plugin to perform a specific step or action. In this session, will walk through the steps for writing our own plugin.
In this session you'll learn:
Review the structure of plugin
How to use the structure and what information you need to include in other files to make your plugin work
How to write a simple plugin example using java
How to reply and use your plugin
Lunch and learn: Getting started with Rundeck & AnsibleRundeck
Operations teams depend on a mixture of tools to keep their systems running. One popular pairing for Rundeck users is integrating Ansible playbooks into Rundeck to orchestrate and schedule workflows across multiple tools.
Join us for this Lunch and Learn event to learn how you can use Rundeck to create runbooks that span your existing Ansible playbooks -- as well as any other scripts, tools, APIs, or systems commands, to respond to incidents or perform Operations tasks.
Join us to learn:
Benefits of using Rundeck and Ansible together
How to configure your Rundeck to use the Ansible plugin
Tips for getting started with the integration
And see a demo of the integration
This event is recommended for beginners.
Self Service Cloud Operations: Safely Delegate the Management of your Cloud ...Rundeck
Running Operations is not an easy job, especially these days. Ops teams have to ensure excellent user experiences, resolve incidents quickly and help developers stay productive. Yet at the same time, there is also the need to maintain systems security and keep downtime to a minimum.
While advances in cloud computing have helped address some of these challenges, many organizations find it difficult to leverage the cloud at scale because of bottlenecks that form around repetitive tasks, such as developers having to wait for provisioning infrastructure. Despite having access to abundant cloud resources, these speedbumps often make it difficult to achieve team objectives.
Join this talk to learn:
How to safely delegate the management of your cloud deployment (to developers and other end users) with self-service operations.
How to create powerful runbooks with guardrails that leverage existing scripting languages, infrastructure, and tools to remove bottlenecks that form around repetitive tasks.
Strategies for getting started with self-service.
Rundeck Office Hours: Best Practices Access Control PoliciesRundeck
Join us this month for an AMA discussion followed by a live Q&A led by technical experts from Rundeck’s engineering, product, and solution engineering teams. Experts are available to provide advice on your technical architecture, give recommendations for operational best practices, review current Github issues, or dive into the open source code itself.
Don’t miss the opportunity to learn Rundeck product best practices and ask experts your questions about Rundeck.
https://www.rundeck.com/rundeck-office-hours
Secure IT infrastructure is well protected by access keys, passwords, and other credentials. Admins need these secrets to gain access, as does any automation executed by Rundeck. Rundeck has rich support for secrets management with native key storage, as well as integrations with best-of-breed standardized solutions. In this webinar, we’ll cover best practices for working with Rundeck’s runbook automation platform in securing IT infrastructure. We’ll explore the secrets management options in Rundeck and we’ll highlight a new plugin with Thycotic Secret Server for Privileged Access Management.
In this webinar, we will demonstrate:
How Rundeck works with underlying secrets of the systems it manages
New Rundeck plugins that allow users to protect privileged accounts with enterprise-grade, privileged access management solutions
How you can use Rundeck plugins with HashiCorp Vault, Thycotic, and CyberArk as keys for jobs and other Rundeck configurations
In this session we will give a live walkthrough covering new capabilities released in Rundeck 3.4. Learn about security & compliance improvements we’ve made including the ability to organize secrets management by project -- so now each Runbook can access a different set of passwords and keys for its access control list (ACL). We also have a new plug-in for Thycotic users to manage secrets. Rundeck 3.4 now allows for queueing of jobs when those jobs must be run serially. Finally, we’ll discuss our vision for the future of Rundeck, and our primary development themes for the next year.
Automate Yourself Out of a Job: Safely Delegate the Management of your Azure...Rundeck
Running Operations is not an easy job, especially these days. Ops teams have to ensure excellent user experiences, resolve incidents quickly and help developers stay productive. Yet at the same time, there is also the need to maintain systems security and keep downtime to a minimum - goals which many struggle with at scale.
While advances in cloud computing have helped address some of these challenges, many organizations find it difficult to leverage the cloud at scale because of bottlenecks that form around repetitive tasks, such as developers having to wait for provisioning infrastructure. Despite having access to abundant cloud resources, these speedbumps often make it difficult or impossible to achieve team objectives.
Join this talk to learn:
-How to safely delegate the management of your Azure deployment (to developers and other colleagues) with self-service operations.
-How to create powerful runbooks with guardrails that leverage existing scripting languages (including PowerShell), infrastructure, and tools to remove the human from the bottleneck that forms around repetitive tasks.
-Strategies for getting started
-And how to create an Easy Button to handle the repetitive tasks that are interrupting your flow of work.
As presented by Jesse Houldsworth at PowerShell + DevOps Global Summit 2021
Super-Charge Your Site Reliability Practices with Runbook Automation Rundeck
On Demand Viewing: https://www.rundeck.com/super-charge-reliability
To win in today’s digital age, organizations need to balance product reliability and feature delivery with dynamic business needs and legacy and multi-cloud environments. Automation, as a main SRE practice, scales product reliability practices by reducing tedious tasks related to production operations, freeing up engineers to work on innovation.
Whether you are in a traditional operations organization or a “you build it, you run it” team, this webinar will explore strategies for increasing automation to improve your Operations so you can continue to create excellent experiences for your customers.
-How you can reduce MTTR and eliminate toil with Self-Service Operations
-Common workflow challenges and opportunities
-How you can use Runbook Automation to enable Self-Service Operations
-Ways to leverage existing assets and workflows by integrating Rundeck with existing toolsets
-See a demo of real world cases
https://youtu.be/4jAf6cbxsgo
As operators, it’s our job to monitor infrastructure, systems and applications and only wake up humans for tasks machines can’t fix on their own. Automated remediation pairs monitoring and runbook automation, giving you a monitoring system that can trigger operational actions with runbook automation to shorten incident response times and avoid alert fatigue.
Rundeck Director of Product Management Forrest Evans and Sensu Developer Advocate Todd Campbell discuss the key role automated remediation plays in the monitoring journey, with live demos of both the Rundeck and Sensu integrations. You’ll learn all about monitoring as code workflows with the Sensu Observability Pipeline and how to deliver runbook automation with Rundeck — and see how the two together can help you achieve automated remediation.
Failure is inevitable. But are you incurring more downtime and disruption than necessary? Legacy incident response techniques have difficulty keeping up with the increasing pace of change and skyrocketing complexity of today’s application environments.
During this webinar, you’ll learn about modern incident response techniques that can dramatically shorten incidents and reduce escalations. Join the experts from Rundeck and PagerDuty as they share:
*How a real-time operations platform intelligently manages alerts and on-call mobilization, delivering the right people the right information at the right time
*How runbook automation gives front-line response teams self-service access to run automated workflows – or runbooks – that diagnose and resolve incidents without escalating to an expert.
*How to automatically detect, diagnose, and resolve incidents without human intervention.
https://youtu.be/9yYwTPMRSOY
Nathan Fluegel, head of Customer Success at Rundeck, talks clustering and high availability. We'll show how to deploy Rundeck servers in a clustered configuration with Rundeck Enterprise.
https://youtu.be/PmBIGP3M9sI
Understand how to migrate your Rundeck environment from the community edition to Enterprise, including the pros and cons of each migratory approach.
In this webinar, you will learn how to:
-Determine which migration approach is most appropriate for your environment
-Shift from a single-server to clustered environment
-Migrate jobs and projects while keeping a clean install
Business Continuity for Humans: Keeping Your Business Running When Your Peopl...Rundeck
Damon Edwards (Rundeck) presentation from TechStrongConf on June 4, 2020.
Learn more: https://www.rundeck.com/business-continuity-for-digital-operations
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
15. SRE
“It’s a problem
with the Foo
service”
SRE
SRE
Foo
SRE
SRE
SRE
SRE
Bridge
Call
Biz
Manager
Foo
Service
No.
NOC
(Bob)
Update
Ticket
Ticket
Foo
Lead Dev
+ add
12:00pm
NOC (Bob)
Biz Manager
Foo SRE
Ticket
Context Wagon
Can you
fix it?
17. k
Foo
Lead Dev
(Karen)
I’m going to need
more log files
Ticket
SysAdmin Team
+ add
Update
Ticket
Chat
“Can someone with
access to Foo Service
in Prod01 help me with
ticket #42516?”
SysAdmin
(Lee) Ticket
“logs
attached”
Foo
Lead Dev
(Karen)
Ticket
“no the
other ones”
Le
(K
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo SRE
SysAdmin (Lee)
Ticket
Context Wagon
18. Foo
Lead Dev
(Karen)
Logs
-Who restarted these services? (and why?)
-They didn’t use the correct environment
variables!
-This entire service pool needs to be restarted!
Ticket
Update
Ticket
NOC
(Bob)
Update
Ticket
Ticket
Middleware Team
+ add
“Middleware, please
urgent restart this entire
app pool with the correct
environment variable”
2:00pm
Ticket
Context W
19. ase
s entire
e correct
able”
NOC
(Bob)
Middleware
Manager
(Melissa)
No way. It’s the middle
of the day! You need
business approval.
NOC
(Bob)
Update
Ticket
Ticket
SVP for Line of
Business
+ add
(S
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo SRE
SysAdmin (Lee)
Middleware Manager
NOC (B
Biz Ma
App Ma
Lead D
Foo SR
Ticket
Context Wagon
Ticket
Context Wagon
2:30pm
20. Update
Ticket
Ticket
SVP for Line of
Business
+ add
SVP
(Susan)
Chief of
Staff
Tech VP
Tech VP
Update
Ticket
Ticket
“Restart approved”
Customer
impact?
Ticket
Middlewa
Manage
(Melissa
Wh
prod
5:00pm
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo SRE
SysAdmin (Lee)
Middleware Manager
SVP
Chief of Staff
2 x Tech VP
Ticket
Context Wagon
21. Share
point
proved”
Ticket
Middleware
Manager
(Melissa)
Who knows these
production services
the best?
Ellen!
Middleware Middleware
(Scott)
Ellen
to
Europe
office
Middleware
(Scott)
Trial and error
.doc
5:00pm
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo SRE
SysAdmin (Lee)
Middleware Manager
SVP
Chief of Staff
2 x Tech VP
Middleware (Scott)
Ticket
Context Wagon
22. Share
point
Middleware
(Scott)
Trial and error
.doc
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo SRE
SysAdmin (Lee)
Middleware Manager
SVP
Chief of Staff
2 x Tech VP
Middleware (Scott)
ket
Context Wagon
Middleware
(Scott)
Bar
Service
10 min Middleware
(Scott)
Waiting for
Acme Service
Acme startup
failed
Bar
Service
6:00pm
26. -Bar app startup timed out. Error says can’t
connect to Acme service.
- I looked at Acme but it seems to be running
-Is this error message correct? Why can’t Bar
connect?
Ticket
Update
Ticket
Middleware
(Scott)
Bar SRE
+ add
Bar SRE
(Linda)
Middleware
(Scott)
-URGENT: Network
connection issue
between Bar and
Acme
Ticket
Update
Ticket
Network
SRE Team
+ add
6:45
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo SRE
SysAdmin (Lee)
Middleware Manager
SVP
Chief of Staff
2 x Tech VP
Middleware (Scott)
Bar SRE (Linda)Ticket
Context Wagon
The new environment pre-flight
check is preventing startup.
Looks like Bar’s connection to
Acme is being blocked.
27. Bar SRE
(Linda)
Middleware
(Scott)
-URGENT: Network
connection issue
between Bar and
Acme
Ticket
Update
Ticket
Network
SRE Team
+ add
Bar
Lead Dev
6:45pm
ob)
ager
nager
ev (Karen)
E
SysAdmin (Lee)
Middleware Manager
SVP
Chief of Staff
2 x Tech VP
Middleware (Scott)
Bar SRE (Linda)
Customers are
calling. What
is going on?The new environment pre-flight
check is preventing startup.
Looks like Bar’s connection to
Acme is being blocked.
Bar
Lead Dev
(Liu)
Business
Managers
I can comment out
the test… But the
CD pipeline only
goes to QA ENV!
28. Network Dir
(Carlos)
Middleware
(Scott)
Carlos, I need a favor.
Can you escalate?Middleware
Manager
(Melissa)
Customers are
calling. What
is going on?
Last week..
Net SRE
VP
VP
Priority!
Different
Incident!
Net SRE Net SRE
Net SRE
Its the network!
Business
Managers
Your
network is
broken!
Business
Managers
We are already
working on it!
Network VPs
out
he
ly
V!
29. Network
SRE
(Hari)
The firewall is
blocking the traffic
You’ll have to take
it up with the
Firewall Team
-URGENT: Firewall is
blocking connection
between Bar and Acme
Ticket
Open
Firewall
Ticket
Firewall
Team
+ add
Firewall Engineer
(Freddie)
Middleware
(Scott)
Paging on-call…
Open bridge…
Can’t be the firewall, it hasn’t
changed since last Thursday.
No its the firewall.
8:00p
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo SRE
SysAdmin (Lee)
Middleware Manager
SVP
Chief of Staff
2 x Tech VP
Middleware (Scott)
Bar SRE (Linda)
Network PM (Carlos)
Network SRE (Bob)
Ticket
Context Wagon
30. Firewall Engineer
(Freddie)
Middleware
(Scott)
Firewall Engineer
(Freddie)
Middleware
(Scott)
Can’t be the firewall, it hasn’t
changed since last Thursday.
No its the firewall.
There was a rule change last
Thursday that would stop Bar
from talking to Acme.
Can you change it back?
Sure we make changes on
Thursday…
Chief of
Staff
SVP and VPs are livid… this was
supposed to be a safe change!!
Freddie, we’ve got customers calling.
ES
Em
pro
rul
Update
Firewall
Ticket
Firewall Engineer
(Freddie)
8:00pm
31. d VPs are livid… this was
sed to be a safe change!!
we’ve got customers calling.
ESCALATE:
Emergency
production firewall
rule change review
Ticket
Update
Firewall
Ticket
NetSec
+ add
Firewall Engineer
(Freddie)
Paging on-call…
NetSec
(Nicole)
This is production so I’ll have
to get others on the Network
CAB…
Chief of
Staff
Firewall
(Freddie)
Middleware
(Scott)
Customer outage!
… I’ll call SVP Susan
Middleware
Manager
VP
VP
Bar
Lead Dev
9:00pm
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo SRE
SysAd
Middle
SVP
Chief o
2 x Tec
Ticket
Context Wagon
32. I’ll have
Network
Chief of
Staff
Firewall
(Freddie)
Middleware
(Scott)
Customer outage!
APPROVE: Emergency
firewall rule change
Ticket
Update
Firewall
Ticket
NetSec
(Nicole)
… I’ll call SVP Susan
Middleware
Manager
VP
VP
Bar
Lead Dev
Firewall
(Freddie)
Net L2
(Bob)
Middl
(Sc
Firewall
change
Restart Bar
9:30pm
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo SRE
SysAdmin (Lee)
Middleware Manager
SVP
Chief of Staff
2 x Tech VP
Middleware (Scott)
Bar SRE (Linda)
Network PM (Carlos)
Network SRE (Bob)
Firewall (Freddie)
Ticket
Context Wagon
NetSec (Nicole)
35. e
Ticket
“APIs OK”
Middleware
(Scott)
Update
Ticket
Ticket
“Services
restarted OK”
NOC
NOC
Lights are green…
I guess it is fixed.
Close
Ticket
NOC
(Bob)
Zzz
11:30pm
N
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo SRE
SysAdmin (Lee)
Middleware Manager
SVP
Chief of Staff
2 x Tech VP
Middleware (Scott)
Bar SRE (Linda)
Network PM (Carlos)
Network SRE (Bob)
Firewall (Freddie)
Ticket
Context Wagon
NetSec (Nicole)
Cust. Engmt. (Varsha)
36. e
Ticket
“APIs OK”
Middleware
(Scott)
Update
Ticket
Ticket
“Services
restarted OK”
NOC
NOC
Lights are green…
I guess it is fixed.
Close
Ticket
NOC
(Bob)
Zzz
11:30pm
N
NOC (Bob)
Biz Manager
App Manager
Lead Dev (Karen)
Foo SRE
SysAdmin (Lee)
Middleware Manager
SVP
Chief of Staff
2 x Tech VP
Middleware (Scott)
Bar SRE (Linda)
Network PM (Carlos)
Network SRE (Bob)
Firewall (Freddie)
Ticket
Context Wagon
NetSec (Nicole)
Cust. Engmt. (Varsha)
.
37. NOC
Lights are green…
I guess it is fixed.
Close
Ticket
NOC
(Bob)
Zzz
Next Day
SVP
(Susan)
Whose fault is this?!
Why are we so bad at change?
What additional processes
and approvals are you
adding to never let this
happen again?!
VP
VP
Dir
Dir
VP
Dir
VP
Scott)
da)
Carlos)
(Bob)
die)
NetSec (Nicole)
Cust. Engmt. (Varsha)
39. We’ve invested in Cloud, Agile,
DevOps, Containers…
Why does everything still take too
long and cost too much?
Executive Team
Our transformation has
largely ignored Ops
47. Manual /
Motion Manual /
Motion
Manual /
Motion
Manual /
Motion
Manual /
Motion
Task
Switching
Task
Switching Task
SwitchingTask
Switching Task
Switching
Task
Switching
Task
Switching
Task
Switching
Task
Switching
Waiting Waiting
Waiting
Waiting
Waiting
Waiting
Waiting Waiting
Waiting
Waiting
Waiting
Waiting
Defects
Defects
Defects
Defects
Defects
Partially
Done
Partially
Done
Partially
Done
Partially
Done
Partially
Done
Extra
Process
Extra
Process
Extra
Process
Extra
Process
Extra
Process
Extra
Process
Extra
Process
Extra
Process
Waiting ! Defects ! Motion/Manual ! Task Switching ! Partially Done ! Extra Process
61. “We need better tools”
“We need more people”
Follow the conventional wisdom:
62. “We need better tools”
“We need more people”
“We need more discipline and attention to detail”
Follow the conventional wisdom:
63. “We need better tools”
“We need more people”
“We need more discipline and attention to detail”
“We need more change reviews/approvals”
Follow the conventional wisdom:
64. “We need better tools”
“We need more people”
“We need more discipline and attention to detail”
“We need more change reviews/approvals”
Follow the conventional wisdom:
70. Backlog Information
I need X
PrioritiesTools
Silos
Backlog
I do X
Requests
for X
Silo A
Information
Priorities
Silo B
Tools
71. Silos cause disconnects and mismatches
Backlog Information
I need X
PrioritiesTools
Backlog
I do X
Requests
for X
Silo A
Information
Priorities
Silo B
Tools
Context
Context
Process
Process
Tooling
Tooling
Capacity
Capacity
74. Function A
Function B
Function C
Becomes siloed labor pools of functional specialists
Requests fulfilled by semi-
manual or manual effort
Primary management focus is
on protecting team capacity
76. How do we cover for our silos disconnects and mismatches?
Silo A Silo B
77. How do we cover for our silos disconnects and mismatches?
Silo A Silo B
Ticket
Queue
78. ??
Silo A Silo B
We all know how well that works
Ticket
Queue
79. Request queues are an expensive way to manage work
Ticket
Queue
Queues Create…
Longer Cycle Time
Increased Risk
More Variability
More Overhead
Lower Quality
Less Motivation
Adapted from Donald G. Reinertsen, The Principles of Product Development Flow: Second Generation Lean Product Development
84. Tickets queues become “snowflake makers”
??
Silo A Silo B
Ticket
Queue
Snowflakes
(each unique, technically acceptable but unreproducible and brittle)
87. Excessive toil prevents fixing the system
“Toil is the kind of work tied to running a production
service that tends to be manual, repetitive,
automatable, tactical, devoid of enduring value, and
that scales linearly as a service grows.”
-Vivek Rau
Google
88. Excessive toil prevents fixing the system
Toil Engineering Work
E.W.Toil
Reduce toil
Improve the business ǡ
No capacity to reduce toil
No capacity to improve business
Toil at manageable percentage of capacity
Toil at unmanageable percentage of capacity (“Engineering Bankruptcy”)
89. Excessive toil prevents fixing the system
Toil Engineering Work
E.W.Toil
Reduce toil
Improve the business ǡ
No capacity to reduce toil
No capacity to improve business
Toil at manageable percentage of capacity
Toil at unmanageable percentage of capacity (“Engineering Bankruptcy”)
91. Toil Impacts Development As Well
• 2016-2017 study of development teams
• 14 enterprises (insurance, healthcare, finance, travel, retail)
• Tech org headcount: 900 - 6800
92. Toil Impacts Development As Well
“28% - 63% of development teams’ total
time was consumed by operations toil.”
• 2016-2017 study of development teams
• 14 enterprises (insurance, healthcare, finance, travel, retail)
• Tech org headcount: 900 - 6800
93. Toil Impacts Development As Well
“28% - 63% of development teams’ total
time was consumed by operations toil.”
Waiting for environments Incident escalations
Rework due to env. differences Handoffs
Network issues Requests for information
Broken lower environments Change meetings
And more…
• 2016-2017 study of development teams
• 14 enterprises (insurance, healthcare, finance, travel, retail)
• Tech org headcount: 900 - 6800
97. All work is contextual
rm -rf $PATHNAME
John
Allspaw
98. All work is contextual
rm -rf $PATHNAME Is this dangerous?
John
Allspaw
99. All work is contextual
rm -rf $PATHNAME
John
Allspaw
100. All work is contextual
rm -rf $PATHNAME
John
Allspaw
101. All work is contextual
rm -rf $PATHNAME
Is this dangerous?
John
Allspaw
102. All work is contextual
rm -rf $PATHNAME
John
Allspaw
103. All work is contextual
rm -rf $PATHNAME
Answer is always
“it depends”
John
Allspaw
104. escalate
1° 2° 3° 4°
escalate escalateor
Context
Where are decisions made? Who can take action?
105. Low trust + approvals = illusion of control
Ticket
System
106. Low trust + approvals = illusion of control
Ticket
System
Add up the total number of approval requests and
107. Low trust + approvals = illusion of control
Ticket
System
Add up the total number of approval requests and
…subtract the info radiators (“I need to be in the loop”)
108. Low trust + approvals = illusion of control
Ticket
System
Add up the total number of approval requests and
…subtract the info radiators (“I need to be in the loop”)
…subtract the CYAs (“Prove you followed the process”)
109. Low trust + approvals = illusion of control
Ticket
System
Add up the total number of approval requests and
…subtract the info radiators (“I need to be in the loop”)
…subtract the CYAs (“Prove you followed the process”)
…subtract the too removed to judge (“mostly guessing”)
110. Low trust + approvals = illusion of control
Ticket
System
Add up the total number of approval requests and
…subtract the info radiators (“I need to be in the loop”)
…subtract the CYAs (“Prove you followed the process”)
…subtract the too removed to judge (“mostly guessing”)
111. Low trust + approvals = illusion of control
Ticket
System
Add up the total number of approval requests and
…subtract the info radiators (“I need to be in the loop”)
…subtract the CYAs (“Prove you followed the process”)
…subtract the too removed to judge (“mostly guessing”)
112. Low trust + approvals = illusion of control
Ticket
System
Add up the total number of approval requests and
…subtract the info radiators (“I need to be in the loop”)
…subtract the CYAs (“Prove you followed the process”)
…subtract the too removed to judge (“mostly guessing”)
How many are you left with?
113. Low trust + approvals = illusion of control
Ticket
System
Add up the total number of approval requests and
…subtract the info radiators (“I need to be in the loop”)
…subtract the CYAs (“Prove you followed the process”)
…subtract the too removed to judge (“mostly guessing”)
How many are you left with?
How many were the right call?
114. Low trust + approvals = illusion of control
Ticket
System
Add up the total number of approval requests and
…subtract the info radiators (“I need to be in the loop”)
…subtract the CYAs (“Prove you followed the process”)
…subtract the too removed to judge (“mostly guessing”)
How many are you left with?
How many were the right call?
How many got rejected?
115. Low trust + approvals = illusion of control
Ticket
System
Add up the total number of approval requests and
…subtract the info radiators (“I need to be in the loop”)
…subtract the CYAs (“Prove you followed the process”)
…subtract the too removed to judge (“mostly guessing”)
How many are you left with?
How many were the right call?
How many got rejected?
118. Obvious: Get rid of as many silos as possible
Old Silo A Old Silo B Old Silo C Old Silo D
119. Old Silo A Old Silo B Old Silo C Old Silo D
Cross-Functional Team 1
Cross-Functional Team 2
Cross-Functional Team n
Obvious: Get rid of as many silos as possible
120. Old Silo A Old Silo B Old Silo C Old Silo D
Cross-Functional Team 1
Cross-Functional Team 2
Cross-Functional Team n
Obvious: Get rid of as many silos as possible
“Horizontal” shared
responsibility, not
everyone do everything!
121. Shared and dedicated responsibility is key
Cross-Functional Team 1
Cross-Functional Team 2
Cross-Functional Team n
Development Team 1
Development Team 2
Development Team n
SRE
Team
Clear handoff requirements
Error budget consequences
“Netflix"
Model
“Google”
Model
122. Shared and dedicated responsibility is key
Cross-Functional Team 1
Cross-Functional Team 2
Cross-Functional Team n
Development Team 1
Development Team 2
Development Team n
SRE
Team
Clear handoff requirements
Error budget consequences
“Netflix"
Model
“Google”
Model
123. Shared and dedicated responsibility is key
Cross-Functional Team 1
Cross-Functional Team 2
Cross-Functional Team n
Development Team 1
Development Team 2
Development Team n
SRE
Team
Clear handoff requirements
Error budget consequences
“Netflix"
Model
“Google”
Model
Same
high-quality,
high-velocity
results!
124. But what about the cross-cutting concerns?
Old Silo A Old Silo B Old Silo C Old Silo D
Cross-Functional Team 1
Cross-Functional Team 2
Cross-Functional Team n
Specialist
Capabilities
Specialist
Capabilities
Specialist
Capabilities
125. But what about the cross-cutting concerns?
Old Silo A Old Silo B Old Silo C Old Silo D
Cross-Functional Team 1
Cross-Functional Team 2
Cross-Functional Team n
Specialist
Capabilities
Specialist
Capabilities
Specialist
Capabilities
Ticket
Queue
Ticket
Queue
Ticket
Queue
126. But what about the cross-cutting concerns?
Old Silo A Old Silo B Old Silo C Old Silo D
Cross-Functional Team 1
Cross-Functional Team 2
Cross-Functional Team n
Specialist
Capabilities
Specialist
Capabilities
Specialist
Capabilities
Ticket
Queue
Ticket
Queue
Ticket
Queue
Ticket
Queue
Ticket
Queue Ticket
Queue
127. Operations as a Service: Turn handoffs into self-service
Operations as a Service
On
Demand
On
Demand
On
Demand
On
Demand
Ops
(embedded)Cross-Functional Product Team 1
Cross-Functional Product Team n Ops
(embedded)
Ops
(builds & operates)
Cross-Functional Product Team 2 Ops
(embedded)
Ops Capability
SRE, Dev, or
Specialist
Ops Capability
SRE, Dev, or
Specialist
Ops Capability
SRE, Dev, or
Specialist
128. Development Team 1
Development Team 2
Development Team n
Ops/SRE
Team
Operations as a Service
On
Demand
On
Demand
On
Demand
On
Demand
Ops
(builds & operates)
Ops Capability
SRE, Dev, or
Specialist
Ops Capability
SRE, Dev, or
Specialist
Ops Capability
SRE, Dev, or
Specialist
Operations as a Service: Works with any org model
130. Use tickets only for what they are good for
1.Documenting true problems/issues/exceptions
Ticket
System
131. Use tickets only for what they are good for
1.Documenting true problems/issues/exceptions
2.Routing for necessary approvals
Ticket
System
132. Use tickets only for what they are good for
1.Documenting true problems/issues/exceptions
2.Routing for necessary approvals
Not as a general purpose work management system!
Ticket
System
133. Security or compliance “in the way”?
Operations as a Service
On
Demand
On
Demand
On
Demand
On
Demand
Ops
(embedded)Cross-Functional Product Team 1
Cross-Functional Product Team n Ops
(embedded)
Ops
(builds & operates)
Cross-Functional Product Team 2 Ops
(embedded)
Ops Capability
SRE, Dev, or
Specialist
Ops Capability
SRE, Dev, or
Specialist
Ops Capability
SRE, Dev, or
Specialist
Build-in
Security
Here
Build-in
Compliance
Here
134. “Shift Left” the ability to take action
escalate
1° 2° 3° 4°
escalate escalateor
135. “Shift Left” the ability to take action
Push the ability to take action this direction
escalate
1° 2° 3° 4°
escalate escalateor
136. “Shift Left” the ability to take action
Push the ability to take action this direction
escalate
1° 2° 3° 4°
escalate escalateor
OaaS Enablement and tooling
139. Reduce Toil
1. Track toil levels for each team
2. Set toil limits for each team
140. Reduce Toil
1. Track toil levels for each team
2. Set toil limits for each team
3. Fund efforts to reduce toil (with emphasis on teams over toil limits)
141. Reduce Toil
1. Track toil levels for each team
2. Set toil limits for each team
3. Fund efforts to reduce toil (with emphasis on teams over toil limits)
Bonus: Use Service Level Objectives, Error Budgets, and other lessons from SRE
142. Example Operations as a Service Platform (shameless plug)
#! Ȑ Ƙ
Scripts APIs Tools Cloud VMs Containers
Workflow and
Scheduling
Collect and
Process Output
Infrastructure
details and state
Config.
Man.
CMDB
Monitor.
Metrics
Cloud
Corp
Directory
Authentication
and roles
ITSM Tickets, work
status, approvals
>_
Create workflows ● Define policies ● Execute workflows
Web GUI API CLI
143. Recap
Don’t forget about Ops.
Challenge conventional wisdom.
Leverage the Operations as a
Service design pattern
“Shift-Left” control and decision
making.
Old Silo A Old Silo B Old Silo C Old Silo D
Cross-Functional Team 1
Cross-Functional Team 2
Cross-Functional Team n
Focus on removing silos and
queues
Operations as a Service
On
Demand
On
Demand
On
Demand
On
Demand
Ops
(embedded)Cross-Functional Product Team 1
Cross-Functional Product Team n Ops
(embedded)
Ops
(builds & operates)
Cross-Functional Product Team 2 Ops
(embedded)
Ops Capability
SRE, Dev, or
Specialist
Ops Capability
SRE, Dev, or
Specialist
Ops Capability
SRE, Dev, or
Specialist
Learn from SRE: Reduce toil to
create capacity to change
Toil Engineering Work
E.W.Toil
Reduce toil
Improve the business ǡ
No capacity to reduce toil
No capacity to improve business
Toil at manageable percentage of capacity
Toil at unmanageable percentage of capacity (“Engineering Bankruptcy”)
Understand the forces
undermining operations work