Keynote presentation at DevOps Days Austin 2019 by Damon Edwards, co-founder of Rundeck.
Wouldn't everyone doing operations work love more time to focus on exciting projects? Build out new platforms, improve performance, contribute to open source projects, pay down tech debt, level-up their automation — all things that add value to your company and advance your career.
But instead, we find ourselves buried in interruptions and repetitive work. Imagine the things you could do, if you just had the time to get to it.
This talk is about applying ideas from the SRE movement that can be applied to any organization. Ideas that can help us all make tomorrow better than today.
See a Demo of Rundeck Enterprise :
https://www.rundeck.com/see-demo
--or--
Download Rundeck Open Source here:
https://rundeck.com/open-source
Connect:
Stack Overflow community: https://stackoverflow.com/questions/tagged/rundeck
Github: https://github.com/rundeck/rundeck/issues
Twitter: https://twitter.com/Rundeck
Facebook: https://www.facebook.com/RundeckInc/
LinkedIn: www.linkedin.com › company › rundeck-inc
How to bootstrap an SRE team into your company. How to hire them, what to have them work on and how to interact with them as a team. Finally some thought on general practices to consider before your SREs arrive. There are also kitten pictures.
<p>From <a href="https://en.wikipedia.org/wiki/Site_reliability_engineering" target="_blank">Wikipedia</a>: Site reliability engineering (SRE) is a discipline that incorporates aspects of software engineering and applies that to operations whose goals are to create ultra-scalable and highly reliable software systems.<p>
<p>Over the past year Acquia has built their own SRE team to help their products and services scale with the demand of our growing number of customers. We wish to share our experience so that others are enabled to do the same and reap the rewards.</p>
<p>This presentation will discuss how the SRE team came about at Acquia, what achievements we have made so far, and the lessons we have learned along the way. We will then show the steps on how to introduce SRE to your workplace so you can deliver more reliable and scalable services to your customers! We will specifically cover:</p>
<ul>
<li>SRE's basic concepts and history from Google</li>
<li>The management support you will need to get started</li>
<li>Introducing the idea of service level objectives and error budgets</li>
<li>Operational Responsibility Assessments as a tool to measure risk</li>
<li>Creating a Launch Readiness Checklist to standardize and improve product launches</li>
<li>Finding ideal candidates for your SRE team</li></ul>
<p>The intended audience are software engineers, system administrators, and managers that have a desire to improve how they do their work and how their products/services perform.</p>
Getting started with Site Reliability Engineering (SRE)Abeer R
"Getting started with Site Reliability Engineering (SRE): A guide to improving systems reliability at production"
This is an intro guide to share some of the common concepts of SRE to a non-technical audience. We will look at both technical and organizational changes that should be adopted to increase operational efficiency, ultimately benefiting for global optimizations - such as minimize downtime, improve systems architecture & infrastructure:
- improving incident response
- Defining error budgets
- Better monitoring of systems
- Getting the best out of systems alerting
- Eliminating manual, repetitive actions (toils) by automation
- Designing better on-call shifts/rotations
How to design the role of the Site Reliability Engineer (who effectively works between application development teams and operations support teams)
An overview of Google's Site Reliability Engineering with a view toward possible incorporation in the IEEE P2675 DevOps security standard. (Creative Commons with credit.)
SRE-iously! Defining the Principles, Habits, and Practices of Site Reliabilit...Tori Wieldt
How do you make DevOps magic when you aren’t Google? This talk will help whether you’re still figuring out how to create a site reliability practice at your company or you’re trying to improve the processes and habits of an existing SRE team.
How to bootstrap an SRE team into your company. How to hire them, what to have them work on and how to interact with them as a team. Finally some thought on general practices to consider before your SREs arrive. There are also kitten pictures.
<p>From <a href="https://en.wikipedia.org/wiki/Site_reliability_engineering" target="_blank">Wikipedia</a>: Site reliability engineering (SRE) is a discipline that incorporates aspects of software engineering and applies that to operations whose goals are to create ultra-scalable and highly reliable software systems.<p>
<p>Over the past year Acquia has built their own SRE team to help their products and services scale with the demand of our growing number of customers. We wish to share our experience so that others are enabled to do the same and reap the rewards.</p>
<p>This presentation will discuss how the SRE team came about at Acquia, what achievements we have made so far, and the lessons we have learned along the way. We will then show the steps on how to introduce SRE to your workplace so you can deliver more reliable and scalable services to your customers! We will specifically cover:</p>
<ul>
<li>SRE's basic concepts and history from Google</li>
<li>The management support you will need to get started</li>
<li>Introducing the idea of service level objectives and error budgets</li>
<li>Operational Responsibility Assessments as a tool to measure risk</li>
<li>Creating a Launch Readiness Checklist to standardize and improve product launches</li>
<li>Finding ideal candidates for your SRE team</li></ul>
<p>The intended audience are software engineers, system administrators, and managers that have a desire to improve how they do their work and how their products/services perform.</p>
Getting started with Site Reliability Engineering (SRE)Abeer R
"Getting started with Site Reliability Engineering (SRE): A guide to improving systems reliability at production"
This is an intro guide to share some of the common concepts of SRE to a non-technical audience. We will look at both technical and organizational changes that should be adopted to increase operational efficiency, ultimately benefiting for global optimizations - such as minimize downtime, improve systems architecture & infrastructure:
- improving incident response
- Defining error budgets
- Better monitoring of systems
- Getting the best out of systems alerting
- Eliminating manual, repetitive actions (toils) by automation
- Designing better on-call shifts/rotations
How to design the role of the Site Reliability Engineer (who effectively works between application development teams and operations support teams)
An overview of Google's Site Reliability Engineering with a view toward possible incorporation in the IEEE P2675 DevOps security standard. (Creative Commons with credit.)
SRE-iously! Defining the Principles, Habits, and Practices of Site Reliabilit...Tori Wieldt
How do you make DevOps magic when you aren’t Google? This talk will help whether you’re still figuring out how to create a site reliability practice at your company or you’re trying to improve the processes and habits of an existing SRE team.
Site Reliability Engineering (SRE) - Tech Talk by Keet SugathadasaKeet Sugathadasa
When it comes to Site Reliability Engineering, short for SRE, the resources available online are only limited to the books published by Google themselves. They do share some useful case studies that will help us understand what SRE is, and how to understand the concepts given in it, but they do not clearly explain how to build your own SRE team for your organization. The concept of SRE was cooked fresh within the walls of Google and later released to the general public as a practice for anyone to follow.
In this presentation I would like to give a brief introduction to SRE and why it is important to any Software Engineering organization. This is based on my experiences and learnings from leading a Site Reliability Engineering team for leading organizations in the US and Norway.
This presentation was conducted by me as a Tech Talk as an Associate Technical Lead at Creative Software Sri Lanka.
How Small Team Get Ready for SRE (public version)Setyo Legowo
How Urbanindo small team engineering team implement Site Reliability Engineering (SRE) in their daily work life and why we choose SRE instead of ordinary DevOps.
SRE (service reliability engineer) on big DevOps platform running on the clou...DevClub_lv
SRE (service reliability engineer). The talk is to explain the SRE philosophy and the principles of production engineering and operations in clouds.
(Language – English)
Pavlo is ADOP (Accenture DevOps Platform) Service Reliability Team Lead, SRE practitioner. Has more then 18 years of IT experience in Ops and Dev.
Overview of Site Reliability Engineering (SRE) & best practicesAshutosh Agarwal
In any software organization, stability & innovation are always at loggerheads - the faster you move, the more things will break. This talk defines what SRE org looks like at high-tech organizations (Google, Uber).
Adopting Kubernetes for production has huge impacts on operations at all levels. We present our pattern for formalizing cluster operations as a separate role from infrastructure and application operations, and explore the impact on the role of the SRE.
Managing a team and project are quite synonymous. Especially, teams require effective distribution of responsibility / roles. Once that is setup, a proper process guides people to make progress. All this fits into a product lifecycle, which is essential to develop the right product, in the right way, and deliver it at the right time.
According to Google, SRE is what you get when you treat operations as if it’s a software problem. In this video, I briefly explain what is and isn't toil, how to identify, measure and eliminate them.
Youtube channel here: https://youtu.be/EgpCw15fIK8
Site Reliability Engineer (SRE), We Keep The Lights On 24/7NUS-ISS
There are many phases in the software development cycle, from requirements to development and testing, but at the tail of the process, is an often overlooked aspect: deployment and delivery. With the paradigm shift of delivering on-site software to offering software-as-a-service, Site Reliability Engineering is beginning to take a greater role in product delivery.
This session aims to give a glimpse of the work that goes into site reliability engineering (SRE) and effort that goes into keeping a service going 24/7.
Independently from the DevOps movement but starting from the same problems, Google developed its own strategy defining a new specific role called SRE (Site Reliability Engineer). This introduction tries to explain the history and the concept of this methodology and to compare it with the DevOps manifesto to understand what does it mean to adopt DevOps and what does it mean to be an SRE and what the two things are sharing and where they diverge.
SysAdmin to SRE: Creating Capacity to Make Tomorrow Better Than Today Rundeck
Damon Edwards, co-founder of Rundeck, talks at SCALE 17x on March 9, 2019 in Pasadena, CA.
Wouldn't everyone in operations love more time to work on exciting projects? Build out new platforms, improve performance, contribute to open source projects focus on security, level-up their automation — all things that add value to your companies and advance your career. But instead, the life of a traditional systems administrator is often buried in interruptions and repetitive work. Imagine the things you could do, if you just had the time to get to it.
Then along comes a new way of working and a new role called Site Reliability Engineering (SRE). But SRE almost seems too good to be true! People are doing what systems administrators used to do, but getting to spend more than 50% of their time doing engineering work that adds enduring value to their company? How can less than half of these SREs' time be wasted on the interruptions, repetitive work, and drudgery that seem to consume most of the traditional systems administrator's time? And do this with the same or less headcount?
This talk will first take a close look at what SRE is and what SRE isn't. We will break down the principles behind the SRE movement and highlight where SRE departs from the current conventional wisdom of Operations and Systems Administration work. You'll learn about key concepts like Toil, SLOs, Error Budgets, and Shared Responsibility Models.
Next, we'll look at how to move to an SRE style of working. We'll look at how traditional operations beliefs and practices can leave organizational scar tissue that is difficult to overcome. We'll examine examples of how silos, excessive toil, reliance on queues, and incorrectly applied governance models undermine the adoption of SRE principles and practices in the enterprise. We'll also look at the individual skills and mindset changes that you'll need to adopt an SRE way of working.
You'll leave this talk with an appreciation for how SRE can create the capacity you need to make tomorrow better than today.
See a Demo of Rundeck Enterprise :
https://www.rundeck.com/see-demo
--or--
Download Rundeck Open Source here:
https://rundeck.com/open-source
Connect:
Stack Overflow community: https://stackoverflow.com/questions/tagged/rundeck
Github: https://github.com/rundeck/rundeck/issues
Twitter: https://twitter.com/Rundeck
Facebook: https://www.facebook.com/RundeckInc/
LinkedIn: www.linkedin.com › company › rundeck-inc
Damon Edwards, co-founder of Rundeck, presentation at NewOps Days in Raleigh, NC on December 4, 2018.
See a Demo of Rundeck Enterprise :
https://www.rundeck.com/see-demo
--or--
Download Rundeck Open Source here:
https://rundeck.com/open-source
Connect:
Stack Overflow community: https://stackoverflow.com/questions/tagged/rundeck
Github: https://github.com/rundeck/rundeck/issues
Twitter: https://twitter.com/Rundeck
Facebook: https://www.facebook.com/RundeckInc/
LinkedIn: www.linkedin.com › company › rundeck-inc
Site Reliability Engineering (SRE) - Tech Talk by Keet SugathadasaKeet Sugathadasa
When it comes to Site Reliability Engineering, short for SRE, the resources available online are only limited to the books published by Google themselves. They do share some useful case studies that will help us understand what SRE is, and how to understand the concepts given in it, but they do not clearly explain how to build your own SRE team for your organization. The concept of SRE was cooked fresh within the walls of Google and later released to the general public as a practice for anyone to follow.
In this presentation I would like to give a brief introduction to SRE and why it is important to any Software Engineering organization. This is based on my experiences and learnings from leading a Site Reliability Engineering team for leading organizations in the US and Norway.
This presentation was conducted by me as a Tech Talk as an Associate Technical Lead at Creative Software Sri Lanka.
How Small Team Get Ready for SRE (public version)Setyo Legowo
How Urbanindo small team engineering team implement Site Reliability Engineering (SRE) in their daily work life and why we choose SRE instead of ordinary DevOps.
SRE (service reliability engineer) on big DevOps platform running on the clou...DevClub_lv
SRE (service reliability engineer). The talk is to explain the SRE philosophy and the principles of production engineering and operations in clouds.
(Language – English)
Pavlo is ADOP (Accenture DevOps Platform) Service Reliability Team Lead, SRE practitioner. Has more then 18 years of IT experience in Ops and Dev.
Overview of Site Reliability Engineering (SRE) & best practicesAshutosh Agarwal
In any software organization, stability & innovation are always at loggerheads - the faster you move, the more things will break. This talk defines what SRE org looks like at high-tech organizations (Google, Uber).
Adopting Kubernetes for production has huge impacts on operations at all levels. We present our pattern for formalizing cluster operations as a separate role from infrastructure and application operations, and explore the impact on the role of the SRE.
Managing a team and project are quite synonymous. Especially, teams require effective distribution of responsibility / roles. Once that is setup, a proper process guides people to make progress. All this fits into a product lifecycle, which is essential to develop the right product, in the right way, and deliver it at the right time.
According to Google, SRE is what you get when you treat operations as if it’s a software problem. In this video, I briefly explain what is and isn't toil, how to identify, measure and eliminate them.
Youtube channel here: https://youtu.be/EgpCw15fIK8
Site Reliability Engineer (SRE), We Keep The Lights On 24/7NUS-ISS
There are many phases in the software development cycle, from requirements to development and testing, but at the tail of the process, is an often overlooked aspect: deployment and delivery. With the paradigm shift of delivering on-site software to offering software-as-a-service, Site Reliability Engineering is beginning to take a greater role in product delivery.
This session aims to give a glimpse of the work that goes into site reliability engineering (SRE) and effort that goes into keeping a service going 24/7.
Independently from the DevOps movement but starting from the same problems, Google developed its own strategy defining a new specific role called SRE (Site Reliability Engineer). This introduction tries to explain the history and the concept of this methodology and to compare it with the DevOps manifesto to understand what does it mean to adopt DevOps and what does it mean to be an SRE and what the two things are sharing and where they diverge.
SysAdmin to SRE: Creating Capacity to Make Tomorrow Better Than Today Rundeck
Damon Edwards, co-founder of Rundeck, talks at SCALE 17x on March 9, 2019 in Pasadena, CA.
Wouldn't everyone in operations love more time to work on exciting projects? Build out new platforms, improve performance, contribute to open source projects focus on security, level-up their automation — all things that add value to your companies and advance your career. But instead, the life of a traditional systems administrator is often buried in interruptions and repetitive work. Imagine the things you could do, if you just had the time to get to it.
Then along comes a new way of working and a new role called Site Reliability Engineering (SRE). But SRE almost seems too good to be true! People are doing what systems administrators used to do, but getting to spend more than 50% of their time doing engineering work that adds enduring value to their company? How can less than half of these SREs' time be wasted on the interruptions, repetitive work, and drudgery that seem to consume most of the traditional systems administrator's time? And do this with the same or less headcount?
This talk will first take a close look at what SRE is and what SRE isn't. We will break down the principles behind the SRE movement and highlight where SRE departs from the current conventional wisdom of Operations and Systems Administration work. You'll learn about key concepts like Toil, SLOs, Error Budgets, and Shared Responsibility Models.
Next, we'll look at how to move to an SRE style of working. We'll look at how traditional operations beliefs and practices can leave organizational scar tissue that is difficult to overcome. We'll examine examples of how silos, excessive toil, reliance on queues, and incorrectly applied governance models undermine the adoption of SRE principles and practices in the enterprise. We'll also look at the individual skills and mindset changes that you'll need to adopt an SRE way of working.
You'll leave this talk with an appreciation for how SRE can create the capacity you need to make tomorrow better than today.
See a Demo of Rundeck Enterprise :
https://www.rundeck.com/see-demo
--or--
Download Rundeck Open Source here:
https://rundeck.com/open-source
Connect:
Stack Overflow community: https://stackoverflow.com/questions/tagged/rundeck
Github: https://github.com/rundeck/rundeck/issues
Twitter: https://twitter.com/Rundeck
Facebook: https://www.facebook.com/RundeckInc/
LinkedIn: www.linkedin.com › company › rundeck-inc
Damon Edwards, co-founder of Rundeck, presentation at NewOps Days in Raleigh, NC on December 4, 2018.
See a Demo of Rundeck Enterprise :
https://www.rundeck.com/see-demo
--or--
Download Rundeck Open Source here:
https://rundeck.com/open-source
Connect:
Stack Overflow community: https://stackoverflow.com/questions/tagged/rundeck
Github: https://github.com/rundeck/rundeck/issues
Twitter: https://twitter.com/Rundeck
Facebook: https://www.facebook.com/RundeckInc/
LinkedIn: www.linkedin.com › company › rundeck-inc
Clearing the Way For SRE In the Enterprise Rundeck
As presented by Damon Edwards, co-founder of Rundeck, at SREcon in Dusseldorf, Germany on 30 Aug 2018.
Video available here:
https://www.usenix.org/conference/srecon18europe/presentation/edwards
See a Demo of Rundeck Enterprise :
https://www.rundeck.com/see-demo
--or--
Download Rundeck Open Source here:
https://rundeck.com/open-source
Connect:
Stack Overflow community: https://stackoverflow.com/questions/tagged/rundeck
Github: https://github.com/rundeck/rundeck/issues
Twitter: https://twitter.com/Rundeck
Facebook: https://www.facebook.com/RundeckInc/
LinkedIn: www.linkedin.com › company › rundeck-inc
NewOps Days Boston 2019 - SysAdmin to SRE: Creating Capacity to Make Tomorrow...Jorn Knuttila
Want to knock down toilin your org? Maybe just renaming people isn't quite the right approach.
These are the slides from the 2019 June NewOps Days in Boston
Team Capability Assessment PowerPoint Presentation Slides SlideTeam
Hiring the right team is extremely important for any business. So, utilize our team capability assessment PPT slideshow and attract top talent in your organization. Our team capability assessment presentation template helps you create a professional presentation where you can highlight the required capabilities and qualities your team members should have. It is important to recognize that most people will need some help and training to be able to complete the tasks and roles assigned to them. The members of your team should have an equal level of commitment towards the goals and objectives of your organization. They should be able to face any challenges which come in their way while achieving their set goals and targets. Our team capability assessment PPT deck helps you analyze your team’s performance as it includes ready-made templates by which you can measure the performance of your team. Showcase your aspects with this fully editable team capability assessment PowerPoint template. Our Team Capability Assessment PowerPoint Presentation Slides facilitate collation of information. It eases the burden of journalists.
Incident Management in the Age of DevOps and SRE Rundeck
Damon Edwards, co-founder of Rundeck, presents at Salt Lake City DevOps Meetup, November 13, 2019.
There is no doubt that DevOps has changed how we deliver software. But what about after deployment? Whether you are in a traditional operations organization or a “you build it, you run it” team, how do you mobilize, resolve, and learn from incidents? This talk will look at how high performing organizations have applied DevOps and SRE practices to shorten incidents and reduce escalations. Less frustration for the engineers. Lower costs for the business. Everybody wins.
See a Demo of Rundeck Enterprise :
https://www.rundeck.com/see-demo
--or--
Download Rundeck Open Source here:
https://rundeck.com/open-source
Connect:
Stack Overflow community: https://stackoverflow.com/questions/tagged/rundeck
Github: https://github.com/rundeck/rundeck/issues
Twitter: https://twitter.com/Rundeck
Facebook: https://www.facebook.com/RundeckInc/
LinkedIn: www.linkedin.com › company › rundeck-inc
Scrum, XP, and Kanban have been proven to provide step changes in productivity and quality for software teams. However, these methods do not have the native constructs necessary to scale to challenges of building enterprise class software systems. What the industry desperately needs is a solution that moves from a set of simplistic, disparate, development-centric methods, to a scalable, unified approach that addresses the complex constructs and additional stakeholders in the organization- and enables realization of enterprise-class product or service initiatives via aligned and cooperative solution development.
In this talk, Dean Leffingwell describes how to accomplish this with the Scaled Agile Framework, a publicly - accessible knowledge base of proven Lean and Agile practices for enterprise-class software development. He approaches the problem from the perspectives of Lean thinking and principles of product development flow, illustrating how these core principles help deliver business results at scale, while keeping the development system - and the enterprise - lean and responsive to rapidly changing market needs. And since winning is more fun, he’ll also describe some of the personal benefits that come when teams master the art of delivering better enterprise-class software, at an ever faster pace.
Incident Management in the Age of DevOps and SRE Rundeck
Presented by Damon Edwards, co-founder of Rundeck, at QCon San Francisco 2019.
See a Demo of Rundeck Enterprise :
https://www.rundeck.com/see-demo
--or--
Download Rundeck Open Source here:
https://rundeck.com/open-source
Connect:
Stack Overflow community: https://stackoverflow.com/questions/tagged/rundeck
Github: https://github.com/rundeck/rundeck/issues
Twitter: https://twitter.com/Rundeck
Facebook: https://www.facebook.com/RundeckInc/
LinkedIn: www.linkedin.com › company › rundeck-inc
VMUG Melbourne - DevOps - Not Just for Open Source and UnicornsJosh Atwell
Presented at the 2017 Melbourne VMUG UserCon event:
https://www.vmug.com/Attend/VMUG-UserCon/Melbourne-VMUG-UserCon-2017
DevOps is sweeping the IT industry like no other movement since server virtualization. It's changing the way IT is operating, how it delivers value, and the economics around IT. A prevailing view is that DevOps is only happening in the most open and unique places. In this talk we will explore the variety of areas where DevOps is being applied successfully. I'll discus how the tools (VMware, PowerShell, Puppet) and frameworks that support our traditional VMware virtualized environment come to bear in this new framework. I'll dig deeper into the methodologies of DevOps and how they affect the way our virtual environments are managed and drive value. Once we reach the impressive summary slide, attendees should have a stronger view of how their skills apply in this new framework.
Rundeck Community Office Hours: Using Variables with Job Steps Rundeck
Rundeck offers powerful runbook automation. Most Runbooks are complicated multi-step processes. We will show various examples of how to share data from one step to another through the use of Log Filters.
Come join this session to learn how to:
Use different types of Log Filters to gather variables from your Job Steps
Gather variables and use the values in other Job Steps
Use the Result Data feature to format your output in a consistent format regardless of the log output.
Most of what Rundeck does is via one of it’s plugins. There are already over 100+ plugins to perform various services including executing commands on nodes, performing step in a workflow, or sending notification about job status. There may be instances where you need to write your own plugin to perform a specific step or action. In this session, will walk through the steps for writing our own plugin.
In this session you'll learn:
Review the structure of plugin
How to use the structure and what information you need to include in other files to make your plugin work
How to write a simple plugin example using java
How to reply and use your plugin
Lunch and learn: Getting started with Rundeck & AnsibleRundeck
Operations teams depend on a mixture of tools to keep their systems running. One popular pairing for Rundeck users is integrating Ansible playbooks into Rundeck to orchestrate and schedule workflows across multiple tools.
Join us for this Lunch and Learn event to learn how you can use Rundeck to create runbooks that span your existing Ansible playbooks -- as well as any other scripts, tools, APIs, or systems commands, to respond to incidents or perform Operations tasks.
Join us to learn:
Benefits of using Rundeck and Ansible together
How to configure your Rundeck to use the Ansible plugin
Tips for getting started with the integration
And see a demo of the integration
This event is recommended for beginners.
Self Service Cloud Operations: Safely Delegate the Management of your Cloud ...Rundeck
Running Operations is not an easy job, especially these days. Ops teams have to ensure excellent user experiences, resolve incidents quickly and help developers stay productive. Yet at the same time, there is also the need to maintain systems security and keep downtime to a minimum.
While advances in cloud computing have helped address some of these challenges, many organizations find it difficult to leverage the cloud at scale because of bottlenecks that form around repetitive tasks, such as developers having to wait for provisioning infrastructure. Despite having access to abundant cloud resources, these speedbumps often make it difficult to achieve team objectives.
Join this talk to learn:
How to safely delegate the management of your cloud deployment (to developers and other end users) with self-service operations.
How to create powerful runbooks with guardrails that leverage existing scripting languages, infrastructure, and tools to remove bottlenecks that form around repetitive tasks.
Strategies for getting started with self-service.
Rundeck Office Hours: Best Practices Access Control PoliciesRundeck
Join us this month for an AMA discussion followed by a live Q&A led by technical experts from Rundeck’s engineering, product, and solution engineering teams. Experts are available to provide advice on your technical architecture, give recommendations for operational best practices, review current Github issues, or dive into the open source code itself.
Don’t miss the opportunity to learn Rundeck product best practices and ask experts your questions about Rundeck.
https://www.rundeck.com/rundeck-office-hours
Secure IT infrastructure is well protected by access keys, passwords, and other credentials. Admins need these secrets to gain access, as does any automation executed by Rundeck. Rundeck has rich support for secrets management with native key storage, as well as integrations with best-of-breed standardized solutions. In this webinar, we’ll cover best practices for working with Rundeck’s runbook automation platform in securing IT infrastructure. We’ll explore the secrets management options in Rundeck and we’ll highlight a new plugin with Thycotic Secret Server for Privileged Access Management.
In this webinar, we will demonstrate:
How Rundeck works with underlying secrets of the systems it manages
New Rundeck plugins that allow users to protect privileged accounts with enterprise-grade, privileged access management solutions
How you can use Rundeck plugins with HashiCorp Vault, Thycotic, and CyberArk as keys for jobs and other Rundeck configurations
In this session we will give a live walkthrough covering new capabilities released in Rundeck 3.4. Learn about security & compliance improvements we’ve made including the ability to organize secrets management by project -- so now each Runbook can access a different set of passwords and keys for its access control list (ACL). We also have a new plug-in for Thycotic users to manage secrets. Rundeck 3.4 now allows for queueing of jobs when those jobs must be run serially. Finally, we’ll discuss our vision for the future of Rundeck, and our primary development themes for the next year.
Automate Yourself Out of a Job: Safely Delegate the Management of your Azure...Rundeck
Running Operations is not an easy job, especially these days. Ops teams have to ensure excellent user experiences, resolve incidents quickly and help developers stay productive. Yet at the same time, there is also the need to maintain systems security and keep downtime to a minimum - goals which many struggle with at scale.
While advances in cloud computing have helped address some of these challenges, many organizations find it difficult to leverage the cloud at scale because of bottlenecks that form around repetitive tasks, such as developers having to wait for provisioning infrastructure. Despite having access to abundant cloud resources, these speedbumps often make it difficult or impossible to achieve team objectives.
Join this talk to learn:
-How to safely delegate the management of your Azure deployment (to developers and other colleagues) with self-service operations.
-How to create powerful runbooks with guardrails that leverage existing scripting languages (including PowerShell), infrastructure, and tools to remove the human from the bottleneck that forms around repetitive tasks.
-Strategies for getting started
-And how to create an Easy Button to handle the repetitive tasks that are interrupting your flow of work.
As presented by Jesse Houldsworth at PowerShell + DevOps Global Summit 2021
Super-Charge Your Site Reliability Practices with Runbook Automation Rundeck
On Demand Viewing: https://www.rundeck.com/super-charge-reliability
To win in today’s digital age, organizations need to balance product reliability and feature delivery with dynamic business needs and legacy and multi-cloud environments. Automation, as a main SRE practice, scales product reliability practices by reducing tedious tasks related to production operations, freeing up engineers to work on innovation.
Whether you are in a traditional operations organization or a “you build it, you run it” team, this webinar will explore strategies for increasing automation to improve your Operations so you can continue to create excellent experiences for your customers.
-How you can reduce MTTR and eliminate toil with Self-Service Operations
-Common workflow challenges and opportunities
-How you can use Runbook Automation to enable Self-Service Operations
-Ways to leverage existing assets and workflows by integrating Rundeck with existing toolsets
-See a demo of real world cases
https://youtu.be/4jAf6cbxsgo
As operators, it’s our job to monitor infrastructure, systems and applications and only wake up humans for tasks machines can’t fix on their own. Automated remediation pairs monitoring and runbook automation, giving you a monitoring system that can trigger operational actions with runbook automation to shorten incident response times and avoid alert fatigue.
Rundeck Director of Product Management Forrest Evans and Sensu Developer Advocate Todd Campbell discuss the key role automated remediation plays in the monitoring journey, with live demos of both the Rundeck and Sensu integrations. You’ll learn all about monitoring as code workflows with the Sensu Observability Pipeline and how to deliver runbook automation with Rundeck — and see how the two together can help you achieve automated remediation.
Failure is inevitable. But are you incurring more downtime and disruption than necessary? Legacy incident response techniques have difficulty keeping up with the increasing pace of change and skyrocketing complexity of today’s application environments.
During this webinar, you’ll learn about modern incident response techniques that can dramatically shorten incidents and reduce escalations. Join the experts from Rundeck and PagerDuty as they share:
*How a real-time operations platform intelligently manages alerts and on-call mobilization, delivering the right people the right information at the right time
*How runbook automation gives front-line response teams self-service access to run automated workflows – or runbooks – that diagnose and resolve incidents without escalating to an expert.
*How to automatically detect, diagnose, and resolve incidents without human intervention.
https://youtu.be/9yYwTPMRSOY
Nathan Fluegel, head of Customer Success at Rundeck, talks clustering and high availability. We'll show how to deploy Rundeck servers in a clustered configuration with Rundeck Enterprise.
https://youtu.be/PmBIGP3M9sI
Understand how to migrate your Rundeck environment from the community edition to Enterprise, including the pros and cons of each migratory approach.
In this webinar, you will learn how to:
-Determine which migration approach is most appropriate for your environment
-Shift from a single-server to clustered environment
-Migrate jobs and projects while keeping a clean install
Business Continuity for Humans: Keeping Your Business Running When Your Peopl...Rundeck
Damon Edwards (Rundeck) presentation from TechStrongConf on June 4, 2020.
Learn more: https://www.rundeck.com/business-continuity-for-digital-operations
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
I have heard many times that architecture is not important for the front-end. Also, many times I have seen how developers implement features on the front-end just following the standard rules for a framework and think that this is enough to successfully launch the project, and then the project fails. How to prevent this and what approach to choose? I have launched dozens of complex projects and during the talk we will analyze which approaches have worked for me and which have not.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
3. Not that far away, maybe in a company just like yours…
4. Not that far away, maybe in a company just like yours…
Overloaded. Constant firefighting.
Ticket
Ticket
Project A
···
Project B
···
Ticket
Ticket
Ticket
Ticket
Ticket
Ticket
Ticket
Ticket
Ticket
Ticket
Ticket
DUE: Yesterday! DUE: Tomorrow!
Ticket
Ticket
Ticket
5. Waiting in ticket queues for everything.
Not that far away, maybe in a company just like yours…
6. Waiting in ticket queues for everything.
Ticket
Not that far away, maybe in a company just like yours…
7. Waiting in ticket queues for everything.
Ticket
Ticket
Ticket
Ticket
Ticket
Ticket
Not that far away, maybe in a company just like yours…
8. Things break. Break again. And again.
Later…
Later…
same
same
Help!
Ticket
Wait Interrupt
Help!
Ticket
Wait Interrupt
Help!
Ticket
Wait Interrupt
Not that far away, maybe in a company just like yours…
9. Everyone is busy, but it doesn’t get any better.
Improvement
Project
Business
Delivery
Incidents
Business
Delivery
Business
Delivery
Not that far away, maybe in a company just like yours…
10. Overloaded. Constant firefighting.
Waiting in ticket queues for everything.
Things break. Break again. And again.
Everyone is busy, but it doesn’t get any better.
Not that far away, maybe in a company just like yours…
11. Overloaded. Constant firefighting.
Waiting in ticket queues for everything.
Things break. Break again. And again.
Everyone is busy, but it doesn’t get any better.
Not that far away, maybe in a company just like yours…
Everything takes too long, costs
too much, and breaks too often!
Executives
Have you heard of SRE?
Google does it.
12.
13. “SRE…
When you ask
software engineers
to do operations”
“SRE…
Next-generation,
cloud-native
Operations”
Class SRE implements DevOps
“SRE…
When Ops does
more engineering
than Ops”
14. “SRE…
When you ask
software engineers
to do operations”
“SRE…
Next-generation,
cloud-native
Operations”
Class SRE implements DevOps
“SRE…
When Ops does
more engineering
than Ops”
SRE
21. SysAdmins
Overloaded. Constant
firefighting.
Waiting in ticket queues
for everything.
Things break. Break
again. And again.
Everyone is busy, but it
doesn’t get any better.
ansformation has largely
nored Ops. Any ideas?
Have you heard of SRE?
Google does it.
Everything takes too
long, cost too much, and
break too often!
Executive
View
22. SysAdmins
Overloaded. Constant
firefighting.
Waiting in ticket queues
for everything.
Things break. Break
again. And again.
Everyone is busy, but it
doesn’t get any better.
ansformation has largely
nored Ops. Any ideas?
Have you heard of SRE?
Google does it.
Everything takes too
long, cost too much, and
break too often!
Executive
View
SRE (new name)
Overloaded. Constant
firefighting.
Waiting in ticket queues
for everything.
Things break. Break
again. And again.
Everyone is busy, but it
doesn’t get any better.
Our transformation has largely
ignored Ops. Any ideas?
Have you h
Google
Everything takes too
long, cost too much, and
break too often!
Executive
View
23. Changing job titles or adding individual skills
doesn’t make systems administrators SREs.
24. Changing job titles or adding individual skills
doesn’t make systems administrators SREs.
25. Changing job titles or adding individual skills
doesn’t make systems administrators SREs.
Observability
Programming
Skills
Distributed
Systems Arch.
Blameless
Post-Mortems
26. Changing job titles or adding individual skills
doesn’t make systems administrators SREs.
Observability
Programming
Skills
Distributed
Systems Arch.
Blameless
Post-Mortems
000000000000000
27. Changing job titles or adding individual skills
doesn’t make systems administrators SREs.
Not SRE
Observability
Programming
Skills
Distributed
Systems Arch.
Blameless
Post-Mortems
000000000000000
28. Changing job titles or adding individual skills
doesn’t make systems administrators SREs.
29. Changing job titles or adding individual skills
doesn’t make systems administrators SREs.
SRE is a rethinking of how Operations work gets
done.
31. Principles are what makes SRE different
Stephen Thorne, Google
At DevOps Enterprise Summit
London 2018
“Principles of SRE”
https://youtu.be/c-w_GYvi0eA
32. Principles are what makes SRE different
1. SRE needs Service Level Objectives, with consequences
Stephen Thorne, Google
At DevOps Enterprise Summit
London 2018
“Principles of SRE”
https://youtu.be/c-w_GYvi0eA
33. SLO and Error Budgets: Tools for Shared Responsibility
0
100
Service Level Objective
Error Budget*
Service Level Indicator
(*Use this to improve the service)
34. SLO and Error Budgets: Tools for Shared Responsibility
0
100
Service Level Objective
Error Budget*
Service Level Indicator
(*Use this to improve the service)
35. SLO and Error Budgets: Tools for Shared Responsibility
0
100
Service Level Objective
Error Budget*
Service Level Indicator
(*Use this to improve the service)
DEV
BIZ
Ops
36. SLO and Error Budgets: Tools for Shared Responsibility
0
100
Service Level Objective
Error Budget*
Service Level Indicator
(*Use this to improve the service)
DEV
BIZ
Ops
SLO takes priority!!
37. Principles of SRE are what set SRE apart
1. SRE needs Service Level Objectives, with consequences
38. Principles of SRE are what set SRE apart
1. SRE needs Service Level Objectives, with consequences
2. SREs have time to make tomorrow better than today
40. Toil: Name For a Problem We’ve All Felt
“Toil is the kind of work tied to running a production
service that tends to be manual, repetitive,
automatable, tactical, devoid of enduring value, and
that scales linearly as a service grows.”
-Vivek Rau
Google
41. Toil vs. Engineering Work
Toil Engineering Work
Lacks Enduring Value Builds Enduring Value
Rote, Repetitive Creative, Iterative
Tactical Strategic
Increases With Scale Enables Scaling
Can Be Automated Requires Human Creativity
42. Excessive Toil Prevents Fixing the System
Toil Engineering Work
E.W.Toil
Reduce toil
Improve the business ǡ
No capacity to reduce toil
No capacity to improve business
Toil at manageable percentage of capacity
Toil at unmanageable percentage of capacity (“Engineering Bankruptcy”)
43. Excessive Toil Prevents Fixing the System
Toil Engineering Work
E.W.Toil
Reduce toil
Improve the business ǡ
No capacity to reduce toil
No capacity to improve business
Toil at manageable percentage of capacity
Toil at unmanageable percentage of capacity (“Engineering Bankruptcy”)
44. Excessive Toil Prevents Fixing the System
Toil Engineering Work
E.W.Toil
Reduce toil
Improve the business ǡ
No capacity to reduce toil
No capacity to improve business
Toil at manageable percentage of capacity
Toil at unmanageable percentage of capacity (“Engineering Bankruptcy”)
Downward spiral is inevitable!
45. Principles of SRE are what set SRE apart
1. SRE needs Service Level Objectives, with consequences
2. SREs have time to make tomorrow better than today
46. Principles of SRE are what set SRE apart
1. SRE needs Service Level Objectives, with consequences
2. SREs have time to make tomorrow better than today
3. SRE teams have the ability to regulate their workload
48. SRE teams have the ability to regulate their workload
Example:
49. SRE teams have the ability to regulate their workload
Example:
What if handing-off responsibility to SRE/Ops wasn’t a right?
50. SRE teams have the ability to regulate their workload
Example:
What if handing-off responsibility to SRE/Ops wasn’t a right?
(separate the “running in production” from “run by SRE/Ops”)
51. SRE teams have the ability to regulate their workload
Example:
What if handing-off responsibility to SRE/Ops wasn’t a right?
(separate the “running in production” from “run by SRE/Ops”)
“?!?”
52. Principles of SRE are what set SRE apart
1. SRE needs Service Level Objectives, with consequences
2. SREs have time to make tomorrow better than today
3. SRE teams have the ability to regulate their workload
54. Where to start (the practical approach)
1. SRE needs Service Level Objectives, with consequences
2. SREs have time to make tomorrow better than today
3. SRE teams have the ability to regulate their workload
55. Where to start (the practical approach)
1. SRE needs Service Level Objectives, with consequences
2. SREs have time to make tomorrow better than today
3. SRE teams have the ability to regulate their workload
Company-wide culture change (hard!)
56. Where to start (the practical approach)
1. SRE needs Service Level Objectives, with consequences
2. SREs have time to make tomorrow better than today
3. SRE teams have the ability to regulate their workload
Company-wide culture change (hard!)
Company-wide culture change (hard!)
57. Where to start (the practical approach)
1. SRE needs Service Level Objectives, with consequences
2. SREs have time to make tomorrow better than today
3. SRE teams have the ability to regulate their workload
Company-wide culture change (hard!)
Company-wide culture change (hard!)
Reduce toil.
Everybody wins!
58. Where to start (the practical approach)
1. SRE needs Service Level Objectives, with consequences
2. SREs have time to make tomorrow better than today
3. SRE teams have the ability to regulate their workload
Company-wide culture change (hard!)
Company-wide culture change (hard!)
Reduce toil.
Everybody wins!
66. Track toil levels for each team
• Standardize (e.g. meetings and email are “overhead" not “toil”)
67. Track toil levels for each team
• Standardize (e.g. meetings and email are “overhead" not “toil”)
• Track
• Self-reporting
• Periodic surveys
• SM or PM interview/sampling
68. Track toil levels for each team
• Standardize (e.g. meetings and email are “overhead" not “toil”)
• Track
• Self-reporting
• Periodic surveys
• SM or PM interview/sampling
• Don’t get lost in time tracking weeds!
70. Start reducing toil today
1. Track toil levels for each team
Toil
2. Set toil limit for each team (50% is conventional wisdom)
71. Start reducing toil today
1. Track toil levels for each team
2. Set toil limit for each team (50% is conventional wisdom)
3. Fund efforts to reduce toil (with emphasis on teams already over limit)
Toil
72. Start reducing toil today
1. Track toil levels for each team
2. Set toil limit for each team (50% is conventional wisdom)
3. Fund efforts to reduce toil (with emphasis on teams already over limit)
Toil
Michael Kehoe
Todd Palino
(LinkedIn)
At SREcon Americas 2019
Example
Process
“Code Yellow”
82. How to enable self-service?
Empower teams to spot and fix the anti-patterns.
83. “Do this for me, do it again, then do it again.”
Done.I need you
to do X
Your
other
work
I need you
to do X
I need you
to do X
Ticket
Do X
Later…
Do X
Do X
Done.
Done.
Your
other
work
Self-Service
Self-Service
Self-Service
Your
other
work x2
Your
other
work x3
Later…Later…
Later…
Your
other
work
Your
other
work
After
Before
Wait Interrupt
Ticket
Wait Interrupt
Ticket
Wait Interrupt
84. “Do this for me, do it again, then do it again.”
Done.I need you
to do X
Your
other
work
I need you
to do X
I need you
to do X
Ticket
Do X
Later…
Do X
Do X
Done.
Done.
Your
other
work
Self-Service
Self-Service
Self-Service
Your
other
work x2
Your
other
work x3
Later…Later…
Later…
Your
other
work
Your
other
work
After
Before
Wait Interrupt
Ticket
Wait Interrupt
Ticket
Wait Interrupt
85. “I could fix it, but I can’t get to it.”
Environment
I could fix it if I
could get to it
Before
Wait
Interrupt
86. “I could fix it, but I can’t get to it.”
Environment
I could fix it if I
could get to it
Before
Wait
Interrupt
After
I’ve got this!
Environment
Self-
Service
87. “The dog-pile.”
!!
I think its a problem with
db07-store2.uswest.acme
“$ top”
“$ top”
db07store2.
uswest.acme
“$ top”
“$ top”
“$ top”
!!
“$ top”
!!
!!
!!
healthcheck
store2 -all
db07store2.
uswest.acme
Self-Service
1.
2.
3.
I think its a problem with
db07-store2.uswest.acme
88. “I’m an expert, I don’t read the wiki.”
docs
Service has changed. Use this flag or
bad things will happen!
Pause monitoring first or
we all get woken up!
“restart -doit -now”
I’ve done this before.
I’ve got this…
Environment
docs
Later…
Before
89. “I’m an expert, I don’t read the wiki.”
docs
Service has changed. Use this flag or
bad things will happen!
Pause monitoring first or
we all get woken up!
“restart -doit -now”
I’ve done this before.
I’ve got this…
Environment
docs
Later…
Before
90. “I’m an expert, I don’t read the wiki.”
docs
Service has changed. Use this flag or
bad things will happen!
Pause monitoring first or
we all get woken up!
“restart -doit -now”
I’ve done this before.
I’ve got this…
Environment
docs
Later…
Before
Service has changed. Use this flag or
bad things will happen!
Pause monitoring first or
we all get woken up!
“restart”
Environment
Later…
Update
Restart Job
✅
I’ve done this before.
I’ve got this.
Self-Service
Self-Service
After
93. Self-Service Operations Design Pattern (in a nutshell)
Consumer of
Ops Capabilities
Self-Service
On
Demand
Ops Capability
Specialist
Knowledge
Ops Capability
Specialist
Knowledge
94. Self-Service Operations Design Pattern (in a nutshell)
Pull-Based
Consumer of
Ops Capabilities
Self-Service
On
Demand
Ops Capability
Specialist
Knowledge
Ops Capability
Specialist
Knowledge
95. Self-Service Operations Design Pattern (in a nutshell)
Pull-Based
Accept tools/languages
that teams want to use
Consumer of
Ops Capabilities
Self-Service
On
Demand
Ops Capability
Specialist
Knowledge
Ops Capability
Specialist
Knowledge
96. Self-Service Operations Design Pattern (in a nutshell)
Pull-Based
Accept tools/languages
that teams want to use
Define “guardrails” to
provide work safety
Consumer of
Ops Capabilities
Self-Service
On
Demand
Ops Capability
Specialist
Knowledge
Ops Capability
Specialist
Knowledge
97. Self-Service Operations Design Pattern (in a nutshell)
Pull-Based
Accept tools/languages
that teams want to use
Let people who
“push buttons”
define the buttons
Define “guardrails” to
provide work safety
Consumer of
Ops Capabilities
Self-Service
On
Demand
Ops Capability
Specialist
Knowledge
Ops Capability
Specialist
Knowledge
98. Self-Service Operations Design Pattern (in a nutshell)
Pull-Based
Accept tools/languages
that teams want to use
Let people who
“push buttons”
define the buttons
Build in security
and compliance
Define “guardrails” to
provide work safety
Consumer of
Ops Capabilities
Self-Service
On
Demand
Ops Capability
Specialist
Knowledge
Ops Capability
Specialist
Knowledge
99. Self-Service is ultimately about user experience
Consumer of
Ops Capabilities
Self-Service
On
Demand
Ops Capability
Specialist
Knowledge
Ops Capability
Specialist
Knowledge
100. Self-Service is ultimately about user experience
Consumer of
Ops Capabilities
Self-Service
On
Demand
Ops Capability
Specialist
Knowledge
Ops Capability
Specialist
Knowledge
1.Work how they want to work (GUI, API, CLI)
101. Self-Service is ultimately about user experience
Consumer of
Ops Capabilities
Self-Service
On
Demand
Ops Capability
Specialist
Knowledge
Ops Capability
Specialist
Knowledge
1.Work how they want to work (GUI, API, CLI)
2. “Guardrails” (Smart options that helpfully constrain)
102. Self-Service is ultimately about user experience
Consumer of
Ops Capabilities
Self-Service
On
Demand
Ops Capability
Specialist
Knowledge
Ops Capability
Specialist
Knowledge
1.Work how they want to work (GUI, API, CLI)
2. “Guardrails” (Smart options that helpfully constrain)
3.Dynamic resource model
(Up-to-date details of your environment)
105. Strategic: Improve incident response times
https://youtu.be/USYrDaPEFtM
Jody Mulkey at DOES ‘15 SF
Services Monitoring Scripts/Tools Services Monitoring Scripts/ToolsServices Monitoring Scripts/Tools
DEV STAGE PROD
Dev & QA NOC/Ops Dev
Promote
approved
jobs
Self-Service Self-Service
Empower
106. Strategic: Improve incident response times
https://youtu.be/USYrDaPEFtM
Jody Mulkey at DOES ‘15 SF
Services Monitoring Scripts/Tools Services Monitoring Scripts/ToolsServices Monitoring Scripts/Tools
DEV STAGE PROD
Dev & QA NOC/Ops Dev
Promote
approved
jobs
Self-Service Self-Service
Empower
107. Strategic: Improve incident response times
https://youtu.be/USYrDaPEFtM
Jody Mulkey at DOES ‘15 SF
Services Monitoring Scripts/Tools Services Monitoring Scripts/ToolsServices Monitoring Scripts/Tools
DEV STAGE PROD
Dev & QA NOC/Ops Dev
Promote
approved
jobs
Self-Service Self-Service
Empower
• Reduced MTTR by 92%
• Reduced escalations by 50%
• Reduced overall support costs by 55%
109. Strategic: Reduce compliance burden & improve
Shaun Norris at DOES ‘18 Las Vegas
https://youtu.be/d5IMvK0YHTg
Optimized for compliance
• 86,000+ employees
• 60+ countries
• Highly regulated
110. Strategic: Reduce compliance burden & improve
Shaun Norris at DOES ‘18 Las Vegas
https://youtu.be/d5IMvK0YHTg
Optimized for compliance
• 86,000+ employees
• 60+ countries
• Highly regulated
LOB #1
LOB #2 LOB #3
LOB …n
Services Scripts/Tools
Data Center
Services Scripts/Tools
Data Center
Services Scripts/Tools
Data Center Services Scripts/Tools
Cloud
Services Scripts/Tools
Cloud
Services Scripts/Tools
Cloud
Services Scripts/Tools
Cloud
Self-Service
ComplianceConsistency
111. Strategic: Reduce compliance burden & improve
Shaun Norris at DOES ‘18 Las Vegas
https://youtu.be/d5IMvK0YHTg
Optimized for compliance
• 86,000+ employees
• 60+ countries
• Highly regulated
LOB #1
LOB #2 LOB #3
LOB …n
Services Scripts/Tools
Data Center
Services Scripts/Tools
Data Center
Services Scripts/Tools
Data Center Services Scripts/Tools
Cloud
Services Scripts/Tools
Cloud
Services Scripts/Tools
Cloud
Services Scripts/Tools
Cloud
Self-Service
ComplianceConsistency
12 months:
• Saved 28 person years of time
• 13,000+ ops tasks in privileged environments that
didn’t require a review
• ~200 less customer impacting events