This document outlines Pedro's experience running engineering teams and details the evolution of Unicorn's on-call program over time from the Stone Age to the Iron Age. It began with everyone being on-call and no formal program. Over time, they introduced dedicated on-call teams, defined compensation and procedures, improved alarm handling, and focused on reducing burnout. The final section reflects on lessons learned like tuning alarms, avoiding rushed decisions, and keeping stakeholders informed during incidents.
DevOps Days Galway 2018: Volunteers, not conscripts: Fixing Out-Of-Hours oncall.Brian Scanlan
A presentation about how we improved out-of-hours oncall at Intercom.
See more here: https://www.intercom.com/blog/rapid-response-how-we-developed-an-on-call-team-at-intercom/
DevOps Days Galway 2018: Volunteers, not conscripts: Fixing Out-Of-Hours oncall.Brian Scanlan
A presentation about how we improved out-of-hours oncall at Intercom.
See more here: https://www.intercom.com/blog/rapid-response-how-we-developed-an-on-call-team-at-intercom/
Hanno Jarvet - VSM, Planning and Problem Solving - ConFuDevConFu
Value stream mapping is a Lean technique used to analyse and design the flow of materials and information required to bring a product or service to a consumer. It can be used for nearly any value chain, line of business and group of processes to optimize their results and efficiency.
During the hands on work-shop each participant will have the opportunity to work with their actual business problems and walk away with a clear roadmap on what to improve and why.
Climbing out of a Crisis Loop at the BBCRafiq Gemmail
A talk I gave with my friend and mentor Katherine Kirk, on our journey to Scrumban and a leaner workflow at the BBC. See https://www.infoq.com/presentations/bbc-agile-case-study for the full presentation.
Lean Maintenance is gaining traction as a sound strategy to keep equipment running and productivity humming. The hardest part is getting started. On Thursday, March 20 at 1 p.m. CDT, Plant Engineering will present a Webcast that looks at the steps needed to implement a sound Lean Maintenance strategy on your plant floor and to begin to reap the benefits.
Learning objectives:
-The value of Lean Maintenance as a plant-floor strategy and the history of lean
-The steps and tools needed to get started down the road to Lean
-Getting plant-floor buy-in from line workers
-Incorporating technology into Lean maintenance
Goto Chicago; Journeys To Cloud Native Architecture: Sun, Sea And Emergencies...OpenCredo
Many businesses want to embrace modern business practices by delivering projects quickly and scaling faster. For this, adopting a Cloud Native mindset and architecture makes sense but is not a simple magic carpet ride.
In this talk, Nicki Watt from OpenCredo will share the realities of making that journey for a number of clients. Far from being a smooth journey to the promised land, you will learn about the numerous detours, bumps and challenges encountered along the way.
Microservices, Kubernetes, Success, but also Bandages and Crutches; This talk is for you if you want to gain some pragmatic insight into what is entailed with such endeavors.
Agile Roles #3 The Product Owner – What is this Mythical Beast?Agile Auckland
What the heck is a Product Owner? Why do you need one? And how can the Product Owner help a team soar – or crash horribly?
In this talk Anthony Marter will share his experience developing the Product Owner role in 2 large NZ software organisations. He’ll cover
· The basics of the role, and how an effective Product Owner contributes to the success of a team
· The challenges of the role, especially at scale
· Why an effective Product Owner is key to establishing and maintaining organisational Agility
StartOps: Growing an ops team from 1 founderServer Density
Bootstrapped startups don't have the luxury of a full team of ops engineers available to respond to issues 24/7, so how can you survive on your own? This talk will tell the story of how to run your infrastructure as a single founder through to growing that into a team of on call engineers. It will include some interesting war stories as well as tips and suggestions for how to run ops at a startup.
Presented at DevOpsDays London 2013 by David Mytton.
Agile in Action - Agile Overview for DevelopersMatt Cowell
Excerpt from a presentation I gave to the University of Alabama Association for Computing Machinery in November 2010. I wanted to give the students a practical overview of Agile and Scrum and give them some perspective on what Agile means for developers.
Agile: the Good, the Bad and the Ugly - Webinar by Clarke Ching Agile - Septe...MARRIS Consulting
Webinar by Clarke Ching Agile and ToC expert. Agile: the Good, the Bad and the Ugly. If your Agile is broken then this is how to fix it!
Your Agile teams are busy. Busy delivering. Busy improving. Your quality is amazing. Rework is low. The product looks great. Your users love it. You are a high performing team!
But your internal customers say your teams are slow. This session will teach you how to use the Theory of Constraints to figure out how to speed up, by finding the one thing that’s slowing them down.
This webinar will cover how, in an Agile environment:
- to better control scope creep,
- to reinforce your relationship with the I.T. Development team’s client,
- to be able to make commitments and honour them and
- to decide where your bottleneck should be.
About the speaker
Clarke Ching is a computer scientist with an MBA who discovered Goldratt’s Theory of Constraints (ToC) in 2003 and has been using it ever since to accelerate Agile initiatives. He is fascinated by Agile and obsessed with ToC.
He wrote the amazon best-sellers Rolling Rocks Downhill and The Bottleneck Rules. Rolling Rocks Downhill teaches 3 things: the fundamentals of Agile combined with ToC; how to use those fundamentals to deliver big projects faster and on time; and how to deliver quietly huge transformations. It’s been featured in The Guardian newspaper and The Spectator magazine. It was one of Barbara Oakley’s top 10 books of 2019. It was the #2 best-selling Leadership book on amazon.com, just behind Steven Covey’s 7-habits book.
He has been Agile / Lean / ToC expert in: GE Energy, Dell, Royal London (life insurance & pensions), Gazprom and Standard Life Aberdeen among other organizations. He is the past Chairperson of Agile Scotland. He is a lecturer at Victoria University School Of Management in New Zealand where he now lives.
Today he is the founder and Chief Productivity Officer of Odd Socks Consulting
Practical Agile. Lessons learned the hard way on our journey building digita...TechExeter
Ian Ames - Practical agile. Lessons learned the hard way on our journey building digital products.
Slides from the TechExeter Conference, 8th October 2016.
www.techexeter.uk
Hanno Jarvet - VSM, Planning and Problem Solving - ConFuDevConFu
Value stream mapping is a Lean technique used to analyse and design the flow of materials and information required to bring a product or service to a consumer. It can be used for nearly any value chain, line of business and group of processes to optimize their results and efficiency.
During the hands on work-shop each participant will have the opportunity to work with their actual business problems and walk away with a clear roadmap on what to improve and why.
Climbing out of a Crisis Loop at the BBCRafiq Gemmail
A talk I gave with my friend and mentor Katherine Kirk, on our journey to Scrumban and a leaner workflow at the BBC. See https://www.infoq.com/presentations/bbc-agile-case-study for the full presentation.
Lean Maintenance is gaining traction as a sound strategy to keep equipment running and productivity humming. The hardest part is getting started. On Thursday, March 20 at 1 p.m. CDT, Plant Engineering will present a Webcast that looks at the steps needed to implement a sound Lean Maintenance strategy on your plant floor and to begin to reap the benefits.
Learning objectives:
-The value of Lean Maintenance as a plant-floor strategy and the history of lean
-The steps and tools needed to get started down the road to Lean
-Getting plant-floor buy-in from line workers
-Incorporating technology into Lean maintenance
Goto Chicago; Journeys To Cloud Native Architecture: Sun, Sea And Emergencies...OpenCredo
Many businesses want to embrace modern business practices by delivering projects quickly and scaling faster. For this, adopting a Cloud Native mindset and architecture makes sense but is not a simple magic carpet ride.
In this talk, Nicki Watt from OpenCredo will share the realities of making that journey for a number of clients. Far from being a smooth journey to the promised land, you will learn about the numerous detours, bumps and challenges encountered along the way.
Microservices, Kubernetes, Success, but also Bandages and Crutches; This talk is for you if you want to gain some pragmatic insight into what is entailed with such endeavors.
Agile Roles #3 The Product Owner – What is this Mythical Beast?Agile Auckland
What the heck is a Product Owner? Why do you need one? And how can the Product Owner help a team soar – or crash horribly?
In this talk Anthony Marter will share his experience developing the Product Owner role in 2 large NZ software organisations. He’ll cover
· The basics of the role, and how an effective Product Owner contributes to the success of a team
· The challenges of the role, especially at scale
· Why an effective Product Owner is key to establishing and maintaining organisational Agility
StartOps: Growing an ops team from 1 founderServer Density
Bootstrapped startups don't have the luxury of a full team of ops engineers available to respond to issues 24/7, so how can you survive on your own? This talk will tell the story of how to run your infrastructure as a single founder through to growing that into a team of on call engineers. It will include some interesting war stories as well as tips and suggestions for how to run ops at a startup.
Presented at DevOpsDays London 2013 by David Mytton.
Agile in Action - Agile Overview for DevelopersMatt Cowell
Excerpt from a presentation I gave to the University of Alabama Association for Computing Machinery in November 2010. I wanted to give the students a practical overview of Agile and Scrum and give them some perspective on what Agile means for developers.
Agile: the Good, the Bad and the Ugly - Webinar by Clarke Ching Agile - Septe...MARRIS Consulting
Webinar by Clarke Ching Agile and ToC expert. Agile: the Good, the Bad and the Ugly. If your Agile is broken then this is how to fix it!
Your Agile teams are busy. Busy delivering. Busy improving. Your quality is amazing. Rework is low. The product looks great. Your users love it. You are a high performing team!
But your internal customers say your teams are slow. This session will teach you how to use the Theory of Constraints to figure out how to speed up, by finding the one thing that’s slowing them down.
This webinar will cover how, in an Agile environment:
- to better control scope creep,
- to reinforce your relationship with the I.T. Development team’s client,
- to be able to make commitments and honour them and
- to decide where your bottleneck should be.
About the speaker
Clarke Ching is a computer scientist with an MBA who discovered Goldratt’s Theory of Constraints (ToC) in 2003 and has been using it ever since to accelerate Agile initiatives. He is fascinated by Agile and obsessed with ToC.
He wrote the amazon best-sellers Rolling Rocks Downhill and The Bottleneck Rules. Rolling Rocks Downhill teaches 3 things: the fundamentals of Agile combined with ToC; how to use those fundamentals to deliver big projects faster and on time; and how to deliver quietly huge transformations. It’s been featured in The Guardian newspaper and The Spectator magazine. It was one of Barbara Oakley’s top 10 books of 2019. It was the #2 best-selling Leadership book on amazon.com, just behind Steven Covey’s 7-habits book.
He has been Agile / Lean / ToC expert in: GE Energy, Dell, Royal London (life insurance & pensions), Gazprom and Standard Life Aberdeen among other organizations. He is the past Chairperson of Agile Scotland. He is a lecturer at Victoria University School Of Management in New Zealand where he now lives.
Today he is the founder and Chief Productivity Officer of Odd Socks Consulting
Practical Agile. Lessons learned the hard way on our journey building digita...TechExeter
Ian Ames - Practical agile. Lessons learned the hard way on our journey building digital products.
Slides from the TechExeter Conference, 8th October 2016.
www.techexeter.uk
ER(Entity Relationship) Diagram for online shopping - TAEHimani415946
https://bit.ly/3KACoyV
The ER diagram for the project is the foundation for the building of the database of the project. The properties, datatypes, and attributes are defined by the ER diagram.
Multi-cluster Kubernetes Networking- Patterns, Projects and GuidelinesSanjeev Rampal
Talk presented at Kubernetes Community Day, New York, May 2024.
Technical summary of Multi-Cluster Kubernetes Networking architectures with focus on 4 key topics.
1) Key patterns for Multi-cluster architectures
2) Architectural comparison of several OSS/ CNCF projects to address these patterns
3) Evolution trends for the APIs of these projects
4) Some design recommendations & guidelines for adopting/ deploying these solutions.
This 7-second Brain Wave Ritual Attracts Money To You.!nirahealhty
Discover the power of a simple 7-second brain wave ritual that can attract wealth and abundance into your life. By tapping into specific brain frequencies, this technique helps you manifest financial success effortlessly. Ready to transform your financial future? Try this powerful ritual and start attracting money today!
1.Wireless Communication System_Wireless communication is a broad term that i...JeyaPerumal1
Wireless communication involves the transmission of information over a distance without the help of wires, cables or any other forms of electrical conductors.
Wireless communication is a broad term that incorporates all procedures and forms of connecting and communicating between two or more devices using a wireless signal through wireless communication technologies and devices.
Features of Wireless Communication
The evolution of wireless technology has brought many advancements with its effective features.
The transmitted distance can be anywhere between a few meters (for example, a television's remote control) and thousands of kilometers (for example, radio communication).
Wireless communication can be used for cellular telephony, wireless access to the internet, wireless home networking, and so on.
4. Hi there! I’m Pedro!
• Engineering Director @
• Impact-driven person
• Passionate about People, Technology, and Products
• Agile, Lean and DevOps aficionado
• 10+ years of experience running engineering teams
5. On-call :: Definition
(of a person) able to be contacted in order to provide a
professional service if necessary, but not formally on duty.
‘The team is on call 24 hours-a-day, and is trained in resuscitation techniques and how to use live-saving defibrillators.’
‘If you work in a global organization, you might be on call 24 hours a day for troubleshooting or consulting.’
‘You have to get up in the middle of the night if you're on call.’
21. Tool Age
• Tons of alarms
• False positives (Broken windows theory
https://en.wikipedia.org/wiki/Broken_windows_theory)
• MTTA not tracked
• MTTR “over 9000”
• All systems were on-call (Because none was… so all of them were)
22. Tool Age
• No compensation (voluntarily and pro-bono)
29. Bronze Age
• SRE team covering own rota (infra one) –> We rebranded the Ops
team to SRE team
• Development teams with rotas (dedicated to their systems)
• One engineer per rota (no secondaries)
• Engineers on-call (eat your own dog food: you develop it… you
maintain it in PROD!)
31. Bronze Age
• Tools: One hotspot per rota (no smartphones so that we don’t make
people carry two devices) + VictorOps App
32. Bronze Age
• One week rotas (four rotas in total)
• The rotas start / end every Tuesday (i.e. End-of-Sprint day) aligning
the rotas calendar with the sprints calendar
33. Bronze Age
• Only critical systems covered by the program (defined by Engineering
and agreed with stakeholders (e.g. Product, Customer Services,
Support))
35. Bronze Age
• Incident commander defined - The Incident Commander (IC) holds
the high-level state about the incident. They structure the incident
response task force, assigning responsibilities according to need and
priority
36. Bronze Age
• Weekly fire drills (or like Google calls it "Wheel of misfortune")
38. Bronze Age
• Alarms fine tuned
• Defined time to Ack under 5 minutes
• Redefined thresholds
• Distinguished Alarms from Notifications: The alarm requires immediate
action. The notification can wait for the next day or so
41. Bronze Age
• Volunteer based and not compulsory based (Yeah… we ran into
“trouble” and I went on-call because of that: eat your own dog food…
lead by example… I took 4 consecutive weeks on-call)
42. Bronze Age
• Engineers participating in multiple rotas
• Avoiding engineers doing rotas back to back
46. Bronze Age
• Little time to work on the resiliency of systems (hard to prioritize and
hard to complete action points from PMs during sprints)
47. Bronze Age
• On-call procedures
• Updating the company’s status page
• Keeping the organization/stakeholders informed with the incident status
every 5 minutes
49. Bronze Age
• Performance reviews completely disassociated from the on-call
program (no one gets a worst review because of not participating in
the program)
50. Bronze Age
• Although we have offices in different time zones we didn’t use a
“follow the sun” strategy (lack of engineers in the US)
51. Bronze Age
• P0s are all-hands on deck and we are “entitled” to call all engineers
that can help
• Panic button on Slack with Zappier integration
87. Final thoughts
• Don’t make rushed decisions because you are getting too many alerts
(e.g. turning off alarms)
88. Final thoughts
• Take advantage of the business hours (when you have the entire
engineering team at the office) to tackle issues that might come up
during out-of-business hours (when you “only” have the on-call
engineers available)
89. Final thoughts
• Being on-call doesn’t mean that you need to save the world. We don’t
need “Rambos”… so play it safe, stick to the playbooks and don’t
make risky decisions under stress
90. Final thoughts
• Don’t hesitate to jump into a (video) call to coordinate the incident
resolution (usually Slack is not enough) – sync vs async comms
91. Final thoughts
• Don’t forget to keep the stakeholders in the loop (we are in the heat
zone… but they are suffering from the sideline… and they need to
know what is happening)
92. Final thoughts
• Action items on (Blameless) post mortems should be tracked and
assured that they are executed
93. Final thoughts
• Don’t fall into the wishful thinking game: if you believe/suspect that
an alarm is triggered by something harmless that you “can’t control”
(e.g. network glitch)… be ready to prove that… otherwise don’t stop
investigating the root cause
94. Final thoughts
• Always write PMs (for PEs and PIs) and bare in mind that you should
have public versions of the PM (sooner or later your customers will
ask for them)
Shifts (I like to call them rotas)
People (I like to call them heroes)
Systems (I like to call them “the critical ones”)
Shifts (I like to call them rotas)
People (I like to call them heroes)
Systems (I like to call them “the critical ones”)
Shifts (I like to call them rotas)
People (I like to call them heroes)
Systems (I like to call them “the critical ones”)
Shifts (I like to call them rotas)
People (I like to call them heroes)
Systems (I like to call them “the critical ones”)
Because you care about your customers
Because you care about your production systems
Because you care about your engineers
Because you care about your company
Because you care about your job