SlideShare a Scribd company logo
1 of 25
On the road to Engineering
excellence
Alex Mrynskyi
Palo Alto Networks
Why?
Our Goals
● Boost developer experience and productivity
● Be able to drive innovation in times of uncertainty
● Become a top performing organization
The ultimate business goal – Creating Value to the Customer!
In order to know what is a “Value” to a customer, we need to keep
experimenting. And our process should support the followings
● Faster feedback loop
● Quick decision making
● Fail fast & learn fast
What is a top performing
organization?
DORA - Deployment frequency
Humanitec - DevOps Benchmarking Study 2023
DORA - Lead Time
Humanitec - DevOps Benchmarking Study 2023
DORA - Mean Time to Recovery (MTTR)
Humanitec - DevOps Benchmarking Study 2023
DORA - Change Failure Rate
Humanitec - DevOps Benchmarking Study 2023
What else?
● Deployment
○ Reliance on Ops to deploy features might indicate lower performance. Close to
90% of top performing teams feel confident deploying independently
● Provisioning infrastructure and managed services
○ Low performing teams disproportionately rely on Ops to provision on a case-by-
case basis
● Standardization
○ 82.19% of top performing teams manage their app config in a standardized way
for all apps
● Infrastructure configuration management
○ 100% of top performing teams store their infrastructure config in a VCS
● Degree of self-service
○ 83.6% of top performing teams, developers are able to create preview
environments on the fly
Humanitec - DevOps Benchmarking Study 2023
Challenges to implement DORA
● Cultural Resistance
○ Implementing DORA metrics often requires a significant shift in company culture. Teams may resist the
change because it disrupts familiar routines or they fear being judged by the metrics. It requires strong
leadership and buy-in from all team members to overcome this resistance.
● Lack of Tooling
○ To accurately measure DORA metrics, you need tools to track deployments, changes, failures, and
recovery times. If these tools aren't in place, or if they can't integrate with each other, it can be difficult to
collect accurate data.
● Data Quality
○ The value of the metrics depends on the quality of the data being collected. If the data isn't accurate or
complete, it will skew the metrics and lead to incorrect conclusions.
● Interpreting the Data
○ Once you have the data, interpreting it can be a challenge. Without an understanding of what the metrics
mean and how they interact, it's hard to draw meaningful conclusions or make informed decisions.
● Misuse of Metrics
○ Metrics can be misused, leading to negative behaviors. For example, if the goal is to maximize deployment
frequency, teams might deploy changes that aren't valuable just to boost their numbers. It's important to
understand the context and use the metrics as a guide, not a strict rule.
● Lack of Standardization
○ Organizations may struggle to standardize the way they measure and report on DORA metrics. If different
teams or departments use different tools or methods to collect data, it can lead to inconsistencies and
make it difficult to compare performance.
Our Team
Our Journey - 2 years ago
● Scheduled releases (monthly -> bi-weekly)
○ deployed to AWS EC2 instances as debian packages
○ executed by Ops team
● Minimal observability for services in production
● Lack of standardized and reusable components and practices
○ Many different ways to manage configuration, secrets, telemetry
○ Some projects are not updated for many years
● Manual infrastructure deployment
○ deploying a new region was a major challenge that taking months
Our Journey - New Platform
● Monorepo with 50+ services with multiple deployments to production
per day across 5 regions
○ Quality checks from day 1 (Sonar, Code Style, Security tools)
○ Deploy time <15 min with parallel builds
● Highly standardized services based on a new architecture
○ Deployed to Kubernetes using Helm
○ Build-in telemetry (metrics, structured logging, tracing)
● Every merge to master automatically deployed to production
○ 2700+ Unit tests, Integration tests, Contract tests with > 90% coverage
○ multiple regressions were blocked by atomic deployments & E2E tests
● > 50 production deployments last 2 weeks
○ Mean time to merge PR - 2 days
Our Journey - Legacy Services
● Modernized ~80% of legacy services
○ Containerization, deploy to k8s, integrated telemetry
● Unified CI/CD is integrated into 10 repos
○ adapted new platform process including all quality tools, deployment pipelines
and e2e tests for acceptance
Our Journey - GitLab
Our Journey - Infrastructure
● Terraform monorepo for 80+ services with 200+ terraform state files
● Provide Service Kit to enable developers own complete infrastructure
dependencies for their service
● All terraform operations are automated via Atlantis and multi-region
environment changes happens in parallel reducing operation time
from couple of hrs to mins
Tooling
Photo from Pexels by Kim Stiver
Why is it difficult to measure productivity?
● Engineering is a complex and creative task and measuring the
productivity of any knowledge worker is generally a hard problem
● Different tools and practices (e.g. Monorepos vs polyrepos)
● Complex dependencies between services and infrastructure
● Multiple non-functional requirements (architecture, security, FIPS, etc)
● Data is scattered across multiple tools
Tools & Processes
Apache DevLake
Apache DevLake is an open-source dev data platform that ingests,
analyzes, and visualizes the fragmented data from DevOps tools to
extract insights for engineering excellence, developer experience, and
community growth.
● Collect DevOps data across the entire Software Development Life Cycle
(SDLC) and connect the siloed data with a standard data model.
● Visualize out-of-the-box engineering metrics in a series of use-case
driven dashboards
● Easily extend DevLake to support your data sources, metrics, and
dashboards with a flexible framework for data collection and ETL
● Out-of-the-box support for DORA metrics
Apache DevLake - Architecture
Demo
Photo from Pexels by Mateusz Walendzik
The Vision - Beyond DORA Metrics
The Vision - Service Catalog
How Spotify does Developer Productivity Engineering with Backstage
The Vision - Platform Engineering
● Provide engineers with the best developer experience
○ Use Backstage (DevClue) as a single pane of glass
● Get more comprehensive picture that include not only DORA metrics
○ Code quality metrics (incl security)
○ Production telemetry
○ Cost
● Ability to analyze productivity and quality from different angles
○ Teams vs services
The key to successful metrics implementation is not just to measure performance
but also to use these insights to drive continual learning and improvement

More Related Content

Similar to On the road to Engineering excellence

Upgrade JDE Quicker, Faster, and More Predictable
Upgrade JDE Quicker, Faster, and More PredictableUpgrade JDE Quicker, Faster, and More Predictable
Upgrade JDE Quicker, Faster, and More PredictableTerillium
 
Introduction to 5w’s of DevOps
Introduction to 5w’s of DevOpsIntroduction to 5w’s of DevOps
Introduction to 5w’s of DevOpsCygnet Infotech
 
5 steps to Network Reliability Engineering and Automated Network Operations
5 steps to Network Reliability Engineering and Automated Network Operations5 steps to Network Reliability Engineering and Automated Network Operations
5 steps to Network Reliability Engineering and Automated Network OperationsJames Kelly
 
Measuring Performance: See the Science of DevOps Measurement in Action
Measuring Performance: See the Science of DevOps Measurement in ActionMeasuring Performance: See the Science of DevOps Measurement in Action
Measuring Performance: See the Science of DevOps Measurement in ActionXebiaLabs
 
Measuring the Productivity of Your Engineering Organisation - the Good, the B...
Measuring the Productivity of Your Engineering Organisation - the Good, the B...Measuring the Productivity of Your Engineering Organisation - the Good, the B...
Measuring the Productivity of Your Engineering Organisation - the Good, the B...Marin Dimitrov
 
Make A Stress Free Move To The Cloud: Application Modernization and Managemen...
Make A Stress Free Move To The Cloud: Application Modernization and Managemen...Make A Stress Free Move To The Cloud: Application Modernization and Managemen...
Make A Stress Free Move To The Cloud: Application Modernization and Managemen...Dell World
 
A comprehensive hiring guide for test environment managers
A comprehensive hiring guide for test environment managersA comprehensive hiring guide for test environment managers
A comprehensive hiring guide for test environment managersEnov8
 
Deepesh_Rai_Resume_Latest
Deepesh_Rai_Resume_LatestDeepesh_Rai_Resume_Latest
Deepesh_Rai_Resume_LatestDeepesh Rai
 
Measuring Performance: See the Science of DevOps Measurement in Action
Measuring Performance: See the Science of DevOps Measurement in ActionMeasuring Performance: See the Science of DevOps Measurement in Action
Measuring Performance: See the Science of DevOps Measurement in ActionXebiaLabs
 
Transition to a modern data platform
Transition to a modern data platform Transition to a modern data platform
Transition to a modern data platform Michael Ghen
 
Kajul verma-Product Implementation Engineer_4 years
Kajul verma-Product Implementation Engineer_4 yearsKajul verma-Product Implementation Engineer_4 years
Kajul verma-Product Implementation Engineer_4 yearsKAJUL VERMA
 
Rajmohan_CV _Updated
Rajmohan_CV _UpdatedRajmohan_CV _Updated
Rajmohan_CV _UpdatedRajmohan A
 
Data Engineer's Lunch #68: DevOps Fundamentals
Data Engineer's Lunch #68: DevOps FundamentalsData Engineer's Lunch #68: DevOps Fundamentals
Data Engineer's Lunch #68: DevOps FundamentalsAnant Corporation
 
introduction_to_it_indusry_verticals.pdf
introduction_to_it_indusry_verticals.pdfintroduction_to_it_indusry_verticals.pdf
introduction_to_it_indusry_verticals.pdfANSHTYAGI33
 
Balancing PM & Software Development Practices by Splunk Sr PM
Balancing PM & Software Development Practices by Splunk Sr PMBalancing PM & Software Development Practices by Splunk Sr PM
Balancing PM & Software Development Practices by Splunk Sr PMProduct School
 

Similar to On the road to Engineering excellence (20)

DevOps
DevOpsDevOps
DevOps
 
Noor-Res
Noor-ResNoor-Res
Noor-Res
 
English digital business 2.1.pptx
English digital business 2.1.pptxEnglish digital business 2.1.pptx
English digital business 2.1.pptx
 
Upgrade JDE Quicker, Faster, and More Predictable
Upgrade JDE Quicker, Faster, and More PredictableUpgrade JDE Quicker, Faster, and More Predictable
Upgrade JDE Quicker, Faster, and More Predictable
 
Introduction to 5w’s of DevOps
Introduction to 5w’s of DevOpsIntroduction to 5w’s of DevOps
Introduction to 5w’s of DevOps
 
5 steps to Network Reliability Engineering and Automated Network Operations
5 steps to Network Reliability Engineering and Automated Network Operations5 steps to Network Reliability Engineering and Automated Network Operations
5 steps to Network Reliability Engineering and Automated Network Operations
 
Measuring Performance: See the Science of DevOps Measurement in Action
Measuring Performance: See the Science of DevOps Measurement in ActionMeasuring Performance: See the Science of DevOps Measurement in Action
Measuring Performance: See the Science of DevOps Measurement in Action
 
Measuring the Productivity of Your Engineering Organisation - the Good, the B...
Measuring the Productivity of Your Engineering Organisation - the Good, the B...Measuring the Productivity of Your Engineering Organisation - the Good, the B...
Measuring the Productivity of Your Engineering Organisation - the Good, the B...
 
Make A Stress Free Move To The Cloud: Application Modernization and Managemen...
Make A Stress Free Move To The Cloud: Application Modernization and Managemen...Make A Stress Free Move To The Cloud: Application Modernization and Managemen...
Make A Stress Free Move To The Cloud: Application Modernization and Managemen...
 
A comprehensive hiring guide for test environment managers
A comprehensive hiring guide for test environment managersA comprehensive hiring guide for test environment managers
A comprehensive hiring guide for test environment managers
 
Deepesh_Rai_Resume_Latest
Deepesh_Rai_Resume_LatestDeepesh_Rai_Resume_Latest
Deepesh_Rai_Resume_Latest
 
Measuring Performance: See the Science of DevOps Measurement in Action
Measuring Performance: See the Science of DevOps Measurement in ActionMeasuring Performance: See the Science of DevOps Measurement in Action
Measuring Performance: See the Science of DevOps Measurement in Action
 
Transition to a modern data platform
Transition to a modern data platform Transition to a modern data platform
Transition to a modern data platform
 
nitesh_rajpurkar_2016
nitesh_rajpurkar_2016nitesh_rajpurkar_2016
nitesh_rajpurkar_2016
 
Kajul verma-Product Implementation Engineer_4 years
Kajul verma-Product Implementation Engineer_4 yearsKajul verma-Product Implementation Engineer_4 years
Kajul verma-Product Implementation Engineer_4 years
 
Rajmohan_CV _Updated
Rajmohan_CV _UpdatedRajmohan_CV _Updated
Rajmohan_CV _Updated
 
Software Project management
Software Project managementSoftware Project management
Software Project management
 
Data Engineer's Lunch #68: DevOps Fundamentals
Data Engineer's Lunch #68: DevOps FundamentalsData Engineer's Lunch #68: DevOps Fundamentals
Data Engineer's Lunch #68: DevOps Fundamentals
 
introduction_to_it_indusry_verticals.pdf
introduction_to_it_indusry_verticals.pdfintroduction_to_it_indusry_verticals.pdf
introduction_to_it_indusry_verticals.pdf
 
Balancing PM & Software Development Practices by Splunk Sr PM
Balancing PM & Software Development Practices by Splunk Sr PMBalancing PM & Software Development Practices by Splunk Sr PM
Balancing PM & Software Development Practices by Splunk Sr PM
 

Recently uploaded

Activity Planning: Objectives, Project Schedule, Network Planning Model. Time...
Activity Planning: Objectives, Project Schedule, Network Planning Model. Time...Activity Planning: Objectives, Project Schedule, Network Planning Model. Time...
Activity Planning: Objectives, Project Schedule, Network Planning Model. Time...Lovely Professional University
 
Research Methodolgy & Intellectual Property Rights Series 1
Research Methodolgy & Intellectual Property Rights Series 1Research Methodolgy & Intellectual Property Rights Series 1
Research Methodolgy & Intellectual Property Rights Series 1T.D. Shashikala
 
Linux Systems Programming: Semaphores, Shared Memory, and Message Queues
Linux Systems Programming: Semaphores, Shared Memory, and Message QueuesLinux Systems Programming: Semaphores, Shared Memory, and Message Queues
Linux Systems Programming: Semaphores, Shared Memory, and Message QueuesRashidFaridChishti
 
Lesson no16 application of Induction Generator in Wind.ppsx
Lesson no16 application of Induction Generator in Wind.ppsxLesson no16 application of Induction Generator in Wind.ppsx
Lesson no16 application of Induction Generator in Wind.ppsxmichaelprrior
 
Quiz application system project report..pdf
Quiz application system project report..pdfQuiz application system project report..pdf
Quiz application system project report..pdfKamal Acharya
 
Seismic Hazard Assessment Software in Python by Prof. Dr. Costas Sachpazis
Seismic Hazard Assessment Software in Python by Prof. Dr. Costas SachpazisSeismic Hazard Assessment Software in Python by Prof. Dr. Costas Sachpazis
Seismic Hazard Assessment Software in Python by Prof. Dr. Costas SachpazisDr.Costas Sachpazis
 
"United Nations Park" Site Visit Report.
"United Nations Park" Site  Visit Report."United Nations Park" Site  Visit Report.
"United Nations Park" Site Visit Report.MdManikurRahman
 
Teachers record management system project report..pdf
Teachers record management system project report..pdfTeachers record management system project report..pdf
Teachers record management system project report..pdfKamal Acharya
 
Filters for Electromagnetic Compatibility Applications
Filters for Electromagnetic Compatibility ApplicationsFilters for Electromagnetic Compatibility Applications
Filters for Electromagnetic Compatibility ApplicationsMathias Magdowski
 
Attraction and Repulsion type Moving Iron Instruments.pptx
Attraction and Repulsion type Moving Iron Instruments.pptxAttraction and Repulsion type Moving Iron Instruments.pptx
Attraction and Repulsion type Moving Iron Instruments.pptxkarthikeyanS725446
 
Software Engineering - Modelling Concepts + Class Modelling + Building the An...
Software Engineering - Modelling Concepts + Class Modelling + Building the An...Software Engineering - Modelling Concepts + Class Modelling + Building the An...
Software Engineering - Modelling Concepts + Class Modelling + Building the An...Prakhyath Rai
 
RESORT MANAGEMENT AND RESERVATION SYSTEM PROJECT REPORT.pdf
RESORT MANAGEMENT AND RESERVATION SYSTEM PROJECT REPORT.pdfRESORT MANAGEMENT AND RESERVATION SYSTEM PROJECT REPORT.pdf
RESORT MANAGEMENT AND RESERVATION SYSTEM PROJECT REPORT.pdfKamal Acharya
 
Online book store management system project.pdf
Online book store management system project.pdfOnline book store management system project.pdf
Online book store management system project.pdfKamal Acharya
 
BURGER ORDERING SYSYTEM PROJECT REPORT..pdf
BURGER ORDERING SYSYTEM PROJECT REPORT..pdfBURGER ORDERING SYSYTEM PROJECT REPORT..pdf
BURGER ORDERING SYSYTEM PROJECT REPORT..pdfKamal Acharya
 
The battle for RAG, explore the pros and cons of using KnowledgeGraphs and Ve...
The battle for RAG, explore the pros and cons of using KnowledgeGraphs and Ve...The battle for RAG, explore the pros and cons of using KnowledgeGraphs and Ve...
The battle for RAG, explore the pros and cons of using KnowledgeGraphs and Ve...Roi Lipman
 
Lab Manual Arduino UNO Microcontrollar.docx
Lab Manual Arduino UNO Microcontrollar.docxLab Manual Arduino UNO Microcontrollar.docx
Lab Manual Arduino UNO Microcontrollar.docxRashidFaridChishti
 
Complex plane, Modulus, Argument, Graphical representation of a complex numbe...
Complex plane, Modulus, Argument, Graphical representation of a complex numbe...Complex plane, Modulus, Argument, Graphical representation of a complex numbe...
Complex plane, Modulus, Argument, Graphical representation of a complex numbe...MohammadAliNayeem
 
Dairy management system project report..pdf
Dairy management system project report..pdfDairy management system project report..pdf
Dairy management system project report..pdfKamal Acharya
 
BRAKING SYSTEM IN INDIAN RAILWAY AutoCAD DRAWING
BRAKING SYSTEM IN INDIAN RAILWAY AutoCAD DRAWINGBRAKING SYSTEM IN INDIAN RAILWAY AutoCAD DRAWING
BRAKING SYSTEM IN INDIAN RAILWAY AutoCAD DRAWINGKOUSTAV SARKAR
 
Theory for How to calculation capacitor bank
Theory for How to calculation capacitor bankTheory for How to calculation capacitor bank
Theory for How to calculation capacitor banktawat puangthong
 

Recently uploaded (20)

Activity Planning: Objectives, Project Schedule, Network Planning Model. Time...
Activity Planning: Objectives, Project Schedule, Network Planning Model. Time...Activity Planning: Objectives, Project Schedule, Network Planning Model. Time...
Activity Planning: Objectives, Project Schedule, Network Planning Model. Time...
 
Research Methodolgy & Intellectual Property Rights Series 1
Research Methodolgy & Intellectual Property Rights Series 1Research Methodolgy & Intellectual Property Rights Series 1
Research Methodolgy & Intellectual Property Rights Series 1
 
Linux Systems Programming: Semaphores, Shared Memory, and Message Queues
Linux Systems Programming: Semaphores, Shared Memory, and Message QueuesLinux Systems Programming: Semaphores, Shared Memory, and Message Queues
Linux Systems Programming: Semaphores, Shared Memory, and Message Queues
 
Lesson no16 application of Induction Generator in Wind.ppsx
Lesson no16 application of Induction Generator in Wind.ppsxLesson no16 application of Induction Generator in Wind.ppsx
Lesson no16 application of Induction Generator in Wind.ppsx
 
Quiz application system project report..pdf
Quiz application system project report..pdfQuiz application system project report..pdf
Quiz application system project report..pdf
 
Seismic Hazard Assessment Software in Python by Prof. Dr. Costas Sachpazis
Seismic Hazard Assessment Software in Python by Prof. Dr. Costas SachpazisSeismic Hazard Assessment Software in Python by Prof. Dr. Costas Sachpazis
Seismic Hazard Assessment Software in Python by Prof. Dr. Costas Sachpazis
 
"United Nations Park" Site Visit Report.
"United Nations Park" Site  Visit Report."United Nations Park" Site  Visit Report.
"United Nations Park" Site Visit Report.
 
Teachers record management system project report..pdf
Teachers record management system project report..pdfTeachers record management system project report..pdf
Teachers record management system project report..pdf
 
Filters for Electromagnetic Compatibility Applications
Filters for Electromagnetic Compatibility ApplicationsFilters for Electromagnetic Compatibility Applications
Filters for Electromagnetic Compatibility Applications
 
Attraction and Repulsion type Moving Iron Instruments.pptx
Attraction and Repulsion type Moving Iron Instruments.pptxAttraction and Repulsion type Moving Iron Instruments.pptx
Attraction and Repulsion type Moving Iron Instruments.pptx
 
Software Engineering - Modelling Concepts + Class Modelling + Building the An...
Software Engineering - Modelling Concepts + Class Modelling + Building the An...Software Engineering - Modelling Concepts + Class Modelling + Building the An...
Software Engineering - Modelling Concepts + Class Modelling + Building the An...
 
RESORT MANAGEMENT AND RESERVATION SYSTEM PROJECT REPORT.pdf
RESORT MANAGEMENT AND RESERVATION SYSTEM PROJECT REPORT.pdfRESORT MANAGEMENT AND RESERVATION SYSTEM PROJECT REPORT.pdf
RESORT MANAGEMENT AND RESERVATION SYSTEM PROJECT REPORT.pdf
 
Online book store management system project.pdf
Online book store management system project.pdfOnline book store management system project.pdf
Online book store management system project.pdf
 
BURGER ORDERING SYSYTEM PROJECT REPORT..pdf
BURGER ORDERING SYSYTEM PROJECT REPORT..pdfBURGER ORDERING SYSYTEM PROJECT REPORT..pdf
BURGER ORDERING SYSYTEM PROJECT REPORT..pdf
 
The battle for RAG, explore the pros and cons of using KnowledgeGraphs and Ve...
The battle for RAG, explore the pros and cons of using KnowledgeGraphs and Ve...The battle for RAG, explore the pros and cons of using KnowledgeGraphs and Ve...
The battle for RAG, explore the pros and cons of using KnowledgeGraphs and Ve...
 
Lab Manual Arduino UNO Microcontrollar.docx
Lab Manual Arduino UNO Microcontrollar.docxLab Manual Arduino UNO Microcontrollar.docx
Lab Manual Arduino UNO Microcontrollar.docx
 
Complex plane, Modulus, Argument, Graphical representation of a complex numbe...
Complex plane, Modulus, Argument, Graphical representation of a complex numbe...Complex plane, Modulus, Argument, Graphical representation of a complex numbe...
Complex plane, Modulus, Argument, Graphical representation of a complex numbe...
 
Dairy management system project report..pdf
Dairy management system project report..pdfDairy management system project report..pdf
Dairy management system project report..pdf
 
BRAKING SYSTEM IN INDIAN RAILWAY AutoCAD DRAWING
BRAKING SYSTEM IN INDIAN RAILWAY AutoCAD DRAWINGBRAKING SYSTEM IN INDIAN RAILWAY AutoCAD DRAWING
BRAKING SYSTEM IN INDIAN RAILWAY AutoCAD DRAWING
 
Theory for How to calculation capacitor bank
Theory for How to calculation capacitor bankTheory for How to calculation capacitor bank
Theory for How to calculation capacitor bank
 

On the road to Engineering excellence

  • 1. On the road to Engineering excellence Alex Mrynskyi Palo Alto Networks
  • 3. Our Goals ● Boost developer experience and productivity ● Be able to drive innovation in times of uncertainty ● Become a top performing organization The ultimate business goal – Creating Value to the Customer! In order to know what is a “Value” to a customer, we need to keep experimenting. And our process should support the followings ● Faster feedback loop ● Quick decision making ● Fail fast & learn fast
  • 4. What is a top performing organization?
  • 5. DORA - Deployment frequency Humanitec - DevOps Benchmarking Study 2023
  • 6. DORA - Lead Time Humanitec - DevOps Benchmarking Study 2023
  • 7. DORA - Mean Time to Recovery (MTTR) Humanitec - DevOps Benchmarking Study 2023
  • 8. DORA - Change Failure Rate Humanitec - DevOps Benchmarking Study 2023
  • 9. What else? ● Deployment ○ Reliance on Ops to deploy features might indicate lower performance. Close to 90% of top performing teams feel confident deploying independently ● Provisioning infrastructure and managed services ○ Low performing teams disproportionately rely on Ops to provision on a case-by- case basis ● Standardization ○ 82.19% of top performing teams manage their app config in a standardized way for all apps ● Infrastructure configuration management ○ 100% of top performing teams store their infrastructure config in a VCS ● Degree of self-service ○ 83.6% of top performing teams, developers are able to create preview environments on the fly Humanitec - DevOps Benchmarking Study 2023
  • 10. Challenges to implement DORA ● Cultural Resistance ○ Implementing DORA metrics often requires a significant shift in company culture. Teams may resist the change because it disrupts familiar routines or they fear being judged by the metrics. It requires strong leadership and buy-in from all team members to overcome this resistance. ● Lack of Tooling ○ To accurately measure DORA metrics, you need tools to track deployments, changes, failures, and recovery times. If these tools aren't in place, or if they can't integrate with each other, it can be difficult to collect accurate data. ● Data Quality ○ The value of the metrics depends on the quality of the data being collected. If the data isn't accurate or complete, it will skew the metrics and lead to incorrect conclusions. ● Interpreting the Data ○ Once you have the data, interpreting it can be a challenge. Without an understanding of what the metrics mean and how they interact, it's hard to draw meaningful conclusions or make informed decisions. ● Misuse of Metrics ○ Metrics can be misused, leading to negative behaviors. For example, if the goal is to maximize deployment frequency, teams might deploy changes that aren't valuable just to boost their numbers. It's important to understand the context and use the metrics as a guide, not a strict rule. ● Lack of Standardization ○ Organizations may struggle to standardize the way they measure and report on DORA metrics. If different teams or departments use different tools or methods to collect data, it can lead to inconsistencies and make it difficult to compare performance.
  • 12. Our Journey - 2 years ago ● Scheduled releases (monthly -> bi-weekly) ○ deployed to AWS EC2 instances as debian packages ○ executed by Ops team ● Minimal observability for services in production ● Lack of standardized and reusable components and practices ○ Many different ways to manage configuration, secrets, telemetry ○ Some projects are not updated for many years ● Manual infrastructure deployment ○ deploying a new region was a major challenge that taking months
  • 13. Our Journey - New Platform ● Monorepo with 50+ services with multiple deployments to production per day across 5 regions ○ Quality checks from day 1 (Sonar, Code Style, Security tools) ○ Deploy time <15 min with parallel builds ● Highly standardized services based on a new architecture ○ Deployed to Kubernetes using Helm ○ Build-in telemetry (metrics, structured logging, tracing) ● Every merge to master automatically deployed to production ○ 2700+ Unit tests, Integration tests, Contract tests with > 90% coverage ○ multiple regressions were blocked by atomic deployments & E2E tests ● > 50 production deployments last 2 weeks ○ Mean time to merge PR - 2 days
  • 14. Our Journey - Legacy Services ● Modernized ~80% of legacy services ○ Containerization, deploy to k8s, integrated telemetry ● Unified CI/CD is integrated into 10 repos ○ adapted new platform process including all quality tools, deployment pipelines and e2e tests for acceptance
  • 15. Our Journey - GitLab
  • 16. Our Journey - Infrastructure ● Terraform monorepo for 80+ services with 200+ terraform state files ● Provide Service Kit to enable developers own complete infrastructure dependencies for their service ● All terraform operations are automated via Atlantis and multi-region environment changes happens in parallel reducing operation time from couple of hrs to mins
  • 17. Tooling Photo from Pexels by Kim Stiver
  • 18. Why is it difficult to measure productivity? ● Engineering is a complex and creative task and measuring the productivity of any knowledge worker is generally a hard problem ● Different tools and practices (e.g. Monorepos vs polyrepos) ● Complex dependencies between services and infrastructure ● Multiple non-functional requirements (architecture, security, FIPS, etc) ● Data is scattered across multiple tools
  • 20. Apache DevLake Apache DevLake is an open-source dev data platform that ingests, analyzes, and visualizes the fragmented data from DevOps tools to extract insights for engineering excellence, developer experience, and community growth. ● Collect DevOps data across the entire Software Development Life Cycle (SDLC) and connect the siloed data with a standard data model. ● Visualize out-of-the-box engineering metrics in a series of use-case driven dashboards ● Easily extend DevLake to support your data sources, metrics, and dashboards with a flexible framework for data collection and ETL ● Out-of-the-box support for DORA metrics
  • 21. Apache DevLake - Architecture
  • 22. Demo Photo from Pexels by Mateusz Walendzik
  • 23. The Vision - Beyond DORA Metrics
  • 24. The Vision - Service Catalog How Spotify does Developer Productivity Engineering with Backstage
  • 25. The Vision - Platform Engineering ● Provide engineers with the best developer experience ○ Use Backstage (DevClue) as a single pane of glass ● Get more comprehensive picture that include not only DORA metrics ○ Code quality metrics (incl security) ○ Production telemetry ○ Cost ● Ability to analyze productivity and quality from different angles ○ Teams vs services The key to successful metrics implementation is not just to measure performance but also to use these insights to drive continual learning and improvement

Editor's Notes

  1. Deployment frequency is a metric that tracks how frequently a development team successfully pushes updates into production. The key word in this definition is successful. A software development team that continually delivers broken updates or deployments is not good. That’s the truth, even if it hurts to hear. This metric is easy to track and very important. Deployment frequency is often the first place a development team may start to make changes. While deployment frequency will vary widely among industries and applications, high-performing teams deliver code for production and launch every day multiple times a week.
  2. The term lead time describes the time between initial code commitment to full deployment to production. When your team decides to implement a UI change, how long does this take to get into production? When your team implements a new security feature, how long does testing take before release? Lead time is measured from when a team starts working on a code change to the moment it is in the production environment. Lead time can be further broken down by looking at what stage of change development is taking the longest. Is your team spending the most time in development or testing?
  3. Mean Time to Recovery measures the time it takes to recover following an outage, service interruption, or product failure. This is measured from the initial moment of an outage until the incident team has recovered all services and operations. These events are unavoidable to a certain degree, although good management can significantly reduce the Mean Time Between Failure (MTBF). Because it’s impossible to avoid incidents completely, you need an incident plan that works. Slow recovery times can impact your organization in more than one way. Your customers will experience a prolonged outage and will view your team negatively for not being able to get the incident resolved. You may lose customers, and the reputation of your brand may be diminished. Additionally, management is less likely to move in an experimental direction if the team cannot keep up with the current, supposedly stable software.
  4. It’s great to have frequent deployments, but what’s the point if your team is constantly rolling back updates. Or even worse, if updates are causing incidents or outages. You should track all deployments that end up as incidents or get rolled back. This is known as the Change Failure Rate (CFR) and is measured as a percentage. By tracking Change Failure R ate, you learn how often your team is going back to fix earlier deployments. This alerts you to a quality breakdown somewhere in the code development or deployment process itself.