SlideShare a Scribd company logo
On the road to Engineering
excellence
Alex Mrynskyi
Palo Alto Networks
Why?
Our Goals
● Boost developer experience and productivity
● Be able to drive innovation in times of uncertainty
● Become a top performing organization
The ultimate business goal – Creating Value to the Customer!
In order to know what is a “Value” to a customer, we need to keep
experimenting. And our process should support the followings
● Faster feedback loop
● Quick decision making
● Fail fast & learn fast
What is a top performing
organization?
DORA - Deployment frequency
Humanitec - DevOps Benchmarking Study 2023
DORA - Lead Time
Humanitec - DevOps Benchmarking Study 2023
DORA - Mean Time to Recovery (MTTR)
Humanitec - DevOps Benchmarking Study 2023
DORA - Change Failure Rate
Humanitec - DevOps Benchmarking Study 2023
What else?
● Deployment
○ Reliance on Ops to deploy features might indicate lower performance. Close to
90% of top performing teams feel confident deploying independently
● Provisioning infrastructure and managed services
○ Low performing teams disproportionately rely on Ops to provision on a case-by-
case basis
● Standardization
○ 82.19% of top performing teams manage their app config in a standardized way
for all apps
● Infrastructure configuration management
○ 100% of top performing teams store their infrastructure config in a VCS
● Degree of self-service
○ 83.6% of top performing teams, developers are able to create preview
environments on the fly
Humanitec - DevOps Benchmarking Study 2023
Challenges to implement DORA
● Cultural Resistance
○ Implementing DORA metrics often requires a significant shift in company culture. Teams may resist the
change because it disrupts familiar routines or they fear being judged by the metrics. It requires strong
leadership and buy-in from all team members to overcome this resistance.
● Lack of Tooling
○ To accurately measure DORA metrics, you need tools to track deployments, changes, failures, and
recovery times. If these tools aren't in place, or if they can't integrate with each other, it can be difficult to
collect accurate data.
● Data Quality
○ The value of the metrics depends on the quality of the data being collected. If the data isn't accurate or
complete, it will skew the metrics and lead to incorrect conclusions.
● Interpreting the Data
○ Once you have the data, interpreting it can be a challenge. Without an understanding of what the metrics
mean and how they interact, it's hard to draw meaningful conclusions or make informed decisions.
● Misuse of Metrics
○ Metrics can be misused, leading to negative behaviors. For example, if the goal is to maximize deployment
frequency, teams might deploy changes that aren't valuable just to boost their numbers. It's important to
understand the context and use the metrics as a guide, not a strict rule.
● Lack of Standardization
○ Organizations may struggle to standardize the way they measure and report on DORA metrics. If different
teams or departments use different tools or methods to collect data, it can lead to inconsistencies and
make it difficult to compare performance.
Our Team
Our Journey - 2 years ago
● Scheduled releases (monthly -> bi-weekly)
○ deployed to AWS EC2 instances as debian packages
○ executed by Ops team
● Minimal observability for services in production
● Lack of standardized and reusable components and practices
○ Many different ways to manage configuration, secrets, telemetry
○ Some projects are not updated for many years
● Manual infrastructure deployment
○ deploying a new region was a major challenge that taking months
Our Journey - New Platform
● Monorepo with 50+ services with multiple deployments to production
per day across 5 regions
○ Quality checks from day 1 (Sonar, Code Style, Security tools)
○ Deploy time <15 min with parallel builds
● Highly standardized services based on a new architecture
○ Deployed to Kubernetes using Helm
○ Build-in telemetry (metrics, structured logging, tracing)
● Every merge to master automatically deployed to production
○ 2700+ Unit tests, Integration tests, Contract tests with > 90% coverage
○ multiple regressions were blocked by atomic deployments & E2E tests
● > 50 production deployments last 2 weeks
○ Mean time to merge PR - 2 days
Our Journey - Legacy Services
● Modernized ~80% of legacy services
○ Containerization, deploy to k8s, integrated telemetry
● Unified CI/CD is integrated into 10 repos
○ adapted new platform process including all quality tools, deployment pipelines
and e2e tests for acceptance
Our Journey - GitLab
Our Journey - Infrastructure
● Terraform monorepo for 80+ services with 200+ terraform state files
● Provide Service Kit to enable developers own complete infrastructure
dependencies for their service
● All terraform operations are automated via Atlantis and multi-region
environment changes happens in parallel reducing operation time
from couple of hrs to mins
Tooling
Photo from Pexels by Kim Stiver
Why is it difficult to measure productivity?
● Engineering is a complex and creative task and measuring the
productivity of any knowledge worker is generally a hard problem
● Different tools and practices (e.g. Monorepos vs polyrepos)
● Complex dependencies between services and infrastructure
● Multiple non-functional requirements (architecture, security, FIPS, etc)
● Data is scattered across multiple tools
Tools & Processes
Apache DevLake
Apache DevLake is an open-source dev data platform that ingests,
analyzes, and visualizes the fragmented data from DevOps tools to
extract insights for engineering excellence, developer experience, and
community growth.
● Collect DevOps data across the entire Software Development Life Cycle
(SDLC) and connect the siloed data with a standard data model.
● Visualize out-of-the-box engineering metrics in a series of use-case
driven dashboards
● Easily extend DevLake to support your data sources, metrics, and
dashboards with a flexible framework for data collection and ETL
● Out-of-the-box support for DORA metrics
Apache DevLake - Architecture
Demo
Photo from Pexels by Mateusz Walendzik
The Vision - Beyond DORA Metrics
The Vision - Service Catalog
How Spotify does Developer Productivity Engineering with Backstage
The Vision - Platform Engineering
● Provide engineers with the best developer experience
○ Use Backstage (DevClue) as a single pane of glass
● Get more comprehensive picture that include not only DORA metrics
○ Code quality metrics (incl security)
○ Production telemetry
○ Cost
● Ability to analyze productivity and quality from different angles
○ Teams vs services
The key to successful metrics implementation is not just to measure performance
but also to use these insights to drive continual learning and improvement

More Related Content

Similar to On the road to Engineering excellence

DevOps
DevOpsDevOps
Noor-Res
Noor-ResNoor-Res
Noor-Res
Noor Ahmed
 
English digital business 2.1.pptx
English digital business 2.1.pptxEnglish digital business 2.1.pptx
English digital business 2.1.pptx
Juanjo MARTINEZ PAGAN
 
Upgrade JDE Quicker, Faster, and More Predictable
Upgrade JDE Quicker, Faster, and More PredictableUpgrade JDE Quicker, Faster, and More Predictable
Upgrade JDE Quicker, Faster, and More Predictable
Terillium
 
Introduction to 5w’s of DevOps
Introduction to 5w’s of DevOpsIntroduction to 5w’s of DevOps
Introduction to 5w’s of DevOps
Cygnet Infotech
 
5 steps to Network Reliability Engineering and Automated Network Operations
5 steps to Network Reliability Engineering and Automated Network Operations5 steps to Network Reliability Engineering and Automated Network Operations
5 steps to Network Reliability Engineering and Automated Network Operations
James Kelly
 
Measuring Performance: See the Science of DevOps Measurement in Action
Measuring Performance: See the Science of DevOps Measurement in ActionMeasuring Performance: See the Science of DevOps Measurement in Action
Measuring Performance: See the Science of DevOps Measurement in Action
XebiaLabs
 
Measuring the Productivity of Your Engineering Organisation - the Good, the B...
Measuring the Productivity of Your Engineering Organisation - the Good, the B...Measuring the Productivity of Your Engineering Organisation - the Good, the B...
Measuring the Productivity of Your Engineering Organisation - the Good, the B...
Marin Dimitrov
 
Make A Stress Free Move To The Cloud: Application Modernization and Managemen...
Make A Stress Free Move To The Cloud: Application Modernization and Managemen...Make A Stress Free Move To The Cloud: Application Modernization and Managemen...
Make A Stress Free Move To The Cloud: Application Modernization and Managemen...
Dell World
 
A comprehensive hiring guide for test environment managers
A comprehensive hiring guide for test environment managersA comprehensive hiring guide for test environment managers
A comprehensive hiring guide for test environment managers
Enov8
 
Deepesh_Rai_Resume_Latest
Deepesh_Rai_Resume_LatestDeepesh_Rai_Resume_Latest
Deepesh_Rai_Resume_Latest
Deepesh Rai
 
Measuring Performance: See the Science of DevOps Measurement in Action
Measuring Performance: See the Science of DevOps Measurement in ActionMeasuring Performance: See the Science of DevOps Measurement in Action
Measuring Performance: See the Science of DevOps Measurement in Action
XebiaLabs
 
Transition to a modern data platform
Transition to a modern data platform Transition to a modern data platform
Transition to a modern data platform
Michael Ghen
 
nitesh_rajpurkar_2016
nitesh_rajpurkar_2016nitesh_rajpurkar_2016
nitesh_rajpurkar_2016
Nitesh Rajpurkar
 
Kajul verma-Product Implementation Engineer_4 years
Kajul verma-Product Implementation Engineer_4 yearsKajul verma-Product Implementation Engineer_4 years
Kajul verma-Product Implementation Engineer_4 years
KAJUL VERMA
 
Rajmohan_CV _Updated
Rajmohan_CV _UpdatedRajmohan_CV _Updated
Rajmohan_CV _Updated
Rajmohan A
 
Software Project management
Software Project managementSoftware Project management
Data Engineer's Lunch #68: DevOps Fundamentals
Data Engineer's Lunch #68: DevOps FundamentalsData Engineer's Lunch #68: DevOps Fundamentals
Data Engineer's Lunch #68: DevOps Fundamentals
Anant Corporation
 
introduction_to_it_indusry_verticals.pdf
introduction_to_it_indusry_verticals.pdfintroduction_to_it_indusry_verticals.pdf
introduction_to_it_indusry_verticals.pdf
ANSHTYAGI33
 
Balancing PM & Software Development Practices by Splunk Sr PM
Balancing PM & Software Development Practices by Splunk Sr PMBalancing PM & Software Development Practices by Splunk Sr PM
Balancing PM & Software Development Practices by Splunk Sr PM
Product School
 

Similar to On the road to Engineering excellence (20)

DevOps
DevOpsDevOps
DevOps
 
Noor-Res
Noor-ResNoor-Res
Noor-Res
 
English digital business 2.1.pptx
English digital business 2.1.pptxEnglish digital business 2.1.pptx
English digital business 2.1.pptx
 
Upgrade JDE Quicker, Faster, and More Predictable
Upgrade JDE Quicker, Faster, and More PredictableUpgrade JDE Quicker, Faster, and More Predictable
Upgrade JDE Quicker, Faster, and More Predictable
 
Introduction to 5w’s of DevOps
Introduction to 5w’s of DevOpsIntroduction to 5w’s of DevOps
Introduction to 5w’s of DevOps
 
5 steps to Network Reliability Engineering and Automated Network Operations
5 steps to Network Reliability Engineering and Automated Network Operations5 steps to Network Reliability Engineering and Automated Network Operations
5 steps to Network Reliability Engineering and Automated Network Operations
 
Measuring Performance: See the Science of DevOps Measurement in Action
Measuring Performance: See the Science of DevOps Measurement in ActionMeasuring Performance: See the Science of DevOps Measurement in Action
Measuring Performance: See the Science of DevOps Measurement in Action
 
Measuring the Productivity of Your Engineering Organisation - the Good, the B...
Measuring the Productivity of Your Engineering Organisation - the Good, the B...Measuring the Productivity of Your Engineering Organisation - the Good, the B...
Measuring the Productivity of Your Engineering Organisation - the Good, the B...
 
Make A Stress Free Move To The Cloud: Application Modernization and Managemen...
Make A Stress Free Move To The Cloud: Application Modernization and Managemen...Make A Stress Free Move To The Cloud: Application Modernization and Managemen...
Make A Stress Free Move To The Cloud: Application Modernization and Managemen...
 
A comprehensive hiring guide for test environment managers
A comprehensive hiring guide for test environment managersA comprehensive hiring guide for test environment managers
A comprehensive hiring guide for test environment managers
 
Deepesh_Rai_Resume_Latest
Deepesh_Rai_Resume_LatestDeepesh_Rai_Resume_Latest
Deepesh_Rai_Resume_Latest
 
Measuring Performance: See the Science of DevOps Measurement in Action
Measuring Performance: See the Science of DevOps Measurement in ActionMeasuring Performance: See the Science of DevOps Measurement in Action
Measuring Performance: See the Science of DevOps Measurement in Action
 
Transition to a modern data platform
Transition to a modern data platform Transition to a modern data platform
Transition to a modern data platform
 
nitesh_rajpurkar_2016
nitesh_rajpurkar_2016nitesh_rajpurkar_2016
nitesh_rajpurkar_2016
 
Kajul verma-Product Implementation Engineer_4 years
Kajul verma-Product Implementation Engineer_4 yearsKajul verma-Product Implementation Engineer_4 years
Kajul verma-Product Implementation Engineer_4 years
 
Rajmohan_CV _Updated
Rajmohan_CV _UpdatedRajmohan_CV _Updated
Rajmohan_CV _Updated
 
Software Project management
Software Project managementSoftware Project management
Software Project management
 
Data Engineer's Lunch #68: DevOps Fundamentals
Data Engineer's Lunch #68: DevOps FundamentalsData Engineer's Lunch #68: DevOps Fundamentals
Data Engineer's Lunch #68: DevOps Fundamentals
 
introduction_to_it_indusry_verticals.pdf
introduction_to_it_indusry_verticals.pdfintroduction_to_it_indusry_verticals.pdf
introduction_to_it_indusry_verticals.pdf
 
Balancing PM & Software Development Practices by Splunk Sr PM
Balancing PM & Software Development Practices by Splunk Sr PMBalancing PM & Software Development Practices by Splunk Sr PM
Balancing PM & Software Development Practices by Splunk Sr PM
 

Recently uploaded

Computational Engineering IITH Presentation
Computational Engineering IITH PresentationComputational Engineering IITH Presentation
Computational Engineering IITH Presentation
co23btech11018
 
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
IJECEIAES
 
Curve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods RegressionCurve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods Regression
Nada Hikmah
 
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
insn4465
 
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Sinan KOZAK
 
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
shadow0702a
 
Design and optimization of ion propulsion drone
Design and optimization of ion propulsion droneDesign and optimization of ion propulsion drone
Design and optimization of ion propulsion drone
bjmsejournal
 
Rainfall intensity duration frequency curve statistical analysis and modeling...
Rainfall intensity duration frequency curve statistical analysis and modeling...Rainfall intensity duration frequency curve statistical analysis and modeling...
Rainfall intensity duration frequency curve statistical analysis and modeling...
bijceesjournal
 
Data Control Language.pptx Data Control Language.pptx
Data Control Language.pptx Data Control Language.pptxData Control Language.pptx Data Control Language.pptx
Data Control Language.pptx Data Control Language.pptx
ramrag33
 
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
ecqow
 
integral complex analysis chapter 06 .pdf
integral complex analysis chapter 06 .pdfintegral complex analysis chapter 06 .pdf
integral complex analysis chapter 06 .pdf
gaafergoudaay7aga
 
Null Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAMNull Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAM
Divyanshu
 
Engineering Drawings Lecture Detail Drawings 2014.pdf
Engineering Drawings Lecture Detail Drawings 2014.pdfEngineering Drawings Lecture Detail Drawings 2014.pdf
Engineering Drawings Lecture Detail Drawings 2014.pdf
abbyasa1014
 
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURSCompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
RamonNovais6
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
IJECEIAES
 
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
Yasser Mahgoub
 
BRAIN TUMOR DETECTION for seminar ppt.pdf
BRAIN TUMOR DETECTION for seminar ppt.pdfBRAIN TUMOR DETECTION for seminar ppt.pdf
BRAIN TUMOR DETECTION for seminar ppt.pdf
LAXMAREDDY22
 
Seminar on Distillation study-mafia.pptx
Seminar on Distillation study-mafia.pptxSeminar on Distillation study-mafia.pptx
Seminar on Distillation study-mafia.pptx
Madan Karki
 
Software Quality Assurance-se412-v11.ppt
Software Quality Assurance-se412-v11.pptSoftware Quality Assurance-se412-v11.ppt
Software Quality Assurance-se412-v11.ppt
TaghreedAltamimi
 
Software Engineering and Project Management - Introduction, Modeling Concepts...
Software Engineering and Project Management - Introduction, Modeling Concepts...Software Engineering and Project Management - Introduction, Modeling Concepts...
Software Engineering and Project Management - Introduction, Modeling Concepts...
Prakhyath Rai
 

Recently uploaded (20)

Computational Engineering IITH Presentation
Computational Engineering IITH PresentationComputational Engineering IITH Presentation
Computational Engineering IITH Presentation
 
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
 
Curve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods RegressionCurve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods Regression
 
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
 
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
 
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
 
Design and optimization of ion propulsion drone
Design and optimization of ion propulsion droneDesign and optimization of ion propulsion drone
Design and optimization of ion propulsion drone
 
Rainfall intensity duration frequency curve statistical analysis and modeling...
Rainfall intensity duration frequency curve statistical analysis and modeling...Rainfall intensity duration frequency curve statistical analysis and modeling...
Rainfall intensity duration frequency curve statistical analysis and modeling...
 
Data Control Language.pptx Data Control Language.pptx
Data Control Language.pptx Data Control Language.pptxData Control Language.pptx Data Control Language.pptx
Data Control Language.pptx Data Control Language.pptx
 
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
 
integral complex analysis chapter 06 .pdf
integral complex analysis chapter 06 .pdfintegral complex analysis chapter 06 .pdf
integral complex analysis chapter 06 .pdf
 
Null Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAMNull Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAM
 
Engineering Drawings Lecture Detail Drawings 2014.pdf
Engineering Drawings Lecture Detail Drawings 2014.pdfEngineering Drawings Lecture Detail Drawings 2014.pdf
Engineering Drawings Lecture Detail Drawings 2014.pdf
 
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURSCompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
 
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
 
BRAIN TUMOR DETECTION for seminar ppt.pdf
BRAIN TUMOR DETECTION for seminar ppt.pdfBRAIN TUMOR DETECTION for seminar ppt.pdf
BRAIN TUMOR DETECTION for seminar ppt.pdf
 
Seminar on Distillation study-mafia.pptx
Seminar on Distillation study-mafia.pptxSeminar on Distillation study-mafia.pptx
Seminar on Distillation study-mafia.pptx
 
Software Quality Assurance-se412-v11.ppt
Software Quality Assurance-se412-v11.pptSoftware Quality Assurance-se412-v11.ppt
Software Quality Assurance-se412-v11.ppt
 
Software Engineering and Project Management - Introduction, Modeling Concepts...
Software Engineering and Project Management - Introduction, Modeling Concepts...Software Engineering and Project Management - Introduction, Modeling Concepts...
Software Engineering and Project Management - Introduction, Modeling Concepts...
 

On the road to Engineering excellence

  • 1. On the road to Engineering excellence Alex Mrynskyi Palo Alto Networks
  • 3. Our Goals ● Boost developer experience and productivity ● Be able to drive innovation in times of uncertainty ● Become a top performing organization The ultimate business goal – Creating Value to the Customer! In order to know what is a “Value” to a customer, we need to keep experimenting. And our process should support the followings ● Faster feedback loop ● Quick decision making ● Fail fast & learn fast
  • 4. What is a top performing organization?
  • 5. DORA - Deployment frequency Humanitec - DevOps Benchmarking Study 2023
  • 6. DORA - Lead Time Humanitec - DevOps Benchmarking Study 2023
  • 7. DORA - Mean Time to Recovery (MTTR) Humanitec - DevOps Benchmarking Study 2023
  • 8. DORA - Change Failure Rate Humanitec - DevOps Benchmarking Study 2023
  • 9. What else? ● Deployment ○ Reliance on Ops to deploy features might indicate lower performance. Close to 90% of top performing teams feel confident deploying independently ● Provisioning infrastructure and managed services ○ Low performing teams disproportionately rely on Ops to provision on a case-by- case basis ● Standardization ○ 82.19% of top performing teams manage their app config in a standardized way for all apps ● Infrastructure configuration management ○ 100% of top performing teams store their infrastructure config in a VCS ● Degree of self-service ○ 83.6% of top performing teams, developers are able to create preview environments on the fly Humanitec - DevOps Benchmarking Study 2023
  • 10. Challenges to implement DORA ● Cultural Resistance ○ Implementing DORA metrics often requires a significant shift in company culture. Teams may resist the change because it disrupts familiar routines or they fear being judged by the metrics. It requires strong leadership and buy-in from all team members to overcome this resistance. ● Lack of Tooling ○ To accurately measure DORA metrics, you need tools to track deployments, changes, failures, and recovery times. If these tools aren't in place, or if they can't integrate with each other, it can be difficult to collect accurate data. ● Data Quality ○ The value of the metrics depends on the quality of the data being collected. If the data isn't accurate or complete, it will skew the metrics and lead to incorrect conclusions. ● Interpreting the Data ○ Once you have the data, interpreting it can be a challenge. Without an understanding of what the metrics mean and how they interact, it's hard to draw meaningful conclusions or make informed decisions. ● Misuse of Metrics ○ Metrics can be misused, leading to negative behaviors. For example, if the goal is to maximize deployment frequency, teams might deploy changes that aren't valuable just to boost their numbers. It's important to understand the context and use the metrics as a guide, not a strict rule. ● Lack of Standardization ○ Organizations may struggle to standardize the way they measure and report on DORA metrics. If different teams or departments use different tools or methods to collect data, it can lead to inconsistencies and make it difficult to compare performance.
  • 12. Our Journey - 2 years ago ● Scheduled releases (monthly -> bi-weekly) ○ deployed to AWS EC2 instances as debian packages ○ executed by Ops team ● Minimal observability for services in production ● Lack of standardized and reusable components and practices ○ Many different ways to manage configuration, secrets, telemetry ○ Some projects are not updated for many years ● Manual infrastructure deployment ○ deploying a new region was a major challenge that taking months
  • 13. Our Journey - New Platform ● Monorepo with 50+ services with multiple deployments to production per day across 5 regions ○ Quality checks from day 1 (Sonar, Code Style, Security tools) ○ Deploy time <15 min with parallel builds ● Highly standardized services based on a new architecture ○ Deployed to Kubernetes using Helm ○ Build-in telemetry (metrics, structured logging, tracing) ● Every merge to master automatically deployed to production ○ 2700+ Unit tests, Integration tests, Contract tests with > 90% coverage ○ multiple regressions were blocked by atomic deployments & E2E tests ● > 50 production deployments last 2 weeks ○ Mean time to merge PR - 2 days
  • 14. Our Journey - Legacy Services ● Modernized ~80% of legacy services ○ Containerization, deploy to k8s, integrated telemetry ● Unified CI/CD is integrated into 10 repos ○ adapted new platform process including all quality tools, deployment pipelines and e2e tests for acceptance
  • 15. Our Journey - GitLab
  • 16. Our Journey - Infrastructure ● Terraform monorepo for 80+ services with 200+ terraform state files ● Provide Service Kit to enable developers own complete infrastructure dependencies for their service ● All terraform operations are automated via Atlantis and multi-region environment changes happens in parallel reducing operation time from couple of hrs to mins
  • 17. Tooling Photo from Pexels by Kim Stiver
  • 18. Why is it difficult to measure productivity? ● Engineering is a complex and creative task and measuring the productivity of any knowledge worker is generally a hard problem ● Different tools and practices (e.g. Monorepos vs polyrepos) ● Complex dependencies between services and infrastructure ● Multiple non-functional requirements (architecture, security, FIPS, etc) ● Data is scattered across multiple tools
  • 20. Apache DevLake Apache DevLake is an open-source dev data platform that ingests, analyzes, and visualizes the fragmented data from DevOps tools to extract insights for engineering excellence, developer experience, and community growth. ● Collect DevOps data across the entire Software Development Life Cycle (SDLC) and connect the siloed data with a standard data model. ● Visualize out-of-the-box engineering metrics in a series of use-case driven dashboards ● Easily extend DevLake to support your data sources, metrics, and dashboards with a flexible framework for data collection and ETL ● Out-of-the-box support for DORA metrics
  • 21. Apache DevLake - Architecture
  • 22. Demo Photo from Pexels by Mateusz Walendzik
  • 23. The Vision - Beyond DORA Metrics
  • 24. The Vision - Service Catalog How Spotify does Developer Productivity Engineering with Backstage
  • 25. The Vision - Platform Engineering ● Provide engineers with the best developer experience ○ Use Backstage (DevClue) as a single pane of glass ● Get more comprehensive picture that include not only DORA metrics ○ Code quality metrics (incl security) ○ Production telemetry ○ Cost ● Ability to analyze productivity and quality from different angles ○ Teams vs services The key to successful metrics implementation is not just to measure performance but also to use these insights to drive continual learning and improvement

Editor's Notes

  1. Deployment frequency is a metric that tracks how frequently a development team successfully pushes updates into production. The key word in this definition is successful. A software development team that continually delivers broken updates or deployments is not good. That’s the truth, even if it hurts to hear. This metric is easy to track and very important. Deployment frequency is often the first place a development team may start to make changes. While deployment frequency will vary widely among industries and applications, high-performing teams deliver code for production and launch every day multiple times a week.
  2. The term lead time describes the time between initial code commitment to full deployment to production. When your team decides to implement a UI change, how long does this take to get into production? When your team implements a new security feature, how long does testing take before release? Lead time is measured from when a team starts working on a code change to the moment it is in the production environment. Lead time can be further broken down by looking at what stage of change development is taking the longest. Is your team spending the most time in development or testing?
  3. Mean Time to Recovery measures the time it takes to recover following an outage, service interruption, or product failure. This is measured from the initial moment of an outage until the incident team has recovered all services and operations. These events are unavoidable to a certain degree, although good management can significantly reduce the Mean Time Between Failure (MTBF). Because it’s impossible to avoid incidents completely, you need an incident plan that works. Slow recovery times can impact your organization in more than one way. Your customers will experience a prolonged outage and will view your team negatively for not being able to get the incident resolved. You may lose customers, and the reputation of your brand may be diminished. Additionally, management is less likely to move in an experimental direction if the team cannot keep up with the current, supposedly stable software.
  4. It’s great to have frequent deployments, but what’s the point if your team is constantly rolling back updates. Or even worse, if updates are causing incidents or outages. You should track all deployments that end up as incidents or get rolled back. This is known as the Change Failure Rate (CFR) and is measured as a percentage. By tracking Change Failure R ate, you learn how often your team is going back to fix earlier deployments. This alerts you to a quality breakdown somewhere in the code development or deployment process itself.