3. Our Goals
● Boost developer experience and productivity
● Be able to drive innovation in times of uncertainty
● Become a top performing organization
The ultimate business goal – Creating Value to the Customer!
In order to know what is a “Value” to a customer, we need to keep
experimenting. And our process should support the followings
● Faster feedback loop
● Quick decision making
● Fail fast & learn fast
5. DORA - Deployment frequency
Humanitec - DevOps Benchmarking Study 2023
6. DORA - Lead Time
Humanitec - DevOps Benchmarking Study 2023
7. DORA - Mean Time to Recovery (MTTR)
Humanitec - DevOps Benchmarking Study 2023
8. DORA - Change Failure Rate
Humanitec - DevOps Benchmarking Study 2023
9. What else?
● Deployment
○ Reliance on Ops to deploy features might indicate lower performance. Close to
90% of top performing teams feel confident deploying independently
● Provisioning infrastructure and managed services
○ Low performing teams disproportionately rely on Ops to provision on a case-by-
case basis
● Standardization
○ 82.19% of top performing teams manage their app config in a standardized way
for all apps
● Infrastructure configuration management
○ 100% of top performing teams store their infrastructure config in a VCS
● Degree of self-service
○ 83.6% of top performing teams, developers are able to create preview
environments on the fly
Humanitec - DevOps Benchmarking Study 2023
10. Challenges to implement DORA
● Cultural Resistance
○ Implementing DORA metrics often requires a significant shift in company culture. Teams may resist the
change because it disrupts familiar routines or they fear being judged by the metrics. It requires strong
leadership and buy-in from all team members to overcome this resistance.
● Lack of Tooling
○ To accurately measure DORA metrics, you need tools to track deployments, changes, failures, and
recovery times. If these tools aren't in place, or if they can't integrate with each other, it can be difficult to
collect accurate data.
● Data Quality
○ The value of the metrics depends on the quality of the data being collected. If the data isn't accurate or
complete, it will skew the metrics and lead to incorrect conclusions.
● Interpreting the Data
○ Once you have the data, interpreting it can be a challenge. Without an understanding of what the metrics
mean and how they interact, it's hard to draw meaningful conclusions or make informed decisions.
● Misuse of Metrics
○ Metrics can be misused, leading to negative behaviors. For example, if the goal is to maximize deployment
frequency, teams might deploy changes that aren't valuable just to boost their numbers. It's important to
understand the context and use the metrics as a guide, not a strict rule.
● Lack of Standardization
○ Organizations may struggle to standardize the way they measure and report on DORA metrics. If different
teams or departments use different tools or methods to collect data, it can lead to inconsistencies and
make it difficult to compare performance.
12. Our Journey - 2 years ago
● Scheduled releases (monthly -> bi-weekly)
○ deployed to AWS EC2 instances as debian packages
○ executed by Ops team
● Minimal observability for services in production
● Lack of standardized and reusable components and practices
○ Many different ways to manage configuration, secrets, telemetry
○ Some projects are not updated for many years
● Manual infrastructure deployment
○ deploying a new region was a major challenge that taking months
13. Our Journey - New Platform
● Monorepo with 50+ services with multiple deployments to production
per day across 5 regions
○ Quality checks from day 1 (Sonar, Code Style, Security tools)
○ Deploy time <15 min with parallel builds
● Highly standardized services based on a new architecture
○ Deployed to Kubernetes using Helm
○ Build-in telemetry (metrics, structured logging, tracing)
● Every merge to master automatically deployed to production
○ 2700+ Unit tests, Integration tests, Contract tests with > 90% coverage
○ multiple regressions were blocked by atomic deployments & E2E tests
● > 50 production deployments last 2 weeks
○ Mean time to merge PR - 2 days
14. Our Journey - Legacy Services
● Modernized ~80% of legacy services
○ Containerization, deploy to k8s, integrated telemetry
● Unified CI/CD is integrated into 10 repos
○ adapted new platform process including all quality tools, deployment pipelines
and e2e tests for acceptance
16. Our Journey - Infrastructure
● Terraform monorepo for 80+ services with 200+ terraform state files
● Provide Service Kit to enable developers own complete infrastructure
dependencies for their service
● All terraform operations are automated via Atlantis and multi-region
environment changes happens in parallel reducing operation time
from couple of hrs to mins
18. Why is it difficult to measure productivity?
● Engineering is a complex and creative task and measuring the
productivity of any knowledge worker is generally a hard problem
● Different tools and practices (e.g. Monorepos vs polyrepos)
● Complex dependencies between services and infrastructure
● Multiple non-functional requirements (architecture, security, FIPS, etc)
● Data is scattered across multiple tools
20. Apache DevLake
Apache DevLake is an open-source dev data platform that ingests,
analyzes, and visualizes the fragmented data from DevOps tools to
extract insights for engineering excellence, developer experience, and
community growth.
● Collect DevOps data across the entire Software Development Life Cycle
(SDLC) and connect the siloed data with a standard data model.
● Visualize out-of-the-box engineering metrics in a series of use-case
driven dashboards
● Easily extend DevLake to support your data sources, metrics, and
dashboards with a flexible framework for data collection and ETL
● Out-of-the-box support for DORA metrics
24. The Vision - Service Catalog
How Spotify does Developer Productivity Engineering with Backstage
25. The Vision - Platform Engineering
● Provide engineers with the best developer experience
○ Use Backstage (DevClue) as a single pane of glass
● Get more comprehensive picture that include not only DORA metrics
○ Code quality metrics (incl security)
○ Production telemetry
○ Cost
● Ability to analyze productivity and quality from different angles
○ Teams vs services
The key to successful metrics implementation is not just to measure performance
but also to use these insights to drive continual learning and improvement
Editor's Notes
Deployment frequency is a metric that tracks how frequently a development team successfully pushes updates into production. The key word in this definition is successful. A software development team that continually delivers broken updates or deployments is not good. That’s the truth, even if it hurts to hear.
This metric is easy to track and very important. Deployment frequency is often the first place a development team may start to make changes. While deployment frequency will vary widely among industries and applications, high-performing teams deliver code for production and launch every day multiple times a week.
The term lead time describes the time between initial code commitment to full deployment to production. When your team decides to implement a UI change, how long does this take to get into production? When your team implements a new security feature, how long does testing take before release?
Lead time is measured from when a team starts working on a code change to the moment it is in the production environment. Lead time can be further broken down by looking at what stage of change development is taking the longest. Is your team spending the most time in development or testing?
Mean Time to Recovery measures the time it takes to recover following an outage, service interruption, or product failure.
This is measured from the initial moment of an outage until the incident team has recovered all services and operations. These events are unavoidable to a certain degree, although good management can significantly reduce the Mean Time Between Failure (MTBF). Because it’s impossible to avoid incidents completely, you need an incident plan that works.
Slow recovery times can impact your organization in more than one way. Your customers will experience a prolonged outage and will view your team negatively for not being able to get the incident resolved. You may lose customers, and the reputation of your brand may be diminished. Additionally, management is less likely to move in an experimental direction if the team cannot keep up with the current, supposedly stable software.
It’s great to have frequent deployments, but what’s the point if your team is constantly rolling back updates. Or even worse, if updates are causing incidents or outages. You should track all deployments that end up as incidents or get rolled back. This is known as the Change Failure Rate (CFR) and is measured as a percentage.
By tracking Change Failure R ate, you learn how often your team is going back to fix earlier deployments. This alerts you to a quality breakdown somewhere in the code development or deployment process itself.