On the road to Engineering excellence

On the road to Engineering
excellence
Alex Mrynskyi
Palo Alto Networks

Our Goals
● Boost developer experience and productivity
● Be able to drive innovation in times of uncertainty
● Become a top performing organization
The ultimate business goal – Creating Value to the Customer!
In order to know what is a “Value” to a customer, we need to keep
experimenting. And our process should support the followings
● Faster feedback loop
● Quick decision making
● Fail fast & learn fast

What is a top performing
organization?

DORA - Deployment frequency
Humanitec - DevOps Benchmarking Study 2023

DORA - Lead Time

DORA - Mean Time to Recovery (MTTR)

DORA - Change Failure Rate

What else?
● Deployment
○ Reliance on Ops to deploy features might indicate lower performance. Close to
90% of top performing teams feel confident deploying independently
● Provisioning infrastructure and managed services
○ Low performing teams disproportionately rely on Ops to provision on a case-by-
case basis
● Standardization
○ 82.19% of top performing teams manage their app config in a standardized way
for all apps
● Infrastructure configuration management
○ 100% of top performing teams store their infrastructure config in a VCS
● Degree of self-service
○ 83.6% of top performing teams, developers are able to create preview
environments on the fly

Challenges to implement DORA
● Cultural Resistance
○ Implementing DORA metrics often requires a significant shift in company culture. Teams may resist the
change because it disrupts familiar routines or they fear being judged by the metrics. It requires strong
leadership and buy-in from all team members to overcome this resistance.
● Lack of Tooling
○ To accurately measure DORA metrics, you need tools to track deployments, changes, failures, and
recovery times. If these tools aren't in place, or if they can't integrate with each other, it can be difficult to
collect accurate data.
● Data Quality
○ The value of the metrics depends on the quality of the data being collected. If the data isn't accurate or
complete, it will skew the metrics and lead to incorrect conclusions.
● Interpreting the Data
○ Once you have the data, interpreting it can be a challenge. Without an understanding of what the metrics
mean and how they interact, it's hard to draw meaningful conclusions or make informed decisions.
● Misuse of Metrics
○ Metrics can be misused, leading to negative behaviors. For example, if the goal is to maximize deployment
frequency, teams might deploy changes that aren't valuable just to boost their numbers. It's important to
understand the context and use the metrics as a guide, not a strict rule.
● Lack of Standardization
○ Organizations may struggle to standardize the way they measure and report on DORA metrics. If different
teams or departments use different tools or methods to collect data, it can lead to inconsistencies and
make it difficult to compare performance.

Our Journey - 2 years ago
● Scheduled releases (monthly -> bi-weekly)
○ deployed to AWS EC2 instances as debian packages
○ executed by Ops team
● Minimal observability for services in production
● Lack of standardized and reusable components and practices
○ Many different ways to manage configuration, secrets, telemetry
○ Some projects are not updated for many years
● Manual infrastructure deployment
○ deploying a new region was a major challenge that taking months

Our Journey - New Platform
● Monorepo with 50+ services with multiple deployments to production
per day across 5 regions
○ Quality checks from day 1 (Sonar, Code Style, Security tools)
○ Deploy time <15 min with parallel builds
● Highly standardized services based on a new architecture
○ Deployed to Kubernetes using Helm
○ Build-in telemetry (metrics, structured logging, tracing)
● Every merge to master automatically deployed to production
○ 2700+ Unit tests, Integration tests, Contract tests with > 90% coverage
○ multiple regressions were blocked by atomic deployments & E2E tests
● > 50 production deployments last 2 weeks
○ Mean time to merge PR - 2 days

Our Journey - Legacy Services
● Modernized ~80% of legacy services
○ Containerization, deploy to k8s, integrated telemetry
● Unified CI/CD is integrated into 10 repos
○ adapted new platform process including all quality tools, deployment pipelines
and e2e tests for acceptance

Our Journey - Infrastructure
● Terraform monorepo for 80+ services with 200+ terraform state files
● Provide Service Kit to enable developers own complete infrastructure
dependencies for their service
● All terraform operations are automated via Atlantis and multi-region
environment changes happens in parallel reducing operation time
from couple of hrs to mins

Tooling
Photo from Pexels by Kim Stiver

Why is it difficult to measure productivity?
● Engineering is a complex and creative task and measuring the
productivity of any knowledge worker is generally a hard problem
● Different tools and practices (e.g. Monorepos vs polyrepos)
● Complex dependencies between services and infrastructure
● Multiple non-functional requirements (architecture, security, FIPS, etc)
● Data is scattered across multiple tools

Apache DevLake
Apache DevLake is an open-source dev data platform that ingests,
analyzes, and visualizes the fragmented data from DevOps tools to
extract insights for engineering excellence, developer experience, and
community growth.
● Collect DevOps data across the entire Software Development Life Cycle
(SDLC) and connect the siloed data with a standard data model.
● Visualize out-of-the-box engineering metrics in a series of use-case
driven dashboards
● Easily extend DevLake to support your data sources, metrics, and
dashboards with a flexible framework for data collection and ETL
● Out-of-the-box support for DORA metrics

Demo
Photo from Pexels by Mateusz Walendzik

The Vision - Beyond DORA Metrics

The Vision - Service Catalog
How Spotify does Developer Productivity Engineering with Backstage

The Vision - Platform Engineering
● Provide engineers with the best developer experience
○ Use Backstage (DevClue) as a single pane of glass
● Get more comprehensive picture that include not only DORA metrics
○ Code quality metrics (incl security)
○ Production telemetry
○ Cost
● Ability to analyze productivity and quality from different angles
○ Teams vs services
The key to successful metrics implementation is not just to measure performance
but also to use these insights to drive continual learning and improvement

On the road to Engineering excellence

Recommended

Recommended

More Related Content

Similar to On the road to Engineering excellence

Similar to On the road to Engineering excellence (20)

Recently uploaded

Recently uploaded (20)

On the road to Engineering excellence

Editor's Notes