Kanban India 2023 | Renjith Achuthanunni and Anoop Kadur Vijayakumar | DevOps Flow Metrics Transformation- Culture of kaizen in an Agile Product Org

DevOps Flow Metrics Transformation
Culture of Kaizen in an Agile Org
DECEMBER 01, 2023

2
Director, Agile CoE– Engineering
Epsilon India
Renjith Achuthanunni
Senior Project Analyst –
Engineering
Epsilon India
Anoop Kadur Vijayakumar
Speakers

3
What’s Coming up
45mins
Engineering System Flow
Product Dev Flow impacted – Case Study
The Journey Begins: Embrace Kaizen
Production Defect Lifecycle
Dashboard Demo
Navigate Challenges: Shifting Mindsets
Conclusion & Key Takeaways
Q&A

4
Do you Know ?
In a Product engineering team, what is the
prime driver for establishing a visual
dashboard?
 Align priorities across teams
 Root Cause Analysis
 Drive decision making
 All of this, but ..
‘Rear-View Mirror’ – establish transparent visual indicators

5
Engineering System Flow
 Ad-Tech domain
 25+ Scrum Teams
 2 Weeks Cadence
 Quarterly Product Roadmaps
 Sprint Commitment points
 Delivery Point as per DOD
 Flow Metrics (Delivery/Devops)
 Retrospectives

© 2023 Epsilon Data Management, LLC. All rights reserved. Proprietary and confidential.
Product Development Flow impact
C A S E S T U D Y
Predictability
90 -> 30 Days
Defect Backlog Age Defect Backlog Size
The Challenge
An engineering team working with a SaaS product over a period accumulates defects that is
buried deep within the backlog. Feature driven backlog at times tend to focus only on ‘High
Severity’ defects. No single ‘pane of glass’ view to measure & monitor product quality
The Solution
DevOps Metrics Dashboard – created a real-time, source-data backed visualization system
encapsulating ‘all’ the production induced disruptions worked by the Dev team in a ‘Single pane of
Glass’.
• Key Flow Measures
• Active Sprint level prioritization
• Missing ‘Rear-view Mirror’
• Ad Hoc Prioritization
• Customer Commits Missed, scope & quality
• No Lessons Learnt –RCAs captured
• Team & Executive level governance reporting
• Periodic Actions on SLA Breach trends
~ 30% ~50%
Total Defects SLA Breach 2023
Total Defects SLA Breach 2022
VS

7
Journey Begins
• MTTD, Open – Ageing Defects, Defect TTR,
Commit to Deploy Metrics
• RAG based Tiered Service Levels
• Visualize the Flow, Optimize WIP Bottlenecks, Pull
Policies
• Instrument Beta dashboards, Iterate the SLA
thresholds
• Inspect & Adapt
• Celebrate Success & Capture lessons learnt
1
• Identify Key Flow Metrics
2
• Tiered Service Levels
3
• Optimize Workflows
4
• Pilot Data Visualization
5
• Review & Refine
6
• Learn from Failures
Embracing Kaizen

8
Engineering Defect Lifecycle
Production Defect Lifecycle
Resolution
Cycle Time
Commit to
Deploy
Jira Ticket
Created
Triage/ Fix
Initiated
Resolved
Code pushed to
Production/Closed
Time to Resolve
(TTR)
Resolved

10
Total Defects Submitted (Count)

11
Total Open & Ageing Defects

12
Defect Time to Resolution (TTR)

13
Code Commit to Deploy (CTD)

14
Reporting Rigor
Measure &
Monitor
• Monitoring and feedback
system in place to aid
proactive decision making
Tiered
Report outs
• Strategic vs Tactical reports.
E.g.: Deploy CT
Trigger
Alerts &
Automated
Notifications
• Automated SLA Breach
notification system in place
using Power Automate
Defining SLA
Thresholds
• SLAs thresholds are defined
and coded into the
dashboard

15
Navigating Challenges
• Loss of Control
• Fear of Transparency
• Blame Game
• False sense of Autonomy
• Data sensitivity
• Inter-team dependencies
• Aligned Vision – Contextual
• Bite-sized changes
• Focus on Problem, not people (Why,
not WHO)
• Leadership Support
• Pilot Program – Iterate & Refine
• Celebrate Early Wins
Shifting Mindsets

16
Conclusion & Takeaways
No One Size Fits all
Product
Iterate & Improve
Visual Trends for
classes of services
Model Workflows
Contextual Flow
Metrics

18
Metrics Definition & RAG Threshold
Metrics Definition Red Amber Green
Mean Time to Detect
(MTTD)
Average time to detect S1/S2 Major Incidents
(outages)
> 60 mins 30 mins – 60 mins < 30 mins
Mean Time to Restore
(MTTR)
The average time required to fix a S1/S2 Major
Incidents (outages) and return the product to
acceptable production status
> 8 hours 4-8 hours < 4 hours
Defect Count
(Trend)
Absolute count of defects submitted
for time-period
(Open + Resolved)
>15% < 15%
< 5% increase from last 3 months
average
Open & Ageing Defects
(Trend)
Absolute count of all defects Open & Ageing
Sev1/2 >30 days
Sev3/4 >90 days
Sev1/2 15 – 30 days
Sev3/4 30-90 days
Sev1/2 < 15 days
Sev3/4 < 30 days
Defect TTR
(Trend)
Time taken between the Defect Created till it was
resolved.
Sev1/2 >30 days
Sev3/4 >90 days
Sev1/2 15 – 30 days
Sev3/4 30-90 days
Sev1/2 < 15 days
Sev3/4 < 30 days
Defect Resolution
Cycle Time
Helps to measure the time taken to resolve/fix a
defect once the team starts working on it. Sev1/2 > 7 Days
Sev3 > 30 Days
Sev4 > 90 Days
Sev1/2 2 - 7 Days
Sev3 14 - 30 Days
Sev4 30 – 90 Days
Sev1/2 < 2 Days
Sev3 < 14 Days
Sev4 < 30 Days
Defects Committed to
Deployed
Time elapsed since the Defect was Resolved until
deployed to Production

Kanban India 2023 | Renjith Achuthanunni and Anoop Kadur Vijayakumar | DevOps Flow Metrics Transformation- Culture of kaizen in an Agile Product Org

Recommended

Recommended

More Related Content

Similar to Kanban India 2023 | Renjith Achuthanunni and Anoop Kadur Vijayakumar | DevOps Flow Metrics Transformation- Culture of kaizen in an Agile Product Org

Similar to Kanban India 2023 | Renjith Achuthanunni and Anoop Kadur Vijayakumar | DevOps Flow Metrics Transformation- Culture of kaizen in an Agile Product Org (20)

More from LeanKanbanIndia

More from LeanKanbanIndia (20)

Recently uploaded

Recently uploaded (20)

Kanban India 2023 | Renjith Achuthanunni and Anoop Kadur Vijayakumar | DevOps Flow Metrics Transformation- Culture of kaizen in an Agile Product Org

Editor's Notes