Kanban India 2023 | Dilip Mysore Devaraj | Efficient Deployment Flow with Dev...LeanKanbanIndia
More Related Content
Similar to Kanban India 2023 | Renjith Achuthanunni and Anoop Kadur Vijayakumar | DevOps Flow Metrics Transformation- Culture of kaizen in an Agile Product Org
Similar to Kanban India 2023 | Renjith Achuthanunni and Anoop Kadur Vijayakumar | DevOps Flow Metrics Transformation- Culture of kaizen in an Agile Product Org (20)
3. 3
What’s Coming up
45mins
Engineering System Flow
Product Dev Flow impacted – Case Study
The Journey Begins: Embrace Kaizen
Production Defect Lifecycle
Dashboard Demo
Navigate Challenges: Shifting Mindsets
Conclusion & Key Takeaways
Q&A
4. 4
Do you Know ?
In a Product engineering team, what is the
prime driver for establishing a visual
dashboard?
Align priorities across teams
Root Cause Analysis
Drive decision making
All of this, but ..
‘Rear-View Mirror’ – establish transparent visual indicators
5. 5
Engineering System Flow
Ad-Tech domain
25+ Scrum Teams
2 Weeks Cadence
Quarterly Product Roadmaps
Sprint Commitment points
Delivery Point as per DOD
Flow Metrics (Delivery/Devops)
Retrospectives
7. 7
Journey Begins
• MTTD, Open – Ageing Defects, Defect TTR,
Commit to Deploy Metrics
• RAG based Tiered Service Levels
• Visualize the Flow, Optimize WIP Bottlenecks, Pull
Policies
• Instrument Beta dashboards, Iterate the SLA
thresholds
• Inspect & Adapt
• Celebrate Success & Capture lessons learnt
1
• Identify Key Flow Metrics
2
• Tiered Service Levels
3
• Optimize Workflows
4
• Pilot Data Visualization
5
• Review & Refine
6
• Learn from Failures
Embracing Kaizen
8. 8
Engineering Defect Lifecycle
Production Defect Lifecycle
Resolution
Cycle Time
Commit to
Deploy
Jira Ticket
Created
Triage/ Fix
Initiated
Resolved
Code pushed to
Production/Closed
Time to Resolve
(TTR)
Resolved
14. 14
Reporting Rigor
Measure &
Monitor
• Monitoring and feedback
system in place to aid
proactive decision making
Tiered
Report outs
• Strategic vs Tactical reports.
E.g.: Deploy CT
Trigger
Alerts &
Automated
Notifications
• Automated SLA Breach
notification system in place
using Power Automate
Defining SLA
Thresholds
• SLAs thresholds are defined
and coded into the
dashboard
15. 15
Navigating Challenges
• Loss of Control
• Fear of Transparency
• Blame Game
• False sense of Autonomy
• Data sensitivity
• Inter-team dependencies
• Aligned Vision – Contextual
• Bite-sized changes
• Focus on Problem, not people (Why,
not WHO)
• Leadership Support
• Pilot Program – Iterate & Refine
• Celebrate Early Wins
Shifting Mindsets
16. 16
Conclusion & Takeaways
No One Size Fits all
Product
Iterate & Improve
Visual Trends for
classes of services
Model Workflows
Contextual Flow
Metrics
17.
18. 18
Metrics Definition & RAG Threshold
Metrics Definition Red Amber Green
Mean Time to Detect
(MTTD)
Average time to detect S1/S2 Major Incidents
(outages)
> 60 mins 30 mins – 60 mins < 30 mins
Mean Time to Restore
(MTTR)
The average time required to fix a S1/S2 Major
Incidents (outages) and return the product to
acceptable production status
> 8 hours 4-8 hours < 4 hours
Defect Count
(Trend)
Absolute count of defects submitted
for time-period
(Open + Resolved)
>15% < 15%
< 5% increase from last 3 months
average
Open & Ageing Defects
(Trend)
Absolute count of all defects Open & Ageing
Sev1/2 >30 days
Sev3/4 >90 days
Sev1/2 15 – 30 days
Sev3/4 30-90 days
Sev1/2 < 15 days
Sev3/4 < 30 days
Defect TTR
(Trend)
Time taken between the Defect Created till it was
resolved.
Sev1/2 >30 days
Sev3/4 >90 days
Sev1/2 15 – 30 days
Sev3/4 30-90 days
Sev1/2 < 15 days
Sev3/4 < 30 days
Defect Resolution
Cycle Time
Helps to measure the time taken to resolve/fix a
defect once the team starts working on it. Sev1/2 > 7 Days
Sev3 > 30 Days
Sev4 > 90 Days
Sev1/2 2 - 7 Days
Sev3 14 - 30 Days
Sev4 30 – 90 Days
Sev1/2 < 2 Days
Sev3 < 14 Days
Sev4 < 30 Days
Defects Committed to
Deployed
Time elapsed since the Defect was Resolved until
deployed to Production
Editor's Notes
Defect backlog pileup - running into triple digits
High inflow of Production Defects
Lack of visibility – Missing ‘Rear-view’ mirror
Unclear Alignment of Priority
Delivery Spillovers, as engineers are working on high priority ad hoc prod issues
Missing customer commits, both scope & schedule
Inability to identify trends and root causes for persistent quality issues
<Anoop – Graph screenshots>
Emphasize the importance of quality metrics in Kaizen
Quality metrics have enabled teams to identify areas for improvement, set clear objectives, make data-driven decisions, and continuously monitor progress.
Set realistic goals for a quarter on key Metrics (or slice of it)
Conversations with the team to align & commit
Operationalizing the Key Flow Metrics focused (Workflow analysis/Modification/Standardization)
Governance of measurements (Scoping/Service level Expectation)
It is crucial to ensure that measurement processes are well-defined, consistent, and aligned with organizational goals.
Effective governance, including scoping and service level expectations, ensures that our data collection and analysis efforts are well-structured, relevant, and aligned with organizational goals, ultimately leading to improved decision-making and performance management.
Reporting framework
Is important to ensure the reports are consistent, accurate, and are aligned with organizational goals.
It supports transparency, accountability, and effective decision-making while promoting compliance with regulatory standards and facilitating continuous improvement in reporting practices.
Instilling feedback mechanism to inspect and adapt (Reporting Automation/SLA real-time triggers)
Automating reporting process has assisted engineering teams save time, reduce errors, and make data-driven decisions more efficiently.
We have also setup real-time SLA triggers on our Power Bi reports, allowing us to monitor SLA performance, the stake holders receive alerts helping them make informed decisions to maintain service levels and meet customer expectations.
Share practical insights on how to measure and take informed decisions.
Measuring and making informed decisions is an ongoing process that requires a balance of quantitative and qualitative data, as well as a commitment to learning and adaptation. We have instilled a few practical steps that has increased the ability to make effective and informed decisions.
Weekly Report outs, and follow up conversations with the stakeholders
POs & SMs leading this with the team to review the dashboard and take informed decisions
Working with Support & Customer delegates on active prioritization & resolution of the highest priority
Flow metrics and the Ageing information driving decisions on Urgency
Aiding proactive decision making by means of segregating SLA breaches
Impact VS criticality
Provide examples of quality metrics that made a significant impact.
Commit to Deploy cycle-time & its comparison with Resolution cycle-time aided the teams to understand bottlenecks in the flow.