The document discusses how Netflix uses actionable metrics and a decentralized approach to enable effective decision-making. It emphasizes building flexible, scalable, and self-service systems for collecting, visualizing, and setting alerts on telemetry data. Thresholds for alerts are tuned over time based on historical data to reduce alert fatigue. The systems evolved to better support change management through improved visibility into changes and faster mean time to resolution, especially for issues caused by code deployments.
Genislab builds better products and faster go-to-market with Lean project man...
Cloud Tech III: Actionable Metrics
1. Actionable Metrics
Enabling Decision-Making in Netflix’s Decentralized
Environment
Cloud Tech III
October 6, 2012
Roy Rapoport
@royrapoport, rsr@netflix.com
Thursday, October 18, 12
2. Me
• Been in tech for about 20 years
• Systems engineering, networking, software
development, QA, release management
• Time at Netflix: 1195 days (3y:3m:1w)
• (Current) job at Netflix: Make things better
(Security Monkey, Python Platform, Central Alert Gateway, Breaking Stuff.. )
Thursday, October 18, 12
37. Threshold Tuning
• An Abbreviated History ...
Thursday, October 18, 12
38. Threshold Tuning
(in the beginning)
Some priests offer their prayers to alien creatures best left
forgotten. This ill-advised worship twists their minds in odd
ways. Overlords find these warped men useful due to the
unnatural powers they can channel. The dark priests most
favored by their strange gods have powerful protections, and
defeating one of them is sure to bring down a terrible curse
upon the victor.
- http://www.descentinthedark.com/_d_/dark_priests.php
Thursday, October 18, 12
39. Threshold Tuning
(in the beginning)
• Systems owned by IT
Some priests offer their prayers to alien creatures best left
forgotten. This ill-advised worship twists their minds in odd
ways. Overlords find these warped men useful due to the
unnatural powers they can channel. The dark priests most
favored by their strange gods have powerful protections, and
defeating one of them is sure to bring down a terrible curse
upon the victor.
- http://www.descentinthedark.com/_d_/dark_priests.php
Thursday, October 18, 12
40. Threshold Tuning
(in the beginning)
• Systems owned by IT
• Want an alert? Submit a ticket
Some priests offer their prayers to alien creatures best left
forgotten. This ill-advised worship twists their minds in odd
ways. Overlords find these warped men useful due to the
unnatural powers they can channel. The dark priests most
favored by their strange gods have powerful protections, and
defeating one of them is sure to bring down a terrible curse
upon the victor.
- http://www.descentinthedark.com/_d_/dark_priests.php
Thursday, October 18, 12
41. Threshold Tuning
(in the beginning)
• Systems owned by IT
• Want an alert? Submit a ticket
• Want to tune an alert? Submit a ticket
Some priests offer their prayers to alien creatures best left
forgotten. This ill-advised worship twists their minds in odd
ways. Overlords find these warped men useful due to the
unnatural powers they can channel. The dark priests most
favored by their strange gods have powerful protections, and
defeating one of them is sure to bring down a terrible curse
upon the victor.
- http://www.descentinthedark.com/_d_/dark_priests.php
Thursday, October 18, 12
43. Threshold Tuning
(It gets better)
• You get to configure your own threshold
Thursday, October 18, 12
44. Threshold Tuning
(It gets better)
• You get to configure your own threshold
• Freedom!
Thursday, October 18, 12
45. Threshold Tuning
(It gets better)
• You get to configure your own threshold
• Freedom!
• Also, you have to configure your own
thresholds
Thursday, October 18, 12
71. Chronos
• Rapidly Prototyped
• Adapters and reporters • Something happened
• Easy querying • ... X times in Y minutes
• Alarming
Thursday, October 18, 12
72. Chronos
• Rapidly Prototyped
• Adapters and reporters • Something happened
• Easy querying • ... X times in Y minutes
• Alarming • Something didn’t happen
Thursday, October 18, 12
73. Chronos
• Rapidly Prototyped
• Adapters and reporters
• Easy querying
• Alarming
• Medium volume
Thursday, October 18, 12
74. Chronos
• Rapidly Prototyped
• Adapters and reporters
• Easy querying
• Alarming
• Medium volume
• Recursive
• Recursive
Thursday, October 18, 12
76. End Result
• Massive decrease in change control tickets
Thursday, October 18, 12
77. End Result
• Massive decrease in change control tickets
• Not talking about SOX or PCI
Thursday, October 18, 12
78. End Result
• Massive decrease in change control tickets
• Not talking about SOX or PCI
• Better visibility into changes
Thursday, October 18, 12
79. End Result
• Massive decrease in change control tickets
• Not talking about SOX or PCI
• Better visibility into changes
• Decreased TTR
Thursday, October 18, 12
80. End Result
• Massive decrease in change control tickets
• Not talking about SOX or PCI
• Better visibility into changes
• Decreased TTR
• Especially for bad code deployments
Thursday, October 18, 12
81. End Result
• Massive decrease in change control tickets
• Not talking about SOX or PCI
• Better visibility into changes
• Decreased TTR
• Especially for bad code deployments
• You should do this
Thursday, October 18, 12
82. I Didn’t Mention
• End-to-end testing and alerting
• External availability and performance
• Open Connect
• Jobs
Thursday, October 18, 12