Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Operational InsightJune 15, 2015
Roy Rapoport
@royrapoport / linkedin.com/in/royrapoport / rrapoport@netflix.com
Oh, The Places
We’ll Go!
John Boyd
Observe
Observe
Orient
Observe
Orient
Decide
Observe
Orient
Decide
Act
Observe
Orient
Decide
Act
OODA
Observe
Orient
Decide
Act
OODA
“This approach favors agility over raw power in dealing with human
opponents in any endeavo...
This Is What We
Do
OODA KPI
OODA KPI
Speed
OODA KPI
Speed Effort
OODA KPI
Speed Effort Reliability
Winning
Speed Effort Reliability
Winning
Speed
Effort Reliability
Winning
Speed
Effort
Reliability
Winning
Speed
Effort
Reliability
Implications …
for Observation (aka measurement, telemetry, metrics)
Implications …
for Observation (aka measurement, telemetry, metrics)
• Make It Easy
Implications …
for Observation (aka measurement, telemetry, metrics)
• Make It Easy
• Make It Scalable
Implications …
for Observation (aka measurement, telemetry, metrics)
• Make It Easy
• Make It Scalable
• Make it pluggable
Implications …
for Observation (aka measurement, telemetry, metrics)
• Make It Easy
• Make It Scalable
• Make it pluggable...
Implications …
for Observation (aka measurement, telemetry, metrics)
• Make It Easy
• Make It Scalable
• Make it pluggable...
A Joke
52
48
% of servers in major region
with an even IP address
Implications …
for Orientation (aka graphing, visualization)
Implications …
for Orientation (aka graphing, visualization)
• First-class product
Implications …
for Orientation (aka graphing, visualization)
• First-class product
• Different decisions require different v...
Implications …
for Orientation (aka graphing, visualization)
• First-class product
• Different decisions require different v...
Implications …
for Orientation (aka graphing, visualization)
• First-class product
• Different decisions require different v...
Implications …
for Orientation (aka graphing, visualization)
• First-class product
• Different decisions require different v...
Better Like This …
Or Better Like That …
Implications …
for Decisions (aka alerting, real-time analytics, etc)
Implications …
for Decisions (aka alerting, real-time analytics, etc)
• You already have (some of) this
Implications …
for Decisions (aka alerting, real-time analytics, etc)
• You already have (some of) this
• Incremental impr...
Implications …
for Decisions (aka alerting, real-time analytics, etc)
• You already have (some of) this
• Incremental impr...
Implications …
for Decisions (aka alerting, real-time analytics, etc)
• You already have (some of) this
• Incremental impr...
Implications …
for Decisions (aka alerting, real-time analytics, etc)
• You already have (some of) this
• Incremental impr...
Implications …
for Action
Implications …
for Action
1. Humans beat bureaucracy
Implications …
for Action
1. Humans beat bureaucracy
2. Machines beat humans
Implications …
for Action
1. Humans beat bureaucracy
2. Machines beat humans
3. Repeatability beats one-offs
Implications …
for Action
1. Humans beat bureaucracy
2. Machines beat humans
3. Repeatability beats one-offs
Repeatable mac...
Implications …
for Action
1. Humans beat bureaucracy
2. Machines beat humans
3. Repeatability beats one-offs
4. Start with ...
Implications …
for Action
1. Humans beat bureaucracy
2. Machines beat humans
3. Repeatability beats one-offs
4. Start with ...
Decision:
Do I Have Enough
Instances?
Decision:
Is My Canary Good?
25
Been there.
Done that.
Manually.Artisanally.
25
Been there.
• Started in the Data Center
Done that.
Manually.Artisanally.
25
Been there.
• Started in the Data Center
• Manual, dashboard-driven
Done that.
Manually.Artisanally.
25
Been there.
Done that.
Manually.
26
CPURequestsErrors
Been there.
Done that.
Manually.
27
Been there.
Done that.
Manually.
• Context vs Precision
27
Been there.
Done that.
Manually.
• Context vs Precision
• No …
27
Been there.
Done that.
Manually.
• Context vs Precision
• No …
• Repeatability
27
Been there.
Done that.
Manually.
• Context vs Precision
• No …
• Repeatability
• Trending
27
Been there.
Done that.
Manually.
• Context vs Precision
• No …
• Repeatability
• Trending
• Manual effort is manual
27
So Now What?
28
So Now What?
• Automate Analysis
28
So Now What?
• Automate Analysis
• Took Some Effort
28
So Now What?
• Automate Analysis
• Took Some Effort
• Approach and analytics
28
So Now What?
• Automate Analysis
• Took Some Effort
• Approach and analytics
• Presentation matters
28
Version
Control
System
1000
servers
@ 1.0.1
Customers
Build &
Deployment
System
Automated
Canary
Analysis
Pretty Pictures
...
Version
Control
System
1000
servers
@ 1.0.1
Customers
Build &
Deployment
System
1 server
@ 1.0.2
Automated
Canary
Analysis...
10 servers
@ 1.0.2
Version
Control
System
1000
servers
@ 1.0.1
Customers
Build &
Deployment
System
Automated
Canary
Analys...
1000
servers
@ 1.0.2
Version
Control
System
1000
servers
@ 1.0.1
Customers
Build &
Deployment
System
Automated
Canary
Anal...
Versi
on
1000
servers
@ 1.0.1
Custome
Build &
Deployment
Automat
ed
1000
servers
@ 1.0.2
Pretty Pictures
30
Version
Contro...
Versi
on
Custome
Build &
Deployment
Automat
ed
1000
servers
@ 1.0.2
Pretty Pictures
30
Version
Control
System
Build &
Depl...
Versi
on
1000
servers
@ 1.0.1
Custome
Build &
Deployment
Automat
ed
1000
servers
@ 1.0.2
Pretty Pictures
31
Version
Contro...
Versi
on
1000
servers
@ 1.0.1
Custome
Build &
Deployment
Automat
ed
1000
servers
@ 1.0.2
Pretty Pictures
31
Version
Contro...
Just The Stats
4-Week View
Just The Stats
4-Week View
6309 canary analysis cycles
Just The Stats
4-Week View
6309 canary analysis cycles
16% canaries failed
Decision:
Do I Have an Outlier?
Outlier Detection
Would You Like to Play a
Game?
Spot the Outlier
The
Outlier Is
“A”
Just The Stats
4-Week View
Just The Stats
4-Week View
739 Server Terminations
In a Nutshell
Observe
Orient
Decide
Act
In a Nutshell
Observe
Orient
Decide
Act
Need This First
http://bit.ly/nflx-atlas-2013
http://metrics20.org
In a Nutshell
Observe
Orient
Decide
Act
Need This First
http://bit.ly/nflx-atlas-2013
http://metrics20.org
Understand the ...
In a Nutshell
Observe
Orient
Decide
Act
Need This First
http://bit.ly/nflx-atlas-2013
http://metrics20.org
Understand the ...
In a Nutshell
Observe
Orient
Decide
Act
Need This First
http://bit.ly/nflx-atlas-2013
http://metrics20.org
Understand the ...
In a Nutshell
Observe
Orient
Decide
Act
Need This First
http://bit.ly/nflx-atlas-2013
http://metrics20.org
Understand the ...
Questions, Attributions, Feedback
42
Questions, Attributions, Feedback
@royrapoport
rsr@netflix.com
linkedin.com/in/royrapoport
?42
Operational Insight: Concepts and Examples (w/o Presenter Notes)
Operational Insight: Concepts and Examples (w/o Presenter Notes)
Operational Insight: Concepts and Examples (w/o Presenter Notes)
Operational Insight: Concepts and Examples (w/o Presenter Notes)
Operational Insight: Concepts and Examples (w/o Presenter Notes)
Operational Insight: Concepts and Examples (w/o Presenter Notes)
Operational Insight: Concepts and Examples (w/o Presenter Notes)
Operational Insight: Concepts and Examples (w/o Presenter Notes)
Operational Insight: Concepts and Examples (w/o Presenter Notes)
Operational Insight: Concepts and Examples (w/o Presenter Notes)
Operational Insight: Concepts and Examples (w/o Presenter Notes)
Upcoming SlideShare
Loading in …5
×

Operational Insight: Concepts and Examples (w/o Presenter Notes)

1,123 views

Published on

The 2015-06-15 Operational Insight presentation, without presenter notes (because the way Keynote handles presenter notes makes them dominate the presentation)

Published in: Technology
  • A version of this w/ presenter notes is at http://www.slideshare.net/royrapoport/operational-insight-concepts-and-examples
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Operational Insight: Concepts and Examples (w/o Presenter Notes)

  1. 1. Operational InsightJune 15, 2015 Roy Rapoport @royrapoport / linkedin.com/in/royrapoport / rrapoport@netflix.com
  2. 2. Oh, The Places We’ll Go!
  3. 3. John Boyd
  4. 4. Observe
  5. 5. Observe Orient
  6. 6. Observe Orient Decide
  7. 7. Observe Orient Decide Act
  8. 8. Observe Orient Decide Act OODA
  9. 9. Observe Orient Decide Act OODA “This approach favors agility over raw power in dealing with human opponents in any endeavor” - Wikipedia
  10. 10. This Is What We Do
  11. 11. OODA KPI
  12. 12. OODA KPI Speed
  13. 13. OODA KPI Speed Effort
  14. 14. OODA KPI Speed Effort Reliability
  15. 15. Winning Speed Effort Reliability
  16. 16. Winning Speed Effort Reliability
  17. 17. Winning Speed Effort Reliability
  18. 18. Winning Speed Effort Reliability
  19. 19. Implications … for Observation (aka measurement, telemetry, metrics)
  20. 20. Implications … for Observation (aka measurement, telemetry, metrics) • Make It Easy
  21. 21. Implications … for Observation (aka measurement, telemetry, metrics) • Make It Easy • Make It Scalable
  22. 22. Implications … for Observation (aka measurement, telemetry, metrics) • Make It Easy • Make It Scalable • Make it pluggable
  23. 23. Implications … for Observation (aka measurement, telemetry, metrics) • Make It Easy • Make It Scalable • Make it pluggable • (Eventually) Ruthlessly Cull
  24. 24. Implications … for Observation (aka measurement, telemetry, metrics) • Make It Easy • Make It Scalable • Make it pluggable • (Eventually) Ruthlessly Cull “What decision will this help me make?”
  25. 25. A Joke
  26. 26. 52 48
  27. 27. % of servers in major region with an even IP address
  28. 28. Implications … for Orientation (aka graphing, visualization)
  29. 29. Implications … for Orientation (aka graphing, visualization) • First-class product
  30. 30. Implications … for Orientation (aka graphing, visualization) • First-class product • Different decisions require different viz
  31. 31. Implications … for Orientation (aka graphing, visualization) • First-class product • Different decisions require different viz • Low cognitive load better than
  32. 32. Implications … for Orientation (aka graphing, visualization) • First-class product • Different decisions require different viz • Low cognitive load better than • High refresh rates
  33. 33. Implications … for Orientation (aka graphing, visualization) • First-class product • Different decisions require different viz • Low cognitive load better than • High refresh rates • Deep data density
  34. 34. Better Like This …
  35. 35. Or Better Like That …
  36. 36. Implications … for Decisions (aka alerting, real-time analytics, etc)
  37. 37. Implications … for Decisions (aka alerting, real-time analytics, etc) • You already have (some of) this
  38. 38. Implications … for Decisions (aka alerting, real-time analytics, etc) • You already have (some of) this • Incremental improvement
  39. 39. Implications … for Decisions (aka alerting, real-time analytics, etc) • You already have (some of) this • Incremental improvement • Sky’s the limit
  40. 40. Implications … for Decisions (aka alerting, real-time analytics, etc) • You already have (some of) this • Incremental improvement • Sky’s the limit • For benefits
  41. 41. Implications … for Decisions (aka alerting, real-time analytics, etc) • You already have (some of) this • Incremental improvement • Sky’s the limit • For benefits • For cost
  42. 42. Implications … for Action
  43. 43. Implications … for Action 1. Humans beat bureaucracy
  44. 44. Implications … for Action 1. Humans beat bureaucracy 2. Machines beat humans
  45. 45. Implications … for Action 1. Humans beat bureaucracy 2. Machines beat humans 3. Repeatability beats one-offs
  46. 46. Implications … for Action 1. Humans beat bureaucracy 2. Machines beat humans 3. Repeatability beats one-offs Repeatable machine processes TROUNCE one-off human bureaucracy
  47. 47. Implications … for Action 1. Humans beat bureaucracy 2. Machines beat humans 3. Repeatability beats one-offs 4. Start with humans Repeatable machine processes TROUNCE one-off human bureaucracy
  48. 48. Implications … for Action 1. Humans beat bureaucracy 2. Machines beat humans 3. Repeatability beats one-offs 4. Start with humans 5. If IFTTT, deprecate humans Repeatable machine processes TROUNCE one-off human bureaucracy
  49. 49. Decision: Do I Have Enough Instances?
  50. 50. Decision: Is My Canary Good?
  51. 51. 25
  52. 52. Been there. Done that. Manually.Artisanally. 25
  53. 53. Been there. • Started in the Data Center Done that. Manually.Artisanally. 25
  54. 54. Been there. • Started in the Data Center • Manual, dashboard-driven Done that. Manually.Artisanally. 25
  55. 55. Been there. Done that. Manually. 26 CPURequestsErrors
  56. 56. Been there. Done that. Manually. 27
  57. 57. Been there. Done that. Manually. • Context vs Precision 27
  58. 58. Been there. Done that. Manually. • Context vs Precision • No … 27
  59. 59. Been there. Done that. Manually. • Context vs Precision • No … • Repeatability 27
  60. 60. Been there. Done that. Manually. • Context vs Precision • No … • Repeatability • Trending 27
  61. 61. Been there. Done that. Manually. • Context vs Precision • No … • Repeatability • Trending • Manual effort is manual 27
  62. 62. So Now What? 28
  63. 63. So Now What? • Automate Analysis 28
  64. 64. So Now What? • Automate Analysis • Took Some Effort 28
  65. 65. So Now What? • Automate Analysis • Took Some Effort • Approach and analytics 28
  66. 66. So Now What? • Automate Analysis • Took Some Effort • Approach and analytics • Presentation matters 28
  67. 67. Version Control System 1000 servers @ 1.0.1 Customers Build & Deployment System Automated Canary Analysis Pretty Pictures 29
  68. 68. Version Control System 1000 servers @ 1.0.1 Customers Build & Deployment System 1 server @ 1.0.2 Automated Canary Analysis Pretty Pictures 29
  69. 69. 10 servers @ 1.0.2 Version Control System 1000 servers @ 1.0.1 Customers Build & Deployment System Automated Canary Analysis Pretty Pictures 29
  70. 70. 1000 servers @ 1.0.2 Version Control System 1000 servers @ 1.0.1 Customers Build & Deployment System Automated Canary Analysis Pretty Pictures 29
  71. 71. Versi on 1000 servers @ 1.0.1 Custome Build & Deployment Automat ed 1000 servers @ 1.0.2 Pretty Pictures 30 Version Control System Build & Deployment System Automated Canary Analysis Customers
  72. 72. Versi on Custome Build & Deployment Automat ed 1000 servers @ 1.0.2 Pretty Pictures 30 Version Control System Build & Deployment System Automated Canary Analysis Customers
  73. 73. Versi on 1000 servers @ 1.0.1 Custome Build & Deployment Automat ed 1000 servers @ 1.0.2 Pretty Pictures 31 Version Control System Build & Deployment System Automated Canary Analysis
  74. 74. Versi on 1000 servers @ 1.0.1 Custome Build & Deployment Automat ed 1000 servers @ 1.0.2 Pretty Pictures 31 Version Control System Build & Deployment System Automated Canary Analysis
  75. 75. Just The Stats 4-Week View
  76. 76. Just The Stats 4-Week View 6309 canary analysis cycles
  77. 77. Just The Stats 4-Week View 6309 canary analysis cycles 16% canaries failed
  78. 78. Decision: Do I Have an Outlier?
  79. 79. Outlier Detection
  80. 80. Would You Like to Play a Game?
  81. 81. Spot the Outlier
  82. 82. The Outlier Is “A”
  83. 83. Just The Stats 4-Week View
  84. 84. Just The Stats 4-Week View 739 Server Terminations
  85. 85. In a Nutshell Observe Orient Decide Act
  86. 86. In a Nutshell Observe Orient Decide Act Need This First http://bit.ly/nflx-atlas-2013 http://metrics20.org
  87. 87. In a Nutshell Observe Orient Decide Act Need This First http://bit.ly/nflx-atlas-2013 http://metrics20.org Understand the decision http://bit.ly/nflx-qcon-aca-2014
  88. 88. In a Nutshell Observe Orient Decide Act Need This First http://bit.ly/nflx-atlas-2013 http://metrics20.org Understand the decision http://bit.ly/nflx-qcon-aca-2014 Make it easier for humans
  89. 89. In a Nutshell Observe Orient Decide Act Need This First http://bit.ly/nflx-atlas-2013 http://metrics20.org Understand the decision http://bit.ly/nflx-qcon-aca-2014 Make it easier for humans Make machines
 do it
  90. 90. In a Nutshell Observe Orient Decide Act Need This First http://bit.ly/nflx-atlas-2013 http://metrics20.org Understand the decision http://bit.ly/nflx-qcon-aca-2014 Make it easier for humans Make machines
 do it Higher speed Lower effort Higher reliability
  91. 91. Questions, Attributions, Feedback 42
  92. 92. Questions, Attributions, Feedback @royrapoport rsr@netflix.com linkedin.com/in/royrapoport ?42

×