Successfully reported this slideshow.
Your SlideShare is downloading. ×

Operational Insight: Concepts and Examples (w/o Presenter Notes)

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Loading in …3
×

Check these out next

1 of 103 Ad

More Related Content

Slideshows for you (20)

Similar to Operational Insight: Concepts and Examples (w/o Presenter Notes) (20)

Advertisement

Recently uploaded (20)

Operational Insight: Concepts and Examples (w/o Presenter Notes)

  1. 1. Operational InsightJune 15, 2015 Roy Rapoport @royrapoport / linkedin.com/in/royrapoport / rrapoport@netflix.com
  2. 2. Oh, The Places We’ll Go!
  3. 3. John Boyd
  4. 4. Observe
  5. 5. Observe Orient
  6. 6. Observe Orient Decide
  7. 7. Observe Orient Decide Act
  8. 8. Observe Orient Decide Act OODA
  9. 9. Observe Orient Decide Act OODA “This approach favors agility over raw power in dealing with human opponents in any endeavor” - Wikipedia
  10. 10. This Is What We Do
  11. 11. OODA KPI
  12. 12. OODA KPI Speed
  13. 13. OODA KPI Speed Effort
  14. 14. OODA KPI Speed Effort Reliability
  15. 15. Winning Speed Effort Reliability
  16. 16. Winning Speed Effort Reliability
  17. 17. Winning Speed Effort Reliability
  18. 18. Winning Speed Effort Reliability
  19. 19. Implications … for Observation (aka measurement, telemetry, metrics)
  20. 20. Implications … for Observation (aka measurement, telemetry, metrics) • Make It Easy
  21. 21. Implications … for Observation (aka measurement, telemetry, metrics) • Make It Easy • Make It Scalable
  22. 22. Implications … for Observation (aka measurement, telemetry, metrics) • Make It Easy • Make It Scalable • Make it pluggable
  23. 23. Implications … for Observation (aka measurement, telemetry, metrics) • Make It Easy • Make It Scalable • Make it pluggable • (Eventually) Ruthlessly Cull
  24. 24. Implications … for Observation (aka measurement, telemetry, metrics) • Make It Easy • Make It Scalable • Make it pluggable • (Eventually) Ruthlessly Cull “What decision will this help me make?”
  25. 25. A Joke
  26. 26. 52 48
  27. 27. % of servers in major region with an even IP address
  28. 28. Implications … for Orientation (aka graphing, visualization)
  29. 29. Implications … for Orientation (aka graphing, visualization) • First-class product
  30. 30. Implications … for Orientation (aka graphing, visualization) • First-class product • Different decisions require different viz
  31. 31. Implications … for Orientation (aka graphing, visualization) • First-class product • Different decisions require different viz • Low cognitive load better than
  32. 32. Implications … for Orientation (aka graphing, visualization) • First-class product • Different decisions require different viz • Low cognitive load better than • High refresh rates
  33. 33. Implications … for Orientation (aka graphing, visualization) • First-class product • Different decisions require different viz • Low cognitive load better than • High refresh rates • Deep data density
  34. 34. Better Like This …
  35. 35. Or Better Like That …
  36. 36. Implications … for Decisions (aka alerting, real-time analytics, etc)
  37. 37. Implications … for Decisions (aka alerting, real-time analytics, etc) • You already have (some of) this
  38. 38. Implications … for Decisions (aka alerting, real-time analytics, etc) • You already have (some of) this • Incremental improvement
  39. 39. Implications … for Decisions (aka alerting, real-time analytics, etc) • You already have (some of) this • Incremental improvement • Sky’s the limit
  40. 40. Implications … for Decisions (aka alerting, real-time analytics, etc) • You already have (some of) this • Incremental improvement • Sky’s the limit • For benefits
  41. 41. Implications … for Decisions (aka alerting, real-time analytics, etc) • You already have (some of) this • Incremental improvement • Sky’s the limit • For benefits • For cost
  42. 42. Implications … for Action
  43. 43. Implications … for Action 1. Humans beat bureaucracy
  44. 44. Implications … for Action 1. Humans beat bureaucracy 2. Machines beat humans
  45. 45. Implications … for Action 1. Humans beat bureaucracy 2. Machines beat humans 3. Repeatability beats one-offs
  46. 46. Implications … for Action 1. Humans beat bureaucracy 2. Machines beat humans 3. Repeatability beats one-offs Repeatable machine processes TROUNCE one-off human bureaucracy
  47. 47. Implications … for Action 1. Humans beat bureaucracy 2. Machines beat humans 3. Repeatability beats one-offs 4. Start with humans Repeatable machine processes TROUNCE one-off human bureaucracy
  48. 48. Implications … for Action 1. Humans beat bureaucracy 2. Machines beat humans 3. Repeatability beats one-offs 4. Start with humans 5. If IFTTT, deprecate humans Repeatable machine processes TROUNCE one-off human bureaucracy
  49. 49. Decision: Do I Have Enough Instances?
  50. 50. Decision: Is My Canary Good?
  51. 51. 25
  52. 52. Been there. Done that. Manually.Artisanally. 25
  53. 53. Been there. • Started in the Data Center Done that. Manually.Artisanally. 25
  54. 54. Been there. • Started in the Data Center • Manual, dashboard-driven Done that. Manually.Artisanally. 25
  55. 55. Been there. Done that. Manually. 26 CPURequestsErrors
  56. 56. Been there. Done that. Manually. 27
  57. 57. Been there. Done that. Manually. • Context vs Precision 27
  58. 58. Been there. Done that. Manually. • Context vs Precision • No … 27
  59. 59. Been there. Done that. Manually. • Context vs Precision • No … • Repeatability 27
  60. 60. Been there. Done that. Manually. • Context vs Precision • No … • Repeatability • Trending 27
  61. 61. Been there. Done that. Manually. • Context vs Precision • No … • Repeatability • Trending • Manual effort is manual 27
  62. 62. So Now What? 28
  63. 63. So Now What? • Automate Analysis 28
  64. 64. So Now What? • Automate Analysis • Took Some Effort 28
  65. 65. So Now What? • Automate Analysis • Took Some Effort • Approach and analytics 28
  66. 66. So Now What? • Automate Analysis • Took Some Effort • Approach and analytics • Presentation matters 28
  67. 67. Version Control System 1000 servers @ 1.0.1 Customers Build & Deployment System Automated Canary Analysis Pretty Pictures 29
  68. 68. Version Control System 1000 servers @ 1.0.1 Customers Build & Deployment System 1 server @ 1.0.2 Automated Canary Analysis Pretty Pictures 29
  69. 69. 10 servers @ 1.0.2 Version Control System 1000 servers @ 1.0.1 Customers Build & Deployment System Automated Canary Analysis Pretty Pictures 29
  70. 70. 1000 servers @ 1.0.2 Version Control System 1000 servers @ 1.0.1 Customers Build & Deployment System Automated Canary Analysis Pretty Pictures 29
  71. 71. Versi on 1000 servers @ 1.0.1 Custome Build & Deployment Automat ed 1000 servers @ 1.0.2 Pretty Pictures 30 Version Control System Build & Deployment System Automated Canary Analysis Customers
  72. 72. Versi on Custome Build & Deployment Automat ed 1000 servers @ 1.0.2 Pretty Pictures 30 Version Control System Build & Deployment System Automated Canary Analysis Customers
  73. 73. Versi on 1000 servers @ 1.0.1 Custome Build & Deployment Automat ed 1000 servers @ 1.0.2 Pretty Pictures 31 Version Control System Build & Deployment System Automated Canary Analysis
  74. 74. Versi on 1000 servers @ 1.0.1 Custome Build & Deployment Automat ed 1000 servers @ 1.0.2 Pretty Pictures 31 Version Control System Build & Deployment System Automated Canary Analysis
  75. 75. Just The Stats 4-Week View
  76. 76. Just The Stats 4-Week View 6309 canary analysis cycles
  77. 77. Just The Stats 4-Week View 6309 canary analysis cycles 16% canaries failed
  78. 78. Decision: Do I Have an Outlier?
  79. 79. Outlier Detection
  80. 80. Would You Like to Play a Game?
  81. 81. Spot the Outlier
  82. 82. The Outlier Is “A”
  83. 83. Just The Stats 4-Week View
  84. 84. Just The Stats 4-Week View 739 Server Terminations
  85. 85. In a Nutshell Observe Orient Decide Act
  86. 86. In a Nutshell Observe Orient Decide Act Need This First http://bit.ly/nflx-atlas-2013 http://metrics20.org
  87. 87. In a Nutshell Observe Orient Decide Act Need This First http://bit.ly/nflx-atlas-2013 http://metrics20.org Understand the decision http://bit.ly/nflx-qcon-aca-2014
  88. 88. In a Nutshell Observe Orient Decide Act Need This First http://bit.ly/nflx-atlas-2013 http://metrics20.org Understand the decision http://bit.ly/nflx-qcon-aca-2014 Make it easier for humans
  89. 89. In a Nutshell Observe Orient Decide Act Need This First http://bit.ly/nflx-atlas-2013 http://metrics20.org Understand the decision http://bit.ly/nflx-qcon-aca-2014 Make it easier for humans Make machines
 do it
  90. 90. In a Nutshell Observe Orient Decide Act Need This First http://bit.ly/nflx-atlas-2013 http://metrics20.org Understand the decision http://bit.ly/nflx-qcon-aca-2014 Make it easier for humans Make machines
 do it Higher speed Lower effort Higher reliability
  91. 91. Questions, Attributions, Feedback 42
  92. 92. Questions, Attributions, Feedback @royrapoport rsr@netflix.com linkedin.com/in/royrapoport ?42

×