Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Creating a Scalable Monitoring System That Everyone Will Love (Velocity Conf)

148 views

Published on

A year ago, my company's monitoring setup was a disaster! We had 6 different monitoring tools sending alerts all over the place. In this talk, I will share how we overhauled our entire monitoring system and created a single, centralized, easy to use system that fits all of our needs. Not only does it fit our needs, but because it is so simple to use, developers have bought into the system and are actively helping to improve it as well.

Published in: Engineering
  • Be the first to comment

  • Be the first to like this

Creating a Scalable Monitoring System That Everyone Will Love (Velocity Conf)

  1. 1. @molly_struve 1 Creating A Scalable Monitoring System That Everyone Will ❤ @ThePracticalDev | @molly_struve | dev.to/molly_struve
  2. 2. @molly_struve Monitoring Mistakes Overhauling the System The Payoff 2
  3. 3. @molly_struve Monitoring Mistakes 3 Overhauling the System The Payoff
  4. 4. @molly_struve 4
  5. 5. @molly_struve Monitoring Mistakes 5 Overhauling the System The Payoff
  6. 6. @molly_struve Monitoring Mistakes Overhauling the System 6 Overhauling the System The Payoff
  7. 7. @molly_struve Monitoring Mistakes Overhauling the System The Payoff 7
  8. 8. @molly_struve Monitoring Mistakes Overhauling the System The Payoff 8
  9. 9. @molly_struve Monitoring Mistakes Overhauling the System The Payoff 9
  10. 10. @molly_struve Monitoring Mistakes Overhauling the System The Payoff 10
  11. 11. @molly_struve Monitoring Mistakes Overhauling the System The Payoff 11
  12. 12. @molly_struve Monitoring Mistakes Overhauling the System The Payoff 12
  13. 13. @molly_struve Monitoring Mistakes Overhauling the System The Payoff 13
  14. 14. @molly_struve Monitoring Mistakes Overhauling the System The Payoff 14
  15. 15. @molly_struve Monitoring Mistakes Overhauling the System The Payoff 15
  16. 16. @molly_struve Monitoring Mistakes Overhauling the System The Payoff 16
  17. 17. @molly_struve Monitoring Mistakes Overhauling the System The Payoff 17
  18. 18. @molly_struve Monitoring Mistakes Overhauling the System The Payoff 18
  19. 19. @molly_struve Monitoring Mistakes Overhauling the System The Payoff 19
  20. 20. @molly_struve Monitoring Mistakes Overhauling the System The Payoff 20
  21. 21. @molly_struve Monitoring Mistakes Overhauling the System The Payoff 21
  22. 22. @molly_struve Monitoring Mistakes Overhauling the System The Payoff 22
  23. 23. @molly_struve Monitoring Mistakes Overhauling the System The Payoff 23
  24. 24. @molly_struve Monitoring Mistakes Overhauling the System The Payoff 24
  25. 25. @molly_struve Monitoring Mistakes Overhauling the System The Payoff 25
  26. 26. @molly_struve Monitoring Mistakes Overhauling the System The Payoff Incredibly Inconsistent 26
  27. 27. @molly_struve Monitoring Mistakes Overhauling the System The Payoff Inconsistent Alerts Required no actionReported data 27
  28. 28. @molly_struve Monitoring Mistakes Overhauling the System The Payoff Inconsistent Alerts Required no actionReported data Immediate action required 28
  29. 29. @molly_struve Monitoring Mistakes Overhauling the System The Payoff Manual Monitoring 29
  30. 30. @molly_struve Monitoring Mistakes Overhauling the System The Payoff Manual Monitoring 30
  31. 31. @molly_struve Monitoring Mistakes Overhauling the System The Payoff 31
  32. 32. @molly_struve Monitoring Mistakes Overhauling the System The Payoff 32
  33. 33. @molly_struve Monitoring Mistakes Overhauling the System The Payoff 33
  34. 34. @molly_struve Monitoring Mistakes Overhauling the System The Payoff 34
  35. 35. @molly_struve Monitoring Mistakes Overhauling the System The Payoff 35
  36. 36. @molly_struve make on-call devs miserable Monitoring Mistakes Overhauling the System The Payoff 36
  37. 37. @molly_struve Monitoring Mistakes Overhauling the System The Payoff 37
  38. 38. @molly_struve Monitoring Mistakes Overhauling the System The Payoff 38
  39. 39. @molly_struve Monitoring Mistakes Overhauling the System The Payoff 39
  40. 40. @molly_struve Coverage doesn’t matter if you have no idea what is going on! Monitoring Mistakes Overhauling the System The Payoff 40
  41. 41. @molly_struve Monitoring Mistakes Overhauling the System The Payoff 41
  42. 42. @molly_struve Monitoring Mistakes Overhauling the System The Payoff 42
  43. 43. @molly_struve Consolidate Monitoring To a Single Place Monitoring Must Haves 1 2 3 4 Monitoring Mistakes Overhauling the System The Payoff 43 5
  44. 44. @molly_struve Monitoring Mistakes Overhauling the System The Payoff " # $ % & 44
  45. 45. @molly_struve Monitoring Mistakes Overhauling the System The Payoff 45
  46. 46. @molly_struve Monitoring Mistakes Overhauling the System The Payoff 46
  47. 47. @molly_struve Monitoring Mistakes Overhauling the System The Payoff 47
  48. 48. @molly_struve Monitoring Mistakes Overhauling the System The Payoff 48
  49. 49. @molly_struve Monitoring Mistakes Overhauling the System The Payoff 49
  50. 50. @molly_struve Monitoring Mistakes Overhauling the System The Payoff 50
  51. 51. @molly_struve Monitoring Mistakes Overhauling the System The Payoff 51
  52. 52. @molly_struve Make ALL Alerts Actionable Consolidate Monitoring To a Single Place Monitoring Must Haves 1 2 3 4 Monitoring Mistakes Overhauling the System The Payoff 52 5
  53. 53. @molly_struve Monitoring Mistakes Overhauling the System The Payoff 53 Action
  54. 54. @molly_struve Monitoring Mistakes Overhauling the System The Payoff 54 Action
  55. 55. @molly_struve Monitoring Mistakes Overhauling the System The Payoff Action Required No action Needed 55
  56. 56. @molly_struve #ops_alerts Monitoring Mistakes Overhauling the System The Payoff #dev_alerts Action Required No action Needed 56
  57. 57. @molly_struve #ops_alerts Monitoring Mistakes Overhauling the System The Payoff #ops_reporting #dev_alerts Action Required No action Needed #dev_reporting 57
  58. 58. @molly_struve Make Sure Alerts Are Mutable Make ALL Alerts Actionable Consolidate Monitoring To a Single Place Monitoring Must Haves 1 2 3 4 Monitoring Mistakes Overhauling the System The Payoff 58 5
  59. 59. @molly_struve Make Sure Alerts Are Mutable Make ALL Alerts Actionable Consolidate Monitoring To a Single Place Monitoring Must Haves 1 2 3 4 Monitoring Mistakes Overhauling the System The Payoff 59 5
  60. 60. @molly_struve Monitoring Mistakes Overhauling the System The Payoff 30 60 90 minutes 60
  61. 61. @molly_struve Monitoring Mistakes Overhauling the System The Payoff 61
  62. 62. @molly_struve Monitoring Mistakes Overhauling the System The Payoff 62 Miss new alerts
  63. 63. @molly_struve Monitoring Mistakes Overhauling the System The Payoff 63
  64. 64. @molly_struve Monitoring Mistakes Overhauling the System The Payoff 64
  65. 65. @molly_struve Monitoring Mistakes Overhauling the System The Payoff 65
  66. 66. @molly_struve Monitoring Mistakes Overhauling the System The Payoff 66
  67. 67. @molly_struve Monitoring Mistakes Overhauling the System The Payoff 67
  68. 68. @molly_struve Track Alert History Make Sure Alerts Are Mutable Make ALL Alerts Actionable Consolidate Monitoring To a Single Place Monitoring Must Haves 1 2 3 4 Monitoring Mistakes Overhauling the System The Payoff 68 5
  69. 69. @molly_struve Tracking Alert History Monitoring Mistakes Overhauling the System The Payoff 69
  70. 70. @molly_struve Tracking Alert History Monitoring Mistakes Overhauling the System The Payoff 70
  71. 71. @molly_struve Tracking Alert History Monitoring Mistakes Overhauling the System The Payoff 71
  72. 72. @molly_struve Track Alert History Make Sure Alerts Are Mutable Make ALL Alerts Actionable Consolidate Monitoring To a Single Place Monitoring Must Haves 1 2 3 4 Monitoring Mistakes Overhauling the System The Payoff 72 5 Remove ALL Manual Monitoring
  73. 73. @molly_struve Monitoring Mistakes Overhauling the System The Payoff 73
  74. 74. @molly_struve Manual Monitoring 🙅( Scale Monitoring Mistakes Overhauling the System The Payoff 74
  75. 75. @molly_struve Monitoring Mistakes Overhauling the System The Payoff 75
  76. 76. @molly_struve Monitoring Mistakes Overhauling the System The Payoff 76 Automatic Alerts
  77. 77. @molly_struve Track Alert History Make Sure Alerts Are Mutable Make ALL Alerts Actionable Consolidate Monitoring To a Single Place Monitoring Must Haves 1 2 3 4 Monitoring Mistakes Overhauling the System The Payoff 77 5 Remove ALL Manual Monitoring
  78. 78. @molly_struve Monitoring Mistakes Overhauling the System The Payoff 78
  79. 79. @molly_struve Monitoring Mistakes Overhauling the System The Payoff 79
  80. 80. @molly_struve On boarding is a breeze Monitoring Mistakes Overhauling the System The Payoff 80
  81. 81. @molly_struve 3 On-boarding steps: 1 2 3 Monitoring Mistakes Overhauling the System The Payoff 81
  82. 82. @molly_struve Show them the monitoring setup1 2 3 Monitoring Mistakes Overhauling the System The Payoff 3 On-boarding steps: 82
  83. 83. @molly_struve Show them the monitoring setup1 2 3 Monitoring Mistakes Overhauling the System The Payoff If an alert goes off you have to address it 3 On-boarding steps: 83
  84. 84. @molly_struve Show them the monitoring setup1 2 3 Monitoring Mistakes Overhauling the System The Payoff How to mute a triggered alert 3 On-boarding steps: 84 If an alert goes off you have to address it
  85. 85. @molly_struve On boarding is a breeze Monitoring Mistakes Overhauling the System The Payoff 85
  86. 86. @molly_struve Monitoring Mistakes Overhauling the System The Payoff Happier on-call developers 86
  87. 87. @molly_struve Monitoring Mistakes Overhauling the System The Payoff All alerts must be actionable 87
  88. 88. @molly_struve Monitoring Mistakes Overhauling the System The Payoff 88
  89. 89. @molly_struve Monitoring Mistakes Overhauling the System The Payoff No more noise 89
  90. 90. @molly_struve Monitoring Mistakes Overhauling the System The Payoff 90
  91. 91. @molly_struve Monitoring Mistakes Overhauling the System The Payoff Alerts must be mutable 91
  92. 92. @molly_struve Monitoring Mistakes Overhauling the System The Payoff Alerts must be mutable 92
  93. 93. @molly_struve Monitoring Mistakes Overhauling the System The Payoff Diagnosing alerts is faster and easier 93
  94. 94. @molly_struve Tracking Alert History 94 Monitoring Mistakes Overhauling the System The Payoff
  95. 95. @molly_struve 95 Monitoring Mistakes Overhauling the System The Payoff High Priority Job Queue Backed Up
  96. 96. @molly_struve 96 Monitoring Mistakes Overhauling the System The Payoff Today? High Priority Job Queue Backed Up
  97. 97. @molly_struve 97 Monitoring Mistakes Overhauling the System The Payoff Today? Going on longer? High Priority Job Queue Backed Up
  98. 98. @molly_struve 98 Monitoring Mistakes Overhauling the System The Payoff Today? Going on longer? High Priority Job Queue Backed Up
  99. 99. @molly_struve 99 Monitoring Mistakes Overhauling the System The Payoff High Priority Job Queue Backed Up Today? Going on longer?Alert History
  100. 100. @molly_struve 100 Monitoring Mistakes Overhauling the System The Payoff
  101. 101. @molly_struve Developers began helping to improve our monitoring system Monitoring Mistakes Overhauling the System The Payoff 101
  102. 102. @molly_struve Monitoring Mistakes Overhauling the System The Payoff 102
  103. 103. @molly_struve Monitoring Mistakes Overhauling the System The Payoff All alerts must be actionable 103
  104. 104. @molly_struve 30-40 Alerts Monitoring Mistakes Overhauling the System The Payoff 104
  105. 105. @molly_struve >90 Alerts Monitoring Mistakes Overhauling the System The Payoff 105
  106. 106. @molly_struve Monitoring Mistakes Overhauling the System The Payoff 106 Basic Alerts
  107. 107. @molly_struve Monitoring Mistakes Overhauling the System The Payoff 107 More Granular Alerts Basic Alerts
  108. 108. @molly_struve Monitoring Mistakes Overhauling the System The Payoff 108 vs High MySQL load High MySQL load on db 1 for client 5
  109. 109. @molly_struve Developers ❤ monitoring system Monitoring Mistakes Overhauling the System The Payoff 109
  110. 110. @molly_struve Monitoring Must Have Benefits 1 2 3 Monitoring Mistakes Overhauling the System The Payoff 110 4
  111. 111. @molly_struve On boarding is a breeze.1 2 3 Monitoring Mistakes Overhauling the System The Payoff 111 4 Monitoring Must Have Benefits
  112. 112. @molly_struve On-call developers are a lot happier 1 2 3 Monitoring Mistakes Overhauling the System The Payoff On boarding is a breeze. 112 4 Monitoring Must Have Benefits
  113. 113. @molly_struve On-call developers are a lot happier 1 2 3 Monitoring Mistakes Overhauling the System The Payoff On boarding is a breeze. 113 4 Diagnosing alerts is faster and easier Monitoring Must Have Benefits
  114. 114. @molly_struve Developers helping to improve your monitoring systems 1 2 3 Monitoring Mistakes Overhauling the System The Payoff On boarding is a breeze. 114 On-call developers are a lot happier 4 Diagnosing alerts is faster and easier Monitoring Must Have Benefits
  115. 115. @molly_struve Track Alert History Make Sure Alerts Are Mutable Make ALL Alerts Actionable Consolidate Monitoring To a Single Place Monitoring Must Haves 1 2 3 4 115 5 Remove ALL Manual Monitoring @ThePracticalDev | @molly_struve | dev.to/molly_struve
  116. 116. @molly_struve ❤❤❤ 116 @ThePracticalDev | @molly_struve | dev.to/molly_struve
  117. 117. @molly_struve Questions? 117 @ThePracticalDev | @molly_struve | dev.to/molly_struve
  118. 118. Rate today ’s session Session page on conference website O’Reilly Events App

×