Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

PPB's Sensu Journey


Published on

In 2016 Paddy Power and Betfair, two gambling giants, merged to form PPB. Each company had its own monitoring baggage, but the SRE team was tasked with cleaning up and consolidating their toolsets. This Sensu Summit 2019 talk from Artur Malinowski and Killian McHale looks at their selection process, scoring and ultimately the decisions which led them to Sensu – which now monitors over 10,000 clients across the PPB estate.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

PPB's Sensu Journey

  1. 1. PPBs Sensu Journey Sensu Summit 2019
  2. 2. $ whoami Killian McHale Site Reliability Engineer at PPB Dublin, Ireland
  3. 3. $ whoishe Artur Malinowski Site Reliability Engineer at PPB London, UK
  4. 4. PPB??! Paddy Power + Betfair Merger of Paddy Power and Betfair 2016
  5. 5. 5PPBs Sensu Journey Betfair
  6. 6. 6PPBs Sensu Journey Paddy Power
  7. 7. 7PPBs Sensu Journey Paddy Power
  8. 8. 8PPBs Sensu Journey Paddy Power
  9. 9. 9PPBs Sensu Journey [CELLRANGE], 51% [CELLRANGE], 21% [CELLRANGE], 10% [CELLRANGE], 18% Market Product UK and Ireland UK&I, Europe, ROW Australia USA USA Sportsbook and Gaming Sportsbook, Exchange and Gaming Sportsbook Sportsbook and Daily- Fantasy-Sports Advanced Deposit Wagering (Tote) and Television broadcast Channel Online and Retail Online Online Online and Retail Online …plus a growing B2B portfolio… Brand Revenue Mix1 Georgia, Armenia Sportsbook and Gaming Online
  10. 10. The Before When two stacks collide…
  11. 11. 11PPBs Sensu Journey
  12. 12. The Selection Choosing the best tools for a new generation…
  13. 13. 13PPBs Sensu Journey The Approach
  14. 14. 14PPBs Sensu Journey Plan
  15. 15. 15PPBs Sensu Journey Requirements • Metric Collection • Documentation • User Interface • Metric Graphing • Updates/Regularity of Updates • Features • Performance • Stability • Time & Effort • Scaling • DR • Interoperability / API • API Completeness
  16. 16. 16PPBs Sensu Journey Test Environment • Scope of Environment • Hypervisors • VMs • Network devices • Storage • Subset of applications • Design Environment to test each solution in a consistent manner
  17. 17. 17PPBs Sensu Journey And the short list is…
  18. 18. 18PPBs Sensu Journey Rating • Each solution score against these requirements • Maximum/Perfect score ~240
  19. 19. 19PPBs Sensu Journey And the Winner is… [DRAMATIC PAUSE]
  20. 20. 20PPBs Sensu Journey And the Winner is…
  21. 21. 21PPBs Sensu Journey Wait? What!? • Are these guys at the wrong conference!? • Purely based on our scoring Zenoss won • Sensu came third!? • Why are we here?
  22. 22. 22Presentation or section title 181 175 140 168 198 0 50 100 150 200 250 Nagios OMD Sensu Bosun Prometheus Zenoss Score Results
  23. 23. 23PPBs Sensu Journey Looking Deeper • Zenoss • API • Complexity • Nagios OMD • API • Updates
  24. 24. 24PPBs Sensu Journey And the Winners are… +
  25. 25. The After
  26. 26. 26PPBs Sensu Journey Current Implementation Sensu Self-Service: - Why Self-Service ? - Design - Plans Sensu management: - Detecting Silence Checks - Detecting machines without client - Detecting client versions
  27. 27. 27PPBs Sensu Journey • Sensu client is running on each machine • The Sensu client knows what to do via information from SUBSCRIPTIONS Sensu's design
  28. 28. 28PPBs Sensu Journey • Minimize wait times • Owners know their hosts best • Satisfy customers • Fewer resources to manage Why Self-Service ? No, we are not lazy - or at least this is not the only reason!!!
  29. 29. 29PPBs Sensu Journey • We are keeping all our subscriptions in our gitlab repo • All subscriptions are automatically deployed to correct Sensu instances after uploading • Changes are expected to be reflected in Sensu within few minutes So how is it self-service ?
  30. 30. 30PPBs Sensu Journey Detect Changes (Merge Requests)
  31. 31. 31PPBs Sensu Journey • Next step will be creating fully automatic pipeline which will check merge requests and, if approved, change will be automatically merged • Do you want to make change at 3 AM because of <reason/-s> – Sure why not :) Plans Next ?
  32. 32. 32PPBs Sensu Journey Fully automatic merge request process
  33. 33. 33PPBs Sensu Journey • The Sensu Audit connects to the Sensu API to retrieve information on all Sensu alerting. • It tracks silenced sensu alerts and invalid sensu configurations. • It inserts data into the splunk index every day. Sensu Audit
  34. 34. 34PPBs Sensu Journey Missing TLA's ( TLA's without Sensu) • Dashboard to identify missing basic checks (CPU/load, Mem, disk). • This is grouped by various ratings - good, bad and critical. Categorised by Business rated apps (Tier1-3>). • Clickable links that will allow users to drill down more details, links to Sensu UI and to allow users to visit configuration location.
  35. 35. 35PPBs Sensu Journey Shows counts, silenced /non silenced, events by criticality. Events by contact table displaying top callouts by team. Event Analysis
  36. 36. 36PPBs Sensu Journey • This dashboard gives information on all silenced checks by TLA and number of individual checks. • Information can be filtered by Business Criticality, Service, TLA, trend, support name and groups. • Useful breakdown based on risk counts by name and team. Finding silenced checks
  37. 37. 37PPBs Sensu Journey Sensu Client Versions • This Dashboard gives information about sensu client version, based on TLA or hosts
  38. 38. 38PPBs Sensu Journey • Easy to use • Very readable json format • Easy to join with other information • Good and well maintain documentation Conclusion – Sensu API is Powerful !!!
  39. 39. 39PPBs Sensu Journey • Sensu Enterprise - End of support March 31, 2020 • Investigation Future? Sensu GO ?
  40. 40. T h a n k y o u