SlideShare a Scribd company logo
1 of 40
PPBs Sensu Journey
Sensu Summit 2019
$ whoami
Killian McHale
Site Reliability Engineer at PPB
Dublin, Ireland
$ whoishe
Artur Malinowski
Site Reliability Engineer at PPB
London, UK
PPB??!
Paddy Power + Betfair
Merger of Paddy Power and Betfair 2016
5PPBs Sensu Journey
Betfair
6PPBs Sensu Journey
Paddy Power
7PPBs Sensu Journey
Paddy Power
8PPBs Sensu Journey
Paddy Power
9PPBs Sensu Journey
[CELLRANGE],
51%
[CELLRANGE],
21%
[CELLRANGE],
10%
[CELLRANGE],
18%
Market
Product
UK and Ireland UK&I, Europe, ROW Australia
USA
USA
Sportsbook and
Gaming
Sportsbook, Exchange
and Gaming
Sportsbook Sportsbook and Daily-
Fantasy-Sports
Advanced Deposit
Wagering (Tote) and
Television broadcast
Channel Online and Retail Online Online Online and Retail Online
…plus a growing B2B portfolio…
Brand
Revenue Mix1
Georgia, Armenia
Sportsbook and
Gaming
Online
The Before
When two stacks collide…
11PPBs Sensu Journey
The Selection
Choosing the best tools for a new generation…
13PPBs Sensu Journey
The Approach
14PPBs Sensu Journey
Plan
15PPBs Sensu Journey
Requirements
• Metric Collection
• Documentation
• User Interface
• Metric Graphing
• Updates/Regularity of Updates
• Features
• Performance
• Stability
• Time & Effort
• Scaling
• DR
• Interoperability / API
• API Completeness
16PPBs Sensu Journey
Test Environment
• Scope of Environment
• Hypervisors
• VMs
• Network devices
• Storage
• Subset of applications
• Design Environment to test each solution in a consistent manner
17PPBs Sensu Journey
And the short list is…
18PPBs Sensu Journey
Rating
• Each solution score against these requirements
• Maximum/Perfect score ~240
19PPBs Sensu Journey
And the Winner is…
[DRAMATIC PAUSE]
20PPBs Sensu Journey
And the Winner is…
21PPBs Sensu Journey
Wait? What!?
• Are these guys at the wrong conference!?
• Purely based on our scoring Zenoss won
• Sensu came third!?
• Why are we here?
22Presentation or section title
181
175
140
168
198
0
50
100
150
200
250
Nagios OMD Sensu Bosun Prometheus Zenoss
Score
Results
23PPBs Sensu Journey
Looking Deeper
• Zenoss
• API
• Complexity
• Nagios OMD
• API
• Updates
24PPBs Sensu Journey
And the Winners are…
+
The After
26PPBs Sensu Journey
Current Implementation
Sensu Self-Service:
- Why Self-Service ?
- Design
- Plans
Sensu management:
- Detecting Silence
Checks
- Detecting machines
without client
- Detecting client
versions
27PPBs Sensu Journey
• Sensu client is running on each
machine
• The Sensu client knows what
to do via information from
SUBSCRIPTIONS
Sensu's design
28PPBs Sensu Journey
• Minimize wait times
• Owners know their hosts best
• Satisfy customers
• Fewer resources to manage
Why Self-Service ? No, we are not lazy - or at least
this is not the only reason!!!
29PPBs Sensu Journey
• We are keeping all our
subscriptions in our gitlab repo
• All subscriptions are
automatically deployed to
correct Sensu instances after
uploading
• Changes are expected to be
reflected in Sensu within few
minutes
So how is it self-service ?
30PPBs Sensu Journey
Detect Changes (Merge Requests)
31PPBs Sensu Journey
• Next step will be creating
fully automatic pipeline
which will check merge
requests and, if approved,
change will be
automatically merged
• Do you want to make
change at 3 AM because of
<reason/-s> – Sure why
not :)
Plans
Next ?
32PPBs Sensu Journey
Fully automatic merge request process
33PPBs Sensu Journey
• The Sensu Audit connects to the
Sensu API to retrieve
information on all Sensu
alerting.
• It tracks silenced sensu alerts
and invalid sensu
configurations.
• It inserts data into the splunk
index every day.
Sensu Audit
34PPBs Sensu Journey
Missing TLA's ( TLA's without Sensu)
• Dashboard to identify
missing basic checks
(CPU/load, Mem, disk).
• This is grouped by various
ratings - good, bad and
critical. Categorised
by Business rated apps
(Tier1-3>).
• Clickable links that will
allow users to drill down
more details, links to
Sensu UI and to allow
users to visit configuration
location.
35PPBs Sensu Journey
Shows counts, silenced
/non silenced, events by
criticality. Events by
contact table displaying
top callouts by team.
Event Analysis
36PPBs Sensu Journey
• This dashboard gives
information on all
silenced checks by TLA
and number of
individual checks.
• Information can be
filtered by Business
Criticality, Service,
TLA, trend, support
name and groups.
• Useful breakdown
based on risk counts
by name and team.
Finding silenced checks
37PPBs Sensu Journey
Sensu Client Versions
• This Dashboard gives information about sensu client version, based on
TLA or hosts
38PPBs Sensu Journey
• Easy to use
• Very readable json format
• Easy to join with other
information
• Good and well
maintain documentation
Conclusion – Sensu API is Powerful !!!
39PPBs Sensu Journey
• Sensu Enterprise - End of
support March 31, 2020
• Investigation
Future? Sensu GO ?
T h a n k y o u

More Related Content

What's hot

Netflix Open Source Meetup Season 4 Episode 1
Netflix Open Source Meetup Season 4 Episode 1Netflix Open Source Meetup Season 4 Episode 1
Netflix Open Source Meetup Season 4 Episode 1aspyker
 
Netflix Open Source Meetup Season 3 Episode 2
Netflix Open Source Meetup Season 3 Episode 2Netflix Open Source Meetup Season 3 Episode 2
Netflix Open Source Meetup Season 3 Episode 2aspyker
 
SRECon16: Moving Large Workloads from a Public Cloud to an OpenStack Private ...
SRECon16: Moving Large Workloads from a Public Cloud to an OpenStack Private ...SRECon16: Moving Large Workloads from a Public Cloud to an OpenStack Private ...
SRECon16: Moving Large Workloads from a Public Cloud to an OpenStack Private ...Nicolas Brousse
 
Practical tips and tricks for Apache Kafka messages integration | Francesco T...
Practical tips and tricks for Apache Kafka messages integration | Francesco T...Practical tips and tricks for Apache Kafka messages integration | Francesco T...
Practical tips and tricks for Apache Kafka messages integration | Francesco T...HostedbyConfluent
 
Monitoring in a scalable world
Monitoring in a scalable worldMonitoring in a scalable world
Monitoring in a scalable worldTechExeter
 
Introduction to InfluxDB and TICK Stack
Introduction to InfluxDB and TICK StackIntroduction to InfluxDB and TICK Stack
Introduction to InfluxDB and TICK StackAhmed AbouZaid
 
Data(?)Ops with CircleCI
Data(?)Ops with CircleCIData(?)Ops with CircleCI
Data(?)Ops with CircleCIJinwoong Kim
 
InfluxDB and Grafana: An Introduction to Time-Based Data Storage and Visualiz...
InfluxDB and Grafana: An Introduction to Time-Based Data Storage and Visualiz...InfluxDB and Grafana: An Introduction to Time-Based Data Storage and Visualiz...
InfluxDB and Grafana: An Introduction to Time-Based Data Storage and Visualiz...Caner Ünal
 
Santa Cloud: How Netflix Does Holiday Capacity Planning - South Bay SRE Meetu...
Santa Cloud: How Netflix Does Holiday Capacity Planning - South Bay SRE Meetu...Santa Cloud: How Netflix Does Holiday Capacity Planning - South Bay SRE Meetu...
Santa Cloud: How Netflix Does Holiday Capacity Planning - South Bay SRE Meetu...Coburn Watson
 
[WSO2Con USA 2018] Deploying Applications in K8S and Docker
[WSO2Con USA 2018] Deploying Applications in K8S and Docker[WSO2Con USA 2018] Deploying Applications in K8S and Docker
[WSO2Con USA 2018] Deploying Applications in K8S and DockerWSO2
 
Running a Massively Parallel Self-serve Distributed Data System At Scale
Running a Massively Parallel Self-serve Distributed Data System At ScaleRunning a Massively Parallel Self-serve Distributed Data System At Scale
Running a Massively Parallel Self-serve Distributed Data System At ScaleZhenzhong Xu
 
5 lessons learned for successful migration to Confluent cloud | Natan Silinit...
5 lessons learned for successful migration to Confluent cloud | Natan Silinit...5 lessons learned for successful migration to Confluent cloud | Natan Silinit...
5 lessons learned for successful migration to Confluent cloud | Natan Silinit...HostedbyConfluent
 
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!confluent
 
Athena 0.2.0 - Nimble
Athena 0.2.0 - NimbleAthena 0.2.0 - Nimble
Athena 0.2.0 - NimbleNimble
 
Demystifying Event-Driven Architectures with Apache Kafka | Bogdan Sucaciu, P...
Demystifying Event-Driven Architectures with Apache Kafka | Bogdan Sucaciu, P...Demystifying Event-Driven Architectures with Apache Kafka | Bogdan Sucaciu, P...
Demystifying Event-Driven Architectures with Apache Kafka | Bogdan Sucaciu, P...HostedbyConfluent
 
Spotify's journey to GCP
Spotify's journey to GCPSpotify's journey to GCP
Spotify's journey to GCPAlexey Lapitsky
 
Season 7 Episode 1 - Tools for Data Scientists
Season 7 Episode 1 - Tools for Data ScientistsSeason 7 Episode 1 - Tools for Data Scientists
Season 7 Episode 1 - Tools for Data Scientistsaspyker
 
Cloud Capacity Planning Tooling - South Bay SRE Meetup Aug-09-2016
Cloud Capacity Planning Tooling - South Bay SRE Meetup Aug-09-2016Cloud Capacity Planning Tooling - South Bay SRE Meetup Aug-09-2016
Cloud Capacity Planning Tooling - South Bay SRE Meetup Aug-09-2016Coburn Watson
 

What's hot (20)

Netflix Open Source Meetup Season 4 Episode 1
Netflix Open Source Meetup Season 4 Episode 1Netflix Open Source Meetup Season 4 Episode 1
Netflix Open Source Meetup Season 4 Episode 1
 
Netflix Open Source Meetup Season 3 Episode 2
Netflix Open Source Meetup Season 3 Episode 2Netflix Open Source Meetup Season 3 Episode 2
Netflix Open Source Meetup Season 3 Episode 2
 
SRECon16: Moving Large Workloads from a Public Cloud to an OpenStack Private ...
SRECon16: Moving Large Workloads from a Public Cloud to an OpenStack Private ...SRECon16: Moving Large Workloads from a Public Cloud to an OpenStack Private ...
SRECon16: Moving Large Workloads from a Public Cloud to an OpenStack Private ...
 
Practical tips and tricks for Apache Kafka messages integration | Francesco T...
Practical tips and tricks for Apache Kafka messages integration | Francesco T...Practical tips and tricks for Apache Kafka messages integration | Francesco T...
Practical tips and tricks for Apache Kafka messages integration | Francesco T...
 
Monitoring in a scalable world
Monitoring in a scalable worldMonitoring in a scalable world
Monitoring in a scalable world
 
Introduction to InfluxDB and TICK Stack
Introduction to InfluxDB and TICK StackIntroduction to InfluxDB and TICK Stack
Introduction to InfluxDB and TICK Stack
 
Tick
TickTick
Tick
 
Data(?)Ops with CircleCI
Data(?)Ops with CircleCIData(?)Ops with CircleCI
Data(?)Ops with CircleCI
 
InfluxDB and Grafana: An Introduction to Time-Based Data Storage and Visualiz...
InfluxDB and Grafana: An Introduction to Time-Based Data Storage and Visualiz...InfluxDB and Grafana: An Introduction to Time-Based Data Storage and Visualiz...
InfluxDB and Grafana: An Introduction to Time-Based Data Storage and Visualiz...
 
Santa Cloud: How Netflix Does Holiday Capacity Planning - South Bay SRE Meetu...
Santa Cloud: How Netflix Does Holiday Capacity Planning - South Bay SRE Meetu...Santa Cloud: How Netflix Does Holiday Capacity Planning - South Bay SRE Meetu...
Santa Cloud: How Netflix Does Holiday Capacity Planning - South Bay SRE Meetu...
 
[WSO2Con USA 2018] Deploying Applications in K8S and Docker
[WSO2Con USA 2018] Deploying Applications in K8S and Docker[WSO2Con USA 2018] Deploying Applications in K8S and Docker
[WSO2Con USA 2018] Deploying Applications in K8S and Docker
 
Running a Massively Parallel Self-serve Distributed Data System At Scale
Running a Massively Parallel Self-serve Distributed Data System At ScaleRunning a Massively Parallel Self-serve Distributed Data System At Scale
Running a Massively Parallel Self-serve Distributed Data System At Scale
 
5 lessons learned for successful migration to Confluent cloud | Natan Silinit...
5 lessons learned for successful migration to Confluent cloud | Natan Silinit...5 lessons learned for successful migration to Confluent cloud | Natan Silinit...
5 lessons learned for successful migration to Confluent cloud | Natan Silinit...
 
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
 
Athena 0.2.0 - Nimble
Athena 0.2.0 - NimbleAthena 0.2.0 - Nimble
Athena 0.2.0 - Nimble
 
Demystifying Event-Driven Architectures with Apache Kafka | Bogdan Sucaciu, P...
Demystifying Event-Driven Architectures with Apache Kafka | Bogdan Sucaciu, P...Demystifying Event-Driven Architectures with Apache Kafka | Bogdan Sucaciu, P...
Demystifying Event-Driven Architectures with Apache Kafka | Bogdan Sucaciu, P...
 
Spotify's journey to GCP
Spotify's journey to GCPSpotify's journey to GCP
Spotify's journey to GCP
 
Kafka Streams
Kafka StreamsKafka Streams
Kafka Streams
 
Season 7 Episode 1 - Tools for Data Scientists
Season 7 Episode 1 - Tools for Data ScientistsSeason 7 Episode 1 - Tools for Data Scientists
Season 7 Episode 1 - Tools for Data Scientists
 
Cloud Capacity Planning Tooling - South Bay SRE Meetup Aug-09-2016
Cloud Capacity Planning Tooling - South Bay SRE Meetup Aug-09-2016Cloud Capacity Planning Tooling - South Bay SRE Meetup Aug-09-2016
Cloud Capacity Planning Tooling - South Bay SRE Meetup Aug-09-2016
 

Similar to PPB's Sensu Journey

Getting started with splunk it service intelligence
Getting started with splunk it service intelligenceGetting started with splunk it service intelligence
Getting started with splunk it service intelligenceStephanie Bies
 
Getting Started With Splunk It Service Intelligence
Getting Started With Splunk It Service IntelligenceGetting Started With Splunk It Service Intelligence
Getting Started With Splunk It Service IntelligenceSplunk
 
IT Service Intelligence Hands On Breakout Session
IT Service Intelligence Hands On Breakout SessionIT Service Intelligence Hands On Breakout Session
IT Service Intelligence Hands On Breakout SessionSplunk
 
Caso de Sucesso Vodafone e Splunk
Caso de Sucesso Vodafone e SplunkCaso de Sucesso Vodafone e Splunk
Caso de Sucesso Vodafone e SplunkSplunk
 
Cost-Effective Personalisation Platform for 30M Users of Ringier Axel Springe...
Cost-Effective Personalisation Platform for 30M Users of Ringier Axel Springe...Cost-Effective Personalisation Platform for 30M Users of Ringier Axel Springe...
Cost-Effective Personalisation Platform for 30M Users of Ringier Axel Springe...Rising Media Ltd.
 
Scope Definition of Online Ticketing System
Scope Definition of Online Ticketing SystemScope Definition of Online Ticketing System
Scope Definition of Online Ticketing SystemShahriar Parvez
 
IT Service Intelligence Hands On Breakout Session
IT Service Intelligence Hands On Breakout SessionIT Service Intelligence Hands On Breakout Session
IT Service Intelligence Hands On Breakout SessionSplunk
 
Scope definition of ticketing automation bangladesh
Scope definition of ticketing automation bangladeshScope definition of ticketing automation bangladesh
Scope definition of ticketing automation bangladeshShakil Mahmood
 
IT Service Intelligence Hands On Breakout Session
IT Service Intelligence Hands On Breakout SessionIT Service Intelligence Hands On Breakout Session
IT Service Intelligence Hands On Breakout SessionSplunk
 
IT Service Intelligence Hands On
IT Service Intelligence Hands OnIT Service Intelligence Hands On
IT Service Intelligence Hands OnSplunk
 
Introduction to brainCloud - Sept 2014
Introduction to brainCloud - Sept 2014Introduction to brainCloud - Sept 2014
Introduction to brainCloud - Sept 2014Paul Winterhalder
 
APIdays London 2019 - The New Neobanking Stack with Woody Rousseau, Sipios
APIdays London 2019 - The New Neobanking Stack with Woody Rousseau, SipiosAPIdays London 2019 - The New Neobanking Stack with Woody Rousseau, Sipios
APIdays London 2019 - The New Neobanking Stack with Woody Rousseau, Sipiosapidays
 
Bizible Essentials for Marketo Users
Bizible Essentials for Marketo UsersBizible Essentials for Marketo Users
Bizible Essentials for Marketo UsersPerkuto
 
Mobile UA Tips from the Inside | Paivi Putsepp-Seufert
Mobile UA Tips from the Inside | Paivi Putsepp-SeufertMobile UA Tips from the Inside | Paivi Putsepp-Seufert
Mobile UA Tips from the Inside | Paivi Putsepp-SeufertJessica Tams
 
One Stop Outsourcing Shop -Offshore Service Provider
One Stop Outsourcing Shop -Offshore Service ProviderOne Stop Outsourcing Shop -Offshore Service Provider
One Stop Outsourcing Shop -Offshore Service ProviderProglobalbusinesssolutions
 
One Azure Monitor to Rule Them All? - Marius Zaharia
One Azure Monitor to Rule Them All? - Marius ZahariaOne Azure Monitor to Rule Them All? - Marius Zaharia
One Azure Monitor to Rule Them All? - Marius ZahariaITCamp
 
One Azure Monitor to Rule Them All? (IT Camp 2017, Cluj, RO)
One Azure Monitor to Rule Them All? (IT Camp 2017, Cluj, RO)One Azure Monitor to Rule Them All? (IT Camp 2017, Cluj, RO)
One Azure Monitor to Rule Them All? (IT Camp 2017, Cluj, RO)Marius Zaharia
 
SenchaCon 2016: Refine Enterprise Applications by Focusing on U0ser Experienc...
SenchaCon 2016: Refine Enterprise Applications by Focusing on U0ser Experienc...SenchaCon 2016: Refine Enterprise Applications by Focusing on U0ser Experienc...
SenchaCon 2016: Refine Enterprise Applications by Focusing on U0ser Experienc...Sencha
 
Qwasi Splunk and NCR Integration: Business Analytics
Qwasi Splunk and NCR Integration: Business AnalyticsQwasi Splunk and NCR Integration: Business Analytics
Qwasi Splunk and NCR Integration: Business AnalyticsTimur Bagirov
 

Similar to PPB's Sensu Journey (20)

Getting started with splunk it service intelligence
Getting started with splunk it service intelligenceGetting started with splunk it service intelligence
Getting started with splunk it service intelligence
 
Getting Started With Splunk It Service Intelligence
Getting Started With Splunk It Service IntelligenceGetting Started With Splunk It Service Intelligence
Getting Started With Splunk It Service Intelligence
 
IT Service Intelligence Hands On Breakout Session
IT Service Intelligence Hands On Breakout SessionIT Service Intelligence Hands On Breakout Session
IT Service Intelligence Hands On Breakout Session
 
Caso de Sucesso Vodafone e Splunk
Caso de Sucesso Vodafone e SplunkCaso de Sucesso Vodafone e Splunk
Caso de Sucesso Vodafone e Splunk
 
Cost-Effective Personalisation Platform for 30M Users of Ringier Axel Springe...
Cost-Effective Personalisation Platform for 30M Users of Ringier Axel Springe...Cost-Effective Personalisation Platform for 30M Users of Ringier Axel Springe...
Cost-Effective Personalisation Platform for 30M Users of Ringier Axel Springe...
 
Business Intelligence for BRTS
Business Intelligence for BRTSBusiness Intelligence for BRTS
Business Intelligence for BRTS
 
Scope Definition of Online Ticketing System
Scope Definition of Online Ticketing SystemScope Definition of Online Ticketing System
Scope Definition of Online Ticketing System
 
IT Service Intelligence Hands On Breakout Session
IT Service Intelligence Hands On Breakout SessionIT Service Intelligence Hands On Breakout Session
IT Service Intelligence Hands On Breakout Session
 
Scope definition of ticketing automation bangladesh
Scope definition of ticketing automation bangladeshScope definition of ticketing automation bangladesh
Scope definition of ticketing automation bangladesh
 
IT Service Intelligence Hands On Breakout Session
IT Service Intelligence Hands On Breakout SessionIT Service Intelligence Hands On Breakout Session
IT Service Intelligence Hands On Breakout Session
 
IT Service Intelligence Hands On
IT Service Intelligence Hands OnIT Service Intelligence Hands On
IT Service Intelligence Hands On
 
Introduction to brainCloud - Sept 2014
Introduction to brainCloud - Sept 2014Introduction to brainCloud - Sept 2014
Introduction to brainCloud - Sept 2014
 
APIdays London 2019 - The New Neobanking Stack with Woody Rousseau, Sipios
APIdays London 2019 - The New Neobanking Stack with Woody Rousseau, SipiosAPIdays London 2019 - The New Neobanking Stack with Woody Rousseau, Sipios
APIdays London 2019 - The New Neobanking Stack with Woody Rousseau, Sipios
 
Bizible Essentials for Marketo Users
Bizible Essentials for Marketo UsersBizible Essentials for Marketo Users
Bizible Essentials for Marketo Users
 
Mobile UA Tips from the Inside | Paivi Putsepp-Seufert
Mobile UA Tips from the Inside | Paivi Putsepp-SeufertMobile UA Tips from the Inside | Paivi Putsepp-Seufert
Mobile UA Tips from the Inside | Paivi Putsepp-Seufert
 
One Stop Outsourcing Shop -Offshore Service Provider
One Stop Outsourcing Shop -Offshore Service ProviderOne Stop Outsourcing Shop -Offshore Service Provider
One Stop Outsourcing Shop -Offshore Service Provider
 
One Azure Monitor to Rule Them All? - Marius Zaharia
One Azure Monitor to Rule Them All? - Marius ZahariaOne Azure Monitor to Rule Them All? - Marius Zaharia
One Azure Monitor to Rule Them All? - Marius Zaharia
 
One Azure Monitor to Rule Them All? (IT Camp 2017, Cluj, RO)
One Azure Monitor to Rule Them All? (IT Camp 2017, Cluj, RO)One Azure Monitor to Rule Them All? (IT Camp 2017, Cluj, RO)
One Azure Monitor to Rule Them All? (IT Camp 2017, Cluj, RO)
 
SenchaCon 2016: Refine Enterprise Applications by Focusing on U0ser Experienc...
SenchaCon 2016: Refine Enterprise Applications by Focusing on U0ser Experienc...SenchaCon 2016: Refine Enterprise Applications by Focusing on U0ser Experienc...
SenchaCon 2016: Refine Enterprise Applications by Focusing on U0ser Experienc...
 
Qwasi Splunk and NCR Integration: Business Analytics
Qwasi Splunk and NCR Integration: Business AnalyticsQwasi Splunk and NCR Integration: Business Analytics
Qwasi Splunk and NCR Integration: Business Analytics
 

More from Sensu Inc.

Introducing GoAlert: a brand-new on-call scheduling and notification open sou...
Introducing GoAlert: a brand-new on-call scheduling and notification open sou...Introducing GoAlert: a brand-new on-call scheduling and notification open sou...
Introducing GoAlert: a brand-new on-call scheduling and notification open sou...Sensu Inc.
 
Monitoring Graceful Failure
Monitoring Graceful FailureMonitoring Graceful Failure
Monitoring Graceful FailureSensu Inc.
 
Testing and monitoring and broken things
Testing and monitoring and broken thingsTesting and monitoring and broken things
Testing and monitoring and broken thingsSensu Inc.
 
Keynote: Measuring the right things
Keynote: Measuring the right thingsKeynote: Measuring the right things
Keynote: Measuring the right thingsSensu Inc.
 
AIOps & Observability to Lead Your Digital Transformation
AIOps & Observability to Lead Your Digital TransformationAIOps & Observability to Lead Your Digital Transformation
AIOps & Observability to Lead Your Digital TransformationSensu Inc.
 
Ecosystem session: Sensu + Puppet
Ecosystem session: Sensu + PuppetEcosystem session: Sensu + Puppet
Ecosystem session: Sensu + PuppetSensu Inc.
 
Pull, don’t push: Architectures for monitoring and configuration in a microse...
Pull, don’t push: Architectures for monitoring and configuration in a microse...Pull, don’t push: Architectures for monitoring and configuration in a microse...
Pull, don’t push: Architectures for monitoring and configuration in a microse...Sensu Inc.
 
Assets in Sensu 2.0
Assets in Sensu 2.0Assets in Sensu 2.0
Assets in Sensu 2.0Sensu Inc.
 
The Box.com success story: migrating 350K Nagios objects to Sensu
The Box.com success story: migrating 350K Nagios objects to SensuThe Box.com success story: migrating 350K Nagios objects to Sensu
The Box.com success story: migrating 350K Nagios objects to SensuSensu Inc.
 
Project 3M: Meaningful Monitoring and Messaging
Project 3M: Meaningful Monitoring and MessagingProject 3M: Meaningful Monitoring and Messaging
Project 3M: Meaningful Monitoring and MessagingSensu Inc.
 
Sharing Sensu with Multiple Teams using Ansible
Sharing Sensu with Multiple Teams using AnsibleSharing Sensu with Multiple Teams using Ansible
Sharing Sensu with Multiple Teams using AnsibleSensu Inc.
 
Where's My Beer: Building a Better Kegerator with a Raspberry Pi & Sensu
Where's My Beer: Building a Better Kegerator with a Raspberry Pi & SensuWhere's My Beer: Building a Better Kegerator with a Raspberry Pi & Sensu
Where's My Beer: Building a Better Kegerator with a Raspberry Pi & SensuSensu Inc.
 
Reimagining Sensu
Reimagining SensuReimagining Sensu
Reimagining SensuSensu Inc.
 
Alert Fatigue: Avoidance and Course Correction
Alert Fatigue: Avoidance and Course CorrectionAlert Fatigue: Avoidance and Course Correction
Alert Fatigue: Avoidance and Course CorrectionSensu Inc.
 
Sensu and Kubernetes 1.x
Sensu and Kubernetes 1.xSensu and Kubernetes 1.x
Sensu and Kubernetes 1.xSensu Inc.
 
Sensu and Puppet
Sensu and PuppetSensu and Puppet
Sensu and PuppetSensu Inc.
 

More from Sensu Inc. (16)

Introducing GoAlert: a brand-new on-call scheduling and notification open sou...
Introducing GoAlert: a brand-new on-call scheduling and notification open sou...Introducing GoAlert: a brand-new on-call scheduling and notification open sou...
Introducing GoAlert: a brand-new on-call scheduling and notification open sou...
 
Monitoring Graceful Failure
Monitoring Graceful FailureMonitoring Graceful Failure
Monitoring Graceful Failure
 
Testing and monitoring and broken things
Testing and monitoring and broken thingsTesting and monitoring and broken things
Testing and monitoring and broken things
 
Keynote: Measuring the right things
Keynote: Measuring the right thingsKeynote: Measuring the right things
Keynote: Measuring the right things
 
AIOps & Observability to Lead Your Digital Transformation
AIOps & Observability to Lead Your Digital TransformationAIOps & Observability to Lead Your Digital Transformation
AIOps & Observability to Lead Your Digital Transformation
 
Ecosystem session: Sensu + Puppet
Ecosystem session: Sensu + PuppetEcosystem session: Sensu + Puppet
Ecosystem session: Sensu + Puppet
 
Pull, don’t push: Architectures for monitoring and configuration in a microse...
Pull, don’t push: Architectures for monitoring and configuration in a microse...Pull, don’t push: Architectures for monitoring and configuration in a microse...
Pull, don’t push: Architectures for monitoring and configuration in a microse...
 
Assets in Sensu 2.0
Assets in Sensu 2.0Assets in Sensu 2.0
Assets in Sensu 2.0
 
The Box.com success story: migrating 350K Nagios objects to Sensu
The Box.com success story: migrating 350K Nagios objects to SensuThe Box.com success story: migrating 350K Nagios objects to Sensu
The Box.com success story: migrating 350K Nagios objects to Sensu
 
Project 3M: Meaningful Monitoring and Messaging
Project 3M: Meaningful Monitoring and MessagingProject 3M: Meaningful Monitoring and Messaging
Project 3M: Meaningful Monitoring and Messaging
 
Sharing Sensu with Multiple Teams using Ansible
Sharing Sensu with Multiple Teams using AnsibleSharing Sensu with Multiple Teams using Ansible
Sharing Sensu with Multiple Teams using Ansible
 
Where's My Beer: Building a Better Kegerator with a Raspberry Pi & Sensu
Where's My Beer: Building a Better Kegerator with a Raspberry Pi & SensuWhere's My Beer: Building a Better Kegerator with a Raspberry Pi & Sensu
Where's My Beer: Building a Better Kegerator with a Raspberry Pi & Sensu
 
Reimagining Sensu
Reimagining SensuReimagining Sensu
Reimagining Sensu
 
Alert Fatigue: Avoidance and Course Correction
Alert Fatigue: Avoidance and Course CorrectionAlert Fatigue: Avoidance and Course Correction
Alert Fatigue: Avoidance and Course Correction
 
Sensu and Kubernetes 1.x
Sensu and Kubernetes 1.xSensu and Kubernetes 1.x
Sensu and Kubernetes 1.x
 
Sensu and Puppet
Sensu and PuppetSensu and Puppet
Sensu and Puppet
 

Recently uploaded

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 

Recently uploaded (20)

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 

PPB's Sensu Journey

Editor's Notes

  1. Who the heck are PPB?! PPB is the result of the merger between Paddy Power and Betfair 2016 Merge of equals Two major online gambling companies coming together A quick look at the two brands....
  2. Purely Online Traditional Sportsbook and Exchange. Exchange is platform to facilitate betting between two parties. You as a user can bet for or against a given outcome at your desired odds and we will match that bet with someone of the other side. And take a small commission on the winnings
  3. Paddy Power is traditional Sportsbook. You are betting against Paddy Power at the odds we offer. We offer markets on hundreds of different events - mostly sporting. Interesting bet Paddy Power had a market for the 2016 US presidential election. In fact analysing all the data coming in we were able to predict the result before most and paid out on that one two weeks early…
  4. Hrrmmm... We all know the actual outcome of this one. £5m... that's > $6m 2:30
  5. PP doesn't take itself seriously. Always getting in trouble. Cheltenham 2010 – Hollywood sign in a nearby field overlooking race track Denmark’s Niklas Bendtner during Euro 2012 - EUR100,000 fine that PP paid PP pants hot air balloon – tethered in local garden for Cheltenham 2013 Arguably our most controversial ad and topical again today. This was leaked by PP to various media outlets and let the keyboard warriors do their thing. Raise awareness about real issues in the host country. Printed a retraction by getting the loggers back out. 4:30
  6. As of earlier this year, we're part of Flutter Entertainment Today we have multiple brands around the world. Strongest in Europe but growing presence in other areas Fanduel in the US. Actually have a retail shop in New Jersey Growing B2B portfolio.
  7. The Before Two stacks collide 5:00
  8. In 2016 merger - two very different stacks came together - differing monitoring tools - Nagios - Opsview - A little Sensu - Various other tools in metric and log analysis space SRE team were tasked with consolidating our toolset and pick the best tools to support both stacks now and into the future
  9. So to start we mapped out our approach…. The Approach - What’s the problem - What were the requirements - How can we create a framework around them - What do test environments look like - What do tests look like Short list - Look at available tooling - Compare at a high-level - Narrow down to short list
  10. Every good project has a plan. Perfectly laid out and timed plan We met all these dates:
  11. Requirements - Metric Collection / Graphing - User Interface - Documentation - Updates - Feature Set - Performance and Stability - Time & Effort - Interoperability Series of questions. Weighted.
  12. Design Environment to test each solution in a consistent manner Scope of the environment - Hypervisors - VMs - Network Devices - Storage - Subset of apps Design to test each solution consistently 7:15
  13. Short list. As per plan we put these through their paces
  14. Each solution was tested against our requirements Perfect or max score was ~240
  15. So having test all the solutions Without further ado No surprises Winner is
  16. I know what you’re thinking… - Are they at the wrong conference? - Based on scoring Zenoss won - Sensu cam third place - Why are we here Lets take a look at the scores… 8:30
  17. Zenos – right hand side – highest score of 198 Sensu – second from left – third at 175 This didn’t feel right… At this point we eliminated Bosun
  18. Things like API, Updates, Migration Path were quite important to us Zenoss – great product full of features - API – Not well documented. Wasn’t clear how we could integrate with it. - Complexity – Self–service model and vast estate. Added complexity. Nagios OMD API – Not up to scratch for our requirements Updates – Not updated in over a year
  19. There can be two Sensu scored well across all categories and had good coverage across the things that were most important to us. When we combined it with Prometheus we found we had something that matched Zenoss feature set, but gave us the flexibility that we needed Also Migration path from OpsView and Nagios to Sensu was really nice with them all running Nagios compatible checks
  20. Talk about our implementation Don't focus on how sensuworks, focus on cool thing which we are doing with sensu.  My part is divided into 2 sections – Self-Service and Sensu-Management (How we are using sensulogs)
  21. Quick very simple description of how it works. Each of our host should have sensu client installed Sensu client is a monitoring agent, installed on a system to be monitored Sensuclient is running on machine, client receives information about what should be done by instruction from subscription.  Because it is so simple, it is very easy to understand, and as everyone knows simplicity is always key.
  22. More then 10 000 machines so it is impossible for us to implement checks for each of them. We have a lot of different applications, so we can't know in details what exactly needs to be checked  Save time, a lot of time.  Happy customers mean happy us 
  23. Everything is kept on gitlabas it is a great place to share code with other engineers.  Basic checks made by githook. Gitlab repo is easy to control (not allowed to make bad changes which can break something), request access (easy to give access to users who want to add new checks), share (very simple to share with people as we can easily send URL links with details to rep).  Git is a very common version-control system, so almost everyone knows how to use it and if not it is very easy to explain to people how to do it.  Also there is gitlabGUI which users can use to check their checks, Of course there is sensuGUI which can be used to check clients, but by gitlabwe can see subscription json file exactly. Subscriptions are divided into zones (dev, prd, Ie1, ie2 and sre), by this we always deploy new subscriptions to correct Sensuinstance. 
  24. So there is situation in which you are a new user who wants to add a new check. Mitigator is reverse proxy --  directs client requests to the appropriate backend server 1. You need to create repo fork -> Make your changes -> Create merge request (End of user work) -> Next gitlabwebhooks will post to correct Sensu instances new changes. -> Explain what is mitigator -> Mitigator posts to all sensuAPI and sensuservers -> Places flag on filesystem to signal that they need to be GIT CHECK -> There is specially created cronjob which is to check if there is this flag -> If there is this flag, simpliygit checkout. -> Check if changes are valid -> if yes simply restart sensu-enterpise, if not, alert is triggered for SRE that something is wrong and changes are not implemented, In this situation we have time until 3:30 AM to fix it( Why 3:30 ? Because all sensu-enterpisesrestart at 3:30)
  25. More self-service Jenkins pipeline, with special gitlabplugin which allows you to automatically merge MR if pipeline success.  Why we need even more self service – we are using human gate as the last step before merging to be sure that changes are correct but because it took us sometimes a lot of time and it interrupted engineers we are planning to automate it, We will check the most crucial part of changes. Also if our team is really busy, customers can wait even few hours before anyone can find some time to merge it, what can result in worst customer satisfaction 
  26. Instead of human gate at the end, there will be pipeline. Very easy workflow – User creates new MR, GitLab send notification about MR to jenkins,jenkinsrun pipeline with all checks, if result are fine, great day reusltis send to gitlab, and the rest is done as I explained in previous slides At the begging we will leave human gate, and jenkinswill only send us information if MR can be merged, but the last decision will be ours. After few weeks if result of tests will satisfy us, we will make it fully automated without any human gate at the end.
  27. As we have hundreds of applications and thousands of hosts, we need to have some control over sensuworking on them and that why we start using SesnuAPI and Splunk to do it  We are using spec ially created LRP (light reporting platform) in which we have codes which collect all data by sensuAPI, process and take the information that we are interested in. 
  28. Dashboard is created from sensulogs and information about our host  Explain what is TLA  Dashboard identify if any of our TLA is missing the most important checks It helps us to know who we need to contact about missing checks, how important it is by business rated apps. List of TLAs which are effected by missing most crucial checks. TLA – Three letter acronym (service name)
  29. As we have thousands of events, we needed to have something which will visualize status of our events, if we have some critical situation with a lot of bad events, or if everything is fine. Also we can use it to troubleshoot incidents as we can see what contact was alerted about bad events.  This dashboard helps us to analyze events, how many of these are critical, what hosts are effected, who was contacted about specific events    
  30. Again when we have hundreds of users it is very hard to keep all of them to not silence checks for ever, as always someone can forget about it or there is any other reason. That's why we have this dashboard  Don’t understand me wrong, silence option is great, but people should not leave silence check forever, that's why we need to have this dashboard to check how many checks are silenced, and again who needs to be contacted.
  31. To control versions of sensuclients we have another dashboard. When we know how many TLAs we have UpToDate, we know where we are and how many of them are old.  Especially very useful, when we update our package to new client version and we want to know how many hosts have been updated and who should be convince to redeployed TLA to have a new version of sensu. You can't see it here but when you click on a section of this chart you can see the list of hosts 
  32. SensuAPI is easy and powerful. You can gather information about everything that you need, without checking GUI or configuration.