SlideShare a Scribd company logo
1 of 152
Download to read offline
You can talk to me about:
● Making on-call better for humans
● High Availability and Load Balancing
● Mirroring free software
● Zero trust networking
● Home Automation and garage racks
● Pretty much anything
James Forman
I am a:
Linux Sysadmin, Network Engineer and People Manager
I work for:
https://en.wikipedia.org/wiki/File:Wellington_montage_2.jpghttps://en.wikipedia.org/wiki/File:New_Zealand_relief_map.jpg
Overlay by:
Wikipedia User Hazhk
Catalyst’s Wellington Team - December 2017
OSMC.de 2019
First things first
Content Warning
Photos of earthquake damage
Hot Potato is not a
monitoring system
It’s a message broker
Life before Hot Potato
One pager team
The “pager peeps”
Every person has a pager
One number, multiple pagers
A range of different
monitoring builds
Customer managed, remote, out of country, in
country with pager access, email to pager gateways
18 Monitoring Servers
Nagios 3, Icinga 1.x and Icinga2
Support Hotline
Call number, leave voicemail, wake person up
IRC based handovers
<jforman> Pager on
<redacted> Going to sleep
Why did we build it?
We already wanted a
replacement system
Aging technology
The pager system was becoming unreliable
Image credit: Matthew Inman / theoatmeal.com
The top 3 questions
1. Why not use a service?
SLAs
None of the options could meet our requirements
2. Why not use SMS?
3. Why not go staffed 24/7?
Our other motivations
Open Source
Customer data
Stop sending notifications in cleartext
https://techcrunch.com/2019/10/30/nhs-pagers-medical-health-data/
We thought we had time
to find a replacement
https://www.spark.co.nz/content/dam/kb/public/docs/media-release-paging-network-closure.pdf
We thought we had time
to find a replacement
At first it was good news
Aging technology
It was too good to be true
The pager network
became unreliable
“In response to Radio New Zealand
queries Spark said it had talked to many
of its customers before the
announcement was made and that
included the Fire Service.”
Time ran out
(in the middle of the night)
NO CARRIER
:(
“1st cab off the rank was those pager numbers
that had not signed up to the new pager
network were disconnected.”
“We have then worked with the customers who
have migrated across to replace their old
access points (ways they send a pager
message) to either Email or an API option.”
“This is because the old access points are
being turned off.”
Photo by:
BRENDON O'HAGAN/FAIRFAX NZ
Solving the
immediate problem
so people could sleep
eMail -> SMS
We sent all notifications via SMS
as an emergency measure
eMail == :(
Nameless project == :)
The first version of
Hot Potato
A really bad “API”
The worst thing I’ve put into production
A dodgy script
Rolled out to all the monitoring servers
Insert and Send
Add to database, send pager message
select * from notifications
A table of notifications
A handover button
sends a message saying you have the pager
v0.1 - Much more reliable than email
It worked (mostly)
It gave us the time and
opportunity to do better
We had some goals
Don’t get in the way
make it easy to be on-call
Enable alert reduction
let people sleep
Survive natural hazards
the reality of building systems in NZ
Volcanoes
https://www.nationalgeographic.org/news/plate-tectonics-ring-fire/
Earthquakes
Recent fatal earthquakes
22 February 2011 - Christchurch - 185 people
13 June 2011 - Christchurch - 1 person
14 November 2016 - Kaikoura - 2 people
Diagrams by:
Wikipedia User Mikenorton
Photo by:
New Zealand Defence Force
Photo by:
New Zealand Defence Force
Photo by:
RNZ / Rebekah Parsons-King
Photo of:
MP Stuart Smith
Photo by:
RNZ / Simon Morton
Photo by:
RNZ / Conan Young
Photo by:
RNZ / Aaron Smale
Photo by:
Phillip Pearson
Tsunamis
https://wremo.nz/hazards/tsunami-zones/
https://wremo.nz/hazards/tsunami-zones/
https://wremo.nz/hazards/tsunami-zones/
https://wremo.nz/hazards/tsunami-zones/
Survive any loss of
International Connectivity
we had 1 main undersea cable (2 landings)
Image credit: Tourism New Zealand
https://www.submarinecablemap.com/
https://www.submarinecablemap.com/
https://www.submarinecablemap.com/
Then we had some
requirements
Survive disasters
Earthquakes, tsunamis, volcanoes, team lunches..
Support existing
monitoring
Nagios3, Icinga 1.x, Icinga2
Get rid of email
No more using email to deliver messages
Confirm message delivery
Move from paging and SMS to Push Notifications
Improve handover
Is your pager on yet? I want to go to sleep
#deathTo Pagers
“I’d rather have a bee burrow into my
skull than carry a pager again”
- Me
What did we build?
A web app with an API
built with Python and Flask
With a funky database
and some queuing
CockroachDB and RabbitMQ
Our production
environment has 5 nodes
NZ: Porirua, Wellington and Hamilton
AU: Sydney
US: California
How does it work?
Sending notifications
Heartbeats
How does it look?
What else can it do?
Failure notifications
the pager network is down again!
Heartbeats
ensuring connectivity
Teams
put everyone on-call!
Team escalations
because sometimes bad things happen
Reports
A breakdown of the week that was
Promote alert reduction
With the help of some neopixels
What notification
providers does it support?
Twilio
For delivery of SMS messages
Modica
For delivery of SMS messages and pager messages
Pushover
For delivery of push notifications to Android and iOS
What’s planned?
Mobile app
for Android and iOS, no more pagers
Support hotline
direct calls to the on-call person or take messages
Planned work
stop forgetting to extend downtime on things
Language support
German and Italian coming soon
What do I need to try it?
What do I need to deploy
it to production?
One server
If you don’t want redundancy,
you don’t have to have it
Demo?
James Forman
Callum Dickinson
Filip Vujičić
Zac Pullar-Strecker
Opal Symes
Rhys Davies
Michael Fincham
Tim Bruce
Jamie McClymont
Toni Gardener
Manuela Spies
Sapir Ben-Shahar
Brynn Wilde
Hemanth Sonthi
Emanuel Evans
Hazel Meehan
Baxter Gray
Sam Banks
Thank you to our contributors
Open Source Academy
https://hotpotato.nz
Questions?
https://hotpotato.nz
@teamHotPotato
#hotpotato on freenode

More Related Content

Similar to OSMC 2019 | Hot Potato by James Forman

Dror-Crazy_toaster
Dror-Crazy_toasterDror-Crazy_toaster
Dror-Crazy_toaster
guest66dc5f
 
CSW2017 Yuhao song+Huimingliu cyber_wmd_vulnerable_IoT
CSW2017 Yuhao song+Huimingliu cyber_wmd_vulnerable_IoTCSW2017 Yuhao song+Huimingliu cyber_wmd_vulnerable_IoT
CSW2017 Yuhao song+Huimingliu cyber_wmd_vulnerable_IoT
CanSecWest
 
Networking and Computer Troubleshooting
Networking and Computer TroubleshootingNetworking and Computer Troubleshooting
Networking and Computer Troubleshooting
Rence Montanes
 
Messaging is not just for investment banks!
Messaging is not just for investment banks!Messaging is not just for investment banks!
Messaging is not just for investment banks!
elliando dias
 
A new perspective on Network Visibility - RISK 2015
A new perspective on Network Visibility - RISK 2015A new perspective on Network Visibility - RISK 2015
A new perspective on Network Visibility - RISK 2015
Network Performance Channel GmbH
 

Similar to OSMC 2019 | Hot Potato by James Forman (20)

P2P for mobile devices
P2P for mobile devicesP2P for mobile devices
P2P for mobile devices
 
Farms, Fabrics and Clouds
Farms, Fabrics and CloudsFarms, Fabrics and Clouds
Farms, Fabrics and Clouds
 
Kamailio World 2018: Having fun with new stuff
Kamailio World 2018: Having fun with new stuffKamailio World 2018: Having fun with new stuff
Kamailio World 2018: Having fun with new stuff
 
More fun using Kautilya
More fun using KautilyaMore fun using Kautilya
More fun using Kautilya
 
Sneaky computation
Sneaky computationSneaky computation
Sneaky computation
 
Dror-Crazy_toaster
Dror-Crazy_toasterDror-Crazy_toaster
Dror-Crazy_toaster
 
CSW2017 Yuhao song+Huimingliu cyber_wmd_vulnerable_IoT
CSW2017 Yuhao song+Huimingliu cyber_wmd_vulnerable_IoTCSW2017 Yuhao song+Huimingliu cyber_wmd_vulnerable_IoT
CSW2017 Yuhao song+Huimingliu cyber_wmd_vulnerable_IoT
 
E Pliance Presentation.V1
E Pliance Presentation.V1E Pliance Presentation.V1
E Pliance Presentation.V1
 
Stewart MACKENZIE - The edge of the Internet is becoming the center
Stewart MACKENZIE - The edge of the Internet is becoming the centerStewart MACKENZIE - The edge of the Internet is becoming the center
Stewart MACKENZIE - The edge of the Internet is becoming the center
 
Networking and Computer Troubleshooting
Networking and Computer TroubleshootingNetworking and Computer Troubleshooting
Networking and Computer Troubleshooting
 
Messaging is not just for investment banks!
Messaging is not just for investment banks!Messaging is not just for investment banks!
Messaging is not just for investment banks!
 
Teensy Programming for Everyone
Teensy Programming for EveryoneTeensy Programming for Everyone
Teensy Programming for Everyone
 
Saving One Network At a Time
Saving One Network At a TimeSaving One Network At a Time
Saving One Network At a Time
 
Interledger Overview // Berlin Node.js Meetup
Interledger Overview // Berlin Node.js MeetupInterledger Overview // Berlin Node.js Meetup
Interledger Overview // Berlin Node.js Meetup
 
E commerce
E commerceE commerce
E commerce
 
A new perspective on Network Visibility - RISK 2015
A new perspective on Network Visibility - RISK 2015A new perspective on Network Visibility - RISK 2015
A new perspective on Network Visibility - RISK 2015
 
Network Monitoring Basics
Network Monitoring BasicsNetwork Monitoring Basics
Network Monitoring Basics
 
Tech Presentation 2
Tech Presentation 2Tech Presentation 2
Tech Presentation 2
 
Fosdem IoT devroom, 2015, open scalable IoT systems with XMPP
Fosdem IoT devroom, 2015, open scalable IoT systems with XMPPFosdem IoT devroom, 2015, open scalable IoT systems with XMPP
Fosdem IoT devroom, 2015, open scalable IoT systems with XMPP
 
Network Automation - Interconnection tools
Network Automation - Interconnection toolsNetwork Automation - Interconnection tools
Network Automation - Interconnection tools
 

Recently uploaded

JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)
Max Lee
 
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
Alluxio, Inc.
 

Recently uploaded (20)

A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
 
Top Mobile App Development Companies 2024
Top Mobile App Development Companies 2024Top Mobile App Development Companies 2024
Top Mobile App Development Companies 2024
 
10 Essential Software Testing Tools You Need to Know About.pdf
10 Essential Software Testing Tools You Need to Know About.pdf10 Essential Software Testing Tools You Need to Know About.pdf
10 Essential Software Testing Tools You Need to Know About.pdf
 
JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)
 
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
 
Crafting the Perfect Measurement Sheet with PLM Integration
Crafting the Perfect Measurement Sheet with PLM IntegrationCrafting the Perfect Measurement Sheet with PLM Integration
Crafting the Perfect Measurement Sheet with PLM Integration
 
Modern binary build systems - PyCon 2024
Modern binary build systems - PyCon 2024Modern binary build systems - PyCon 2024
Modern binary build systems - PyCon 2024
 
Implementing KPIs and Right Metrics for Agile Delivery Teams.pdf
Implementing KPIs and Right Metrics for Agile Delivery Teams.pdfImplementing KPIs and Right Metrics for Agile Delivery Teams.pdf
Implementing KPIs and Right Metrics for Agile Delivery Teams.pdf
 
The Evolution of Web App Testing_ An Ultimate Guide to Future Trends.pdf
The Evolution of Web App Testing_ An Ultimate Guide to Future Trends.pdfThe Evolution of Web App Testing_ An Ultimate Guide to Future Trends.pdf
The Evolution of Web App Testing_ An Ultimate Guide to Future Trends.pdf
 
What need to be mastered as AI-Powered Java Developers
What need to be mastered as AI-Powered Java DevelopersWhat need to be mastered as AI-Powered Java Developers
What need to be mastered as AI-Powered Java Developers
 
SQL Injection Introduction and Prevention
SQL Injection Introduction and PreventionSQL Injection Introduction and Prevention
SQL Injection Introduction and Prevention
 
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
 
AI Hackathon.pptx
AI                        Hackathon.pptxAI                        Hackathon.pptx
AI Hackathon.pptx
 
Naer Toolbar Redesign - Usability Research Synthesis
Naer Toolbar Redesign - Usability Research SynthesisNaer Toolbar Redesign - Usability Research Synthesis
Naer Toolbar Redesign - Usability Research Synthesis
 
COMPUTER AND ITS COMPONENTS PPT.by naitik sharma Class 9th A mittal internati...
COMPUTER AND ITS COMPONENTS PPT.by naitik sharma Class 9th A mittal internati...COMPUTER AND ITS COMPONENTS PPT.by naitik sharma Class 9th A mittal internati...
COMPUTER AND ITS COMPONENTS PPT.by naitik sharma Class 9th A mittal internati...
 
StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi.pdf
StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi.pdfStrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi.pdf
StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi.pdf
 
Secure Software Ecosystem Teqnation 2024
Secure Software Ecosystem Teqnation 2024Secure Software Ecosystem Teqnation 2024
Secure Software Ecosystem Teqnation 2024
 
Workforce Efficiency with Employee Time Tracking Software.pdf
Workforce Efficiency with Employee Time Tracking Software.pdfWorkforce Efficiency with Employee Time Tracking Software.pdf
Workforce Efficiency with Employee Time Tracking Software.pdf
 
how-to-download-files-safely-from-the-internet.pdf
how-to-download-files-safely-from-the-internet.pdfhow-to-download-files-safely-from-the-internet.pdf
how-to-download-files-safely-from-the-internet.pdf
 
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...
 

OSMC 2019 | Hot Potato by James Forman