SlideShare a Scribd company logo
Uptime Day #1
Stream your monitoring
Stanislav Osipov
2017.04.08
Stanislav Osipov
• DevOps Architect at ECommPay
• Sr. DevOps Engineer at CityAds Media
• CISO & CIO at Payler, Runet Award 2014
• H. SysOps Engineer at Mainpeople Worldwide
• Sr. Deployment Engineer & Project Manager at Mirantis IT
• Sr. DevOps at Undev (Digital October)
• … and 10 more companies …
Topics
• Context
• Global problem statistics
• Tools
• 3 metric layers for Zabbix
• Channels
• Escalations
• Additional sources
• Receive it together
• Help to your managers
Online advertising industry
CPA, RTB, etc.
•A lot of traffic
•A lot of buzz and big data
•A lot of short-term initiatives
•A lot of IT support
Online advertising industry
CPA, RTB, etc.
•A lot of madness & fuckups
•A lot of
fun & lulz!
Classify problems
1. Datacenter failures
2. Connectivity failures
3. /dev/hands
ENISA Annual Incident Report 2015
(issued on October 05, 2016)
https://www.enisa.europa.eu/topics/incident-reporting/for-telcos/annual-reports
98,9 Connectivity failures
Datacenter failures
Failures overhead
/dev/hands
Problems distribution
1. Datacenter failures
2. Connectivity failures
3. /dev/hands
– up to 99% of failures
Tools
External:
•New Relic Synthetics
•Pingdom
Internal:
•Zabbix
•Munin
•New Relic APM (runtime context)
•собственные скрипты
Concept of 3 layers for Zabbix
• Base/System: OS metrics
- Disk space, RAM, CPU, LA, net
• Components: Daemon metrics
- Nginx, PHP-FPM, memcached, MySQL, PgSQL, etc.
• Advanced: Services & Apps metrics, DBA metrics, etc
- Money earned ;-)
Group or tag hosts in Zabbix
• By envs: Production, Staging, Testing, Development
• By datacenters or locations
• By VM guest type
• By operating systems ot type
• By projects, services & components
• By teams & ppl, if you cannot override the chaos
Channels
• Call/SMS
• E-mail
• New Relic mobile
• Slack
• Telegram
Channels
• Call/SMS
• E-mail
• New Relic mobile app
• Slack
• Telegram
• … and screens!
Escalations – Production only
• >= HIGH: Notify Ops on duty – All channels, incl. SMS
• >=CRITICAL: notify all Ops,
add +15 mins delay before if not work hours (all channels)
• 1 hour & >=CRITICAL: notify Ops management (any way, any time, all channels)
• >= HIGH: Notify SDEs on daylight time – All channels
• >= CRITICAL: notify SD TL of the affected system, add +15 mins delay before if
not work hours (all channels)
• = DISASTER: Notify IT top management & SD Team of the affected
systems,
add +5 mins delay if not work hours
What more?
New Relic Alerts on top of New Relic APM:
• Detect % of fatal errors in code runtime. Create NR Alerts
Policy: “If count is more than threshold”. If so, NR Alerts sends
e-mail to bugs@company.tld or what you enter.
• E-mail must trigger a “Bug on the production” ticket to IT
support.
• IT support assigns the ticket to the appropriate responsible
team.
• Every a such ticket must decrease KPI of the responsible team.
• PROFIT: nobody wants bugs on production!
What more?
Should Ops know about the deployments?
What more?
Should Ops know about the deployments?
• Add Jenkins jobs (about production deployments)
hook notifications to IMs (Slack, Telegram, etc)
• Add Pingdom bot to IMs
• Add New Relic Synthetics notifications to IMs
• Add anything else happens in your SD & IT …
Too many chats!!!
Stream this.
One stream chat per IM.
For everything.
Stream the monitoring
Help to your manager
bonus track
Incident management
Custom report for Zabbix
Count incidents and stats every month
Implement CI/CD and monitor the uptime
PROFIT: Uptime with observable result.
Questions?
Stanislav S. Osipov
oss+uptime@gkos.name
Thanks!

More Related Content

Similar to Stream your monitoring

Kick-off nieuwe Monitoring Werkgroep bij de GSE tijdens de Nationale GSE Conf...
Kick-off nieuwe Monitoring Werkgroep bij de GSE tijdens de Nationale GSE Conf...Kick-off nieuwe Monitoring Werkgroep bij de GSE tijdens de Nationale GSE Conf...
Kick-off nieuwe Monitoring Werkgroep bij de GSE tijdens de Nationale GSE Conf...
BDekkema
 
TIAD 2016 : Continuous Integration mesured and controlled
TIAD 2016 : Continuous Integration mesured and controlledTIAD 2016 : Continuous Integration mesured and controlled
TIAD 2016 : Continuous Integration mesured and controlled
The Incredible Automation Day
 
4 Best Practices for Patch Management in Education IT
4 Best Practices for Patch Management in Education IT4 Best Practices for Patch Management in Education IT
4 Best Practices for Patch Management in Education IT
Kaseya
 
Moving 65,000 Microsofties to DevOps with Visual Studio Team Services
Moving 65,000 Microsofties to DevOps with Visual Studio Team ServicesMoving 65,000 Microsofties to DevOps with Visual Studio Team Services
Moving 65,000 Microsofties to DevOps with Visual Studio Team Services
VSTS Community MSFT
 
Space Codesign at TandemLaunch 20150414
Space Codesign at TandemLaunch 20150414Space Codesign at TandemLaunch 20150414
Space Codesign at TandemLaunch 20150414
Space Codesign
 
Space Codesign at TandemLaunch Lunch & Learn 20150414
Space Codesign at TandemLaunch Lunch & Learn 20150414Space Codesign at TandemLaunch Lunch & Learn 20150414
Space Codesign at TandemLaunch Lunch & Learn 20150414
Gary Dare
 
Self-Service Analytics on Hadoop: Lessons Learned
Self-Service Analytics on Hadoop: Lessons LearnedSelf-Service Analytics on Hadoop: Lessons Learned
Self-Service Analytics on Hadoop: Lessons Learned
DataWorks Summit/Hadoop Summit
 
Witekio introducing-predictive-maintenance
Witekio introducing-predictive-maintenanceWitekio introducing-predictive-maintenance
Witekio introducing-predictive-maintenance
Witekio
 
Test Pyramid vs Roi
Test Pyramid vs Roi Test Pyramid vs Roi
Test Pyramid vs Roi
COMAQA.BY
 
Пирамида Тестирования через призму ROI калькулятора и прочая геометрия
Пирамида Тестирования через призму ROI калькулятора и прочая геометрияПирамида Тестирования через призму ROI калькулятора и прочая геометрия
Пирамида Тестирования через призму ROI калькулятора и прочая геометрия
SQALab
 
6 ways DevOps helped PrepSportswear move from monolith to microservices
6 ways DevOps helped PrepSportswear move from monolith to microservices6 ways DevOps helped PrepSportswear move from monolith to microservices
6 ways DevOps helped PrepSportswear move from monolith to microservices
Dynatrace
 
Architectural considerations when building an API
Architectural considerations when building an APIArchitectural considerations when building an API
Architectural considerations when building an API
Rod Hemphill
 
“Machine Learning in Production + Case Studies” by Dmitrijs Lvovs from Epista...
“Machine Learning in Production + Case Studies” by Dmitrijs Lvovs from Epista...“Machine Learning in Production + Case Studies” by Dmitrijs Lvovs from Epista...
“Machine Learning in Production + Case Studies” by Dmitrijs Lvovs from Epista...
DevClub_lv
 
What a DevOps specialist has to know about static code analysis
What a DevOps specialist has to know about static code analysisWhat a DevOps specialist has to know about static code analysis
What a DevOps specialist has to know about static code analysis
Andrey Karpov
 
Making operations visible - Nick Gallbreath
Making operations visible - Nick GallbreathMaking operations visible - Nick Gallbreath
Making operations visible - Nick Gallbreath
Devopsdays
 
Making operations visible - devopsdays tokyo 2013
Making operations visible  - devopsdays tokyo 2013Making operations visible  - devopsdays tokyo 2013
Making operations visible - devopsdays tokyo 2013
Nick Galbreath
 
XebiConFr 15 - AXA : Transformation digitale, les enjeux d'un grand groupe (Y...
XebiConFr 15 - AXA : Transformation digitale, les enjeux d'un grand groupe (Y...XebiConFr 15 - AXA : Transformation digitale, les enjeux d'un grand groupe (Y...
XebiConFr 15 - AXA : Transformation digitale, les enjeux d'un grand groupe (Y...
Publicis Sapient Engineering
 
Green Code Lab Challenge 2015 Subject Details
Green Code Lab Challenge 2015 Subject DetailsGreen Code Lab Challenge 2015 Subject Details
Green Code Lab Challenge 2015 Subject Details
Olivier Philippot
 
JOE_CV2014
JOE_CV2014JOE_CV2014
JOE_CV2014
Naidoo J
 
Rational User Group - May 2014 Stockholm - DevOps from an EA perspective
Rational User Group - May 2014 Stockholm - DevOps from an EA perspectiveRational User Group - May 2014 Stockholm - DevOps from an EA perspective
Rational User Group - May 2014 Stockholm - DevOps from an EA perspective
Joakim Lindbom
 

Similar to Stream your monitoring (20)

Kick-off nieuwe Monitoring Werkgroep bij de GSE tijdens de Nationale GSE Conf...
Kick-off nieuwe Monitoring Werkgroep bij de GSE tijdens de Nationale GSE Conf...Kick-off nieuwe Monitoring Werkgroep bij de GSE tijdens de Nationale GSE Conf...
Kick-off nieuwe Monitoring Werkgroep bij de GSE tijdens de Nationale GSE Conf...
 
TIAD 2016 : Continuous Integration mesured and controlled
TIAD 2016 : Continuous Integration mesured and controlledTIAD 2016 : Continuous Integration mesured and controlled
TIAD 2016 : Continuous Integration mesured and controlled
 
4 Best Practices for Patch Management in Education IT
4 Best Practices for Patch Management in Education IT4 Best Practices for Patch Management in Education IT
4 Best Practices for Patch Management in Education IT
 
Moving 65,000 Microsofties to DevOps with Visual Studio Team Services
Moving 65,000 Microsofties to DevOps with Visual Studio Team ServicesMoving 65,000 Microsofties to DevOps with Visual Studio Team Services
Moving 65,000 Microsofties to DevOps with Visual Studio Team Services
 
Space Codesign at TandemLaunch 20150414
Space Codesign at TandemLaunch 20150414Space Codesign at TandemLaunch 20150414
Space Codesign at TandemLaunch 20150414
 
Space Codesign at TandemLaunch Lunch & Learn 20150414
Space Codesign at TandemLaunch Lunch & Learn 20150414Space Codesign at TandemLaunch Lunch & Learn 20150414
Space Codesign at TandemLaunch Lunch & Learn 20150414
 
Self-Service Analytics on Hadoop: Lessons Learned
Self-Service Analytics on Hadoop: Lessons LearnedSelf-Service Analytics on Hadoop: Lessons Learned
Self-Service Analytics on Hadoop: Lessons Learned
 
Witekio introducing-predictive-maintenance
Witekio introducing-predictive-maintenanceWitekio introducing-predictive-maintenance
Witekio introducing-predictive-maintenance
 
Test Pyramid vs Roi
Test Pyramid vs Roi Test Pyramid vs Roi
Test Pyramid vs Roi
 
Пирамида Тестирования через призму ROI калькулятора и прочая геометрия
Пирамида Тестирования через призму ROI калькулятора и прочая геометрияПирамида Тестирования через призму ROI калькулятора и прочая геометрия
Пирамида Тестирования через призму ROI калькулятора и прочая геометрия
 
6 ways DevOps helped PrepSportswear move from monolith to microservices
6 ways DevOps helped PrepSportswear move from monolith to microservices6 ways DevOps helped PrepSportswear move from monolith to microservices
6 ways DevOps helped PrepSportswear move from monolith to microservices
 
Architectural considerations when building an API
Architectural considerations when building an APIArchitectural considerations when building an API
Architectural considerations when building an API
 
“Machine Learning in Production + Case Studies” by Dmitrijs Lvovs from Epista...
“Machine Learning in Production + Case Studies” by Dmitrijs Lvovs from Epista...“Machine Learning in Production + Case Studies” by Dmitrijs Lvovs from Epista...
“Machine Learning in Production + Case Studies” by Dmitrijs Lvovs from Epista...
 
What a DevOps specialist has to know about static code analysis
What a DevOps specialist has to know about static code analysisWhat a DevOps specialist has to know about static code analysis
What a DevOps specialist has to know about static code analysis
 
Making operations visible - Nick Gallbreath
Making operations visible - Nick GallbreathMaking operations visible - Nick Gallbreath
Making operations visible - Nick Gallbreath
 
Making operations visible - devopsdays tokyo 2013
Making operations visible  - devopsdays tokyo 2013Making operations visible  - devopsdays tokyo 2013
Making operations visible - devopsdays tokyo 2013
 
XebiConFr 15 - AXA : Transformation digitale, les enjeux d'un grand groupe (Y...
XebiConFr 15 - AXA : Transformation digitale, les enjeux d'un grand groupe (Y...XebiConFr 15 - AXA : Transformation digitale, les enjeux d'un grand groupe (Y...
XebiConFr 15 - AXA : Transformation digitale, les enjeux d'un grand groupe (Y...
 
Green Code Lab Challenge 2015 Subject Details
Green Code Lab Challenge 2015 Subject DetailsGreen Code Lab Challenge 2015 Subject Details
Green Code Lab Challenge 2015 Subject Details
 
JOE_CV2014
JOE_CV2014JOE_CV2014
JOE_CV2014
 
Rational User Group - May 2014 Stockholm - DevOps from an EA perspective
Rational User Group - May 2014 Stockholm - DevOps from an EA perspectiveRational User Group - May 2014 Stockholm - DevOps from an EA perspective
Rational User Group - May 2014 Stockholm - DevOps from an EA perspective
 

Recently uploaded

快速办理(新加坡SMU毕业证书)新加坡管理大学毕业证文凭证书一模一样
快速办理(新加坡SMU毕业证书)新加坡管理大学毕业证文凭证书一模一样快速办理(新加坡SMU毕业证书)新加坡管理大学毕业证文凭证书一模一样
快速办理(新加坡SMU毕业证书)新加坡管理大学毕业证文凭证书一模一样
3a0sd7z3
 
快速办理(Vic毕业证书)惠灵顿维多利亚大学毕业证完成信一模一样
快速办理(Vic毕业证书)惠灵顿维多利亚大学毕业证完成信一模一样快速办理(Vic毕业证书)惠灵顿维多利亚大学毕业证完成信一模一样
快速办理(Vic毕业证书)惠灵顿维多利亚大学毕业证完成信一模一样
3a0sd7z3
 
Securing BGP: Operational Strategies and Best Practices for Network Defenders...
Securing BGP: Operational Strategies and Best Practices for Network Defenders...Securing BGP: Operational Strategies and Best Practices for Network Defenders...
Securing BGP: Operational Strategies and Best Practices for Network Defenders...
APNIC
 
HijackLoader Evolution: Interactive Process Hollowing
HijackLoader Evolution: Interactive Process HollowingHijackLoader Evolution: Interactive Process Hollowing
HijackLoader Evolution: Interactive Process Hollowing
Donato Onofri
 
一比一原版新西兰林肯大学毕业证(Lincoln毕业证书)学历如何办理
一比一原版新西兰林肯大学毕业证(Lincoln毕业证书)学历如何办理一比一原版新西兰林肯大学毕业证(Lincoln毕业证书)学历如何办理
一比一原版新西兰林肯大学毕业证(Lincoln毕业证书)学历如何办理
thezot
 
How to make a complaint to the police for Social Media Fraud.pdf
How to make a complaint to the police for Social Media Fraud.pdfHow to make a complaint to the police for Social Media Fraud.pdf
How to make a complaint to the police for Social Media Fraud.pdf
Infosec train
 
怎么办理(umiami毕业证书)美国迈阿密大学毕业证文凭证书实拍图原版一模一样
怎么办理(umiami毕业证书)美国迈阿密大学毕业证文凭证书实拍图原版一模一样怎么办理(umiami毕业证书)美国迈阿密大学毕业证文凭证书实拍图原版一模一样
怎么办理(umiami毕业证书)美国迈阿密大学毕业证文凭证书实拍图原版一模一样
rtunex8r
 
一比一原版(USYD毕业证)悉尼大学毕业证如何办理
一比一原版(USYD毕业证)悉尼大学毕业证如何办理一比一原版(USYD毕业证)悉尼大学毕业证如何办理
一比一原版(USYD毕业证)悉尼大学毕业证如何办理
k4ncd0z
 
Integrating Physical and Cybersecurity to Lower Risks in Healthcare!
Integrating Physical and Cybersecurity to Lower Risks in Healthcare!Integrating Physical and Cybersecurity to Lower Risks in Healthcare!
Integrating Physical and Cybersecurity to Lower Risks in Healthcare!
Alec Kassir cozmozone
 
办理新西兰奥克兰大学毕业证学位证书范本原版一模一样
办理新西兰奥克兰大学毕业证学位证书范本原版一模一样办理新西兰奥克兰大学毕业证学位证书范本原版一模一样
办理新西兰奥克兰大学毕业证学位证书范本原版一模一样
xjq03c34
 
Bengaluru Dreamin' 24 - Personal Branding
Bengaluru Dreamin' 24 - Personal BrandingBengaluru Dreamin' 24 - Personal Branding
Bengaluru Dreamin' 24 - Personal Branding
Tarandeep Singh
 
Honeypots Unveiled: Proactive Defense Tactics for Cyber Security, Phoenix Sum...
Honeypots Unveiled: Proactive Defense Tactics for Cyber Security, Phoenix Sum...Honeypots Unveiled: Proactive Defense Tactics for Cyber Security, Phoenix Sum...
Honeypots Unveiled: Proactive Defense Tactics for Cyber Security, Phoenix Sum...
APNIC
 
Should Repositories Participate in the Fediverse?
Should Repositories Participate in the Fediverse?Should Repositories Participate in the Fediverse?
Should Repositories Participate in the Fediverse?
Paul Walk
 
Discover the benefits of outsourcing SEO to India
Discover the benefits of outsourcing SEO to IndiaDiscover the benefits of outsourcing SEO to India
Discover the benefits of outsourcing SEO to India
davidjhones387
 

Recently uploaded (14)

快速办理(新加坡SMU毕业证书)新加坡管理大学毕业证文凭证书一模一样
快速办理(新加坡SMU毕业证书)新加坡管理大学毕业证文凭证书一模一样快速办理(新加坡SMU毕业证书)新加坡管理大学毕业证文凭证书一模一样
快速办理(新加坡SMU毕业证书)新加坡管理大学毕业证文凭证书一模一样
 
快速办理(Vic毕业证书)惠灵顿维多利亚大学毕业证完成信一模一样
快速办理(Vic毕业证书)惠灵顿维多利亚大学毕业证完成信一模一样快速办理(Vic毕业证书)惠灵顿维多利亚大学毕业证完成信一模一样
快速办理(Vic毕业证书)惠灵顿维多利亚大学毕业证完成信一模一样
 
Securing BGP: Operational Strategies and Best Practices for Network Defenders...
Securing BGP: Operational Strategies and Best Practices for Network Defenders...Securing BGP: Operational Strategies and Best Practices for Network Defenders...
Securing BGP: Operational Strategies and Best Practices for Network Defenders...
 
HijackLoader Evolution: Interactive Process Hollowing
HijackLoader Evolution: Interactive Process HollowingHijackLoader Evolution: Interactive Process Hollowing
HijackLoader Evolution: Interactive Process Hollowing
 
一比一原版新西兰林肯大学毕业证(Lincoln毕业证书)学历如何办理
一比一原版新西兰林肯大学毕业证(Lincoln毕业证书)学历如何办理一比一原版新西兰林肯大学毕业证(Lincoln毕业证书)学历如何办理
一比一原版新西兰林肯大学毕业证(Lincoln毕业证书)学历如何办理
 
How to make a complaint to the police for Social Media Fraud.pdf
How to make a complaint to the police for Social Media Fraud.pdfHow to make a complaint to the police for Social Media Fraud.pdf
How to make a complaint to the police for Social Media Fraud.pdf
 
怎么办理(umiami毕业证书)美国迈阿密大学毕业证文凭证书实拍图原版一模一样
怎么办理(umiami毕业证书)美国迈阿密大学毕业证文凭证书实拍图原版一模一样怎么办理(umiami毕业证书)美国迈阿密大学毕业证文凭证书实拍图原版一模一样
怎么办理(umiami毕业证书)美国迈阿密大学毕业证文凭证书实拍图原版一模一样
 
一比一原版(USYD毕业证)悉尼大学毕业证如何办理
一比一原版(USYD毕业证)悉尼大学毕业证如何办理一比一原版(USYD毕业证)悉尼大学毕业证如何办理
一比一原版(USYD毕业证)悉尼大学毕业证如何办理
 
Integrating Physical and Cybersecurity to Lower Risks in Healthcare!
Integrating Physical and Cybersecurity to Lower Risks in Healthcare!Integrating Physical and Cybersecurity to Lower Risks in Healthcare!
Integrating Physical and Cybersecurity to Lower Risks in Healthcare!
 
办理新西兰奥克兰大学毕业证学位证书范本原版一模一样
办理新西兰奥克兰大学毕业证学位证书范本原版一模一样办理新西兰奥克兰大学毕业证学位证书范本原版一模一样
办理新西兰奥克兰大学毕业证学位证书范本原版一模一样
 
Bengaluru Dreamin' 24 - Personal Branding
Bengaluru Dreamin' 24 - Personal BrandingBengaluru Dreamin' 24 - Personal Branding
Bengaluru Dreamin' 24 - Personal Branding
 
Honeypots Unveiled: Proactive Defense Tactics for Cyber Security, Phoenix Sum...
Honeypots Unveiled: Proactive Defense Tactics for Cyber Security, Phoenix Sum...Honeypots Unveiled: Proactive Defense Tactics for Cyber Security, Phoenix Sum...
Honeypots Unveiled: Proactive Defense Tactics for Cyber Security, Phoenix Sum...
 
Should Repositories Participate in the Fediverse?
Should Repositories Participate in the Fediverse?Should Repositories Participate in the Fediverse?
Should Repositories Participate in the Fediverse?
 
Discover the benefits of outsourcing SEO to India
Discover the benefits of outsourcing SEO to IndiaDiscover the benefits of outsourcing SEO to India
Discover the benefits of outsourcing SEO to India
 

Stream your monitoring

  • 1. Uptime Day #1 Stream your monitoring Stanislav Osipov 2017.04.08
  • 2. Stanislav Osipov • DevOps Architect at ECommPay • Sr. DevOps Engineer at CityAds Media • CISO & CIO at Payler, Runet Award 2014 • H. SysOps Engineer at Mainpeople Worldwide • Sr. Deployment Engineer & Project Manager at Mirantis IT • Sr. DevOps at Undev (Digital October) • … and 10 more companies …
  • 3. Topics • Context • Global problem statistics • Tools • 3 metric layers for Zabbix • Channels • Escalations • Additional sources • Receive it together • Help to your managers
  • 4. Online advertising industry CPA, RTB, etc. •A lot of traffic •A lot of buzz and big data •A lot of short-term initiatives •A lot of IT support
  • 5. Online advertising industry CPA, RTB, etc. •A lot of madness & fuckups •A lot of fun & lulz!
  • 6. Classify problems 1. Datacenter failures 2. Connectivity failures 3. /dev/hands
  • 7. ENISA Annual Incident Report 2015 (issued on October 05, 2016) https://www.enisa.europa.eu/topics/incident-reporting/for-telcos/annual-reports
  • 8. 98,9 Connectivity failures Datacenter failures Failures overhead /dev/hands Problems distribution 1. Datacenter failures 2. Connectivity failures 3. /dev/hands – up to 99% of failures
  • 9. Tools External: •New Relic Synthetics •Pingdom Internal: •Zabbix •Munin •New Relic APM (runtime context) •собственные скрипты
  • 10. Concept of 3 layers for Zabbix • Base/System: OS metrics - Disk space, RAM, CPU, LA, net • Components: Daemon metrics - Nginx, PHP-FPM, memcached, MySQL, PgSQL, etc. • Advanced: Services & Apps metrics, DBA metrics, etc - Money earned ;-)
  • 11. Group or tag hosts in Zabbix • By envs: Production, Staging, Testing, Development • By datacenters or locations • By VM guest type • By operating systems ot type • By projects, services & components • By teams & ppl, if you cannot override the chaos
  • 12. Channels • Call/SMS • E-mail • New Relic mobile • Slack • Telegram
  • 13. Channels • Call/SMS • E-mail • New Relic mobile app • Slack • Telegram • … and screens!
  • 14. Escalations – Production only • >= HIGH: Notify Ops on duty – All channels, incl. SMS • >=CRITICAL: notify all Ops, add +15 mins delay before if not work hours (all channels) • 1 hour & >=CRITICAL: notify Ops management (any way, any time, all channels) • >= HIGH: Notify SDEs on daylight time – All channels • >= CRITICAL: notify SD TL of the affected system, add +15 mins delay before if not work hours (all channels) • = DISASTER: Notify IT top management & SD Team of the affected systems, add +5 mins delay if not work hours
  • 15. What more? New Relic Alerts on top of New Relic APM: • Detect % of fatal errors in code runtime. Create NR Alerts Policy: “If count is more than threshold”. If so, NR Alerts sends e-mail to bugs@company.tld or what you enter. • E-mail must trigger a “Bug on the production” ticket to IT support. • IT support assigns the ticket to the appropriate responsible team. • Every a such ticket must decrease KPI of the responsible team. • PROFIT: nobody wants bugs on production!
  • 16. What more? Should Ops know about the deployments?
  • 17. What more? Should Ops know about the deployments? • Add Jenkins jobs (about production deployments) hook notifications to IMs (Slack, Telegram, etc) • Add Pingdom bot to IMs • Add New Relic Synthetics notifications to IMs • Add anything else happens in your SD & IT …
  • 19. Stream this. One stream chat per IM. For everything.
  • 21. Help to your manager bonus track
  • 24. Count incidents and stats every month
  • 25. Implement CI/CD and monitor the uptime PROFIT: Uptime with observable result.