Command & Conquer
!
© Tadej Murovec, HouseTrip, 2015 1
! Hello
Twitter: @tadejm
GitHub: github.com/tadejm
Work at HouseTrip.
© Tadej Murovec, HouseTrip, 2015 2
!
© Tadej Murovec, HouseTrip, 2015 3
! The debt
184 alerts per week
=> 1.095 alert per hour
© Tadej Murovec, HouseTrip, 2015 4
© Tadej Murovec, HouseTrip, 2015 5
! Rethink how to do alerting
• Alerts should be urgent, important, actionable and real.
• Over-monitoring is a harder problem to solve than under-
monitoring.
• Symptoms are a better way to capture more problems
more comprehensively and robustly with less effort.1
1
My Philosophy on Alerting – Rob Ewaschuk
© Tadej Murovec, HouseTrip, 2015 6
! Meet the Happy path
* Guests can search for properties
* Guests can browse properties
* Users can login or register
* Guests can send an enquiry or book a property
* Guests can pay
* Hosts can accept booking
© Tadej Murovec, HouseTrip, 2015 7
The end user does not care about MySQL server being
unreachable, but she does care about not being able to
view a property.
© Tadej Murovec, HouseTrip, 2015 8
Tools & services
© Tadej Murovec, HouseTrip, 2015 9
© Tadej Murovec, HouseTrip, 2015 10
New Relic
IS AWESOME.
© Tadej Murovec, HouseTrip, 2015 11
New Relic
• Application server response time
• Key transactions tracking with Apdex T
• Uptime monitoring
• Application segmentation into policy groups
© Tadej Murovec, HouseTrip, 2015 12
© Tadej Murovec, HouseTrip, 2015 13
Datadog
IS AWESOME.
© Tadej Murovec, HouseTrip, 2015 14
Datadog
• Monitoring scheduled errands
• Background queues and services health check
• Custom metrics that make sense for the business
© Tadej Murovec, HouseTrip, 2015 15
© Tadej Murovec, HouseTrip, 2015 16
Slack
IS AWESOME.
© Tadej Murovec, HouseTrip, 2015 17
Slack
• Chronological tracking, search
• Slack channels as urgency segmentation
#alerts, #notifications
© Tadej Murovec, HouseTrip, 2015 18
© Tadej Murovec, HouseTrip, 2015 19
PagerDuty
IS OK.
And wakes you up at night. Sometimes.
© Tadej Murovec, HouseTrip, 2015 20
PagerDuty
• Propagates alerts to duty engineers
• Manage duty rotation schedule
• Keep notes about alerts
© Tadej Murovec, HouseTrip, 2015 21
! Progress and results
© Tadej Murovec, HouseTrip, 2015 22
! 4 rules for efficient alerting
⏰ Alert on symptoms not causes
" Get your teammates to pair review the alerts
# Prefer under monitoring to over monitoring
$ Use notifications to prevent alerts
© Tadej Murovec, HouseTrip, 2015 23
! Be a good citizen
© Tadej Murovec, HouseTrip, 2015 24
© Tadej Murovec, HouseTrip, 2015 25
That's all folks!
Thanks! !
© Tadej Murovec, HouseTrip, 2015 26

Command & conquer: Red alert

  • 1.
    Command & Conquer ! ©Tadej Murovec, HouseTrip, 2015 1
  • 2.
    ! Hello Twitter: @tadejm GitHub:github.com/tadejm Work at HouseTrip. © Tadej Murovec, HouseTrip, 2015 2
  • 3.
    ! © Tadej Murovec,HouseTrip, 2015 3
  • 4.
    ! The debt 184alerts per week => 1.095 alert per hour © Tadej Murovec, HouseTrip, 2015 4
  • 5.
    © Tadej Murovec,HouseTrip, 2015 5
  • 6.
    ! Rethink howto do alerting • Alerts should be urgent, important, actionable and real. • Over-monitoring is a harder problem to solve than under- monitoring. • Symptoms are a better way to capture more problems more comprehensively and robustly with less effort.1 1 My Philosophy on Alerting – Rob Ewaschuk © Tadej Murovec, HouseTrip, 2015 6
  • 7.
    ! Meet theHappy path * Guests can search for properties * Guests can browse properties * Users can login or register * Guests can send an enquiry or book a property * Guests can pay * Hosts can accept booking © Tadej Murovec, HouseTrip, 2015 7
  • 8.
    The end userdoes not care about MySQL server being unreachable, but she does care about not being able to view a property. © Tadej Murovec, HouseTrip, 2015 8
  • 9.
    Tools & services ©Tadej Murovec, HouseTrip, 2015 9
  • 10.
    © Tadej Murovec,HouseTrip, 2015 10
  • 11.
    New Relic IS AWESOME. ©Tadej Murovec, HouseTrip, 2015 11
  • 12.
    New Relic • Applicationserver response time • Key transactions tracking with Apdex T • Uptime monitoring • Application segmentation into policy groups © Tadej Murovec, HouseTrip, 2015 12
  • 13.
    © Tadej Murovec,HouseTrip, 2015 13
  • 14.
    Datadog IS AWESOME. © TadejMurovec, HouseTrip, 2015 14
  • 15.
    Datadog • Monitoring schedulederrands • Background queues and services health check • Custom metrics that make sense for the business © Tadej Murovec, HouseTrip, 2015 15
  • 16.
    © Tadej Murovec,HouseTrip, 2015 16
  • 17.
    Slack IS AWESOME. © TadejMurovec, HouseTrip, 2015 17
  • 18.
    Slack • Chronological tracking,search • Slack channels as urgency segmentation #alerts, #notifications © Tadej Murovec, HouseTrip, 2015 18
  • 19.
    © Tadej Murovec,HouseTrip, 2015 19
  • 20.
    PagerDuty IS OK. And wakesyou up at night. Sometimes. © Tadej Murovec, HouseTrip, 2015 20
  • 21.
    PagerDuty • Propagates alertsto duty engineers • Manage duty rotation schedule • Keep notes about alerts © Tadej Murovec, HouseTrip, 2015 21
  • 22.
    ! Progress andresults © Tadej Murovec, HouseTrip, 2015 22
  • 23.
    ! 4 rulesfor efficient alerting ⏰ Alert on symptoms not causes " Get your teammates to pair review the alerts # Prefer under monitoring to over monitoring $ Use notifications to prevent alerts © Tadej Murovec, HouseTrip, 2015 23
  • 24.
    ! Be agood citizen © Tadej Murovec, HouseTrip, 2015 24
  • 25.
    © Tadej Murovec,HouseTrip, 2015 25
  • 26.
    That's all folks! Thanks!! © Tadej Murovec, HouseTrip, 2015 26