Your SlideShare is downloading. ×
Winning the metrics battle
Winning the metrics battle
Winning the metrics battle
Winning the metrics battle
Winning the metrics battle
Winning the metrics battle
Winning the metrics battle
Winning the metrics battle
Winning the metrics battle
Winning the metrics battle
Winning the metrics battle
Winning the metrics battle
Winning the metrics battle
Winning the metrics battle
Winning the metrics battle
Winning the metrics battle
Winning the metrics battle
Winning the metrics battle
Winning the metrics battle
Winning the metrics battle
Winning the metrics battle
Winning the metrics battle
Winning the metrics battle
Winning the metrics battle
Winning the metrics battle
Winning the metrics battle
Winning the metrics battle
Winning the metrics battle
Winning the metrics battle
Winning the metrics battle
Winning the metrics battle
Winning the metrics battle
Winning the metrics battle
Winning the metrics battle
Winning the metrics battle
Winning the metrics battle
Winning the metrics battle
Winning the metrics battle
Winning the metrics battle
Winning the metrics battle
Winning the metrics battle
Winning the metrics battle
Winning the metrics battle
Winning the metrics battle
Winning the metrics battle
Winning the metrics battle
Winning the metrics battle
Winning the metrics battle
Winning the metrics battle
Winning the metrics battle
Winning the metrics battle
Winning the metrics battle
Winning the metrics battle
Winning the metrics battle
Winning the metrics battle
Winning the metrics battle
Winning the metrics battle
Winning the metrics battle
Winning the metrics battle
Winning the metrics battle
Winning the metrics battle
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Winning the metrics battle

6,693

Published on

The slides from a presentation at Velocity Europe 2012 talk about how the Guardian does metrics an monitoring. …

The slides from a presentation at Velocity Europe 2012 talk about how the Guardian does metrics an monitoring.

The original proposal is at http://velocityconf.com/velocityeu2012/public/schedule/detail/26576 and there is also an article about it at http://www.guardian.co.uk/info/developer-blog/2012/oct/04/winning-the-metrics-battle

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
6,693
On Slideshare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
14
Comments
0
Likes
3
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Winning the metrics battle (finally)
  • 2. Winning the metrics battle (finally) Simon Hildrew Nick Satterly Infrastructure Developer Monitoring Engineer The Guardian The Guardian
  • 3. The metrics battlefield
  • 4. Total metrics 180,000 50,0001,400 2,800
  • 5. http://www.flickr.com/photos/ghostsigns/6676069121 5 minutes every 15 seconds http://www.flickr.com/photos/millynet/134071210
  • 6. developer dashboards
  • 7. Physical screens Screensaver hacks201510 5 0
  • 8. devhack
  • 9. business dashboards
  • 10. metrics + dashboards = culture change
  • 11. http://www.flickr.com/photos/chrisjames_taylor/5454315456
  • 12. our approach Side project ➡ PrioritiseIncremental upgrade ➡ Understand the real problemUse off the shelf tool ➡ Question the tools Pragmatic solution ➡ Be ambitious Done in a year ➡ Keep learning
  • 13. Prioritise
  • 14. drowning in workhttp://www.flickr.com/photos/iampeas/246738971
  • 15. a dedicated monitoring and metrics engineer
  • 16. Understand the real problem
  • 17. Urgent issue -current tool end of life
  • 18. The story so far...
  • 19. metrics were not helping us solve production outages
  • 20. ballooning number of applications
  • 21. but... difficult to instrument applications
  • 22. T.T. Detect +T.T. Fix = T.T. Diagnose + T.T. Resolve
  • 23. inaccessible tools http://www.flickr.com/photos/kdashy/2678539087
  • 24. inconsistent datahttp://www.flickr.com/photos/sybrenstuvel/2468506922
  • 25. hypothesising & arguing easier than measuring http://www.flickr.com/photos/nouqraz/200049988
  • 26. The ‘right’ thing• measure everything• measure frequently• measure each data point once• input and output must be open
  • 27. Question the tools
  • 28. Brute force?http://www.flickr.com/photos/epublicist/3546059144
  • 29. The safe option?http://www.flickr.com/photos/alicebartlett/2361209195
  • 30. Unintuitive?http://www.flickr.com/photos/merlijnhoek/2841785343
  • 31. Imposing a flawed model?http://www.flickr.com/photos/evansville/8953838/
  • 32. Too difficult / no progress?http://www.flickr.com/photos/ginja_andy/4165849136/
  • 33. Nagios• the “IBM” of monitoring tools• compromise over quantity and frequency of checks• < insert your criticism of nagios here >
  • 34. Zabbix• metric collection tightly coupled to monitoring tool• confusing UI with poor visualisation• needed brute force to make limited API work
  • 35. The ‘right’ thing• measure everything• measure frequently• measure each data point once• input and output must be open
  • 36. don’t compromise
  • 37. Be ambitious
  • 38. http://www.flickr.com/photos/mugley/2961131550 Throw work away
  • 39. Draw your dream
  • 40. http://www.flickr.com/photos/sk8geek/7358702704 Get as far as you can
  • 41. screens users db? alerting? Etsy dashboard message queue graphite SNMP? syslog? FITB ganglia api?network hosts applications
  • 42. Develop missing pieces http://www.flickr.com/photos/kalexanderson/5969012589
  • 43. screens users mongodb alerta elastic search Etsy dashboard message queue syslog SNMP graphite ganglia alerts alerts alerts FITB ganglia ganglia-apinetwork hosts applications
  • 44. Guardian Managementhttps://github.com/guardian/guardian-management
  • 45. Ganglia APIhttps://github.com/guardian/ganglia-api
  • 46. rescale image??? Alertahttps://github.com/guardian/alerta
  • 47. Current stack• Ganglia • Guardian management https://github.com/guardian/guardian-management• FITB • Guardian ganglia-api https://github.com/guardian/ganglia-api• Graphite • Guardian alerta• Etsy dashboards https://github.com/guardian/alerta
  • 48. Keep learning
  • 49. we are not there yet
  • 50. Watch the cultural changes
  • 51. detecting
  • 52. diagnosis
  • 53. diagnosis
  • 54. performance testing
  • 55. confirmation
  • 56. #monitoringsucks
  • 57. ➡ Prioritise➡ Understand the real problem➡ Question the tools➡ Be ambitious➡ Keep learning
  • 58. tools can change culture
  • 59. Thank you http://github.com/guardian http://gu.com/p/3ap5f Simon Hildrew Nick Satterly @sihil @nicksatterlysimon.hildrew@guardian.co.uk nick.satterly@guardian.co.uk

×