Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Why Visibility into Your Stack Matters

795 views

Published on

When running any amount of systems, gaining visibility into what they are doing can be a non-trivial matter. Starting on the path to monitoring can prove bumpy, and if you don’t measure, you don’t know. In this session, Michael Fiedler, Director of TechOps, will speak on personal experience with scalability, deployment, and monitoring challenges prior to using Datadog - and how that changed. He will cover how to get started, and examples of where monitoring the company's platform with Datadog provided the guiding light towards the team solving scalability problems.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Why Visibility into Your Stack Matters

  1. 1. Why visibility into your stack matters or, Do you see it all?
  2. 2. Mike Fiedler Operations Datadog.com Twitter: @mikefiedler GitHub: @miketheman OpsSchool.org Chef Community Roller Derby Referee Skydiver ©Alex Erde
  3. 3. –CEO calling your cellphone at 03:00 “The site is slow.”
  4. 4. What? • typical monitoring implementation story • an alternative approach
  5. 5. (CC BY 2.0) http://www.gotcredit.com/ https://flic.kr/p/6439SA
  6. 6. LB Data User Web
  7. 7. (CC BY 2.0) www.futurealpha.com https://flic.kr/p/8PhF4g
  8. 8. (CC BY 2.0) Aristocrats-hat https://flic.kr/p/6qdTC1 –W. Edwards Deming, The Elements of Statistical Learning “In God we trust; all others bring data.”
  9. 9. You want more?
  10. 10. • graphite • ganglia • mongodb • mysql • influxdb • socket.io • datadog • …
  11. 11. from bottle import route import pymongo import json db = pymongo.Connection(‘mongodb://... @route('/insert/:name') def insert(name): doc = {'name': name} db.words.update( doc, {"$inc":{"count": 1}}, upsert=True ) return json.dumps(doc, default=default)
  12. 12. from bottle import route import pymongo import json from statsd import statsd db = pymongo.Connection(‘mongodb://... @route(‘/insert/:name') @statsd.increment('wordcount.insert') def insert(name): doc = {'name': name} db.words.update( doc, {"$inc":{"count": 1}}, upsert=True ) return json.dumps(doc, default=default)
  13. 13. Time is a Cruel Master
  14. 14. (CC BY-SA 2.0) https://www.flickr.com/theilr/ https://flic.kr/p/8MC5YM
  15. 15. Have • systems • applications • services • developers • operators • customers
  16. 16. Have • systems • applications • services • developers • operators • customers
  17. 17. Polyglot Platforms
  18. 18. Complex Systems
  19. 19. Disparate Locations
  20. 20. Information Overload
  21. 21. –CEO calling your cellphone at 03:00 “The site is slow.”
  22. 22. (CC BY 2.0) www.futurealpha.com https://flic.kr/p/8PhF4g
  23. 23. Does this matter?
  24. 24. Top-down • work metrics • resource metrics • events
  25. 25. Work Metrics throughput (rps), success/error, performance (latency)
  26. 26. Resource Metrics utilization (%busy), saturation (queued), errors, availability
  27. 27. Events change/build/deploy, alerts, anything notable
  28. 28. Trend resource metrics, notify on changes
  29. 29. Wake people up when work metrics go awry
  30. 30. Slice and Dice exploration and aggregation
  31. 31. Set-and-Forget
  32. 32. Just-In-Time Information
  33. 33. Does it scale?
  34. 34. Customer Stats • AdRoll, ~2m transactions/second • SimpleReach, ~7b measurements/day • MercadoLibre, ~18k hosts monitored • AirBnB, 3000+ monitors defined
  35. 35. –CEO calling your cellphone at 03:00 “The site is slow.”
  36. 36. –You “Thanks. We know, and are already investigating.”
  37. 37. –You, because you never got that call in the first place due to proactive data collection and alerting. “[silence]”
  38. 38. Questions?
  39. 39. –M. Fiedler, Twitter: @mikefiedler “If you don’t measure, you don’t won’t know.”

×