Monitoring is easy, why are we so bad at it presentation

1,527 views

Published on

Published in: Technology, Business
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,527
On SlideShare
0
From Embeds
0
Number of Embeds
32
Actions
Shares
0
Downloads
24
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Monitoring is easy, why are we so bad at it presentation

  1. 1. Monitoring is easy; why do we suck at it? / monitoring it allTuesday, November 8, 2011
  2. 2. Who is this guy? @postwait Author of “Scalable Internet Architectures” Pearson, ISBN: 067232699X Contributor to “Web Operations” O’Reilly, ISBN: 978-1-4493-7744-1 Founder of OmniTI, Message Systems, Fontdeck, & Circonus I like to tackle problems that are “always on” and “always growing.” I am an Engineer A practitioner of academic computing. IEEE member and Senior ACM member. On the Editorial Board of ACM’s Queue magazine.Tuesday, November 8, 2011
  3. 3. Monitoring: let’s start with a definition. • analytics • trending • fault-detection / alerting • capacity planning • it is the collection and use of telemetry dataTuesday, November 8, 2011
  4. 4. What monitoring is not • controls • via a monitoring you observe, you do not influenceTuesday, November 8, 2011
  5. 5. So why do we suck at it? tl;dr because we think about • networks, • systems, and • applications instead of what matters: business.Tuesday, November 8, 2011
  6. 6. Your purpose • Your purpose is to make your company’s web business operate. (hence: “web operations”)Tuesday, November 8, 2011
  7. 7. Your purpose • Your purpose is to make your company’s web business operate. (hence: “web operations”)Tuesday, November 8, 2011
  8. 8. Your purpose • ensure business successTuesday, November 8, 2011
  9. 9. Understanding your purpose • who defines business success? • shareholders, ultimately • the board of directors, in their stead • the CEO on an operational, day-to-day basisTuesday, November 8, 2011
  10. 10. Understanding your purpose • Assuming your CEO is doing a good job • the executive team understands these metrics • Assuming the executive team is competent • their reports understand these metrics (at least the pertinent ones)Tuesday, November 8, 2011
  11. 11. Pertinent == Problematic • You enable all aspects of the business • All these metrics are pertinentTuesday, November 8, 2011
  12. 12. But why? • You could simply track stuff that is in your purview. • Why not?Tuesday, November 8, 2011
  13. 13. Technology • As a technology operations group, you have the technology. We can rebuild him. We have the technology. We can make him better than he was. Better...stronger...faster. - Oscar GoldmanTuesday, November 8, 2011
  14. 14. Why is our technology better? • Simply put: MTTDTuesday, November 8, 2011
  15. 15. Now, what about your purview? • Obviously monitoring the business is useful. • However, you cannot directly affect business. • You indirectly affect it by operating the web portion.Tuesday, November 8, 2011
  16. 16. What can you change? • You can control: • releases, • performance, • stability, • computing resources, • networking, • and availability.Tuesday, November 8, 2011
  17. 17. Visualize! • All this information must be presented visually.Tuesday, November 8, 2011
  18. 18. Text. • Text is incredibly useful. • Consider: deployment.Tuesday, November 8, 2011
  19. 19. Code Deployment r82394 (by corey) 1h 7m 9s ago previous deploy 1h 42m 18s ago 11 deploys todayTuesday, November 8, 2011
  20. 20. Code Deployment r82394 15:03:14 2011/06/15 previous deploy 1h 42m 18s ago 11 deploys todayTuesday, November 8, 2011
  21. 21. Code Deployment r82394 (by corey) 1h 7m 9s ago previous deploy 1h 42m 18s ago 11 deploys todayTuesday, November 8, 2011
  22. 22. Code Deployment r82394 (by corey) 1h 7m 9s ago previous deploy 1h 42m 18s ago 11 deploys todayTuesday, November 8, 2011
  23. 23. Code Deployment r82394 (by corey) 1h 7m 9s ago previous deploy 1h 42m 18s ago 11 deploys todayTuesday, November 8, 2011
  24. 24. Code Deployment r82394 (by corey) 1h 7m 9s ago previous deploy 1h 42m 18s ago 11 deploys todayTuesday, November 8, 2011
  25. 25. Text. • Numbers are trickier. • So many representations from which to choose.Tuesday, November 8, 2011
  26. 26. BewareTuesday, November 8, 2011
  27. 27. BewareTuesday, November 8, 2011
  28. 28. BewareTuesday, November 8, 2011
  29. 29. BewareTuesday, November 8, 2011
  30. 30. Gauges require understanding • Gauges imply a deep understanding of • bounds, and • tolerancesTuesday, November 8, 2011
  31. 31. Gauges require understanding • General advice • If the range will ever change, don’t use gaugesTuesday, November 8, 2011
  32. 32. Gauges require understanding • Great for: • percentages, • temperature, • power per rack, • bandwidth per uplinkTuesday, November 8, 2011
  33. 33. Gauges require understanding • Bad for: • IOPS, • current visitor counts, • requests per second, • bandwidth overallTuesday, November 8, 2011
  34. 34. Graphs are often betterTuesday, November 8, 2011
  35. 35. Even little onesTuesday, November 8, 2011
  36. 36. Think relativelyTuesday, November 8, 2011
  37. 37. Think relatively xxxxxxxxxxxxxxx xxxxxxxxxxxxxxxTuesday, November 8, 2011
  38. 38. Users live all around the world • Users live just about everywhere • “Where?” is a useful questionTuesday, November 8, 2011
  39. 39. GeolocationTuesday, November 8, 2011
  40. 40. Geolocation is interesting • to marketing • to legal • (okay to everyone) • but, not so useful to operationsTuesday, November 8, 2011
  41. 41. Geolocation is interesting • perhaps more interestingTuesday, November 8, 2011
  42. 42. Geolocation is interestingTuesday, November 8, 2011
  43. 43. Geolocation • Internet location != geo-political locationTuesday, November 8, 2011
  44. 44. ASN location • The closest thing to geo-political boundaries is peering -bash-4.0$ /usr/sbin/bgpctl show rib 66.78.236.243 flags: * = Valid, > = Selected, I = via IBGP, A = Announced origin: i = IGP, e = EGP, ? = Incomplete flags destination gateway lpref med aspath origin 66.78.236.0/22 64.202.119.7 100 0 23352 4436 2914 3356 32778 i ### ASN 327778 is “Smart City Networks, L.P.”Tuesday, November 8, 2011
  45. 45. ASN locationTuesday, November 8, 2011
  46. 46. What about the business?Tuesday, November 8, 2011
  47. 47. What about the business? Authorizations : Hard Failed : Soft Failed : ReleasesTuesday, November 8, 2011
  48. 48. Is that all? • Hells no.Tuesday, November 8, 2011
  49. 49. It’s all about real-time • Everything so far is old hat (maybe) • Every business unit has visualizations like this • You need to combine the data • You need to make it real-timeTuesday, November 8, 2011
  50. 50. Thanks • web demo ensues....Tuesday, November 8, 2011

×