Baking-In Transparency

566 views

Published on

This is from an invited talk I gave at the Pittsburgh Perl Workshop a few years back. It's not often that I get a chance to talk to developers, so I thought I'd take advantage of it and yell at them a bit ;-)

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Baking-In Transparency

  1. 1. Baking-In Transparency Saturday, October 8, 11
  2. 2. About Me • Matt Simmons • • • • 11+ year System Administrator http://www.standalone-sysadmin.com @standaloneSA standalone.sysadmin@gmail.com Saturday, October 8, 11
  3. 3. Baking-In Transparency Saturday, October 8, 11
  4. 4. The Situation Saturday, October 8, 11
  5. 5. Devs make things • Small discrete programs • Large complex programs • Immense interconnected software suites Saturday, October 8, 11
  6. 6. Ops makes things go • Script using small discrete programs • Administer large complex programs • Cluster immense interconnected software suites Saturday, October 8, 11
  7. 7. There is a direct relationship between the software that developers write and the software that gets implemented by operations. Saturday, October 8, 11
  8. 8. The Problems Saturday, October 8, 11
  9. 9. Software needs to be monitored "When performance is measured, performance improves. When performance is measured and reported back, the rate of improvement accelerates." --Pearson’s Law Saturday, October 8, 11
  10. 10. Why? “You can’t manage what you can’t measure” --Robert Kaplan Saturday, October 8, 11
  11. 11. Software needs to be managedClearly we need to “Management by objective works - if you know the objective. 90% of the time, you don’t.” --Peter Drucker Saturday, October 8, 11
  12. 12. Clearly we need to measure... But what do we measure? And what metrics do we use? How do we obtain the measurements? Saturday, October 8, 11
  13. 13. What do we measure? Software Engineers measure... • Programmer Productivity • code size/efficiency • Defect Density • Bugs / module size • Requirement Stability • “feature creep” Saturday, October 8, 11
  14. 14. What do we measure? Operations measures... • • • Saturday, October 8, 11 Resource Utilization • Diskspace, Bandwidth, etc Infrastructure Stability • Service Uptime, MTBF, etc Performance • CPU / Memory efficiency, etc
  15. 15. What metrics do we use? It depends. Duh. Saturday, October 8, 11
  16. 16. The metrics that Ops needs to monitor are not always easy to obtain... Saturday, October 8, 11
  17. 17. ...even though they’re really important • Reliability • Repeatability • Root Cause Identification Saturday, October 8, 11
  18. 18. ...so not only is monitoring important... Saturday, October 8, 11
  19. 19. Monitoring is hard. Saturday, October 8, 11
  20. 20. correctly V Monitoring is hard. Saturday, October 8, 11
  21. 21. Why is monitoring hard? • Monitoring Software Suites are complex • Infrastructures are complex • Processes and applications are opaque to our futile requests to determine and track internal state Saturday, October 8, 11
  22. 22. Processes and applications are opaque to our futile requests to determine and track internal state Saturday, October 8, 11
  23. 23. The Solution(s) Saturday, October 8, 11
  24. 24. Dev/Ops working together gives • Team Interrelationships • Knowledge Sharing • Cross Training • Tool Sharing Saturday, October 8, 11
  25. 25. But more specifically... Methods of monitoring software can be BUILT INTO THE SOFTWARE Saturday, October 8, 11
  26. 26. How things are designed now Question: A well-designed program encounters an error. What happens? Answer: It handles the error, and continues processing requests Saturday, October 8, 11
  27. 27. How things are designed now Question: A poorly-designed program encounters an error. What happens? Answer: It crashes and burns Saturday, October 8, 11
  28. 28. Question: Which of those is easier to monitor? Saturday, October 8, 11
  29. 29. Obviously, dying to alert the monitoring system is overkill. (pun firmly intended) Saturday, October 8, 11
  30. 30. How do we make our statuses available to the monitoring system, then? It depends on the kind of software Saturday, October 8, 11
  31. 31. Remember these? • Small discrete programs • Large complex programs • Immense interconnected software suites Saturday, October 8, 11
  32. 32. Small Discrete Programs • Possibly a utility • Usually scripted or run manually • Typically short-term run time Saturday, October 8, 11
  33. 33. Small Discrete Programs: Monitoring • Screen output • Return codes • Catch signals • Great example: ping & SIGQUIT • SIGUSR1 & SIGUSR2 Saturday, October 8, 11
  34. 34. Signal Handling in Perl sub USR1_handler { drop_state_file(); } $SIG{‘USR1’} = ‘USR1_handler’; Saturday, October 8, 11
  35. 35. Large Complex Programs • Probably a daemon or interactive program • Long running, needs to be stable • Subject to resource change over time • May need to retain state across restarts • May have a web component Saturday, October 8, 11
  36. 36. Large Complex Programs: Reporting • No screen output (except debugging) • Logging • SNMP Agent/Traps • (seriously, read ‘man snmpd.conf’) • Named Pipes (FIFO) • State Output to DB (if appropriate) Saturday, October 8, 11
  37. 37. Net-SNMP Embedded Perl perl use Data::Dumper; perl sub myroutine { print "got called:",Dumper(@_),"n"; } perl $agent->register ('mylink', '.1.3.6.1.8765', &myroutine); Saturday, October 8, 11
  38. 38. Immense Interconnected Software Suites (or Large Saturday, October 8, 11 Suites)
  39. 39. Large Suites • Definitely retain state across restarts • Probably requires centralized controller • May use sockets to communicate • Probably has a web component Saturday, October 8, 11
  40. 40. Large Suites: Reporting Everything under “Large Programs”, plus... • Monitoring coordinated by the “central” node or program • Aggregation of state • Provide layer of abstraction from any insuite monitoring or reporting • Provide XML/CSV in addition to humanparsable HTML pages Saturday, October 8, 11
  41. 41. What we’re really doing is IPC So what other methods exist? Lots. Saturday, October 8, 11
  42. 42. Unix IPC • Sockets • RPC • Message Queues • FIFO • Shared Memory • And Many More... Saturday, October 8, 11
  43. 43. They shouldn’t all be used... Saturday, October 8, 11
  44. 44. What is important is that you use SOMETHING Saturday, October 8, 11
  45. 45. What is best? To crush your enemies, see them driven before you, and to hear the lamentation of their women? Saturday, October 8, 11
  46. 46. What is best? • An application that is easily and openly monitored • A developer that considers monitoring in all phases of design and development • A developer who writes their own monitoring checks Saturday, October 8, 11
  47. 47. Do us all a favor... When you develop software, be it scripts, utilities, programs, or suites, please please please... Saturday, October 8, 11
  48. 48. Do us all a favor... When you develop software, be it scripts, utilities, programs, or suites, please please please... Consider how we Ops folks will manage and monitor it. Saturday, October 8, 11
  49. 49. Baking-In Transparency Thank you for your time. Matt Simmons standaloneSA on Twitter standalone.sysadmin@gmail.com http://www.standalone-sysadmin.com Saturday, October 8, 11

×