Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

See through software


Published on

Software, by default, is opaque. It doesn't explain what it is doing, or how well it's going. Only through development do we gain visibility into systems, and thus commonly only the needs of developers are met.

See through software seeks to acknowledge the needs of the entire organization through democratization of data access.

This talk focuses on logging and metrics as two sources of potential insight and the architecture CommerceHub has adopted to democratize access to this information

Published in: Software
  • Be the first to comment

  • Be the first to like this

See through software

  1. 1. See-through software Using logs, metrics and visualization to see your app at runtime and share your vision with others
  2. 2. The best technical solutions are ones that solve for human relationships
  3. 3. Opaque software suffers from a lack of focus on the operations user experience
  4. 4. Software is opaque by default
  5. 5. Software is opaque by default What's it doing?
  6. 6. Software is opaque by default Is it going well?
  7. 7. Software is opaque by default When's it going to be done?
  8. 8. Software is opaque by default Does it need me to do anything?
  9. 9. When you write opaque software
  10. 10. This is the user experience of operations
  11. 11. This is the user experience of support
  12. 12. This is the user experience of your boss
  13. 13. Opaque software leads to • Misaligned priorities • Loss of productivity • A generally "unprofessional" experience for customers • The "us vs them" attitude that is the antithesis of DevOps culture
  14. 14. If you don’t provide facts, you encourage mythology
  15. 15. See-through software acknowledges the operations user experience of the entire organization
  16. 16. “A good user experience should make a user feel smart, powerful and safe.” –Jason Nemec
  17. 17. We feel smart when • We understand what's going on • We understand how to change things • Others share our understanding
  18. 18. We feel powerful when • We are able to change things • We see the results of our changes • We can find an answer to our questions
  19. 19. We feel safe when • We know there isn't a problem • We know if there is a problem, we'll be able to understand it • We can trust others to react on our behalf
  20. 20. These principles help to develop a roadmap to improving transparency
  21. 21. Transparent software gets your attention • Dashboarding • Alerting
  22. 22. Transparent software takes on a shape • Graphing • Modeling
  23. 23. Transparent software tells stories • Logging • Auditing • Reporting
  24. 24. Transparent software responds interactively to questions • Ad Hoc Queries • Post Hoc Analytics
  25. 25. Transparent software is democratic • Wikis • Shared visibility • Persistent chat rooms
  26. 26. Software does not become transparent as the result of any single project • Software evolves; its UX needs to evolve with it • Insight is rarely easy to produce, and easy to produce information is rarely insightful • Insight is frequently driven from the bottom up or from the outside in
  27. 27. Democratization means everybody benefits And everybody has a role to play
  28. 28. "Clearinghouse" services • Store data for people who can’t get it themselves • Collect and persist data from many different sources • Provide a single engine for serving information • Reduce pressure on critical infrastructure from interested users
  29. 29. "Visualization" services • Provide studio-like tools allow "non-technical" users feel safe to experiment • Allow for rapid, real-time development of new insights on old data • Allow for sharing and repurposing of insight
  30. 30. The see-through system at runtime Logging
  31. 31. Good logs tell a story • Each statement is a sentence: it needs verbs and nouns • Each statement has a setting -- where, when and who • It should be simple to reconstruct the story told by independent sentences
  32. 32. Aggregate your logs to create an epic • Discover systems that are acting aberrantly • Correlate errors between coordinating systems • Graph meaningful patterns in your stories
  33. 33. Index your logs to find interesting stories quickly • Audit individual chains of processing from start to finish • Slice up your reports so they interest a specific group or team • Build new reports quickly to solve unpredicted needs
  34. 34. Aggregation architecture Components: log to the fastest, convenient, least likely to fail store available (e.g. local disk)
  35. 35. Aggregation architecture Log shippers: asynchronously publish logs to an aggregator
  36. 36. Aggregation architecture Aggregator: parse, clean, enrich and store logs
  37. 37. Aggregation architecture Clearinghouse: hold data and standardize access
  38. 38. Aggregation architecture Visualization: Allows data to inform and be manipulated by end users
  39. 39. Log aggregations on private networks The ELK stack (Elasticsearch + Logstash + Kibana)
  40. 40. Log aggregation in the Cloud
  41. 41. Developing apps with log aggregation in mind
  42. 42. • Use Correlation IDs throughout your system • Don't log secrets • Build log strategies with shipping and rolling in mind • Have a way to capture crashes • Log using techniques that preserve context, such as JSON
  43. 43. The see-through system at runtime Dashboarding
  44. 44. Focus on UX
  45. 45. Make users feel smart • Dashboards should inform without a lot of explanation or prior knowledge • Dashboards should direct the user to the next step
  46. 46. Make users feel powerful • Dashboards should update frequently (aim for <10s) • Dashboards should help users perform their job • Dashboards should respond to the user's needs
  47. 47. Make users feel safe • Dashboards should not overwhelm • Dashboards accuracy should be known • Thresholds should be meaningful • Using a dashboard should not endanger the running software
  48. 48. How to build a dashboard item
  49. 49. Are you concerned with a technical or a business issue? • Technical: Machine 123 is slow, West Coast users are slow, we’re moving 80 GB/s • Business: Client ABC is slow, logins are slow, we’re moving 1000k orders/s
  50. 50. How does a stressed system look? How can you tell it from an unstressed system?
  51. 51. What kind of comparisons do you want to provide? • Time series vs flat • Machine vs Machine • Current vs Previous • Current vs Threshold
  52. 52. Dashboarding architecture Metric source: a process within an app that can produce a numeric value
  53. 53. Dashboarding architecture Metrics collection API: decouples the collection of metrics from their publishing; generally still part of the app
  54. 54. Dashboarding architecture Stats Aggregator: an out-of-process component that creates aggregate data points from a stream of metrics
  55. 55. Dashboarding architecture Metrics clearinghouse: hold data and standardize access
  56. 56. Dashboarding architecture Visualization: Allows a user to build and correlate graphs
  57. 57. Dashboarding architecture Dashboarding: Allows a user to share a distilled vision of data
  58. 58. Dashboarding on private networks StatsD + Graphite
  59. 59. Dashboarding in the cloud
  60. 60. Developing apps with dash boarding in mind
  61. 61. • Collect and report everything that’s “free” • Collect and report deep, valuable application metrics at runtime • Understand aggregation and know when to apply it • Be aware of multiplicative effects of metrics collection on bandwidth, storage and billing
  62. 62. ScoreKeeper Gather metrics from existing datasources into statsd/ Graphite
  63. 63. See-through software • Lets the people whose jobs depend on software understand what and how it's doing. • Empowers people to ask their own questions and share their insights
  64. 64. Help teams become more successful • By understanding when there's a problem • By focusing energy where it's needed most • By talking to customers in a competent and informed way
  65. 65. @DataMiller