Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Make Runtime Data Useful in Teams

97 views

Published on

FOSDEM 2020, February 2020, Brussels: Talk by Florian Lautenschlager (@flolaut, Software Architect at QAware) & Robert Hoffmann (@robhoffmax, Lead Architect VPaaS at Telekom)

=== Please download slides if blurred! ===

Abstract: We introduced distributed tracing, central logging with trace correlation and monitoring with Prometheus and Grafana in a large internationally distributed software development project from the beginning. The result: Nobody used it.

In this talk we show the good and not so good sides we have learned while introducing and operating the observability tools. We show which extensions and conventions were necessary in order to carry out a cultural change and to awaken enthusiasm for these tools. Today the tools are a first-class citizen and people are shouting when they are not available.

Watch the video of the talk: http://bofh.nikhef.nl/events/FOSDEM/2020/UA2.120/useless.webm

Published in: Data & Analytics
  • Be the first to comment

Make Runtime Data Useful in Teams

  1. 1. From Zero to Useless to Hero Make Runtime Data Useful in Teams FOSDEM 2020 Robert Hoffmann @robhoffmax Florian Lautenschlager @flolaut
  2. 2. Dr. Florian Lautenschlager Software Architect {name.surname}@qaware.de Robert Hoffmann Lead Architect VPaaS {name.surname}@telekom.de Contact us if you want. =)
  3. 3. 3 “Hallo Magenta” Building a European Voice Assistant Platform
  4. 4. 4 From Zero to 1 international co-development > 900 collaborators > 500 active git repos > 100 services
  5. 5. T=Zero
  6. 6. Some Cloud SQL Databases Storage NoSQL Databases Proxy http http Kubernetes <Pod> Admin Gateway <Pod> API Gateway Voice Services <Pod> Service <Pod> Service <Pod>Service API Admin https https External Services IDM CDN … Skills Weather … Radio Device Services <Pod> Service <Pod> Service <Pod>Service API Admin 6 Complex Architecture. Complex Software System. Complex Analysis.
  7. 7. Exploration Probes, Collection and Storage Storage, Exploration Transport Storage, Exploration Probes and Collection Metric sampling-based Textual event-based Span event-based 7 Advanced toolchain needed. Standard used. NEW: Humio NEW: Jaeger NEW: Grafana Cloud
  8. 8. Generic-Standard-Runtime-Data-Smarthub-Service-Data-Model 9 Logging Concept Tracing Best- Practices Standard Metrics for incoming and outgoing Requests Standard Database Metrics and specific ones. Standard Readiness and Liveness checks with Metrics TraceId in every Response
  9. 9. – We, the ignorant ones “Done. This solves all our problems. They will it!“
  10. 10. 11 Our team: Colorful. Platform Developers Skill Developers Operation Heros First Level Support Data Scientists Production Management Tester Mobile Developers
  11. 11. 12 Our solution: Monochrome. Platform Developers Skill Developers Operation Heros First Level Support Data Scientists Production Management Tester Mobile Developers
  12. 12. T=Useless Because we are monochrome
  13. 13. Nobody wants to be a Beginner. Optimize for Intermediate. About Face - Alan Cooper Intermediate ExpertBeginner Toolchain 14
  14. 14. Useful = Utility + Usability. 🤔 Utility: whether it provides the features you need. ✅ You can find all the information... Usability: how easy & pleasant these features are to use: Learnability, Efficiency, Memorability, Error Handling, Satisfaction. ❌ ... if you really know how and where to look (as an Expert). Usability 101 - Jakob Nielsen https://www.nngroup.com/articles/usability-101-introduction-to-usability/ What we did to move our solution from expert to intermediate. 15
  15. 15. Close Gaps: Link data and tools as much as possible. 16 Developer-, Tester- & Operations-oriented Dashboards with links to logs and e2e test runs
  16. 16. Close Gaps: Link data and tools as much as possible. 17 Developer- oriented Pipeline UI - promote software and get runtime data
  17. 17. Close Gaps: Link data and tools as much as possible. 18 Developer- oriented Pipeline dashboards with logs, traces
  18. 18. Close Gaps: Link data and tools as much as possible. 19 Developer-, Tester-, & Operations-oriented Gangway landing page to access k8s, logs, traces, metrics
  19. 19. Make functional use: First-level support integration. 20 Customer First Level Support First-Level- & Operations-oriented GDPR-aware debugging in production: Token-based user-specific debug logging and tracing
  20. 20. Referencing Trace IDs as a common base to discuss and find relevant data Make functional use: Resolving Tickets more easily. 21 Developer-, Tester- First-Level-, & Operations-oriented
  21. 21. 22 Any project member has easy access. Just open your chat. Anyone can learn by example.
 See how others use the service. Support in case of an error.
 By others or technical: • Trace: Request Trace • Logs: Request Application log Lower the access hurdle: CLI & Chatbot integration. Everybody-oriented
  22. 22. Visibility and Increased Trust: Toolchain acts as a safety-net as it shows the runtime behavior. People can be sure to understand their services, e.g. in case of an error. Self-Awareness: Accept and understand that software has a runtime behavior. Not all developers feel comfortable with dynamic analysis, but now they have means to see and understand. Clear Communication: Inner & cross-team communication is easier. Different people can easily share the same context, e.g. trace-Id, log messages, request flow. Error Culture: Failures are more easily accepted. As the software system is visible and the cross-team communication is clear, people tend to accept failures and work together on solutions. Ownership: Increased acceptance is the foundation for end-to-end responsibility. Due the disability and increased trust, clear communication and error culture, people are more inclined to take ownership for their services. Changes in the culture that we have recognized. 23 Everybody-oriented
  23. 23. T=Hero Because we are a little bit colorful
  24. 24. Start here Select Toolchain & Standardize Metrics, Logs, Traces Tools your team Link and combine them as far as possible Integrate them into everyday tools & Processes

×