Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Observability für alle


Published on

Cloud Native Night Oktober 2018, Mainz: Vortrag von Florian Lautenschlager (@flolaut, Senior Softwareingenieur bei QAware) und Josef Fuchshuber (@fuchshuber, Cheftechnologe bei QAware)

=== Dokument bitte herunterladen, falls unscharf! Please download slides if blurred! ===

Tritt unserer Meetup-Gruppe bei:

In diesem Vortrag zeigen wir euch nicht, wie man Opentracing, Prometheus oder EfK in verteilten Hello World Anwendungen zusammensteckt, um möglichst viel von der Anwendung zu sehen – es gibt tonnenweise gute Vorträge da draußen. Wir zeigen euch stattdessen, wie wir Observability in einer realen Cloud-nativen Anwendung zuerst etabliert und anschließend für alle nutzbar und zugänglich gemacht haben. Unseren Lösungsansatz nennen wir „Collaborative Monitoring“. Wir sprechen in diesem Talk über die Idee, zeigen Details der Umsetzung und erzählen über Stolpersteine und den echten Mehrwert.

Published in: Software
  • Be the first to comment

Observability für alle

  1. 1. Observability für alle Cloud Native Night 25. Oktober 2018 Florian Lautenschlager @flolaut Josef Fuchshuber @fuchshuber
  2. 2. Observability für alle 3 In our cloud backend we have a vital microservice ecosystem.
  3. 3. Our team is just as vital and heterogeneous as our software. Observability für alle 4 Platform Developer App Developer Skill Developer Client Developer Tester Ops Help Desk Product Management Data Scientist UX Designer
  4. 4. Observability isn't just for operations.
  5. 5. What is the hardest step in the DevOps process? Observability für alle 6 DEV OPS
  6. 6. Much better: The 6 Cs of the DevOps Cycle. Observability für alle 7 Source:
  7. 7. Observability in the wild! A case study… and how we found collaborative monitoring.
  8. 8. Monitoring Toolchain: Simply Cloud Native. Observability für alle 9 Metrics Events Traces Java (Spring Boot) or Python on Azure / Kubernetes / Openshift / Docker
  9. 9. Monitoring Technical and Functional Observability für alle 10 Kubernetes Generic monitoring that does not need knowledge about the application. Monitoring that does need knowledge about the application. Health of platform and application Telemetry data Infrastructure-Monitoring Application-Monitoring
  10. 10. Monitoring Technical and Functional Observability für alle 11 Questions: Services are up and running Services can accept traffic Sources: Kubestate-Exporter Prometheus-Node-Exporter JMX, top, iostat etc. Questions: Use-Cases runtimes Service level agreements Sources: Specific instrumentation (around use cases, etc.) Health of platform and application Telemetry data Kubernetes Infrastructure-Monitoring Application-Monitoring
  11. 11. Four Golden Signals Dashboard
  12. 12. Observability für alle 13 Total duration Involved services <click>
  13. 13. Code-Slide: Standardize tracing logs and tags. Observability für alle 14 Span logs: We model database calls as well as other expensive calls as logs using a template to reduce the size of traces: db:<Repo>.<Call> took: xx ms. call:<Class>.<Method> took: xx ms. Span tags: Used to model values that are valid for a span. We use a template to standardize tags. span.tag. (to mark our tags) Environment (staging, integration , etc.) db (to mark spans with db calls.) param.<name>=value (call parameters)
  14. 14. Observability für alle 15 Logs for a given trace Involved Services
  15. 15. Code-Slide: Contextual logging. Observability für alle 16 Context of a log event. Everyone can easily see the logs for a specific context (trace etc.)
  16. 16. end-2-end tests are also integrated in our observability stack. 17 See the logs
  17. 17. We provide these tools and techniques to every developer, but… Observability für alle 18 Best SmartSpeaker in the World. Best Software-System in the World.Best Developers in the World. Blah blah Weather blah (voice) Don’t understand “Blah blah Weather blah” … in case of an error the experts of the best software system in the world were often asked, what is the problem?
  18. 18. I know. Most of you do this already. But what about .. Observability für alle 19 Collaborative Monitoring!?!?
  19. 19. An example is the best explanation. Observability für alle 20 and a chatbot… and a monitoring toolchain… Once there was a little tiny application…
  20. 20. Observability für alle 21
  21. 21. Observability für alle 22 Snip Snap Links request with trace and logs. verbose
  22. 22. Observability für alle 23 Or in case of an error
  23. 23. Observability für alle 24 Or for checking the health of the services
  24. 24. Observability für alle 25 Or for checking the status of e2e tests
  25. 25. Our current setup: A chatbot as generic interface. Observability für alle 26
  26. 26. Observability für alle Happy end. 27
  27. 27. Summary Collaborative Monitoring: Monitoring that allows everyone to benefit of without the need of expert knowledge.
  28. 28. Three steps to enable collaborative monitoring. 29 Standardize metrics, logs and traces Link and combine them as far as possible Integrate them into everyone's tools Start Here Correlate Events and Trace by Context Metrics with Events and Traces by Time Structured Logging + Context, Metric names, etc. Tools your team
  29. 29. Did we create an uncontrollable observability monster? Observability für alle 31
  30. 30. There’s No Such Thing as a Free Lunch • The more complex a microservice architecture is, the more sophisticated the observability solution must be. • For Collaborative Observability there is no out of the box solution. Observability für alle 32
  31. 31. Collaborative Monitoring by everyone. Observability für alle 33 Ease of use. Simple general interface to access various monitoring tools. Integrated into everyone's daily tools (ChatBots, E-Mail, etc.) Support all kinds of teams: Operations / Dev-Ops / Developers / QA-Team / My mum =) Allow everyone to get superman insights. Decrease Mean Time To Recovery (MTTR) with a fast analysis Integrates different kinds of monitoring data (traces, metrics and logs) of different monitoring layers. The right information. Provide relevant information for different teams, e.g. runtimes for perf. engineer. Level of Detail: Abstract (use case level) for management vs. details (database calls) for developers
  32. 32. Observability für alle 34 Ease of communication within bug tickets.
  33. 33. Lessons Learned Observability für alle 35 Tool stack is awesome: Prometheus, Sleuth / Zipkin, Logging (fluentD, elastic) is stable with a good documentation. Maximum flexibility compared to commercial products. But: Effort for concepts, implementation and quality checks. Conventions and rulesets are important! Mindset:We found that we had to convince people first. But we have seen a high level of acceptance. Example: Chatbot with trace-links is standard tool for discussing possible bugs between all project roles. Development and system understanding: No need of “cloudy” conversations. Just provide the context, e.g. a trace id. Example: Issues typically contain the context (trace id) that points the developer to the logs and the trace. Mark customer and automatic test traffic for better dashboards and analytics. Observability tool stack is a first calls citizen:You do not make friends when it's down
  34. 34. 29.10.2018 QAware 37