Spring makes it easy to build and run applications quickly without boilerplate code. Once up and running though, you will want operational insight into the behavior of your system, beyond basic monitoring of system health. How can you best achieve observability with Spring applications? Spring Boot's Actuator can get you quite far with individual instances, but for observability of a distributed system, additional tools are needed. Fortunately, Spring makes using those tools pretty easy!
This introduces three main pillars of observability - logging, metrics, and tracing. For each of these pillars, you will learn how you can integrate or instrument it in your Spring Boot-based applications. Specifically, some projects that will be covered include Zipkin for distributed tracing, Micrometer as a metrics façade and exporter, and Spring Cloud Sleuth for tracing instrumentation and log correlation. With the unique advantages offered by each pillar combined, you can achieve powerful observability.
2. 2
Assumptions:
• Basic knowledge of Spring Boot
• You care about user experience
Agenda:
• Observability: what / why
• 3 pillars of observability w/ Spring
• Logging
• Metrics
• Tracing
• Putting it all together
4. 4
What is observability?
Observability is achieved through a set of tools and practices that aims to
turn data points and context into insights.
• Beyond traditional monitoring
• Constant partial degradation/failure
• Expect the unexpected
• Answer unknown questions about your system
5. 5
Why care about observability?
You want to provide a great experience for users of your system.
• Observability builds confidence in production
• Ownership. Give yourself the tools to be a good owner.
• MTTR is key – failures will are happening
• early detection + fast recovery + increased understanding
* MTTR = mean time to recovery
7. 7
Spring Boot Actuator
• Spring Boot Actuator is awesome.
• You get so much out-of-the-box.
• But... is it enough? Like most things, it depends.
• Inherently information is instance-scoped
8. Spring Boot Admin makes it easy to
access and use each instance’s
Actuator endpoints.
https://github.com/codecentric/spring-boot-admin
11. 11
Distributed systems are hard
• Any request spans multiple processes
• Need to stitch together local info and slice/drill-down
• Increased points of failure
• Scaling and ephemeral instances*
* Not strictly properties of a distributed system
13. 13
Logging and metrics and tracing… oh my!
• 3 sides to observability
• Non-functional requirements (generic/specific)
• Overlap exists, but use all 3 for best insight
Source: Peter Bourgon, access date: 2018-05-18
http://peter.bourgon.org/blog/2017/02/21/metrics-tracing-and-logging.html
14. 14
Effort to reward
When it comes to logging, metrics, and tracing:
• Common needs just work out-of-the-box.
• Custom needs can be met with a little extra effort.
See also: 80-20 rule
16. 16
Logging in general
• Arbitrary messages you want to find later
• Formatted to give context
• Key parts of context: logging levels, timestamp
• Message examples
• Exceptions/stack traces
• Additional context
• Access logs
• Request/response bodies
17. 17
Basic logging
VM App1 Logs
I want to check
the logs…
~~~~~~~~~~~~~
~~~~~~~~~~~~
~~~~~~~~~~~~~~
Get logs Search
logs
!
App2
App1 App2
~~~~~~~~~~~~~
~~~~~~~~~~~~
~~~~~~~~~~~~~~
~~~~~~~~~~~~~
~~~~~~~~~~~~
~~~~~~~~~~~~~~
~~~~~~~~~~~~~
~~~~~~~~~~~~
~~~~~~~~~~~~~~
"
Legend:
18. 18
• Does not scale; Too much work and knowledge required
• Multithreaded, concurrent requests intermingle logs
• Low usability – searching is limited/difficult
Problems with basic logging
20. 20
Logging and Spring
Spring Boot
• Configurable via Spring Environment (see also Spring Cloud Config)
• log format – make a common format across applications
• log levels (logging.level.*)
• Configurable via Actuator (at runtime)
• log levels
21. 21
Logging and Spring
Spring Cloud Sleuth
• adds trace ID for request correlation
• Query all collected logs by any field or full-text search
• ,
Centralized, request-correlated, formatted logs
indexed and searchable across your system
23. 23
Metrics in general
Characteristics:
• Aggregate time-series data; bounded size
• Can slice based on dimensions/tags/labels*
Purpose:
• Visualize / identify trends and deviation
• Alerting based on metric queries
* See also https://www.datadoghq.com/blog/the-power-of-tagged-metrics/
24. 24
Metrics examples
Example metric Type Example tags
response time timer uri, status, method
number of classes loaded gauge
response body size histogram uri, status, method
number of garbage collections counter cause, action
28. 28
Metrics and Spring
• Spring Boot 2 introduced Micrometer as its native metrics library
• Micrometer supports many metrics backends
• e.g. Atlas, Datadog, Influx, Prometheus, SignalFX, Wavefront
• Instrumentation of common components auto-configured
• JVM/system, HTTP server/client requests, Spring Integration, DataSource…
• Custom metrics also easy to add
29. 29
Metrics and Spring
• Configure via properties
• management.metrics.*
• Disable certain metrics
• Enable percentiles/SLAs/percentile histograms
• Common tags
• e.g. application name, instance, stack, region, zone
• via MeterRegistryCustomizer or properties from Spring Boot 2.1
32. 32
Distributed tracing
Distributed tracing: tracing across process boundaries
• Propagate context/hierarchy; join together after
• Request-scoped latency analysis across services
• Metrics lack request context
• Logging has local context but limited distributed info
35. 35
Zipkin architecture
Zipkin server
transport
collector UI
storage
datastore
API
!
• HTTP
• Kafka
• RabbitMQ
• In-memory *
• MySQL *
• Elasticsearch
• Cassandra
Reference: https://zipkin.io/pages/architecture.html
Tracing instrumented system
" s1 s2
s3
s4
36. 36
Tracing and Spring
Tracing backend: Run Zipkin Server
Spring Cloud Sleuth:
• auto-configures tracing instrumentation (Zipkin’s Brave)
• spring-cloud-starter-zipkin dependency
• see “Integrations” section of documentation
• HTTP server/client, Runnable/Callable, Spring Messaging/Integration, etc.
• Make sure you are using instrumented components
• reports recorded spans to Zipkin async/batched
37. 37
Tracing and Spring
Configure via properties:
• Sampling probability (spring.sleuth.sampler.probability)
• Endpoints to skip (spring.sleuth.web.skipPattern)
39. 39
Correlation everywhere
Now you have correlated logging, metrics, and tracing across your
system. Find data from each based on identifiers.
Source: Adrian Cole, “Observability 3 ways: logging metrics and tracing”; access date: 2018-05-18
https://speakerdeck.com/adriancole/observability-3-ways-logging-metrics-and-tracing
42. 42
Key takeaways
• System-wide observability is crucial in distributed architectures
• Tools exist and Spring makes them easy to integrate
• Most common cases are covered out-of-the-box or configurable.
Custom instrumentation is possible as needed.
• Use the right tool for the job; synergize across tools
43.
44. 44
Some additional observability resources
• “Distributed Systems Observability” e-book by Cindy Sridharan:
http://distributed-systems-observability-ebook.humio.com/
• Articles by Cindy Sridharan (@copyconstruct): https://medium.com/@copyconstruct
• Talks by Charity Majors (@mipsytipsy): https://speakerdeck.com/charity
• “Observability+” articles by JBD (@rakyll): https://medium.com/observability