Observability with Spring-based distributed systems

Observability with Spring-based
distributed systems
Tommy Ludwig (@TommyLudwig)
Travel Service Development Department
Rakuten, Inc.
Spring I/O
2018-05-24

2
Assumptions:
• Basic knowledge of Spring Boot
• You care about user experience
Agenda:
• Observability: what / why
• 3 pillars of observability w/ Spring
• Logging
• Metrics
• Tracing
• Putting it all together

3
distributed systems

4
What is observability?
Observability is achieved through a set of tools and practices that aims to
turn data points and context into insights.
• Beyond traditional monitoring
• Constant partial degradation/failure
• Expect the unexpected
• Answer unknown questions about your system

5
Why care about observability?
You want to provide a great experience for users of your system.
• Observability builds confidence in production
• Ownership. Give yourself the tools to be a good owner.
• MTTR is key – failures will are happening
• early detection + fast recovery + increased understanding
* MTTR = mean time to recovery

6
distributed systems

7
Spring Boot Actuator
• Spring Boot Actuator is awesome.
• You get so much out-of-the-box.
• But... is it enough? Like most things, it depends.
• Inherently information is instance-scoped

Spring Boot Admin makes it easy to
access and use each instance’s
Actuator endpoints.
https://github.com/codecentric/spring-boot-admin

9
distributed systems

10
Distributed systemNon-distributed system
DB DB DB
User User
! !

11
Distributed systems are hard
• Any request spans multiple processes
• Need to stitch together local info and slice/drill-down
• Increased points of failure
• Scaling and ephemeral instances*
* Not strictly properties of a distributed system

13
Logging and metrics and tracing… oh my!
• 3 sides to observability
• Non-functional requirements (generic/specific)
• Overlap exists, but use all 3 for best insight
Source: Peter Bourgon, access date: 2018-05-18
http://peter.bourgon.org/blog/2017/02/21/metrics-tracing-and-logging.html

14
Effort to reward
When it comes to logging, metrics, and tracing:
• Common needs just work out-of-the-box.
• Custom needs can be met with a little extra effort.
See also: 80-20 rule

16
Logging in general
• Arbitrary messages you want to find later
• Formatted to give context
• Key parts of context: logging levels, timestamp
• Message examples
• Exceptions/stack traces
• Additional context
• Access logs
• Request/response bodies

17
Basic logging
VM App1 Logs
I want to check
the logs…
~~~~~~~~~~~~~
~~~~~~~~~~~~
~~~~~~~~~~~~~~
Get logs Search
logs
!
App2
App1 App2
~~~~~~~~~~~~~
~~~~~~~~~~~~
~~~~~~~~~~~~~~
~~~~~~~~~~~~~
~~~~~~~~~~~~
~~~~~~~~~~~~~~
~~~~~~~~~~~~~
~~~~~~~~~~~~
~~~~~~~~~~~~~~
"
Legend:

18
• Does not scale; Too much work and knowledge required
• Multithreaded, concurrent requests intermingle logs
• Low usability – searching is limited/difficult
Problems with basic logging

19
Centralized logging
Central log
store service
stream logs
Query
request
Collection of
matching logs
query logs
VM App1 LogsApp2Legend:

20
Logging and Spring
Spring Boot
• Configurable via Spring Environment (see also Spring Cloud Config)
• log format – make a common format across applications
• log levels (logging.level.*)
• Configurable via Actuator (at runtime)
• log levels

21
Logging and Spring
Spring Cloud Sleuth
• adds trace ID for request correlation
• Query all collected logs by any field or full-text search
• ,
Centralized, request-correlated, formatted logs
indexed and searchable across your system

23
Metrics in general
Characteristics:
• Aggregate time-series data; bounded size
• Can slice based on dimensions/tags/labels*
Purpose:
• Visualize / identify trends and deviation
• Alerting based on metric queries
* See also https://www.datadoghq.com/blog/the-power-of-tagged-metrics/

24
Metrics examples
Example metric Type Example tags
response time timer uri, status, method
number of classes loaded gauge
response body size histogram uri, status, method
number of garbage collections counter cause, action

25
Basic metrics
HTTP server requests
!
my-application
"
HTTP GET metricscontroller
metrics over JMX

26
Basic metrics
HTTP server requests
!
my-application
"
controller
my-application
controller
LB

27
Metrics for observability
my-application
controller
my-application
controller
Metrics
backend
!
publish metrics
Alerts
☠
Visualization

28
Metrics and Spring
• Spring Boot 2 introduced Micrometer as its native metrics library
• Micrometer supports many metrics backends
• e.g. Atlas, Datadog, Influx, Prometheus, SignalFX, Wavefront
• Instrumentation of common components auto-configured
• JVM/system, HTTP server/client requests, Spring Integration, DataSource…
• Custom metrics also easy to add

29
Metrics and Spring
• Configure via properties
• management.metrics.*
• Disable certain metrics
• Enable percentiles/SLAs/percentile histograms
• Common tags
• e.g. application name, instance, stack, region, zone
• via MeterRegistryCustomizer or properties from Spring Boot 2.1

31
Tracing
• local tracing: Actuator /httptrace
endpoint
• Latency data + request metadata
{
"traces" : [ {
"timestamp" : "2018-05-09T13:28:32.867Z",
"principal" : {
"name" : "alice”
},
"session" : {
"id" : "728aebfe-8222-4dd2-856c-256104b20bfe”
},
"request" : {
"method" : "GET",
"uri" : "https://api.example.com",
"headers" : {
"Accept" : [ "application/json" ]
}
},
"response" : {
"status" : 200,
"headers" : {
"Content-Type" : [ "application/json" ]
}
},
"timeTaken" : 3
} ]
}
Source: Spring Boot Actuator Web API Documentation; access date: 2018-05-18
https://docs.spring.io/spring-boot/docs/2.0.2.RELEASE/actuator-api/html/#http-trace

32
Distributed tracing
Distributed tracing: tracing across process boundaries
• Propagate context/hierarchy; join together after
• Request-scoped latency analysis across services
• Metrics lack request context
• Logging has local context but limited distributed info

33
Distributed tracing for observability
Tracing instrumented system
!
service1 service2
service3
service4
start span / sampling decision
propagate trace context
continue trace
report spans = tracer / instrumentation
Tracing
backenduser

34
Zipkin UI
Source: Spring Cloud Sleuth reference documentation; access date: 2018-05-18
http://cloud.spring.io/spring-cloud-static/spring-cloud-sleuth/2.0.0.RC1/single/spring-cloud-sleuth.html#_distributed_tracing_with_zipkin

35
Zipkin architecture
Zipkin server
transport
collector UI
storage
datastore
API
!
• HTTP
• Kafka
• RabbitMQ
• In-memory *
• MySQL *
• Elasticsearch
• Cassandra
Reference: https://zipkin.io/pages/architecture.html
Tracing instrumented system
" s1 s2
s3
s4

36
Tracing and Spring
Tracing backend: Run Zipkin Server
Spring Cloud Sleuth:
• auto-configures tracing instrumentation (Zipkin’s Brave)
• spring-cloud-starter-zipkin dependency
• see “Integrations” section of documentation
• HTTP server/client, Runnable/Callable, Spring Messaging/Integration, etc.
• Make sure you are using instrumented components
• reports recorded spans to Zipkin async/batched

37
Tracing and Spring
Configure via properties:
• Sampling probability (spring.sleuth.sampler.probability)
• Endpoints to skip (spring.sleuth.web.skipPattern)

39
Correlation everywhere
Now you have correlated logging, metrics, and tracing across your
system. Find data from each based on identifiers.
Source: Adrian Cole, “Observability 3 ways: logging metrics and tracing”; access date: 2018-05-18
https://speakerdeck.com/adriancole/observability-3-ways-logging-metrics-and-tracing

40
Observability cycle
Detect
Investi-
gate
Recover
/ adjust
Alerts /
reports
1. Starts with an alert/report
2. Check metrics
3. Check tracing data (if needed)
4. Check logs (if needed)
5. Triage issue
6. Make adjustment to prevent recurrence
!

42
Key takeaways
• System-wide observability is crucial in distributed architectures
• Tools exist and Spring makes them easy to integrate
• Most common cases are covered out-of-the-box or configurable.
Custom instrumentation is possible as needed.
• Use the right tool for the job; synergize across tools

44
Some additional observability resources
• “Distributed Systems Observability” e-book by Cindy Sridharan:
http://distributed-systems-observability-ebook.humio.com/
• Articles by Cindy Sridharan (@copyconstruct): https://medium.com/@copyconstruct
• Talks by Charity Majors (@mipsytipsy): https://speakerdeck.com/charity
• “Observability+” articles by JBD (@rakyll): https://medium.com/observability

Observability with Spring-based distributed systems

Recommended

Recommended

More Related Content

What's hot

What's hot (9)

Similar to Observability with Spring-based distributed systems

Similar to Observability with Spring-based distributed systems (20)

Recently uploaded

Recently uploaded (20)

Observability with Spring-based distributed systems