SlideShare a Scribd company logo
1 of 44
Observability with Spring-based
distributed systems
Tommy Ludwig (@TommyLudwig)
Travel Service Development Department
Rakuten, Inc.
Spring I/O
2018-05-24
2
Assumptions:
• Basic knowledge of Spring Boot
• You care about user experience
Agenda:
• Observability: what / why
• 3 pillars of observability w/ Spring
• Logging
• Metrics
• Tracing
• Putting it all together
3
Observability with Spring-based
distributed systems
4
What is observability?
Observability is achieved through a set of tools and practices that aims to
turn data points and context into insights.
• Beyond traditional monitoring
• Constant partial degradation/failure
• Expect the unexpected
• Answer unknown questions about your system
5
Why care about observability?
You want to provide a great experience for users of your system.
• Observability builds confidence in production
• Ownership. Give yourself the tools to be a good owner.
• MTTR is key – failures will are happening
• early detection + fast recovery + increased understanding
* MTTR = mean time to recovery
6
Observability with Spring-based
distributed systems
7
Spring Boot Actuator
• Spring Boot Actuator is awesome.
• You get so much out-of-the-box.
• But... is it enough? Like most things, it depends.
• Inherently information is instance-scoped
Spring Boot Admin makes it easy to
access and use each instance’s
Actuator endpoints.
https://github.com/codecentric/spring-boot-admin
9
Observability with Spring-based
distributed systems
10
Distributed systemNon-distributed system
DB DB DB
User User
! !
11
Distributed systems are hard
• Any request spans multiple processes
• Need to stitch together local info and slice/drill-down
• Increased points of failure
• Scaling and ephemeral instances*
* Not strictly properties of a distributed system
Logging, Metrics, and Tracing
13
Logging and metrics and tracing… oh my!
• 3 sides to observability
• Non-functional requirements (generic/specific)
• Overlap exists, but use all 3 for best insight
Source: Peter Bourgon, access date: 2018-05-18
http://peter.bourgon.org/blog/2017/02/21/metrics-tracing-and-logging.html
14
Effort to reward
When it comes to logging, metrics, and tracing:
• Common needs just work out-of-the-box.
• Custom needs can be met with a little extra effort.
See also: 80-20 rule
Logging
16
Logging in general
• Arbitrary messages you want to find later
• Formatted to give context
• Key parts of context: logging levels, timestamp
• Message examples
• Exceptions/stack traces
• Additional context
• Access logs
• Request/response bodies
17
Basic logging
VM App1 Logs
I want to check
the logs…
~~~~~~~~~~~~~
~~~~~~~~~~~~
~~~~~~~~~~~~~~
Get logs Search
logs
!
App2
App1 App2
~~~~~~~~~~~~~
~~~~~~~~~~~~
~~~~~~~~~~~~~~
~~~~~~~~~~~~~
~~~~~~~~~~~~
~~~~~~~~~~~~~~
~~~~~~~~~~~~~
~~~~~~~~~~~~
~~~~~~~~~~~~~~
"
Legend:
18
• Does not scale; Too much work and knowledge required
• Multithreaded, concurrent requests intermingle logs
• Low usability – searching is limited/difficult
Problems with basic logging
19
Centralized logging
Central log
store service
stream logs
Query
request
Collection of
matching logs
query logs
VM App1 LogsApp2Legend:
20
Logging and Spring
Spring Boot
• Configurable via Spring Environment (see also Spring Cloud Config)
• log format – make a common format across applications
• log levels (logging.level.*)
• Configurable via Actuator (at runtime)
• log levels
21
Logging and Spring
Spring Cloud Sleuth
• adds trace ID for request correlation
• Query all collected logs by any field or full-text search
• ,
Centralized, request-correlated, formatted logs
indexed and searchable across your system
Metrics
23
Metrics in general
Characteristics:
• Aggregate time-series data; bounded size
• Can slice based on dimensions/tags/labels*
Purpose:
• Visualize / identify trends and deviation
• Alerting based on metric queries
* See also https://www.datadoghq.com/blog/the-power-of-tagged-metrics/
24
Metrics examples
Example metric Type Example tags
response time timer uri, status, method
number of classes loaded gauge
response body size histogram uri, status, method
number of garbage collections counter cause, action
25
Basic metrics
HTTP server requests
!
my-application
"
HTTP GET metricscontroller
metrics over JMX
26
Basic metrics
HTTP server requests
!
my-application
"
controller
my-application
controller
LB
27
Metrics for observability
my-application
controller
my-application
controller
Metrics
backend
!
publish metrics
Alerts
☠
Visualization
28
Metrics and Spring
• Spring Boot 2 introduced Micrometer as its native metrics library
• Micrometer supports many metrics backends
• e.g. Atlas, Datadog, Influx, Prometheus, SignalFX, Wavefront
• Instrumentation of common components auto-configured
• JVM/system, HTTP server/client requests, Spring Integration, DataSource…
• Custom metrics also easy to add
29
Metrics and Spring
• Configure via properties
• management.metrics.*
• Disable certain metrics
• Enable percentiles/SLAs/percentile histograms
• Common tags
• e.g. application name, instance, stack, region, zone
• via MeterRegistryCustomizer or properties from Spring Boot 2.1
Tracing
31
Tracing
• local tracing: Actuator /httptrace
endpoint
• Latency data + request metadata
{
"traces" : [ {
"timestamp" : "2018-05-09T13:28:32.867Z",
"principal" : {
"name" : "alice”
},
"session" : {
"id" : "728aebfe-8222-4dd2-856c-256104b20bfe”
},
"request" : {
"method" : "GET",
"uri" : "https://api.example.com",
"headers" : {
"Accept" : [ "application/json" ]
}
},
"response" : {
"status" : 200,
"headers" : {
"Content-Type" : [ "application/json" ]
}
},
"timeTaken" : 3
} ]
}
Source: Spring Boot Actuator Web API Documentation; access date: 2018-05-18
https://docs.spring.io/spring-boot/docs/2.0.2.RELEASE/actuator-api/html/#http-trace
32
Distributed tracing
Distributed tracing: tracing across process boundaries
• Propagate context/hierarchy; join together after
• Request-scoped latency analysis across services
• Metrics lack request context
• Logging has local context but limited distributed info
33
Distributed tracing for observability
Tracing instrumented system
!
service1 service2
service3
service4
start span / sampling decision
propagate trace context
continue trace
report spans = tracer / instrumentation
Tracing
backenduser
34
Zipkin UI
Source: Spring Cloud Sleuth reference documentation; access date: 2018-05-18
http://cloud.spring.io/spring-cloud-static/spring-cloud-sleuth/2.0.0.RC1/single/spring-cloud-sleuth.html#_distributed_tracing_with_zipkin
35
Zipkin architecture
Zipkin server
transport
collector UI
storage
datastore
API
!
• HTTP
• Kafka
• RabbitMQ
• In-memory *
• MySQL *
• Elasticsearch
• Cassandra
Reference: https://zipkin.io/pages/architecture.html
Tracing instrumented system
" s1 s2
s3
s4
36
Tracing and Spring
Tracing backend: Run Zipkin Server
Spring Cloud Sleuth:
• auto-configures tracing instrumentation (Zipkin’s Brave)
• spring-cloud-starter-zipkin dependency
• see “Integrations” section of documentation
• HTTP server/client, Runnable/Callable, Spring Messaging/Integration, etc.
• Make sure you are using instrumented components
• reports recorded spans to Zipkin async/batched
37
Tracing and Spring
Configure via properties:
• Sampling probability (spring.sleuth.sampler.probability)
• Endpoints to skip (spring.sleuth.web.skipPattern)
Putting It All Together
39
Correlation everywhere
Now you have correlated logging, metrics, and tracing across your
system. Find data from each based on identifiers.
Source: Adrian Cole, “Observability 3 ways: logging metrics and tracing”; access date: 2018-05-18
https://speakerdeck.com/adriancole/observability-3-ways-logging-metrics-and-tracing
40
Observability cycle
Detect
Investi-
gate
Recover
/ adjust
Alerts /
reports
1. Starts with an alert/report
2. Check metrics
3. Check tracing data (if needed)
4. Check logs (if needed)
5. Triage issue
6. Make adjustment to prevent recurrence
!
Wrap Up
42
Key takeaways
• System-wide observability is crucial in distributed architectures
• Tools exist and Spring makes them easy to integrate
• Most common cases are covered out-of-the-box or configurable.
Custom instrumentation is possible as needed.
• Use the right tool for the job; synergize across tools
44
Some additional observability resources
• “Distributed Systems Observability” e-book by Cindy Sridharan:
http://distributed-systems-observability-ebook.humio.com/
• Articles by Cindy Sridharan (@copyconstruct): https://medium.com/@copyconstruct
• Talks by Charity Majors (@mipsytipsy): https://speakerdeck.com/charity
• “Observability+” articles by JBD (@rakyll): https://medium.com/observability

More Related Content

What's hot

Zentral presentation MacAdmins meetup Univ. Utah
Zentral presentation MacAdmins meetup Univ. Utah Zentral presentation MacAdmins meetup Univ. Utah
Zentral presentation MacAdmins meetup Univ. Utah Henry Stamerjohann
 
Sumo Logic QuickStart Webinar - Dec 2016
Sumo Logic QuickStart Webinar - Dec 2016Sumo Logic QuickStart Webinar - Dec 2016
Sumo Logic QuickStart Webinar - Dec 2016Sumo Logic
 
Sumo Logic: Optimizing Scheduled Searches
Sumo Logic: Optimizing Scheduled SearchesSumo Logic: Optimizing Scheduled Searches
Sumo Logic: Optimizing Scheduled SearchesSumo Logic
 
Restful Asynchronous Notification
Restful Asynchronous NotificationRestful Asynchronous Notification
Restful Asynchronous NotificationMichael Koster
 
How Hudl and Cloud Cruiser Leverage Sumo Logic's Unified Logs and Metrics
How Hudl and Cloud Cruiser Leverage Sumo Logic's Unified Logs and MetricsHow Hudl and Cloud Cruiser Leverage Sumo Logic's Unified Logs and Metrics
How Hudl and Cloud Cruiser Leverage Sumo Logic's Unified Logs and MetricsSumo Logic
 
Sumo Logic QuickStart Webinar Oct 2016
Sumo Logic QuickStart Webinar Oct 2016Sumo Logic QuickStart Webinar Oct 2016
Sumo Logic QuickStart Webinar Oct 2016Sumo Logic
 
Cloud applications monitoring in digital transformation era
Cloud applications monitoring in digital transformation eraCloud applications monitoring in digital transformation era
Cloud applications monitoring in digital transformation eraManageEngine, Zoho Corporation
 

What's hot (9)

Zentral presentation MacAdmins meetup Univ. Utah
Zentral presentation MacAdmins meetup Univ. Utah Zentral presentation MacAdmins meetup Univ. Utah
Zentral presentation MacAdmins meetup Univ. Utah
 
Sumo Logic QuickStart Webinar - Dec 2016
Sumo Logic QuickStart Webinar - Dec 2016Sumo Logic QuickStart Webinar - Dec 2016
Sumo Logic QuickStart Webinar - Dec 2016
 
Sumo Logic: Optimizing Scheduled Searches
Sumo Logic: Optimizing Scheduled SearchesSumo Logic: Optimizing Scheduled Searches
Sumo Logic: Optimizing Scheduled Searches
 
Restful Asynchronous Notification
Restful Asynchronous NotificationRestful Asynchronous Notification
Restful Asynchronous Notification
 
Zentral macaduk conf 2016
Zentral macaduk conf 2016Zentral macaduk conf 2016
Zentral macaduk conf 2016
 
How Hudl and Cloud Cruiser Leverage Sumo Logic's Unified Logs and Metrics
How Hudl and Cloud Cruiser Leverage Sumo Logic's Unified Logs and MetricsHow Hudl and Cloud Cruiser Leverage Sumo Logic's Unified Logs and Metrics
How Hudl and Cloud Cruiser Leverage Sumo Logic's Unified Logs and Metrics
 
DBOps
DBOpsDBOps
DBOps
 
Sumo Logic QuickStart Webinar Oct 2016
Sumo Logic QuickStart Webinar Oct 2016Sumo Logic QuickStart Webinar Oct 2016
Sumo Logic QuickStart Webinar Oct 2016
 
Cloud applications monitoring in digital transformation era
Cloud applications monitoring in digital transformation eraCloud applications monitoring in digital transformation era
Cloud applications monitoring in digital transformation era
 

Similar to Observability with Spring-based distributed systems

Observability with Spring-based distributed systems
Observability with Spring-based distributed systemsObservability with Spring-based distributed systems
Observability with Spring-based distributed systemsRakuten Group, Inc.
 
Service quality monitoring system architecture
Service quality monitoring system architectureService quality monitoring system architecture
Service quality monitoring system architectureMatsuo Sawahashi
 
Cashing in on logging and exception data
Cashing in on logging and exception dataCashing in on logging and exception data
Cashing in on logging and exception dataStackify
 
Monitoring in Big Data Platform - Albert Lewandowski, GetInData
Monitoring in Big Data Platform - Albert Lewandowski, GetInDataMonitoring in Big Data Platform - Albert Lewandowski, GetInData
Monitoring in Big Data Platform - Albert Lewandowski, GetInDataGetInData
 
ThroughTheLookingGlass_EffectiveObservability.pptx
ThroughTheLookingGlass_EffectiveObservability.pptxThroughTheLookingGlass_EffectiveObservability.pptx
ThroughTheLookingGlass_EffectiveObservability.pptxGrace Jansen
 
Sumo Logic Certification - Level 2 (Using Sumo)
Sumo Logic Certification - Level 2 (Using Sumo)Sumo Logic Certification - Level 2 (Using Sumo)
Sumo Logic Certification - Level 2 (Using Sumo)Sumo Logic
 
Sumo Logic Cert Jam - Metrics Mastery
Sumo Logic Cert Jam - Metrics MasterySumo Logic Cert Jam - Metrics Mastery
Sumo Logic Cert Jam - Metrics MasterySumo Logic
 
Sumo Logic QuickStart Webinar - Jan 2016
Sumo Logic QuickStart Webinar - Jan 2016Sumo Logic QuickStart Webinar - Jan 2016
Sumo Logic QuickStart Webinar - Jan 2016Sumo Logic
 
Inside Kafka Streams—Monitoring Comcast’s Outside Plant
Inside Kafka Streams—Monitoring Comcast’s Outside Plant Inside Kafka Streams—Monitoring Comcast’s Outside Plant
Inside Kafka Streams—Monitoring Comcast’s Outside Plant confluent
 
Meetup milano #4 log management and anypoint advanced monitoring
Meetup milano #4   log management and anypoint advanced monitoringMeetup milano #4   log management and anypoint advanced monitoring
Meetup milano #4 log management and anypoint advanced monitoringGonzalo Marcos Ansoain
 
Monitoring docker containers and dockerized applications
Monitoring docker containers and dockerized applicationsMonitoring docker containers and dockerized applications
Monitoring docker containers and dockerized applicationsSatya Sanjibani Routray
 
A practical introduction to observability
A practical introduction to observabilityA practical introduction to observability
A practical introduction to observabilityNikolay Stoitsev
 
Sumo Logic QuickStart Webinar
Sumo Logic QuickStart WebinarSumo Logic QuickStart Webinar
Sumo Logic QuickStart WebinarSumo Logic
 
MeetUp Monitoring with Prometheus and Grafana (September 2018)
MeetUp Monitoring with Prometheus and Grafana (September 2018)MeetUp Monitoring with Prometheus and Grafana (September 2018)
MeetUp Monitoring with Prometheus and Grafana (September 2018)Lucas Jellema
 
Building high performance and scalable share point applications
Building high performance and scalable share point applicationsBuilding high performance and scalable share point applications
Building high performance and scalable share point applicationsTalbott Crowell
 
Monitoring Node.js Microservices on CloudFoundry with Open Source Tools and a...
Monitoring Node.js Microservices on CloudFoundry with Open Source Tools and a...Monitoring Node.js Microservices on CloudFoundry with Open Source Tools and a...
Monitoring Node.js Microservices on CloudFoundry with Open Source Tools and a...Tony Erwin
 
Data Onboarding Breakout Session
Data Onboarding Breakout SessionData Onboarding Breakout Session
Data Onboarding Breakout SessionSplunk
 
Nagios Conference 2007 | Enterprise Application Monitoring with Nagios by Jam...
Nagios Conference 2007 | Enterprise Application Monitoring with Nagios by Jam...Nagios Conference 2007 | Enterprise Application Monitoring with Nagios by Jam...
Nagios Conference 2007 | Enterprise Application Monitoring with Nagios by Jam...NETWAYS
 
Monitoring-Docker-Container-and-Dockerized-Applications
Monitoring-Docker-Container-and-Dockerized-ApplicationsMonitoring-Docker-Container-and-Dockerized-Applications
Monitoring-Docker-Container-and-Dockerized-ApplicationsSatya Sanjibani Routray
 
Monitoring docker container and dockerized applications
Monitoring docker container and dockerized applicationsMonitoring docker container and dockerized applications
Monitoring docker container and dockerized applicationsAnanth Padmanabhan
 

Similar to Observability with Spring-based distributed systems (20)

Observability with Spring-based distributed systems
Observability with Spring-based distributed systemsObservability with Spring-based distributed systems
Observability with Spring-based distributed systems
 
Service quality monitoring system architecture
Service quality monitoring system architectureService quality monitoring system architecture
Service quality monitoring system architecture
 
Cashing in on logging and exception data
Cashing in on logging and exception dataCashing in on logging and exception data
Cashing in on logging and exception data
 
Monitoring in Big Data Platform - Albert Lewandowski, GetInData
Monitoring in Big Data Platform - Albert Lewandowski, GetInDataMonitoring in Big Data Platform - Albert Lewandowski, GetInData
Monitoring in Big Data Platform - Albert Lewandowski, GetInData
 
ThroughTheLookingGlass_EffectiveObservability.pptx
ThroughTheLookingGlass_EffectiveObservability.pptxThroughTheLookingGlass_EffectiveObservability.pptx
ThroughTheLookingGlass_EffectiveObservability.pptx
 
Sumo Logic Certification - Level 2 (Using Sumo)
Sumo Logic Certification - Level 2 (Using Sumo)Sumo Logic Certification - Level 2 (Using Sumo)
Sumo Logic Certification - Level 2 (Using Sumo)
 
Sumo Logic Cert Jam - Metrics Mastery
Sumo Logic Cert Jam - Metrics MasterySumo Logic Cert Jam - Metrics Mastery
Sumo Logic Cert Jam - Metrics Mastery
 
Sumo Logic QuickStart Webinar - Jan 2016
Sumo Logic QuickStart Webinar - Jan 2016Sumo Logic QuickStart Webinar - Jan 2016
Sumo Logic QuickStart Webinar - Jan 2016
 
Inside Kafka Streams—Monitoring Comcast’s Outside Plant
Inside Kafka Streams—Monitoring Comcast’s Outside Plant Inside Kafka Streams—Monitoring Comcast’s Outside Plant
Inside Kafka Streams—Monitoring Comcast’s Outside Plant
 
Meetup milano #4 log management and anypoint advanced monitoring
Meetup milano #4   log management and anypoint advanced monitoringMeetup milano #4   log management and anypoint advanced monitoring
Meetup milano #4 log management and anypoint advanced monitoring
 
Monitoring docker containers and dockerized applications
Monitoring docker containers and dockerized applicationsMonitoring docker containers and dockerized applications
Monitoring docker containers and dockerized applications
 
A practical introduction to observability
A practical introduction to observabilityA practical introduction to observability
A practical introduction to observability
 
Sumo Logic QuickStart Webinar
Sumo Logic QuickStart WebinarSumo Logic QuickStart Webinar
Sumo Logic QuickStart Webinar
 
MeetUp Monitoring with Prometheus and Grafana (September 2018)
MeetUp Monitoring with Prometheus and Grafana (September 2018)MeetUp Monitoring with Prometheus and Grafana (September 2018)
MeetUp Monitoring with Prometheus and Grafana (September 2018)
 
Building high performance and scalable share point applications
Building high performance and scalable share point applicationsBuilding high performance and scalable share point applications
Building high performance and scalable share point applications
 
Monitoring Node.js Microservices on CloudFoundry with Open Source Tools and a...
Monitoring Node.js Microservices on CloudFoundry with Open Source Tools and a...Monitoring Node.js Microservices on CloudFoundry with Open Source Tools and a...
Monitoring Node.js Microservices on CloudFoundry with Open Source Tools and a...
 
Data Onboarding Breakout Session
Data Onboarding Breakout SessionData Onboarding Breakout Session
Data Onboarding Breakout Session
 
Nagios Conference 2007 | Enterprise Application Monitoring with Nagios by Jam...
Nagios Conference 2007 | Enterprise Application Monitoring with Nagios by Jam...Nagios Conference 2007 | Enterprise Application Monitoring with Nagios by Jam...
Nagios Conference 2007 | Enterprise Application Monitoring with Nagios by Jam...
 
Monitoring-Docker-Container-and-Dockerized-Applications
Monitoring-Docker-Container-and-Dockerized-ApplicationsMonitoring-Docker-Container-and-Dockerized-Applications
Monitoring-Docker-Container-and-Dockerized-Applications
 
Monitoring docker container and dockerized applications
Monitoring docker container and dockerized applicationsMonitoring docker container and dockerized applications
Monitoring docker container and dockerized applications
 

Recently uploaded

Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 

Recently uploaded (20)

Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 

Observability with Spring-based distributed systems

  • 1. Observability with Spring-based distributed systems Tommy Ludwig (@TommyLudwig) Travel Service Development Department Rakuten, Inc. Spring I/O 2018-05-24
  • 2. 2 Assumptions: • Basic knowledge of Spring Boot • You care about user experience Agenda: • Observability: what / why • 3 pillars of observability w/ Spring • Logging • Metrics • Tracing • Putting it all together
  • 4. 4 What is observability? Observability is achieved through a set of tools and practices that aims to turn data points and context into insights. • Beyond traditional monitoring • Constant partial degradation/failure • Expect the unexpected • Answer unknown questions about your system
  • 5. 5 Why care about observability? You want to provide a great experience for users of your system. • Observability builds confidence in production • Ownership. Give yourself the tools to be a good owner. • MTTR is key – failures will are happening • early detection + fast recovery + increased understanding * MTTR = mean time to recovery
  • 7. 7 Spring Boot Actuator • Spring Boot Actuator is awesome. • You get so much out-of-the-box. • But... is it enough? Like most things, it depends. • Inherently information is instance-scoped
  • 8. Spring Boot Admin makes it easy to access and use each instance’s Actuator endpoints. https://github.com/codecentric/spring-boot-admin
  • 11. 11 Distributed systems are hard • Any request spans multiple processes • Need to stitch together local info and slice/drill-down • Increased points of failure • Scaling and ephemeral instances* * Not strictly properties of a distributed system
  • 13. 13 Logging and metrics and tracing… oh my! • 3 sides to observability • Non-functional requirements (generic/specific) • Overlap exists, but use all 3 for best insight Source: Peter Bourgon, access date: 2018-05-18 http://peter.bourgon.org/blog/2017/02/21/metrics-tracing-and-logging.html
  • 14. 14 Effort to reward When it comes to logging, metrics, and tracing: • Common needs just work out-of-the-box. • Custom needs can be met with a little extra effort. See also: 80-20 rule
  • 16. 16 Logging in general • Arbitrary messages you want to find later • Formatted to give context • Key parts of context: logging levels, timestamp • Message examples • Exceptions/stack traces • Additional context • Access logs • Request/response bodies
  • 17. 17 Basic logging VM App1 Logs I want to check the logs… ~~~~~~~~~~~~~ ~~~~~~~~~~~~ ~~~~~~~~~~~~~~ Get logs Search logs ! App2 App1 App2 ~~~~~~~~~~~~~ ~~~~~~~~~~~~ ~~~~~~~~~~~~~~ ~~~~~~~~~~~~~ ~~~~~~~~~~~~ ~~~~~~~~~~~~~~ ~~~~~~~~~~~~~ ~~~~~~~~~~~~ ~~~~~~~~~~~~~~ " Legend:
  • 18. 18 • Does not scale; Too much work and knowledge required • Multithreaded, concurrent requests intermingle logs • Low usability – searching is limited/difficult Problems with basic logging
  • 19. 19 Centralized logging Central log store service stream logs Query request Collection of matching logs query logs VM App1 LogsApp2Legend:
  • 20. 20 Logging and Spring Spring Boot • Configurable via Spring Environment (see also Spring Cloud Config) • log format – make a common format across applications • log levels (logging.level.*) • Configurable via Actuator (at runtime) • log levels
  • 21. 21 Logging and Spring Spring Cloud Sleuth • adds trace ID for request correlation • Query all collected logs by any field or full-text search • , Centralized, request-correlated, formatted logs indexed and searchable across your system
  • 23. 23 Metrics in general Characteristics: • Aggregate time-series data; bounded size • Can slice based on dimensions/tags/labels* Purpose: • Visualize / identify trends and deviation • Alerting based on metric queries * See also https://www.datadoghq.com/blog/the-power-of-tagged-metrics/
  • 24. 24 Metrics examples Example metric Type Example tags response time timer uri, status, method number of classes loaded gauge response body size histogram uri, status, method number of garbage collections counter cause, action
  • 25. 25 Basic metrics HTTP server requests ! my-application " HTTP GET metricscontroller metrics over JMX
  • 26. 26 Basic metrics HTTP server requests ! my-application " controller my-application controller LB
  • 28. 28 Metrics and Spring • Spring Boot 2 introduced Micrometer as its native metrics library • Micrometer supports many metrics backends • e.g. Atlas, Datadog, Influx, Prometheus, SignalFX, Wavefront • Instrumentation of common components auto-configured • JVM/system, HTTP server/client requests, Spring Integration, DataSource… • Custom metrics also easy to add
  • 29. 29 Metrics and Spring • Configure via properties • management.metrics.* • Disable certain metrics • Enable percentiles/SLAs/percentile histograms • Common tags • e.g. application name, instance, stack, region, zone • via MeterRegistryCustomizer or properties from Spring Boot 2.1
  • 31. 31 Tracing • local tracing: Actuator /httptrace endpoint • Latency data + request metadata { "traces" : [ { "timestamp" : "2018-05-09T13:28:32.867Z", "principal" : { "name" : "alice” }, "session" : { "id" : "728aebfe-8222-4dd2-856c-256104b20bfe” }, "request" : { "method" : "GET", "uri" : "https://api.example.com", "headers" : { "Accept" : [ "application/json" ] } }, "response" : { "status" : 200, "headers" : { "Content-Type" : [ "application/json" ] } }, "timeTaken" : 3 } ] } Source: Spring Boot Actuator Web API Documentation; access date: 2018-05-18 https://docs.spring.io/spring-boot/docs/2.0.2.RELEASE/actuator-api/html/#http-trace
  • 32. 32 Distributed tracing Distributed tracing: tracing across process boundaries • Propagate context/hierarchy; join together after • Request-scoped latency analysis across services • Metrics lack request context • Logging has local context but limited distributed info
  • 33. 33 Distributed tracing for observability Tracing instrumented system ! service1 service2 service3 service4 start span / sampling decision propagate trace context continue trace report spans = tracer / instrumentation Tracing backenduser
  • 34. 34 Zipkin UI Source: Spring Cloud Sleuth reference documentation; access date: 2018-05-18 http://cloud.spring.io/spring-cloud-static/spring-cloud-sleuth/2.0.0.RC1/single/spring-cloud-sleuth.html#_distributed_tracing_with_zipkin
  • 35. 35 Zipkin architecture Zipkin server transport collector UI storage datastore API ! • HTTP • Kafka • RabbitMQ • In-memory * • MySQL * • Elasticsearch • Cassandra Reference: https://zipkin.io/pages/architecture.html Tracing instrumented system " s1 s2 s3 s4
  • 36. 36 Tracing and Spring Tracing backend: Run Zipkin Server Spring Cloud Sleuth: • auto-configures tracing instrumentation (Zipkin’s Brave) • spring-cloud-starter-zipkin dependency • see “Integrations” section of documentation • HTTP server/client, Runnable/Callable, Spring Messaging/Integration, etc. • Make sure you are using instrumented components • reports recorded spans to Zipkin async/batched
  • 37. 37 Tracing and Spring Configure via properties: • Sampling probability (spring.sleuth.sampler.probability) • Endpoints to skip (spring.sleuth.web.skipPattern)
  • 38. Putting It All Together
  • 39. 39 Correlation everywhere Now you have correlated logging, metrics, and tracing across your system. Find data from each based on identifiers. Source: Adrian Cole, “Observability 3 ways: logging metrics and tracing”; access date: 2018-05-18 https://speakerdeck.com/adriancole/observability-3-ways-logging-metrics-and-tracing
  • 40. 40 Observability cycle Detect Investi- gate Recover / adjust Alerts / reports 1. Starts with an alert/report 2. Check metrics 3. Check tracing data (if needed) 4. Check logs (if needed) 5. Triage issue 6. Make adjustment to prevent recurrence !
  • 42. 42 Key takeaways • System-wide observability is crucial in distributed architectures • Tools exist and Spring makes them easy to integrate • Most common cases are covered out-of-the-box or configurable. Custom instrumentation is possible as needed. • Use the right tool for the job; synergize across tools
  • 43.
  • 44. 44 Some additional observability resources • “Distributed Systems Observability” e-book by Cindy Sridharan: http://distributed-systems-observability-ebook.humio.com/ • Articles by Cindy Sridharan (@copyconstruct): https://medium.com/@copyconstruct • Talks by Charity Majors (@mipsytipsy): https://speakerdeck.com/charity • “Observability+” articles by JBD (@rakyll): https://medium.com/observability