This document discusses distributed tracing with Spring Cloud Sleuth and Zipkin. It begins with an overview of distributed tracing terminology like spans, traces, logs, and tags. It then covers how Spring Cloud Sleuth correlates logs across services and libraries. Next, it demonstrates how to visualize latency using Spring Cloud Sleuth and Zipkin by logging timing data and sending spans to Zipkin for analysis. Finally, it provides examples of adding Spring Cloud Sleuth and Zipkin dependencies to applications.
2. About me
Developer at Pivotal
Part of Spring Cloud Team
Working with OSS:
● Accurest - Consumer Driven Contracts verifier for Java
● JSON Assert - fluent JSON assertions
● Spock Subjects Collaborators Extension
● Gradle Test Profiler
● Up To Date Gradle Plugin
TWITTER: @MGrzejszczak
BLOG: http://TOOMUCHCODING.COM
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
4. Agenda
What is distributed tracing?
How to correlate logs with Spring Cloud Sleuth?
How to visualize latency with Spring Cloud Sleuth and Zipkin?
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
10. Time to debug
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
https://tonysbologna.files.wordpress.com/2015/09/mario-and-luigi.jpg?w=468&h=578&crop=1
11. It doesn’t look like this
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
13. On which server / instance
was the exception thrown?
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
14. SSH and grep for ERROR to find it?
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
15. Distributed tracing - terminology
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
Span
Trace
Logs (annotations)
Tags (binary annotations)
16. Distributed tracing - terminology
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
Span
Trace
Logs (annotations)
Tags (binary annotations)
17. Span
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
The basic unit of work (e.g. sending RPC)
● Spans are started and stopped
● They keep track of their timing information
● Once you create a span, you must stop it at some point in the future
● Has a parent and can have multiple children
18. Trace
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
A set of spans forming a tree-like structure.
● For example, if you are running a book store then
○ Trace could be retriving a list of available books
○ Assuming that to retrive the books you have to send 3 requests to 3 services
then you could have at least 3 spans (1 for each hop) forming 1 trace
19. SERVICE 1
REQUEST
No Trace Id
No Span Id
RESPONSE
SERVICE 2
SERVICE 3
Trace Id = X
Span Id = A
Trace Id = X
Span Id = A
Trace Id = X
Span Id = A
REQUEST
RESPONSE
Trace Id = X
Span Id = B
Client Sent
Trace Id = X
Span Id = B
Client Received
Trace Id = X
Span Id = B
Server Received
Trace Id = X
Span Id = C
Trace Id = X
Span Id = B
Server Sent
REQUEST
RESPONSE
Trace Id = X
Span Id = D
Client Sent
Trace Id = X
Span Id = D
Client Received
Trace Id = X
Span Id = D
Server Received
Trace Id = X
Span Id = E
Trace Id = X
Span Id = D
Server Sent
Trace Id = X
Span Id = E
SERVICE 4
REQUEST
RESPONSE
Trace Id = X
Span Id = F
Client Sent
Trace Id = X
Span Id = F
Client Received
Trace Id = X
Span Id = F
Server Received
Trace Id = X
Span Id = G
Trace Id = X
Span Id = F
Server Sent
Trace Id = X
Span Id = G
Trace Id = X
Span Id = C
20. Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
Span Id = A
Parent Id = null
Span Id = B
Parent Id = A
Span Id = C
Parent Id = B
Span Id = D
Parent Id = C
Span Id = E
Parent Id = D
Span Id = F
Parent Id = C
Span Id = G
Parent Id = F
21. Is it that simple?
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
22. Is it that simple?
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
How do you pass tracing information (incl. Trace ID) between:
● different libraries?
● thread pools?
● asynchronous communication?
● …?
23. Log correlation with Spring Cloud Sleuth
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
We take care of passing tracing information between threads / libraries / contexts for
● Hystrix
● RxJava
● Rest Template
● Feign
● Messaging with Spring Integration
● Zuul
● ...
If you don’t do anything unexpected there’s nothing you need to do to make
Sleuth work. Check the docs for more info.
24. Now let’s aggregate the logs!
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
Instead of SSHing to the machines aggregate the logs!
● With Cloud Foundry’s (CF) Loggergator the logs from different instances are
streamed into a single place
● You can harvest your logs with Logstash Forwarder / FileBeat
● You can use ELK stack to stream and visualize the logs
25. Spring Cloud Sleuth with Maven
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
<dependencyManagement>
<dependencies>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-dependencies</artifactId>
<version>Brixton.RELEASE</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-starter-sleuth</artifactId>
</dependency>
26. Spring Cloud Sleuth with Gradle
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
dependencies {
compile "org.springframework.cloud:spring-cloud-starter-sleuth"
}
dependencyManagement {
imports {
mavenBom "org.springframework.cloud:spring-cloud-dependencies:Brixton.
RELEASE"
}
}
27. Log correlation with Spring Cloud Sleuth
DEMO
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
31. Great! We’ve found the exception!
But meanwhile....
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
32. The system is slow...
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
CLICK 200
33. One of the services is slow?
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
34. Which one?
How to measure that?
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
35. ● Client Sent (CS) - The client has made a request
● Server Received (SR) - The server side got the request and will start processing it
● Server Send (SS) - Annotated upon completion of request processing
● Client Received (CR) - The client has successfully received the response from the
server side
Let’s log events!
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
37. ● The request started at T=0ms
● It took 300 ms for the client to receive a response
● Server side received the request at T=100 ms
● The request got processed on the server side in 100 ms
● Why is there a delay between sending and receiving messages?
Conclusions
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
CS 0 ms SR 100 ms
SS 200 msCR 300 ms
39. Distributed tracing - terminology
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
Span
Trace
Logs (annotations)
Tags (binary annotations)
40. Logs
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
Represents an event in time associated with a span
● Every span has zero or more logs
● Each log is a timestamped event name
● Event should be the stable name of some notable moment in the lifetime of a
span
● For instance, a span representing a browser page load might add an event for
each of the Performance.timing moments (check https://developer.mozilla.
org/en-US/docs/Web/API/PerformanceTiming)
41. Main logs
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
● Client Send (CS)
○ The client has made a request - the span was started
● Server Received (SR)
○ The server side got the request and will start processing it
○ SR timestamp - CS timestamp = NETWORK LATENCY
42. Main logs
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
● Server Send (SS)
○ Annotated upon completion of request processing
○ SS timestamp - SR timestamp = SERVER SIDE PROCESSING TIME
● Client Received (CR)
○ The client has successfully received the response from the server side
○ CR timestamp - CS timestamp = TIME NEEDED TO RECEIVE RESPONSE
○ SS timestamp - CR timestamp = NETWORK LATENCY
43. Key-value pair
● Every span may also have zero or more key/value Tags
● They do not have timestamps and simply annotate the spans.
● Example of default tags in Sleuth
○ message/payload-size
○ http.method
○ commandKey for Hystrix
Tag
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
44. How to visualise latency in a
distributed system?
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
45. ● Zipkin is a distributed tracing system
● It runs as a separate process (you can run it as a Spring Boot
application)
● It helps gather timing data needed to troubleshoot latency problems in
microservice architectures
● The front end is a "waterfall" style graph of service calls showing call durations
as horizontal bars
The answer is: Zipkin
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
46. How does Zipkin work?
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
SPANS SENT TO
COLLECTORS
SPANS SENT TO
COLLECTORS
STORE
IN DB
APP
APP
UI QUERIES
FOR TRACE
INFO VIA API
47. Spring Cloud Sleuth and Zipkin integration
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
● We take care of passing tracing information between threads / libraries /
contexts
● Upon closing of a Span we will send it to Zipkin
○ either via HTTP (spring-cloud-sleuth-zipkin)
○ or via Spring Cloud Stream (spring-cloud-sleuth-stream)
● You can run Zipkin Sping Cloud Stream Collector as a Spring Boot app (spring-
cloud-sleuth-zipkin-stream)
○ you can add the dependency to Zipkin UI!
48. Spring Cloud Sleuth Zipkin with Maven
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
<dependencyManagement>
<dependencies>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-dependencies</artifactId>
<version>Brixton.RELEASE</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-starter-zipkin</artifactId>
</dependency>
49. Spring Cloud Sleuth Zipkin with Gradle
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
dependencies {
compile "org.springframework.cloud:spring-cloud-starter-zipkin"
}
dependencyManagement {
imports {
mavenBom "org.springframework.cloud:spring-cloud-dependencies:Brixton.
RELEASE"
}
}
50. SERVICE 1
/start
REQUEST
No Trace Id
No Span Id
RESPONSE
SERVICE 2
/foo
SERVICE 3
/bar
Trace Id = X
Span Id = A
Trace Id = X
Span Id = A
Trace Id = X
Span Id = A
REQUEST
RESPONSE
Trace Id = X
Span Id = B
Client Sent
Trace Id = X
Span Id = B
Client Received
Trace Id = X
Span Id = B
Server Received
Trace Id = X
Span Id = C
Trace Id = X
Span Id = B
Server Sent
REQUEST
RESPONSE
Trace Id = X
Span Id = D
Client Sent
Trace Id = X
Span Id = D
Client Received
Trace Id = X
Span Id = D
Server Received
Trace Id = X
Span Id = E
Trace Id = X
Span Id = D
Server Sent
Trace Id = X
Span Id = E
SERVICE 4
/baz
REQUEST
RESPONSE
Trace Id = X
Span Id = F
Client Sent
Trace Id = X
Span Id = F
Client Received
Trace Id = X
Span Id = F
Server Received
Trace Id = X
Span Id = G
Trace Id = X
Span Id = F
Server Sent
Trace Id = X
Span Id = G
Trace Id = X
Span Id = C
52. Zipkin for Brewery
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
● A test app for Spring Cloud end to end tests
● Source code: https://github.com/spring-cloud-samples/brewery
● Around 10 applications involved
55. Summary
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
● Log correlation allows you to match logs for a given trace
● Distributed tracing allows you to quickly see latency issues in your system
● Zipkin is a great tool to visualize the latency graph and system dependencies
● Spring Cloud Sleuth integrates with Zipkin and grants you log correlation
57. THANK YOU
● https://github.com/marcingrzejszczak/vagrant-elk-box/tree/presentation - code for this presentation (clone
and run getReadyForConference.sh - NOTE: you need Vagrant!)
● https://github.com/spring-cloud/spring-cloud-sleuth - Spring Cloud Sleuth repository
● http://cloud.spring.io/spring-cloud-sleuth/spring-cloud-sleuth.html - Sleuth’s documentation
● http://toomuchcoding.com/blog/2016/03/25/spring-cloud-sleuth-rc1-deployed/ - article about RC1 release
● https://github.com/openzipkin/zipkin-java - Repo with Spring Boot Zipkin server
● http://docssleuth-service1.cfapps.io/start - The service1 app from this presentation deployed to Pivotal Cloud
Foundry - point of entry to the app
● http://docssleuth-zipkin-server.cfapps.io/ - Zipkin deployed to Pivotal Cloud Foundry
● http://brewery-zipkin-web.cfapps.io - Zipkin deployed to PCF for Brewery Sample app
Marcin Grzejszczak, @mgrzejszczak Kraków, 11-13 May 2016