Microservices Tracing with Spring Cloud and Zipkin (devoxx)

Microservices tracing with
Spring Cloud and Zipkin
Marcin Grzejszczak
Marcin Grzejszczak @mgrzejszczak, 24 June 2016

About me
Developer at Pivotal
Part of Spring Cloud Team
Working with OSS:
● Accurest - Consumer Driven Contracts verifier for Java
● JSON Assert - fluent JSON assertions
● Spock Subjects Collaborators Extension
● Gradle Test Profiler
● Up To Date Gradle Plugin
TWITTER: @MGrzejszczak
BLOG: http://TOOMUCHCODING.COM

Agenda
What is distributed tracing?
How to correlate logs with Spring Cloud Sleuth?
How to visualize latency with Spring Cloud Sleuth and Zipkin?

An ordinary system...

UI calls backend
UI -> BACKEND

Everything is awesome
CLICK 200

Until it’s not
CLICK 500

Time to debug
https://tonysbologna.files.wordpress.com/2015/09/mario-and-luigi.jpg?w=468&h=578&crop=1

It doesn’t look like this

More like this

On which server / instance
was the exception thrown?

SSH and grep for ERROR to find it?

Distributed tracing - terminology
Span
Trace
Logs (annotations)
Tags (binary annotations)

Span
The basic unit of work (e.g. sending RPC)
● Spans are started and stopped
● They keep track of their timing information
● Once you create a span, you must stop it at some point in the future
● Has a parent and can have multiple children

Trace
A set of spans forming a tree-like structure.
● For example, if you are running a book store then
○ Trace could be retriving a list of available books
○ Assuming that to retrive the books you have to send 3 requests to 3 services
then you could have at least 3 spans (1 for each hop) forming 1 trace

SERVICE 1
REQUEST
No Trace Id
No Span Id
RESPONSE
SERVICE 2
SERVICE 3
Trace Id = X
Span Id = A
Trace Id = X
Span Id = A
Trace Id = X
Span Id = A
REQUEST
RESPONSE
Trace Id = X
Span Id = B
Client Send
Trace Id = X
Span Id = B
Client Received
Trace Id = X
Span Id = B
Server Received
Trace Id = X
Span Id = C
Trace Id = X
Span Id = B
Server Sent
REQUEST
RESPONSE
Trace Id = X
Span Id = D
Client Send
Trace Id = X
Span Id = D
Client Received
Trace Id = X
Span Id = D
Server Received
Trace Id = X
Span Id = E
Trace Id = X
Span Id = D
Server Sent
Trace Id = X
Span Id = E
SERVICE 4
REQUEST
RESPONSE
Trace Id = X
Span Id = F
Client Send
Trace Id = X
Span Id = F
Client Received
Trace Id = X
Span Id = F
Server Received
Trace Id = X
Span Id = G
Trace Id = X
Span Id = F
Server Sent
Trace Id = X
Span Id = G
Trace Id = X
Span Id = C

Span Id = A
Parent Id = null
Span Id = B
Parent Id = A
Span Id = C
Parent Id = B
Span Id = D
Parent Id = C
Span Id = E
Parent Id = D
Span Id = F
Parent Id = C
Span Id = G
Parent Id = F

Is it that simple?

Is it that simple?
How do you pass tracing information (incl. Trace ID)
between:
● different libraries?
● thread pools?
● asynchronous communication?
● …?

What if you forget about a thread pool?
SERVICE 1
REQUEST
NO TRACE
RESPONSE
SERVICE 2
SERVICE 3
A
A
A
REQUEST
RESPONSE
A
A
A B
A
REQUEST
RESPONSE
B
B
C C
C C
SERVICE 4
REQUEST
RESPONSE
B
B
D D
D D
B

Log correlation with Spring Cloud Sleuth
We take care of passing tracing information between threads / libraries / contexts for
● Hystrix
● RxJava
● Rest Template
● Feign
● Messaging with Spring Integration
● Zuul
● ...
If you don’t do anything unexpected there’s nothing you need to do to make
Sleuth work. Check the docs for more info.

Now let’s aggregate the logs!
Instead of SSHing to the machines aggregate the logs!
● With Cloud Foundry’s (CF) Loggergator the logs from different instances are
streamed into a single place
● You can harvest your logs with Logstash Forwarder / FileBeat
● You can use ELK stack to stream and visualize the logs

Spring Cloud Sleuth with Maven
<dependencyManagement>
<dependencies>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-dependencies</artifactId>
<version>Brixton.SR1</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
<dependency>
<artifactId>spring-cloud-starter-sleuth</artifactId>
</dependency>

Spring Cloud Sleuth with Gradle
dependencies {
compile "org.springframework.cloud:spring-cloud-starter-sleuth"
}
dependencyManagement {
imports {
mavenBom "org.springframework.cloud:spring-cloud-dependencies:Brixton.SR1"
}
}

SERVICE 1
/start
REQUEST
RESPONSE
SERVICE 2
SERVICE 3
REQUEST
RESPONSE
REQUEST
RESPONSE
SERVICE 4
REQUEST
RESPONSE
“Hello from service3”
“Hello from service4”
“Hello from service2, response from
service3 [Hello from service3] and from
service4 [Hello from service4]”

SERVICE 1
/readtimeout
REQUEST
BOOM!
SERVICE 2
REQUEST
BOOM!
REQUEST
BOOM!

Log correlation with Spring Cloud Sleuth
DEMO

Great! We’ve found the exception!
But meanwhile....

The system is slow...
CLICK 200

One of the services is slow?

Which one?
How to measure that?

● Client Send (CS) - The client has made a request
● Server Received (SR) - The server side got the request and will start processing
● Server Send (SS) - Annotated upon completion of request processing
● Client Received (CR) - The client has successfully received the response from
the server side
Let’s log events!

CS 0 ms SR 100 ms
SS 300 msCR 450 ms

● The request started at T=0ms
● It took 450 ms for the client to receive a response
● Server side received the request at T=100 ms
● The request got processed on the server side in 200 ms
Conclusions
CS 0 ms SR 100 ms
SS 300 msCR 450 ms

Why is there a delay between sending and receiving messages?!!11!one!?!1!
Conclusions
CS 0 ms SR 100 ms
SS 300 msCR 450 ms

https://blogs.oracle.com/jag/resource/Fallacies.html

Logs
Represents an event in time associated with a span
● Every span has zero or more logs
● Each log is a timestamped event name
● Event should be the stable name of some notable moment in the lifetime of a
span
● For instance, a span representing a browser page load might add an event for
each of the Performance.timing moments (check https://developer.mozilla.
org/en-US/docs/Web/API/PerformanceTiming)

Main logs
● Client Send (CS)
○ The client has made a request - the span was started
● Server Received (SR)
○ The server side got the request and will start processing it
○ SR timestamp - CS timestamp = NETWORK LATENCY
CS 0 ms SR 100 ms

Main logs
● Server Send (SS)
○ Annotated upon completion of request processing
○ SS timestamp - SR timestamp = SERVER SIDE PROCESSING TIME
● Client Received (CR)
○ The client has successfully received the response from the server side
○ CR timestamp - CS timestamp = TIME NEEDED TO RECEIVE RESPONSE
○ CR timestamp - SS timestamp = NETWORK LATENCY
CS 0 ms SR 100 ms
SS 300 msCR 450 ms

Key-value pair
● Every span may have zero or more key/value Tags
● They do not have timestamps and simply annotate the spans.
● Example of default tags in Sleuth
○ message/payload-size
○ http.method
○ commandKey for Hystrix
Tag

How to visualise latency in
a distributed system?

● Zipkin is a distributed tracing system
● It runs as a separate process (you can run it as a Spring Boot
application)
● It helps gather timing data needed to troubleshoot latency problems in
microservice architectures
● The front end is a "waterfall" style graph of service calls showing call durations
as horizontal bars
The answer is: Zipkin

How does Zipkin work?
SPANS SENT TO
COLLECTORS
SPANS SENT TO
COLLECTORS
STORE
IN DB
APP
APP
UI QUERIES
FOR TRACE
INFO VIA API

Spring Cloud Sleuth and Zipkin integration
● We take care of passing tracing information between threads / libraries /
contexts
● Upon closing of a Span we will send it to Zipkin
○ either via HTTP (spring-cloud-sleuth-zipkin)
○ or via Spring Cloud Stream (spring-cloud-sleuth-stream)
● You can run Zipkin Spring Cloud Stream Collector as a Spring Boot app (spring-
cloud-sleuth-zipkin-stream)
○ you can add the dependency to Zipkin UI!

Spring Cloud Sleuth Zipkin with Maven
<dependencyManagement>
<dependencies>
<dependency>
<artifactId>spring-cloud-dependencies</artifactId>
<version>Brixton.SR1</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
<dependency>
<artifactId>spring-cloud-starter-zipkin</artifactId>
</dependency>

Spring Cloud Sleuth Zipkin with Gradle
dependencies {
compile "org.springframework.cloud:spring-cloud-starter-zipkin"
}
dependencyManagement {
imports {
mavenBom "org.springframework.cloud:spring-cloud-dependencies:Brixton.SR1"
}
}

HOLD IT!
● If I have billion services that emit gazillion spans - won’t I kill Zipkin?

Sampling to the rescue!
● By default Spring Cloud Sleuth sends only 10% of requests to Zipkin
● You can change that by changing the property
spring.sleuth.sampler.percentage (for 100% pass 1.0)
● Or register a custom org.springframework.cloud.sleuth.Sampler
implementation

SERVICE 1
/start
REQUEST
RESPONSE
SERVICE 2
/foo
SERVICE 3
/barREQUEST
RESPONSE
REQUEST
RESPONSE
SERVICE 4
/baz
REQUEST
RESPONSE
DEVOXX
SERVICE
/devoxx
REQUEST
RESPONSE

DEMO

Traced call

Traced call
TOTAL DURATION
END
START

Traced call
CLIENT
SENT
CLIENT
RECEIVED
SERVICE 2CLIENT

Traced call
SERVER
RECEIVED
SERVER
SENT
SERVICE 4SERVER

Traced call
LATENCY
SERVER
RECEIVED
CLIENT
SENT

Traced call
SERVER
RECEIVED
CLIENT
SENT
DIFF IS
LATENCY

Zipkin for Brewery
● A test app for Spring Cloud end to end tests
● Source code:
https://github.com/spring-cloud-samples/brewery
● Around 10 applications involved

Summary
● Log correlation allows you to match logs for a given trace
● Distributed tracing allows you to quickly see latency issues in your system
● Zipkin is a great tool to visualize the latency graph and system dependencies
● Spring Cloud Sleuth integrates with Zipkin and grants you log correlation

THANK YOU
● https://github.com/marcingrzejszczak/vagrant-elk-box/tree/presentation - code for this presentation (clone
and run getReadyForConference.sh - NOTE: you need Vagrant!)
● https://github.com/spring-cloud/spring-cloud-sleuth - Spring Cloud Sleuth repository
● http://cloud.spring.io/spring-cloud-sleuth/spring-cloud-sleuth.html - Sleuth’s documentation
● http://toomuchcoding.com/blog/2016/03/25/spring-cloud-sleuth-rc1-deployed/ - article about RC1 release
● https://github.com/openzipkin/zipkin-java - Repo with Spring Boot Zipkin server
● http://docssleuth-service1.cfapps.io/start - The service1 app from this presentation deployed to Pivotal Cloud
Foundry - point of entry to the app
● http://docssleuth-zipkin-server.cfapps.io/ - Zipkin deployed to Pivotal Cloud Foundry
● http://docsbrewing-zipkin-server.cfapps.io - Zipkin deployed to PCF for Brewery Sample app

Microservices Tracing with Spring Cloud and Zipkin (devoxx)

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (20)

Similar to Microservices Tracing with Spring Cloud and Zipkin (devoxx)

Similar to Microservices Tracing with Spring Cloud and Zipkin (devoxx) (9)

More from Marcin Grzejszczak

More from Marcin Grzejszczak (10)

Recently uploaded

Recently uploaded (20)

Microservices Tracing with Spring Cloud and Zipkin (devoxx)